Improving Stress and Positive Mental Health at Work via an App-Based Intervention: A Large-Scale Multi-Center Randomized Control Trial

Mobile health interventions (i.e., “apps”) are used to address mental health and are an increasingly popular method available to both individuals and organizations to manage workplace stress. However, at present, there is a lack of research on the effectiveness of mobile health interventions in counteracting or improving stress-related health problems, particularly in naturalistic, non-clinical settings. This project aimed at validating a mobile health intervention (which is theoretically grounded in the Job Demands-Resources Model) in preventing and managing stress at work. Within the mobile health intervention, employees make an evidence-based, personalized, psycho-educational journey to build further resources, and thus, reduce stress. A large-scale longitudinal randomized control trial, conducted with six European companies over 6 weeks using four measurement points, examined indicators of mental health via measures of stress, wellbeing, resilience, and sleep. The data were analyzed by means of hierarchical multilevel models for repeated measures, including both self-report measures and user behavior metrics from the app. The results (n = 532) suggest that using the mobile health intervention (vs. waitlist control group) significantly improved stress and wellbeing over time. Higher engagement in the intervention increased the beneficial effects. Additionally, use of the sleep tracking function led to an improvement in sleeping troubles. The intervention had no effects on measures of physical health or social community at work. Theoretical and practical implications of these findings are discussed, focusing on benefits and challenges of using technological solutions for organizations to support individuals’ mental health in the workplace.


INTRODUCTION
In recent decades, work-related stress has become increasingly prominent due to changes in working conditions (e.g., Kompier, 2002;Landsbergis, 2003;Broughton, 2010). Technology represents a potential solution to the issue of workplace stress, as smartphone technologies have become a common part of daily living (Bolier and Abello, 2014). Accessible, scalable, and providing return on investment (Ebert et al., 2014, mobile health interventions (i.e., mental health apps) may drive positive behavior change and are becoming increasingly available for individuals and organizations to manage mental health, both in terms of treatment and prevention programs. However, researchers, clinicians, and practitioners alike have identified both a paucity of empirical evidence for the effectiveness, and a lack of congruence with scientific theories and guidelines of mobile health interventions (e.g., Webb et al., 2010;Donker et al., 2013;Bolier and Abello, 2014;Nicholas et al., 2015;Bakker et al., 2016;Howells et al., 2016;Mistretta et al., 2018).
The current project addresses this research gap by investigating the effectiveness of a science-backed mobile health intervention to manage stress and positive mental health 1 at work. As part of a European Union's Horizon 2020 research and innovation project, this paper describes findings of a large-scale, multi-center randomized control trial (RCT) to assess a smartphone-delivered health intervention to counteract workplace stress and increase wellbeing at the individual level within a naturalistic, occupational context.

Stress and Resilience in the Workplace
Workplace stress remains a common yet undertreated condition . If experienced over a prolonged period of time, stress can result in a variety of mental and physical health issues (Joyce et al., 2016), affecting both the individual and the organization. Concerning the former, excessive chronic workloads without time to rest can cause physical exhaustion, stress-related illnesses (Chandola et al., 2006), and psychological disorders such as depression (Hammen, 2005). As to organizations, this results in high costs. From 2008 to 2017, sickness absence days due to mental ill-health 2 increased by 67.5% in Germany (Meyer et al., 2018). In the United Kingdom, in 2018, the Health and Safety Executive (2017) reported that stress, depression, and anxiety accounted for 15.4 million working days lost due to ill-health (57% of all days lost). Similar figures have been reported for other European countries (European Agency for Safety and Health at Work, 2014).
To counterbalance the risks that come with high levels of work stress, the concept of employee resilience has become increasingly important. Definitions of resilience vary, regarding it either as a rather stable trait that helps individuals to cope with difficulties and to attain good adjustment and development (e.g., Wagnild and Young, 1993;Schumacher et al., 2005;Ong et al., 2006), or as a state, suggesting a dynamic process, in which individuals actively adapt to and recover from major difficulties (e.g., Luthar and Cicchetti, 2000;Connor and Davidson, 2003;Fergus and Zimmerman, 2005;Windle, 2010). Taking the perspective that resources to counteract stress are adaptable, the Job Demands-Resources Model (JDR; Bakker and Demerouti, 2007;Schaufeli and Taris, 2014) describes stress as a response to an imbalance between the demands that work places on the individual and the resources a person has available to deal with those demands. Job demands describe "physical, social, or organizational aspects of the job that require sustained physical or mental effort and are therefore associated with certain physiological and psychological costs" (Demerouti et al., 2001, p. 501), for example, work overload, interpersonal conflicts, or job insecurity (Schaufeli and Taris, 2014). Job resources are defined as "those physical, psychological, social, or organizational aspects of the job that may do any of the following: (a) be functional in achieving work goals, (b) reduce job demands at the associated physiological and psychological costs; and (c) stimulate personal growth and development" (Demerouti et al., 2001, p. 501), resulting in higher employee resilience. If demands exceed resources, individuals experience a mental and physical health impairment process, leading to decreased energy and exhaustion. If the contrary is the case, a motivational process takes place, with an increase in work engagement and positive outcomes such as higher wellbeing and organizational commitment (e.g., Bakker et al., 2003;Schaufeli and Bakker, 2004;Xanthopoulou et al., 2007;Hakanen et al., 2008).

Digital Mental Health Interventions: Categorization, Scientific Foundation, and Validation
In order to tackle problems associated with stress in people's everyday lives, the mHealth 3 sector, and with it, the availability of apps claiming to reduce stress, increase wellbeing, or improve mental health issues such as depression, is constantly growing (e.g., Muaremi et al., 2013;Research 2 Guidance, 2017;Statista, 2018). Such mobile health interventions can be classified into four categories (Muaremi et al., 2013): diaries (collection of subjective and aggregated data), guides (strategies to cope with the problems), relaxations (training of relaxation skills), and sensor measurements (sensor-based tracking of problem-associated behavior). Many stress and wellbeing apps include features from more than one of these categories. Mobile health interventions can provide personalized feedback and deliver summary statistics or progress scores, either via the app or transmitted through the app by a therapist or coach (e.g., Gaggioli and Riva, 2013;Ly et al., 2014;Heber et al., 2016;Firth et al., 2017). Research demonstrates that the use of scientific methods and a theoretical foundation positively influence the outcomes of technologically delivered health interventions and drive behavior change (e.g., Webb et al., 2010;Donker et al., 2013;Bolier and Abello, 2014).
Recent research has shown that there is a large discrepancy between using science to advertise mental health apps versus evaluating their effectiveness (Larsen et al., 2019). Few of the many mobile health interventions currently on the market are scientifically validated; as such, the quality of mental health apps differs widely regarding the level of scientific evidence (e.g., Bolier and Abello, 2014;Nicholas et al., 2015). Meta-analyses show that online interventions (e.g., internet-delivered cognitive behavioral therapy) can be equally as effective as traditional faceto-face interventions in treating psychological disorders (e.g., Barak et al., 2008;Carlbring et al., 2018). A comparable level of research on mental health apps is lacking. In the clinical domain, some apps have been successfully applied concerning the diagnosis and treatment of stress-related mental illnesses such as depression and anxiety (e.g., Watts et al., 2013;Birney et al., 2016;Ranjbartabar et al., 2016;Firth et al., 2017). Yet, there remains a need for more rigorous research on the effectiveness of mobile health interventions as prevention tools for mental health, particularly in the work context.

Preventative Mental Health Solutions in the Work Context
Whilst research on digital health interventions in the clinical domain is rapidly growing (Firth et al., 2017), research on preventative mental health solutions in the workplace is often overlooked. Although workplaces would constitute an ideal location for preventative programs, most organizations implement reactive measures targeting the symptoms of workplace stress (Deloitte, 2017). Previous workplace mobile health interventions are built on programs traditionally run in a clinical context, such as cognitive behavioral therapy (CBT) or mindfulness based cognitive therapy. They often center around a 'virtual coach' or councilor guiding the user through content. Some pioneering studies found positive effects for smartphone interventions on workplace stress, for example, using Acceptance and Commitment Therapy (ACT; Ly et al., 2014) or CBT (Joyce et al., 2016). Initial results look promising; however, characteristics of the target population have to be considered when transferring tools to support mental health from a clinical domain to the workplace population (Joyce et al., 2016).
Mobile health interventions provide an opportunity for the provision of a proactive, preventative approach for employee mental health (e.g., Tan et al., 2014;Aryana et al., 2018). They are easily accessible to employees, enabling access to a service earlier than traditional methods, and therefore, preventing the onset of more severe mental health problems . A mobile preventative intervention approach may lend itself to higher levels of personalization than traditional workplace wellbeing support. Through a digital pathway, employees are able to take control of their individual journey by completing the intervention at their own pace, working on content that applies to them and their personal situation, and choosing a time that suites them while staying in anonymity (cf. Gaggioli and Riva, 2013;Ebert et al., 2016;Joyce et al., 2016). In sum, this creates an environment of high opportunity and low demand in which the desired behavior change can occur, leading to improved psychological outcomes (cf. Howarth et al., 2018). This may not only benefit the individual, but also the organization as a whole.
Programs specifically focusing on work-based problems often link to daily work routines (e.g., commuting or eating), and consist of targeted, short, simple, and easy to implement tasks, providing an opportunity to easily transfer learnings into daily work life (e.g., Gaggioli and Riva, 2013;Ebert et al., 2016;Joyce et al., 2016). Mobile health interventions benefit further when embedded into the work environment, for example, when downloaded onto an employee's personal technical equipment such as their work phone (Howarth et al., 2018; see also Martinez and Williams, 2010). Smartphones are commonly used among people of different age groups, socio-economic status, and cultural backgrounds; thus, app-based health interventions can be made accessible to a broad range of people in different workplaces. Often, users carry their smartphones almost everywhere and anytime (He and Li, 2013), providing a range of measurement and intervention opportunities, such as realtime symptom and activity monitoring, tracking of treatment progress, provision of personalized feedback and motivational support, portability and flexibility of use, and the potential to improve adherence to treatment (Donker et al., 2013). Furthermore, users may be empowered by the feeling of privacy and confidentiality of their engagement, if they only interact with an app on their personal phone, without connections to social networking sites (Bakker et al., 2016); still, data security policies need to be ensured.

The Current Research
Taken together, theory and research suggest that mobile health interventions may help to prevent mental health problems due to work-related stress, provided the intervention is grounded in theory, uses evidence-based techniques, and implements various behavior change strategies (e.g., based on self-monitoring or dual process theory). There is promise in regards to future sciencebacked mobile health interventions, as research in this domain is on the rise (e.g., Deady et al., 2018). Yet, there is still a lack of experimental research to validate app-based interventions to reduce stress at work.
The aim of this research was to examine whether a science-based health and wellbeing application, named "Kelaa Mental Resilience" and provided by Soma Analytics (London, United Kingdom), drives statistically and functionally significant improvements in validated measures of stress and wellbeing. The app is a digital tool that focuses on the prevention of mental ill-health, rather than a treatment thereof. It is theoretically grounded in the JDR (e.g., Demerouti et al., 2001;Bakker and Demerouti, 2007). The mobile health interventions has been developed specifically for the workplace and offers a combination of diaries, sensor measurements, and guides (see Muaremi et al., 2013). Details are provided in the paragraph Kelaa Mental Resilience App in the section "Materials and Methods." A previous version of the app was already shown to be effective for healthcare workers in a clinical work context based on a small-scale RCT (Mistretta et al., 2018).
Within the European Union's Horizon 2020 Research and Innovation Program, we conducted a large-scale longitudinal RCT to assess the intervention impact by comparing healthrelated outcomes between the app group and a waitlist control group. To increase external validity and generalizability of the findings, the trial was implemented at various organizations, offering a broad range of different jobs and work environments.
Indicators of stress (cognitive and general), wellbeing, resilience, and sleeping troubles served as dependent variables (DVs). Further, we analyzed whether the intervention impact was influenced by other factors, such as intensity of engagement with the app over time or trial site.
Based on the theory and research outlined above, we expected the following: Hypothesis 1: Compared to the waitlist control, after using the app for 4 weeks, participants in the app group will report (a) lower levels of stress (cognitive and general), (b) higher levels of wellbeing, (c) higher levels of resilience, and (d) fewer sleeping troubles.
Hypothesis 2: The observed effects will be more intense the more the user interacts with the app throughout the duration of the study.
In addition to the a priori hypotheses, we explored two open research questions: First, does the app also affect other outcomes which are more remote to the content of the app, such as social community at work (i.e., a measure of the organizational climate) or physical health? And second, will the positive effects of the app persist after people cease using it? To examine potential long-term effects, we included a 2-week follow-up measurement occasion (i.e., a time point after the main part of the study when participants had stopped using the app).

Kelaa Mental Resilience App
The app, developed as a digital prevention tool, seeks to translate insights from scientific research on psychology, sleep medicine, and neuroscience into an action-based program. It draws on the tenets of clinical, health, positive, cognitive, biological, and social psychology to foster recovery and growth. "Kelaa" aims to reduce stress and increase wellbeing of the user, specifically in the workplace. Users learn new behaviors and best practices through different means, for example, based on CBT and mindfulness based cognitive therapy. The app is designed to implement lifestyle changes through (1) measuring behavior, cognitions, and emotions (tracking module) and (2) providing psychoeducational content (intervention module).
Within the tracking module, users can track their stress, wellbeing, and resilience via short in-app questionnaires using validated scientific measures. The app also uses inbuilt sensors in smartphones (e.g., the accelerometer) to provide the opportunity to measure and track their sleep quality and quantity. Personalized feedback on questionnaire scores (e.g., what are my scores? What does this mean for me? What should I do about this?), as well as detailed feedback on sleep data (e.g., how do I interpret my sleep charts?), are given within the app.
In the intervention module, users access structured sciencebased content on factors contributing to reduced stress and improved wellbeing. "Kelaa" provides the user with evidencebased interventions grounded in current research, for instance, from sleep science, positive and social psychology, selfmonitoring, CBT, and mindfulness. According to the Hedonic Adaptation Prevention Model, task variety is a prominent factor influencing the effectiveness of happiness interventions, especially in the long-term. Engaging in a larger variety of exercises results in greater benefits from the intervention, causing an additional increase in wellbeing both in the short and long term (Sheldon et al., 2013). Building upon these findings, "Kelaa" offers users the opportunity to choose from a variety of topics of interventions, based on their individual results from the tracking module and personal interest. This individual choice is also intended to stimulate higher intrinsic motivation. Then, the user journeys through self-selected goals on different content. Each goal includes six to seven "daily sessions" (each about 2-4 min to read), which are gradually unlocked. The goals aim to increase personal resources by providing information, exercises, and reflection. During each daily session, relevant research and expected benefits are outlined, before users are instructed, for example, in specific stress management and resilience techniques, while encouraging positive behavior transformation. The intervention module supports users to reach a variety of nominated goals. A detailed summary of all goals and sessions is provided in Table 1.

Participants
An a priori sample size calculation (G * Power; Faul et al., 2007), with a conservatively estimated main treatment effect of at least 5% for all parameters (small effect: Cohen's d = 0.28, Cohen's f = 0.14), assuming a significance level (alpha) of 5%, and a statistical power (1-beta) of 80%, resulted in a required sample size of N = 561 participants. Participants were recruited from six different European businesses in Germany, England, and Northern Ireland from the private and public sector. The complete data set consisted of N = 678 participants. At T1, n = 621 people completed the questionnaire. The number of participants dropped to n = 483 at T2, n = 396 at T3, and n = 363 at T4. A total of n = 301 (44.4%) people completed all questionnaires at all times, while n = 105 people completed three (15.5%), n = 99 two (14.6%), and n = 146 (21.5%) only one out of four measurement occasions (see section "Procedure" for details).
As indicated by user metrics of the app, out of the n = 347 participants who were assigned to the app group, n = 137 people did not use the app at all, and thus, were excluded from the sample. Further, people in the waitlist control group (n = 331), who downloaded the app before the end of the trial (n = 9), were also excluded. All participants who adhered to their assigned group (i.e., app group with at least one sign-in, waitlist control group with no app use) were included into the statistical analyses.
The final sample consisted of n = 532 participants with n = 210 in the app group and n = 322 in the waitlist control group. Participants were unevenly distributed across trial sites (n 1 = 40, n 2 = 78, n 3 = 179, n 4 = 61, n 5 = 11, n 6 = 163). Participants' age (M = 40.62, SD = 11.19) ranged between 17 and 72 years (based on n = 485 participants who shared their age). The gender distribution of the final sample was skewed with n = 119 (24.4%)

Procedure
This intervention study was conducted as a randomized control trial (RCT), following a longitudinal experimental design. Participation was voluntary. Data protection policies (i.e., GDPR) were strictly followed. Ethics approval was provided through the European Commission Horizon 2020 Ethics Appraisal Procedure (European Commission, 2018). Participants were blind to hypotheses and goals of the study, while HR managers were blind to each participant's group assignment. HR managers had no insight into questionnaire results, app use intensity, or other personal information.
The trial was conducted from January until September 2018. Launch days varied between the six companies. After signing up and giving their informed consent, participants were randomly assigned to one out of two experimental conditions: app group vs. waitlist control. The trial was conducted over a period of 6 weeks. During the recruitment phase, employees were informed via email and information on a web portal that on launch day, they would receive an email containing a link to the first of the four assessments. Data was collected online using the survey software Qualtrics. Participants in the app group were asked to complete the first questionnaire prior to downloading and engaging with the app. Measurements (see section "Measures") of all participants (i.e., app group and waitlist control group) were taken at baseline (T1, week 0), mid-intervention (T2, week 2), end-intervention (T3, week 4), and two-week follow-up (T4, week 6). Invitations to the follow-up questionnaires as well as reminders were sent via email. The time frame to complete each questionnaire was restricted to seven days. After finishing T4, all participants were thanked and fully debriefed. They had the opportunity to provide feedback on their personal experiences with using the app and to suggest improvements.
The duration of the intervention was 4 weeks. Thus, participants in the app group could complete a maximum of 28 sessions and track a maximum of 28 nights. As people prefer to receive self-help support materials on a familiar medium (see Martinez and Williams, 2010), participants were offered the choice to use the app on their personal or their work phones at their individual preferences. It was completely left to the user to what extent s/he wanted to engage with the app. Push-notifications were sent out as reminders, yet users had the option to turn them off. For the app group, active access to the interventional module within the app was withdrawn during the final 2 weeks, in order to get an indicator of whether and which gains from 4 weeks' use persisted (T4). Participants in the waitlist control group received no intervention and no tracking opportunity for the duration of the trial (6 weeks), yet they had unrestricted access to treatment as usual within their companies. Upon completion of the trial, participants in the waitlist control group received access to the "Kelaa" app.

Measures
Participants completed a series of questionnaires at all four measurement occasions (T1, T2, T3, and T4). At all times, reliability of the scales was good or excellent, as indicated by Cronbach's α (see Table 2 for descriptive statistics). Scales were presented in the order as below. For each measurement occasion and scale, participants were instructed to answer the questions based on their experiences during the past 2 weeks. For all measures and all times, the mean response scale values were calculated.

Stress
Self-reported levels of stress were assessed with the two subscales General Stress (four items, e.g., "How often have you been stressed?") and Cognitive Stress (four items, e.g., "How often have you had problems concentrating?") from the Copenhagen Psychosocial Questionnaire -Revised Version (COPSOQ II;

Wellbeing
Subjective wellbeing was measured with the Warwick-Edinburgh Mental Wellbeing Scale (Tennant et al., 2007). Seven items (e.g., "I've been feeling relaxed.") were answered on a five-point scale (1 = none of the time; 2 = rarely; 3 = some of the time; 4 = often; 5 = all of the time). Higher values indicate more wellbeing.

Resilience
We assessed resilience with the 13-item Resilience Scale (RS-13; Leppert et al., 2008), a short form of the Resilience Scale (RS-25; Wagnild and Young, 1993). Items (e.g., "I usually take things in stride.") were answered on a seven-point scale (1 = strongly disagree to 7 = strongly agree). Higher values indicate more resilience.

Social Community at Work
Participants indicated their sense of cooperation and social community at work with the subscale Social Community at Work from the COPSOQ II (Pejtersen et al., 2010). Three items (e.g., "Do you feel part of a community at your place of work?") were answered on a five-point scale (1 = not at all; 2 = a small part of the time; 3 = part of the time; 4 = a large part of the time; 5 = all the time). Higher values indicate more sense of social community.

Sleeping Troubles
Participants were asked about sleeping troubles with the subscale Sleeping Troubles from the COPSOQ II (Pejtersen et al., 2010). Four items (e.g., "How often have you slept badly and restlessly?") were answered on a five-point scale (1 = not at all; 2 = a small part of the time; 3 = part of the time; 4 = a large part of the time; 5 = all the time). Higher values indicate more sleeping troubles.

Physical Health Impairment
We assessed participants self-reported physical health levels with the SF-36 Version 2 (Jenkinson et al., 1999). Participants were asked to indicate their agreement to four items (e.g., "Because of your physical health, you were limited in the kind of work or other activities.") on a scale from 1 = none of the time to 5 = all of the time. Higher values indicate worse physical health.

Work Productivity and Activity Impairment
The Work Productivity and Activity Impairment Questionnaire: General Health V2.0 (WPAI:GH; Reilly et al., 1993) asked participants about the impact of health problems on their ability to work and perform regular activities over the past week. Three items (i.e., "During the past 7 days, how many hours did you miss from work because of your health problems?" ". . ., how many hours did you miss from work because of any other reason, such as vacation, holidays, time off to participate in this study?" ". . ., how many hours did you actually work?") were provided with open text fields. Two items asked for a rating on a bipolar 11point scale (e.g., ". . ., how much did your health problems affect your productivity while you were working?"; 0 = Health problems had no effect on my work to 10 = Health problems completely prevented me from working). This scale was not analyzed further as part of this research, but is reported for transparency reasons.

Statistical Analyses
To obtain the final data set, we augmented the self-reported outcome measures at the four measurement points with the (selfreported) demographic information at baseline (i.e., age, gender), trial site, experimental group, and actual user behavior metrics from the app (i.e., number of daily sessions completed, number of nights tracked via the sleep tracker). Combining the selfreported DVs with predictors mirroring actual user behavior is a particular strength of the methodological approach of this study, as it eliminates the subjectivity in measuring compliance to the intervention by introducing an objective, fine-grained measure of app use intensity. The data were analyzed by means of multilevel modeling for repeated measures where measurements (Level 1) are nested within subjects (Level 2). Compared to repeated measures ANOVAs, multilevel analyses possess the advantage of being more robust with respect to missing data and underlying variance/co-variance assumptions (Hox, 2010). Additionally, particularly in the case of the current project, multilevel analyses potentially allow the addition of "trial site" as Level 3 (i.e., measurement occasions within subjects within trial sites). The recommendation for each level is a minimum number of 30 cases per level (Hox, 2010). However, the study was conducted at only six trial sites. Thus, we could not include a third level into our analyses. Instead, the variable was disaggregated onto Level 2 (see controlled model), in order to assess the potential impact of higher order variables. The data were analyzed using the statistical programing language R (R Core Team, 2018) in combination with the nlme package (Pinheiro et al., 2018). Results were replicated with HLM 7 for Windows (Raudenbush et al., 2011).
Using the notation of Raudenbush et al. (2011), the foundational, uncontrolled Level 1 model can be written as follows: within subject i, the outcome measurement OUTCOME ti at measurement time t depends on a baseline mean and a treatment effect according to the simple regression model, OUTCOME ti = π 0i + π 1i TIME ti + e ti , for subject i; e ti is a measurement specific residual assumed to be independently and normally distributed within subjects. TIME takes the values 0, 1, and 2, so that the intercept denotes the baseline outcome.
Within the framework of the hierarchical linear model, the coefficients at Level 1 become outcome variables at Level 2. The intercept and subject specific treatment effect vary randomly across subjects ("random intercepts and slopes model") according to the regression models, π 0i = β 00 + β 01 GROUP i + r 0i π 1i = β 10 + β 11 GROUP i + r 1i Here, β 00 is the grand mean for the baseline outcome and β 10 is the average treatment effect (for one unit of TIME, i.e., per 2 weeks). GROUP i is a dummy coded treatment contrast, with a value of 1 for the experimental group and 0 for the control group. r 0i and r 1i are subject specific random effects that are independent of e ti and are assumed to have a bivariate normal distribution over subjects.
After establishing the basic treatment effect of the intervention (including T1, T2, and T3), the uncontrolled model was then extended in three distinct ways: (a) Controlled model: potentially confounding factors including age (normalized to mean 0), gender (effect coded: male = −1, female = 1), trial site (effect coded: trial site 6 = −1) were introduced as additional predictors on Level 2, (b) Intensity model: the categorical variable GROUP i was substituted by the numerical measures of app use intensity: sessions completed (values 0-28) and nights tracked (values 0-28), and (c) Follow-up model: the follow-up measurement T4 was included in a piecewise linear model.

Basic Treatment Effect: Stress, Wellbeing, Resilience, and Sleeping Troubles Over Time
To test our hypotheses, we analyzed (1) between group differences, (2) changes over time, and (3) interaction effects of group and time. Results of the multilevel analyses are based on n = 532 participants (app group: n = 210; waitlist control group: n = 322) and measurement occasions T1 (baseline), T2 (midintervention), and T3 (post-intervention). 4 Detailed results of the basic treatment effects are displayed in Table 3.

Stress
The results revealed an improvement in the experience of general stress and cognitive stress with a continuous time-trend toward experiencing less stress (significant time slopes), as well as a significant difference between the two groups over time (significant group * time interactions). Persons in the app group experienced a greater decrease in both general and cognitive stress from T1 to T3 compared to the waitlist control group.

Wellbeing
Similarly, regarding wellbeing, there was a significant time-trend toward reporting more wellbeing over time (significant time slope). Participants in the app group compared to the waitlist control group reported significantly more wellbeing over time (significant group * time interaction).

Resilience
We could not confirm our hypothesis with respect to resilience. While there was indeed a significant time-trend toward reporting more resilience over time (significant time slope), the difference between the two groups over time was not significant (group * time interaction n.s.).

Sleeping Troubles
Self-reported sleeping troubles improved over time with a significant trend toward reporting reduced sleeping troubles (significant time slope). The difference between the two groups 4 Controlled model (see Supplementary Material): All analyses were rerun with age, gender, and trial site as additional predictors (entered on Level 2). None of the basic treatment effects changed when we included the control variables. For the extended models, sample sizes may differ due to selectively missing data as indicated in the respective tables.
was not significant, even though descriptively, the app group showed a larger improvement (group * time interaction n.s.).

Intensity Model: Engagement With the App as a Predictor
We conducted a second set of analyses to examine whether app use intensity had an influence on changes in the DVs over time. We hypothesized that the more interaction with the app occurs over time, the larger the improvement. Specifically, we examined whether the number of completed sessions (see Table 4) and the number of tracked nights (see Supplementary Material for details) on the app predicted changes in the DVs over time. On average, participants in the app group completed M = 11.06 sessions (SD = 7.34, range: 1-28) and tracked M = 3.61 nights (SD = 6.11, range: 0-25). A substantial number of people in the app group (n = 111, 52.9%) did not track their sleep at all, and n = 22 (10.5%) only tried once. Confirming Hypothesis 2, the results revealed that the more sessions users completed, the less general and cognitive stress they reported over time (significant sessions * time interactions). The same applied to self-reported wellbeing: the more sessions users completed, the larger their increase in wellbeing over time (significant sessions * time interaction). In case of resilience, the interaction effect of sessions * time also became significant, while it had only been trending when experimental group was the predictor (see basic treatment effects).
Regarding sleeping troubles, the number of completed sessions was only a trend-significant predictor for an improvement, yet descriptively the results pointed in the expected direction. As the app offers the option to track sleep, and thus, focus specifically on creating awareness for and improving this health-related issue, we included the number of tracked nights instead of sessions as a predictor for sleeping troubles. Number of tracked nights turned out to be a significant predictor for the improvement in sleeping troubles over time (significant nights * time interaction, p < 0.001; see Supplementary Material for details).

Social Community at Work and Physical Health Impairment
Regarding the first open research question, the results revealed neither a significant time trend nor an intervention effect concerning participants' perception of their social community at work. The physical health impairment scale was included as a measure that should be insensitive to the mental health intervention. Indeed, we did not observe a change in physical health over time and no significant difference between the two groups. Social community at work and physical health impairment were also not affected by app use intensity (i.e., sessions completed or nights tracked). This suggests that only aspects of persons' mental health but not their physical health or perception of organizational climate were directly influenced by using the app.

Follow-Up Model: Effects of App Use at Follow-Up
Our second open research question aimed at examining the sustainment of effects after people stopped using the app in comparison to the waitlist control group. Thus, in a third step, we included the 2-week follow-up measurement occasion (T4) into our analyses. To analyze a potentially non-linear continuation into the follow-up trend, we incorporated two differently coded time predictors into the hierarchical analyses (time1: 0-1-2-2; time2: 0-0-0-1); additionally, group and the interaction terms (time1 * group and time2 * group) served as predictors for the DVs.
The results revealed a significant interaction of group and time2 regarding general stress (beta = 0.24, SE = 0.07, p < 0.001) and wellbeing (beta = −0.15, SE = 0.07, p = 0.02), but not regarding any of the other DVs (all ps > 0.18). Remarkably, these findings indicate that while the effects remained stable in the app group, the control group significantly improved from T3 to T4 (see Figure 1 and see section "Discussion").

DISCUSSION
Systematic reviews and meta-analyses show that there are only few scientifically validated mobile health interventions on the market, and even fewer applications designed for workplace interventions, which look at both individual mental health and overall organizational culture change (e.g., Zhao et al., 2016;Stratton et al., 2017;McKay et al., 2018). This RCT adds to the sparse research on preventative mental health solutions in the workplace. It was conducted in a 'naturalistic environment' , as it was implemented in different work environments, in three countries, and across disciplines. By executing the trial as an experimental field study, and as such, including a large variety of workplaces and people, the current research bears high external validity. Our results are in line with recent research findings, suggesting that digital health interventions can improve mental health-related outcomes in a work context (e.g., Tan et al., 2014;Mistretta et al., 2018). Yet, they go beyond previous studies, as this large-scale RCT included multiple workplaces and addressed both, positive mental health of individuals and indicators of organizational change. In addition to the core findings of this study, we also discuss aspects of user attrition and sustained engagement as important practical issues to address (cf. Howarth et al., 2018; see section "Practical implications"). Regarding individual mental health, the app was shown to be effective relative to a waitlist control group. Supporting our hypotheses, the results of this RCT indicate that using the app improves indicators of stress and wellbeing, and also, to a lesser extent, resilience. The results further suggest that within the app group, the positive effects remained stable after a period of 2 weeks without app usage. As indicated above, resilience comprises both, a state and a trait component (cf. Windle, 2010). Due to the fact that the resilience measure used in the current research also incorporates trait-like aspects, improvements in resilience likely require more time to develop. Job resources, as defined in the JDR (Demerouti et al., 2001), can take more time to accrue than 4 weeks to realize their full potential in stimulating personal growth. The actual content that users chose on the app 5 (i.e., the goals and sessions they self-selected and completed within the interventional module), may further explain the specific effects on indicators of individual mental health outcomes (i.e., stress, wellbeing, resilience). Overall,  we found the app to be effective to improve stress and wellbeing outcomes of all employees, regardless of their age, gender, and workplace.
To observe change processes in the organizations, we also examined the outcome variable 'social community at work' (i.e., a measure of the organizational climate). Based on anecdotal evidence from users in previous trials, we assumed that undergoing a joint wellbeing intervention may create some "common ground" among employees and increase group cohesiveness (Mistretta et al., 2018). Yet, no such effects manifested on the respective scale. However, we observed a significant improvement in general stress and wellbeing within the waitlist control group from T3 to T4. Since these improvements are confined to those outcome variables that responded to the intervention from T1-T3 (with cognitive stress trending into a similar direction), but did not materialize on the social community and physical health impairment scales, we suggest that it is unlikely that these changes took place only because of measurement effects. They appear to be neither random nor spontaneous. While we cannot provide a perfect explanation for this effect, we argue that effective aspects of the intervention may have spread across the organization after the active treatment phase had ended, potentially due to communication processes within the organizations. Previous research has shown that positive behavior changes can be contagious, for example, by communicating social norms (Goldstein et al., 2008). The current results are a first indicator that offering a digital health solution in the workplace might be beneficial not only for those who decide to make use of it, but also for their colleagues. Certainly, further research is needed to examine these potential carry-over effects over a longer time period.
The "Kelaa" app meets the criteria that have been recommended by previous researchers: It is grounded in theory (i.e., the JDR; Bakker and Demerouti, 2007), offers a variety of evidence-based interventions that target specific challenges that employees might face in the work context, and implements various science-backed behavior change strategies.
Our findings indicate that more intense use of the interventional module of the app increased the beneficial effects on stress and wellbeing, while specifically using the sleep tracking functionality can help to reduce sleeping troubles. This highlights the impact of self-monitoring and connects the study to a body of medical literature documenting diary-based approaches to tackle sleep impairment (Carney et al., 2012), in particular in the context of mHealth (Lorenz and Williams, 2017). Taken together, the current results support the assumption that building mobile health interventions on a theoretical foundation and using scientific methods bears great potential to positively influence mental health-related outcomes (cf. Webb et al., 2010;Donker et al., 2013;Bolier and Abello, 2014).

Practical Implications
Attrition in digital interventions is a challenge. Attrition rates can spike with up to 64% in motivated, self-selected groups (Howells et al., 2016); however, they are usually lower in regular working samples (Payne et al., 2015;Howarth et al., 2018). This is a known barrier to workplace interventions. Developers of digital mental health interventions have been recommended to implement a behavior change plan and an interactive framework to increase user engagement (Bakker et al., 2016). The Kelaa Mental Resilience App was developed based on these recommendations. Yet, we also observed moderate to high rates of non-adherence to the trial participation guidelines, in particular in the intervention group. We acknowledge that app-based interventions might not be suitable for everyone, which might have resulted in a selfselection mechanism over time. Broadening the scope of the content within the app while keeping the personalized approach for each individual might help to make it more applicable for a larger variety of users' needs.
Further, the relatively low compliance in questionnaire completion and variance of completion rates between trial sites suggests that company-specific communication strategies may play a key role in user engagement. Strategies to make using the mobile health intervention a more social experience, and thus, increasing the sense of social community might be beneficial in intensifying intervention impact (e.g., van Dick and Haslam, 2012). Prevention programs can be targeted universally across an organization, to individuals at high risk, or to individuals who show initial symptoms of risk (Cuijpers et al., 2012). Creating and implementing structured, targeted psychological interventions is more likely to be effective than generic stress management trainings . In line with previous research, we suggest that interventions should be targeted at both the individual and the organization, as the inclusion of team-based interventions can improve workplace stress (Tan et al., 2014). In general, the best outcomes may be achieved by providing the right type of intervention to the respective population, as this fit will likely influence the results (see Stratton et al., 2017).

Limitations and Future Research Directions
Despite our contribution to both research and practice, some limitations need to be noted. First, our a priori sample size calculation indicated that we would have needed more participants (N = 561 vs. n = 532). Further, we acknowledge the possibility of false positives due to multiple hypotheses testing. However, the effects that we found were large enough to be detected with the given sample size. Therefore, we conclude that the changes in the DVs, which originated from using the app, were large enough to be meaningful for employees' experiences of stress and wellbeing.
Second, the sample was rather heterogeneous, as described above. The over-representation of female participants as well as the relatively high educational standard in the sample is a reflection of the respective companies that participated in the trial. Despite the statistical precautions that were taken (see controlled model in the online Supplementary Material), the findings may not apply to other workplace populations. We recommend being cautious with over-generalizations of the findings.
Third, the scales used to assess the DVs varied in terms of their sensitivity to change. Thus, some may not have been sensitive enough to capture smaller changes that occurred, such as fluctuations in mood. For instance, the scale that was used to assess resilience targets rather stable (trait-like) features, while the scales for stress and wellbeing are more sensitive to shortterm changes (states). This might have contributed to the less clear-cut effects for resilience. In addition, although previous studies have also shown a positive impact in stress reduction, these may not necessarily lead to an impact on work-related outcomes such as absenteeism (Joyce et al., 2016). More research in this area is needed.
Fourth, we did not conduct the study at enough trial sites, and we could not recruit sufficient participants within each site to include organization as a third level into our multilevel analyses. Thus, the questions remain whether we are able to generalize the findings across trial sites and whether it was more effective at certain sites compared to others. The results of the controlled model suggest that the treatment effect was present at all sites, regardless of differences in employees' stress and wellbeing baselines (see Supplementary Material). However, this finding should not be overgeneralized. Working with larger datasets by including more trial sites and associated context variables would allow a data-driven tailoring of the mobile health intervention to fit the individual needs of each organization and its employees.
Fifth, due to the personalization, and thus, the large variability in how people used the app, regarding both the tracking and the interventional module, we cannot conclude which of the elements contributed to the app being effective in reducing stress and increasing wellbeing. In order to identify the best combination of interventions and the most effective elements of the app, more data points would be needed. Analyses with larger data sets, which include detailed information on user metrics, could lead to more efficient interventions. To gain more differentiated insight, we support the call for better ways to assess the effectiveness of apps for health behavior change (McKay et al., 2018).
Last, due to the naturalistic nature of the RCT, there was a range of factors that could not be controlled for, and which therefore may have influenced the DVs over and above the intervention. Thus, at the cost of the high external validity, our study has rather low internal validity. While participants in both treatment and control group were instructed not to discuss contents of the study with members of the other group, it is difficult to enforce this constraint in practice. Both in the controlled and uncontrolled model, we witnessed significant effects of time, irrespective of experimental group, for stress, wellbeing, resilience, and sleeping troubles (significant time slopes). These effects were consistently smaller than the treatment effects, yet noteworthy. We suspect that they may result from perceived changes to the organizational context due to the sheer existence of the intervention. Recruitment efforts within the organizations, which included advertisements for an 'innovation and research project' , likely have signaling effects for employees to notice that the employer is taking employee wellbeing seriously. Despite these limitations, the current findings suggest that there is a benefit to using technological solutions to mental health in the workplace to support organizations and their employees to thrive.

CONCLUSION
This paper presents conclusive evidence that a smartphonebased health intervention that is grounded in the JDR  can decrease levels of perceived stress and sleeping troubles, and improve subjective wellbeing and resilience after 4 weeks, with sustained results at a 6-week follow-up. To the best of our knowledge, the current project offers the first large-scale, multi-center RCT using a theoretically sound smartphone application to manage and reduce stress in a work context, based on a sample of employees at various companies. The study therefore contributes to closing the empirical evidence gap concerning the effectiveness of appbased interventions to manage stress and positive mental health at work, in order to fulfill their potential as "enabler[s] of change" (Stevenson and Farmer, 2017, p. 7). While mental health professionals in traditional health care have an ethical obligation to provide patients with theoretically and empirically sound interventions (e.g., American Psychological Association, 2002; American Counseling Association, 2014), we argue that organizations should do the same for their employees. To reduce the costs of ill-health and keep organizations and their employees thriving, more research on effective solutions to positive mental health in the workplace is needed.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
Ethics approval was provided by an Independent Ethics Advisory Board (EAB) according to the European Commission Horizon 2020 Ethics Appraisal Procedure (European Commission, 2018), consisting of four independent ethics advisors from Germany and the United Kingdom. All members of the EAB either signaled "favorable opinion" or "favorable opinion with additional conditions." Ethics approval was required and obtained as per applicable institutional and national guidelines and regulations. Data protection policies according to GDPR guidelines were strictly followed. Participation in the study was voluntary. All participants gave their informed consent in written form.

AUTHOR CONTRIBUTIONS
SW conceptualized and designed the RCT and wrote the manuscript with valuable input from CL and NH. CL and NH implemented the study. CL organized the database and created the figures. SW and CL carried out the statistical analyses, interpreted the results, and created the tables. NH conducted a systematic literature search. All authors contributed to the manuscript revision, read, and approved the submitted version.

FUNDING
This project has received funding from the European Union's Horizon 2020 Research and Innovation Program under the grant agreement no. 725832. This publication was funded by the German Research Foundation (DFG) and the University of Wuerzburg in the funding program Open Access Publishing.