You're viewing our updated article page. If you need more time to adjust, you can return to the old layout.

BRIEF RESEARCH REPORT article

Front. Digit. Health, 02 January 2026

Sec. Digital Mental Health

Volume 7 - 2025 | https://doi.org/10.3389/fdgth.2025.1694666

Climbing the ladder: a ranking approach to burnout prediction

  • 1. Department of Innovative Technologies, Institute of Digital Technologies for Personalized Healthcare, University of Applied Sciences and Arts of Southern Switzerland, Lugano, Switzerland

  • 2. Faculty of Informatics, Università della Svizzera italiana, Lugano, Switzerland

  • 3. Institute of Computer Science, Faculty of Science, University of Bern, Bern, Switzerland

  • 4. Resilient SA, Lausanne, Switzerland

  • 5. Psy Bern AG, Bern, Switzerland

Article metrics

View details

996

Views

55

Downloads

Abstract

Validated psychological assessment tools, such as the Shirom–Melamed Burnout Measure (SMBM), are essential for reliably assessing burnout. However, their reliance on active, self-reported input limits their suitability for continuous monitoring and early detection, and introduces the potential for human bias. The SMBM specifically targets the energy depletion component of burnout, with items organized into three subscales: Physical Fatigue (PF), Cognitive Weariness (CW), and Emotional Exhaustion (EE). In the present work, we investigate the feasibility of predicting burnout risk unobtrusively using the preceding trajectory of passive physiological data from wearable devices, supplemented by baseline demographic and occupational information. We evaluate classification, regression, and learning-to-rank formulations for the prediction of SMBM subscale scores on a 9-month real-world dataset of 239 workers, using both aggregate-based and sequential models. Binary classification yields modest performance [ROC AUC: PF (0.66), CW (0.67), EE(0.56)], and regression models offer negligible gains over naïve benchmarks. However, rank-based metrics suggest relative burnout severity can be partially inferred from wearable signals. Motivated by this, we propose a siamese recurrent neural network, explicitly tailored for sequential wearable data and optimized for pairwise risk estimation. Results show improved alignment with the ordinal nature of burnout scores for PF (Spearman’s , Normalized Discounted Cumulative Gain ) and CW (Spearman’s , Normalized Discounted Cumulative Gain 0.89), whereas assessing EE may require additional modalities. Although real-world implementation remains challenging, ranking-based techniques could pave the way for more effective burnout risk monitoring.

1 Introduction

Burnout is classified in the ICD-11 as an occupational phenomenon resulting from chronic workplace stress that has not been successfully managed, characterized by energy depletion, increased cynicism toward one’s job, and reduced professional efficacy [1, 2]. Burnout is widespread across diverse professions, especially among healthcare workers [35], with rates rising sharply during the COVID-19 pandemic [6], and shows distinct patterns between white- and blue-collar workers [7], heightened vulnerability in shift workers [8], while also affecting broader segments of the population beyond the workplace [9]. Elevated burnout levels are frequently associated with somatic complaints such as chronic fatigue, gastrointestinal disturbances, and sleep disruptions [9], and have been linked to increased risk of type 2 diabetes [10], cardiovascular risk [11], and reduced vagal tone, a marker of impaired parasympathetic regulation [12]. Furthermore, burnout is closely tied to depression and anxiety, with meta-analytic evidence revealing substantial correlations ( and , respectively) [13], though the constructs are not isomorphic [13, 14]. Beyond individual health consequences, burnout also imposes significant economic costs on organizations and society at large [15, 16]. Given its high prevalence and broad impact, burnout requires urgent attention through effective prevention and early detection strategies, crucial for enabling timely intervention.

Effective prevention efforts rely on the accurate assessment of burnout. Validated self-report tools include the widely used Maslach Burnout Inventory (MBI), which evaluates emotional exhaustion, depersonalization, and reduced personal accomplishment [17]; the Shirom–Melamed Burnout Measure (SMBM), which captures burnout across cognitive weariness, emotional exhaustion, and physical fatigue [18, 19]; and the Copenhagen Burnout Inventory (CBI), which emphasizes fatigue and exhaustion as central symptoms [20]. While standardized self-report instruments remain the cornerstone for burnout assessment, they require individuals to actively initiate evaluation, consulting a healthcare professional, filling in questionnaires, and participating in interviews. Because these methods rely on individuals’ willingness and initiative to report their feelings and undergo assessment, detection may be delayed, particularly in the early stages of burnout when signs may be subtle. In this context, wearable devices present a promising, though still emerging, avenue for early identification and continuous monitoring of burnout risk [21, 22]. These devices, ranging from wristbands to smart rings, enable the unobtrusive collection of physiological and behavioral data, including cardiac metrics, sleep characteristics, and physical activity patterns [23, 24]. Growing evidence indicates that these signals are associated with mental health conditions and hold potential to support clinical decision-making in these areas [25]. In occupational contexts, research suggests that physiological indicators gathered through wearables may contribute to enhanced work performance and support the well-being of employees [21], although current implementations are still largely constrained [26].

Recent research has increasingly examined physiological markers of burnout that can be tracked through wearable devices. Sleep disruption is among the most consistent findings: individuals with burnout show fragmented sleep, reduced slow-wave and REM stages, lower sleep efficiency, and persistent daytime fatigue [27, 28], while prospective studies identify poor sleep quality and short duration as key contributors to burnout onset [29]. Physical activity has also been associated with burnout, with evidence suggesting that regular movement, both during work-time and leisure-time, may lower risk [30, 31]. Additionally, burnout has been associated with increased heart rate and reduced heart rate variability (HRV), reflecting heightened physiological stress responses and lower parasympathetic activity [12, 32, 33]. While wearable sensing in naturalistic settings faces inherent limitations in capturing correlates of stress in real time without contextual information [34, 35], recent deep learning methods that integrate time- and frequency-domain features across multiple physiological modalities have shown promising performance in detecting short-term stress in certain occupational contexts [36], though further validation to larger cohorts and unseen subjects are warranted. Moreover, we note that substantially higher accuracy in the measurement of HRV can be achieved when considering resting conditions, i.e., overnight measurements [37]. Beyond physiological indicators, contextual and demographic factors also contribute meaningfully to burnout risk. The causal progression of burnout symptoms may differ between men and women [38], with several studies identifying gender as a significant predictor [9, 19, 39].

Resilience refers to an individual’s ability to maintain or regain mental well-being in the face of adversity. It has been shown to reduce the severity of burnout by protecting against emotional exhaustion and enhancing feelings of personal accomplishment [40, 41]. Building on this link, recent studies have begun modeling the dynamic relationship between resilience and mental health. Adler et al. [42] proposed a framework combining wearable sensors with ecological momentary assessments to detect markers of resilience, operationalized as individuals exhibiting minimal change in the patient health questionnaire-9 (PHQ-9) scores over time. Their density-estimation approach identified step count and sleep duration as key mobile-sensing features in high-stress contexts. More recently, large-scale longitudinal paradigms have sought to capture resilience processes over time by repeatedly monitoring exposure to both major life events and daily microstressors, alongside fluctuations in mental health [43]. Kalisch et al. [44] introduced a normative modeling approach inferring resilience from an individual’s residuals from the expected relationship between stressor exposure and mental health symptoms. However, these designs rely heavily on frequent self-reporting, and replacing these with objective features may offer a less burdensome alternative [44]. An early attempt by [45] used the same resilience definition of [44] in a machine learning framework to predict individual resilience scores from both daily-life psychological and physiological data. While psychological-based models performed moderately well, physiological-based models underperformed. Overall, whether predicting burnout directly or via resilience-based proxies, models based solely on physiological data continue to face significant challenges. Most studies report limited or negative results [26], with only a few achieving above-chance performance [42, 46], highlighting the need for further methodological advancement.

In this work, we explore fully unobtrusive [47] burnout prediction by estimating burnout risk using only passively collected data, without requiring active user input. We predict burnout scores based on prior trajectories of daily physiological markers, considering both aggregated features and sequential architectures, enriched with demographic and occupational data collected at baseline, assuming no access to prior burnout assessments. We begin by outlining the performance and limitations of conventional classification and regression approaches. Then, drawing from the observation that both burnout and resilience are often defined in relative terms, either in terms of deviation from the norm [4345], from stress-resilient outcomes [42], or via population percentiles [48], we propose a shift toward a ranking-based formulation of burnout risk prediction. To the best of our knowledge, this is the first study to explicitly frame the task within a learning-to-rank paradigm [49]. Models are trained to learn the relative severity of burnout across individuals, focusing on the correct ordering rather than predicting absolute values or categories. To implement this, we propose a siamese recurrent neural network [50], drawing inspiration from the RankNet approach [51].

2 Materials and methods

2.1 The dataset

The data analyzed in this study were collected as part of the “Risk Identification and Prevention of Work-Related Stress Disorders” (WRSD) project. The study design adhered to the Declaration of Helsinki guidelines [52] (protocol number 628-2023-SPER-AUSLBO, Comitato Etico Area Vasta Emilia Centro, Italy; Clarification of Responsibility Req-2023-00377, SwissEthics, Switzerland). The detailed methodology, participant characteristics, and adherence analysis of the WRSD protocol are thoroughly described in another paper by the authors [53]. Here, we summarize all the relevant characteristics of the original dataset and the data collection protocol for the present work. The cohort considered comprised office and production workers from Italy and Switzerland, monitored over a 9-month period (mean SD age years; % female). Each participant was provided with a Garmin Venu Sq smartwatch and a custom mobile application, enabling continuous monitoring of physiological parameters. The devices captured minute-level heart rate, 15-min activity epochs, 3-min stress and breathing rate, and detailed sleep metrics, including sleep composition and nightly summaries. Burnout assessments were administered monthly within the application using the -item version of the Shirom-Melamed Burnout Measure [10], evaluating physical fatigue (PF), cognitive weariness (CW), and emotional exhaustion (EE) components. The survey consisted of questions asking participants to report how often, over the past 30 days, they experienced specific feelings, using a -point Likert scale ranging from (“Never or almost never”) to (“Always or almost always”); the resulting component scores and the overall score also ranged from to . At baseline, alongside their initial baseline SMBM assessment, participants also completed socio-demographic and work-related questionnaires.

2.2 Window extraction and feature representation

Since each SMBM survey prompts participants to reflect on their experiences over the preceding 30 days, in this work we base our predictions on physiological data collected from the 20- to 30-day window preceding each assessment. Shorter windows are used when the previous SMBM submission occurred less than 30 days earlier, ensuring no temporal overlap. Baseline assessments are excluded, as they lack prior physiological data and were modeled separately [46]. Each window is required to include at least 10 adherent days and 10 adherent nights. This requirement is guided by recent literature, showing that accurate estimates of monthly sleep metrics could be obtained from 10 daily observations [54]. We define adherent days as those with 70% of device wearing time, while adherent nights as nights with non-manually reported sleep summaries from the device, restricted to primary sleep sessions lasting at least two hours. The window extraction process is illustrated in Supplementary Figure S1.

Given the relatively limited amount of labeled data, we restrict our modeling to a focused subset of time-varying wearable-derived features that have demonstrated associations with burnout in prior research [2729, 31, 32]. Within the sleep domain, we include total sleep time, time spent awake after sleep onset, and restorative sleep percentage, defined as the combined proportion of deep and REM sleep. For cardiac parameters, we select a daily and a nocturnal feature: median heart rate while awake and rest heart rate, i.e., the lowest 30-min moving average while sleeping. Since direct HRV measures aren’t available, we incorporate the mean overnight Garmin stress score as a proxy [55]. To summarize daily activity patterns, we consider the percentage of time spent in a sedentary state. We also incorporate a subset of time-invariant features from the onboarding socio-demographic and occupational assessments: gender [38], age, BMI, relationship status (defined as being in a stable relationship or not), and binary occupational variables such as seniority, work type (production/office), and engagement in shift work [8]. For time-varying features and SMBM assessments we assess the intra-class correlation coefficient (ICC) [56], quantifying the proportion of variance attributable to between- vs. within-person differences, with values closer to 1 indicating greater between-person variability.

2.3 Burnout prediction

All models are trained and evaluated using stratified (nested) 10-fold cross-validation, with subject-wise splits [57], and stratification achieved by binning the target scores into quartiles. Throughout all experiments, SMBM components (PF, CW, and EE) are treated as distinct outcomes and predicted independently.

We consider two modeling approaches to predict burnout from physiological data. The first relies on aggregating features across the extracted time windows, similarly to what was done in [45, 46]. For each time-varying feature, we compute a set of summary statistics, including the mean, standard deviation, 25th and 75th percentiles, skewness, kurtosis, and the number of zero-crossings. These aggregated features, combined with time-invariant variables, are used as inputs to traditional machine learning models. To mitigate dimensionality issues, recursive feature selection [58] is performed to identify a subset of 15 features. As a second approach, we model the temporal dynamics of physiological features within each window using Gated Recurrent Unit (GRU) recurrent neural networks (RNNs) [59]. Daily observations serve as time steps, processing sequences of varying lengths by padding shorter windows and ignoring the padded time steps. Each sequence of daily observations is encoded into a fixed-length representation, which is concatenated with time-invariant features. This joint representation is passed through a feedforward neural network (FFN) with a single hidden layer and dropout to produce the final scalar output. The network described is represented in the upper branch of Figure 1. Training is performed using the Adam optimizer () [60] for up to 100 epochs, with early stopping based on validation loss and a patience of 10 epochs.

Figure 1

Diagram showing a neural network architecture with two sequences merging into a feed-forward network (FFN). The sequences start with nodes labeled \\( h_i,0 \\) and \\( h_j,0 \\), which process inputs \\( x_i,1 \\) to \\( x_i,t_i \\) and \\( x_j,1 \\) to \\( x_j,t_j \\), respectively. Each sequence then connects to circles labeled \\( z_i \\) and \\( z_j \\), each influenced by inputs \\( x_i,inv \\) and \\( x_j,inv \\). The outputs \\( \\haty_i \\) and \\( \\haty_j \\) are fed into a function \\( h(\\haty_i - \\haty_j) \\).

Pairwise ranking architecture for sequences and . Each sequence is processed by a recurrent encoder with shared weights. The encoder takes as input the time-dependent features , and computes a series of hidden states , . The final hidden states , are concatenated with the corresponding time-invariant features , , and passed through fully connected layers to produce scalar scores and . These scores are then compared using a comparison function to learn the relative ordering between the two instances.

We first evaluate both modeling strategies within standard classification and regression frameworks. For classification, burnout scores are dichotomized into “low” and “high” risk categories according to SMBM cut-off values (CW: 2.83, EE: 2.75, PF: 3.5) provided in [48]. In regression, the models are tasked with predicting the continuous SMBM component scores directly. We evaluate a different range of machine learning algorithms: for classification, we employ Logistic regression, Linear and Quadratic Discriminant Analysis, and Random Forest; for regression, we use Lasso and Ridge regressors, Elastic Net, and Random Forest. For sequential modeling, the architecture is held constant and only the loss function is adapted to the task. Classification models are trained to minimize binary cross-entropy and are evaluated using ROC-AUC and F1 score of the high-risk class. Regression models are trained to minimize mean squared error and are evaluated using mean absolute error (MAE) and Spearman’s [61]. The hyperparameter space explored for each model can be found in Supplementary Table S2.

We then frame burnout prediction as a ranking task. Learning to rank is a supervised machine learning framework that has evolved primarily within information retrieval systems but represents a general paradigm for problems where the primary objective is the accurate ordering of instances rather than absolute prediction [49, 62]. In ranking frameworks, training data consist of sets of items with partial ordering relationships specified between items, enabling models to learn relative preferences. In our context, ranking is expressed as a single query task, as it seeks to determine the relative burnout risk ordering among individuals within the given population. Unlike classification approaches, ranking eliminates the need to select potentially arbitrary thresholds for distinguishing between risk categories. Moreover, regression can be seen as the most basic form of ranking, i.e., point-wise ranking, as it treats each instance independently, ignoring the relational structure inherent in comparative assessments. Instead, in a pairwise ranking approach, models are trained on pairs of instances with the objective of learning their relative order. This setup is commonly framed as a binary classification task: for each pair , the model predicts whether should be ranked higher than . This is typically achieved by defining a shared scoring function that maps each instance to a real-valued score, and then comparing the scores and with a function . RankNet [51] is a seminal pairwise algorithm that formalizes this process: a feed-forward network computes the scores and and their difference is passed through a sigmoid to yield the probability that should be ranked above , with training based on binary cross-entropy loss (BCE). Building on the RankNet framework, we employ a siamese recurrent neural network [50, 59] specifically designed for ranking longitudinal data. Each branch of the siamese network, with weights shared across the branches, mirrors the GRU-based sequential model described earlier. Given two instances and , the network computes scores and , which are compared to infer their relative ordering. We report an intuitive schematic of the proposed architecture in Figure 1. This approach allows to leverage the strengths of pairwise ranking, namely avoidance of thresholds and the focus on relative ordering, while modeling temporal patterns in the data. We restrict training to pairs of windows where the absolute difference in the burnout scores is at least 0.5, acknowledging that small differences in self-reported scores may not be significant. We experiment with two different loss functions: BCE, as used in RankNet [51], and the Margin Ranking Loss (MRL), a margin-based alternative commonly used in learning-to-rank tasks:where denotes the ground-truth ordering: if instance should be ranked above , and otherwise. The margin hyperparameter , set to 0.5 in our experiments, defines the desired separation between the scores of correctly ranked pairs.

A detailed description of the training procedure for the siamese ranking approach is reported in Supplementary Algorithm 1.

To assess the effectiveness of ranking models, we employ Spearman correlation [61] and the Normalized Discounted Cumulative Gain (NDCG) [63]. NDCG [63] is a widely used position-sensitive metric in information retrieval that evaluates ranked lists by rewarding highly relevant items appearing earlier, with their contribution discounted logarithmically based on position. Formally:where is the ground-truth relevance score of the item at rank , and IDCG is the DCG of the ideal ranking.

3 Results

After excluding individuals with minimal data contribution (<14 cumulative days), 206 participants were retained for analysis. These participants provided a total of 579 valid SMBM responses, representing 28% of the expected entries. Following the exclusion of baseline assessments and the application of the adherence criteria detailed in Section 2, a final set of 258 valid windows from 129 participants was selected for modeling. Of these, 237 spanned the full 30-day period, while the remainder ranged from 20 to 29 days. Average night and daily adherence were 82.7% and 66.4%, respectively. Missing values in time-varying features were imputed using the within-window median, while feature aggregates were computed exclusively from non-imputed data. Supplementary Table S1 presents the ICC values for physiological features and SMBM assessments. SMBM scores displayed high between-person variance, with ICCs ranging from 0.65 to 0.71, supporting its temporal stability [64]. Physiological features related to sleep, activity, and stress showed lower ICCs, reflecting greater within-person variability, while cardiac features showed higher between-person variability. Burnout risk scores were derived from participants’ SMBM responses by computing individual scores for each SMBM subscale and calculating the overall burnout score as their weighted average [10, 18]. Across the analyzed windows, the mean  SD scores were for PF, for CW, for EE, and for the overall burnout score. Modeling the overall burnout score directly, either as a composite metric or via joint modeling of the three subscales, consistently underperformed compared to modeling each component independently. Accordingly, we focus on results from separate subscale modeling. All statistical tests in the following sub-sections refer to one-sided Wilcoxon signed-rank tests, with a significance level .

3.1 Classification

Table 1 presents the ROC-AUC and F1-scores for the high-risk class across ten outer folds of nested cross-validation. As a baseline, we include a biased random guesser (BRG) that samples outcomes based on the empirical label distribution. The best performances obtained for the PF and CW components are comparable, significantly surpassing the baseline, with the GRU model achieving the highest scores by a small margin. Median ROC-AUCs for GRU are 0.66 (PF) and 0.67 (CW), with corresponding median F1-scores of 0.67 and 0.62. For EE, all models perform worse; logistic regression achieves the highest ROC-AUC (0.56), but does not significantly outperform BRG, and F1-scores remain uniformly low.

Table 1

Model PF CW EE
ROC-AUC F1 ROC-AUC F1 ROC-AUC F1
BRG
LR
LDA
QDA
RF
GRU

Median (p25–p75) ROC-AUC and F1-scores for binary classification of burnout risk across the three SMBM components.

BRG, biased random guesser; LR, logistic regression; LDA, linear discriminant analysis; QDA, quadratic discriminant analysis; RF, random forest; GRU, gated recurrent unit neural network.

Best results for each component and metric are specified in bold.

Represents significant improvement with respect to BRG.

3.2 Regression

Table 2 shows MAE and Spearman’s (), aggregated across outer folds and 10 random seeds. MAE improvements over a naïve predictor (always predicting the population mean for each subscale) were minimal and not statistically significant. For PF, the GRU model achieves the highest median (p25, p75) Spearman’s at 0.25 (0.11, 0.40), and for CW, ElasticNet reached 0.28 (0.11, 0.38); both results were significantly greater than zero. EE remained the most challenging, with Ridge regression yielding the best result, though not statistically distinguishable from zero.

Table 2

Model PF CW EE
MAE Spearman’s MAE Spearman’s MAE Spearman’s
Naïve
Lasso
Ridge
ElasticNet
RF
GRU

Median (p25–p75) Mean Absolute Error (MAE) and Spearman’s for regression prediction of burnout severity across the three SMBM components.

Best results for each component and metric are specified in bold.

Represents significant difference from zero.

3.3 Ranking

Table 3 summarizes Spearman’s () and NDCG across outer folds and 10 random seeds, assessing models’ ability to rank participants by burnout severity. For comparison, we include the average NDCG of a random ranker, computed from 1,000 random rankings per outer fold, and a baseline ranker, i.e., a feed-forward network using only time-invariant features, to assess the added value of wearable-derived data. Notably, the results from the point-wise ranking approach match those obtained from regression, given their equivalence in this context. For the siamese GRU model (S-GRU), we report results using both BCE (S-GRUBCE) and MRL (S-GRUMRL) losses. The latter yields the best predictive performance for the PF and CW subscales, with median (p25, p75) Spearman’s values respectively of 0.29 (0.11, 0.46) and 0.25 (0.03, 0.42), and corresponding NDCG scores of 0.93 (0.90, 0.95) and 0.89 (0.85, 0.92), significantly outperforming both the random and baseline rankers. For the EE subscale, the highest performance is achieved by the baseline model using only time-invariant features, though this result is not significantly better than random.

Table 3

Model PF CW EE
Spearman’s NDCG Spearman’s NDCG Spearman’s NDCG
Random
Baseline
GRU
S-GRUBCE
S-GRUMRL

Median (p25–p75) Spearman’s and Normalized Discounted Cumulative Gain (NDCG) for ranking prediction of burnout risk across the three SMBM components.

Best results for each component and metric are specified in bold.

Represents significant improvement with respect to the random ranker.

Represents significant improvement with respect to the time-invariant baseline.

4 Discussion

When framing the burnout prediction task as binary classification, our findings are broadly consistent with those of our previous work [46], where baseline burnout levels were predicted based on subsequent data, effectively reversing the temporal structure compared to the approach proposed here. We hypothesize that such reversal has limited impact due to burnout’s considerable temporal stability [64]. Relying solely on unobtrusive features, binary classification yields modest predictive performance: PF and CW subscales are captured more accurately than EE, with our best-performing models achieving comparable median ROC AUC for PF (0.66 vs. 0.68), and slight improvements for CW (0.67 vs. 0.61) and EE (0.56 vs. 0.53), though EE performance remains insufficient for practical use, being not significantly better than a random baseline. Moreover, as highlighted also by [65], dichotomizing continuous scores based on predefined thresholds can obscure meaningful variation by discarding the underlying ordinal structure and weakening alignment with psychometrically validated survey instruments. This is avoided when framing burnout prediction as a regression task; however our experiments show that improvements in mean absolute error over a naïve predictor are minimal and not statistically significant. A more nuanced picture emerges when evaluating models using Spearman’s : while absolute predictions remain imprecise, they nonetheless capture meaningful relative structure in the data. This pattern is consistent with recent findings in short-term perceived stress estimation [65], where machine learning regressors failed to outperform baseline models on Symmetric Mean Absolute Percentage Error (SMAPE) but achieved better alignment in terms of rank correlation. Building on these insights, when the goal is to assign scores which reflect individuals’ relative position in terms of burnout risk, rather than to recover exact questionnaire values, explicitly modeling the relative order may be more appropriate. To test this, we reframed burnout prediction as a ranking task and evaluated whether models could capture the ordinal structure of the target scores more effectively. Our results show that the pairwise ranking approach using siamese recurrent networks outperforms the corresponding point-wise sequential model. Specifically, the siamese GRU trained with margin ranking loss (S-GRUMRL) achieves the best overall performance (PF: , NDCG = ; CW: , NDCG ), slightly outperforming its variant trained with binary cross-entropy (S-GRUBCE), and clearly outperforming the baseline ranker that relies solely on demographic and occupational features, highlighting the added value of wearable-derived physiological trajectories. Notably, while pairwise modeling enhances performance in the sequential modeling setting and yields the best results for PF, the highest rank correlation for CW is still achieved by the ElasticNet regression model (), which relies on aggregated features. Overall, the results obtained correspond to weak-to-moderate effect sizes [66] for ordinal modeling of PF and CW using unobtrusive data, while EE remains largely out of reach across all frameworks. These findings are broadly consistent with recent work on perceived daily stress prediction from wearable devices [65], which achieved at most . Furthermore, unobtrusive modeling could be seen as a lower bound, with room for improvement through the integration of additional data sources.

We acknowledge several limitations in our study and outline directions for improvement. Although the sample size was adequate for exploratory modeling, it constrains both the generalizability of the findings and the predictive performance, especially for deep learning models, which typically benefit from larger datasets. Moreover, the participant cohort did not include individuals clinically diagnosed with burnout; applying the proposed methods to clinically characterized populations, including individuals currently experiencing burnout, those in recovery, and healthy controls, could yield more robust insights. Furthermore, the duration of the observation period may be suboptimal for capturing the gradual onset and progression of burnout [18, 64]. In this context, sustained adherence to the study protocol is especially critical, as missing data—and the imputation procedures required to address it—may introduce subtle biases. This concern is particularly relevant for sequential modeling, where patterns of missingness may themselves correlate with adverse underlying behavioral or physiological states. Regarding feature representation, several unobtrusive yet informative variables previously associated with chronic stress and burnout were unavailable for modeling. These include HRV metrics [33], contextual mobile sensing data (e.g., location entropy, home-work transitions, social proximity) [67], and time-invariant characteristics such as personality traits [68]. Finally, while some burnout sub-dimensions such as PF and CW appear amenable to unobtrusive prediction, others, like EE, are harder to infer passively, potentially requiring self-reported information about life events, either collected via validated surveys [43, 44] or inferred through conversational agents or other LLM-based technologies [69]. We also highlight that, while wearable-based burnout prediction systems can provide valuable organizational and individual insights, their deployment in workplace contexts necessitates robust privacy safeguards, transparent data handling practices, and strong ethical oversight to maintain employee confidence and protect worker rights [70, 71].

Regarding future potential iterations, the current pairwise-ranking siamese recurrent model, akin to the original RankNet formulation [51], treats all pairs of instances with equal importance, regardless of their position in the target ranking. While this uniform weighting can be appropriate for learning globally consistent burnout risk scores, it may be suboptimal when prioritizing individuals at highest risk. This suggests exploring ranking methods that optimize more effectively position-sensitive metrics (e.g., NDCG), for example by adapting metric-aware -gradient formulations to recurrent models [51, 72, 73].

Statements

Data availability statement

The datasets presented in this article are not readily available but Research data collected, data processing, and machine learning pipeline for this study can be made available upon reasonable request to M.G. Request for data will be evaluated and responded to in a manner consistent with the study protocol. Requests to access the datasets should be directed to Max Grossenbacher, .

Ethics statement

The studies involving humans were approved by protocol number 628-2023-SPER-AUSLBO, Comitato Etico Area Vasta Emilia Centro, Italy; Clarification of Responsibility Req-2023-00377, SwissEthics, Switzerland. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

AD: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing, Visualization. DM: Conceptualization, Data curation, Investigation, Visualization, Writing – original draft, Writing – review & editing. RŠ: Conceptualization, Writing – original draft, Writing – review & editing. JG: Writing – original draft, Writing – review & editing. VK: Software, Writing – original draft, Writing – review & editing. MG: Funding acquisition, Project administration, Writing – original draft, Writing – review & editing. FF: Funding acquisition, Project administration, Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declared financial support was received for this work and/or its publication. AD, DM, RŠ, and FF were funded by the Swiss State Secretariat for Education, Research, and Innovation (SEFRI), Aramis Project Number 62423.1 IP-LS.

Acknowledgments

The authors would like to thank Philip Morris International, Psy Bern AG, and Resilient SA for their support in the recruitment of the study participants and in the data collection phase.

Conflict of interest

MG, VK, and JG are respectively CEO, CTO, and shareholder of Resilient SA. Resilient SA is a company developing digital solutions for burnout prevention.

The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdgth.2025.1694666/full#supplementary-material

References

  • 1.

    World Health Organization. International statistical classification of diseases and related health problems (2019). Available online at: https://www.who.int/news/item/28-05-2019-burn-out-an-occupational-phenomenon-international-classification-of-diseases (Accessed May 10, 2023).

  • 2.

    Edú-Valsania S Laguía A Moriano JA . Burnout: a review of theory and measurement. Int J Environ Res Public Health. (2022) 19:1780. 10.3390/ijerph19031780

  • 3.

    Woo T Ho R Tang A Tam W . Global prevalence of burnout symptoms among nurses: a systematic review and meta-analysis. J Psychiatr Res. (2020) 123:920. 10.1016/j.jpsychires.2019.12.015

  • 4.

    Naji L Singh B Shah A Naji F Dennis B Kavanagh O , et al. Global prevalence of burnout among postgraduate medical trainees: a systematic review and meta-regression. Can Med Assoc Open Access J. (2021) 9:E189E200. 10.9778/cmajo.20200068

  • 5.

    Bykov KV Zrazhevskaya IA Topka EO Peshkin VN Dobrovolsky AP Isaev RN , et al. Prevalence of burnout among psychiatrists: a systematic review and meta-analysis. J Affect Disord. (2022) 308:4764. 10.1016/j.jad.2022.04.005

  • 6.

    Shanafelt TD West CP Dyrbye LN Trockel M Tutty M Wang H , et al. Changes in burnout and satisfaction with work-life integration in physicians during the first 2 years of the COVID-19 pandemic. In: Elsevier, editors. Mayo Clinic Proceedings. Vol. 97. Elsevier (2022). p. 2248–58.

  • 7.

    Schutte N Toppinen S Kalimo R Schaufeli W . The factorial validity of the Maslach burnout inventory-general survey (MBI-GS) across occupational groups and nations. J Occup Organ Psychol. (2000) 73:5366. 10.1348/096317900166877

  • 8.

    Hulsegge G van Mechelen W Proper KI Paagman H Anema JR . Shift work, and burnout and distress among 7798 blue-collar workers. Int Arch Occup Environ Health. (2020) 93:95563. 10.1007/s00420-020-01536-3

  • 9.

    Hammarström P Rosendahl S Gruber M Nordin S . Somatic symptoms in burnout in a general adult population. J Psychosom Res. (2023) 168:111217. 10.1016/j.jpsychores.2023.111217

  • 10.

    Melamed S Shirom A Toker S Shapira I . Burnout and risk of type 2 diabetes: a prospective study of apparently healthy employed persons. Psychosom Med. (2006) 68:8639. 10.1097/01.psy.0000242860.24009.f0

  • 11.

    Melamed S Shirom A Toker S Berliner S Shapira I . Burnout and risk of cardiovascular disease: evidence, possible causal paths, and promising research directions. Psychol Bull. (2006) 132:327. 10.1037/0033-2909.132.3.327

  • 12.

    Wekenborg MK Hill LK Thayer JF Penz M Wittling RA Kirschbaum C . The longitudinal association of reduced vagal tone with burnout. Biopsychosoc Sci Med. (2019) 81:7918. 10.1097/PSY.0000000000000750

  • 13.

    Koutsimani P Montgomery A Georganta K . The relationship between burnout, depression, and anxiety: a systematic review and meta-analysis. Front Psychol. (2019) 10:429219. 10.3389/fpsyg.2019.00284

  • 14.

    Glass D McKnight J . Perceived control, depressive symptomatology, and professional burnout: a review of the evidence. Psychol Health. (1996) 11:2348. 10.1080/08870449608401975

  • 15.

    King DD Newman A Luthans F . Not if, but when we need resilience in the workplace. J Organ Behav. (2016) 37:7826. 10.1002/job.2063

  • 16.

    Brunner B Igic I Keller AC Wieser S . Who gains the most from improving working conditions? health-related absenteeism and presenteeism due to stress at work. Eur J Health Econ. (2019) 20:116580. 10.1007/s10198-019-01084-9

  • 17.

    Maslach C Jackson SE . The measurement of experienced burnout. J Organ Behav. (1981) 2:99113. 10.1002/job.4030020205

  • 18.

    Shirom A . Burnout in work organizations. In: John Wiley & Sons, editors. International Review of Industrial and Organizational Psychology 1989. Oxford, England: John Wiley & Sons (1989). p. 25–48.

  • 19.

    Gerber M Colledge F Mücke M Schilling R Brand S Ludyga S . Psychometric properties of the shirom-melamed burnout measure (SMBM) among adolescents: results from three cross-sectional studies. BMC Psychiatry. (2018) 18:113. 10.1186/s12888-018-1841-5

  • 20.

    Kristensen TS Borritz M Villadsen E Christensen KB . The Copenhagen burnout inventory: a new tool for the assessment of burnout. Work Stress. (2005) 19:192207. 10.1080/02678370500297720

  • 21.

    Khakurel J Melkas H Porras J . Tapping into the wearable device revolution in the work environment: a systematic review. Inf Technol People. (2018) 31:791818. 10.1108/ITP-03-2017-0076

  • 22.

    Wilton AR Sheffield K Wilkes Q Chesak S Pacyna J Sharp R , et al. The burnout prediction using wearable and artificial intelligence (BROWNIE) study: a decentralized digital health protocol to predict burnout in registered nurses. BMC Nurs. (2024) 23:114. 10.1186/s12912-024-01711-8

  • 23.

    Mohr DC Zhang M Schueller SM . Personal sensing: understanding mental health using ubiquitous sensors and machine learning. Annu Rev Clin Psychol. (2017) 13:2347. 10.1146/annurev-clinpsy-032816-044949

  • 24.

    Hickey BA Chalmers T Newton P Lin CT Sibbritt D McLachlan CS , et al. Smart devices and wearable technologies to detect and monitor mental health conditions and stress: a systematic review. Sensors. (2021) 21:3461. doi: 10.3390/s21103461.10.3390/s21103461

  • 25.

    Fedor S Lewis R Pedrelli P Mischoulon D Curtiss J Picard RW . Wearable technology in clinical practice for depressive disorder. N Engl J Med. (2023) 389:245766. doi: 10.1056/nejmra2215898.10.1056/NEJMra2215898

  • 26.

    Barac M Scaletty S Hassett LC Stillwell A Croarkin PE Chauhan M , et al. Wearable technologies for detecting burnout and well-being in health care professionals: scoping review. J Med Internet Res. (2024) 26:e50253. 10.2196/50253

  • 27.

    Ekstedt M Söderström M Åkerstedt T Nilsson J Søndergaard HP Aleksander P . Disturbed sleep and fatigue in occupational burnout. Scand J Work Environ Health. (2006) 32:12131. 10.5271/sjweh.987

  • 28.

    Grossi G Perski A Osika W Savic I . Stress-related exhaustion disorder–clinical manifestation of burnout? a review of assessment methods, sleep impairments, cognitive disturbances, and neuro-biological and physiological changes in clinical burnout. Scand J Psychol. (2015) 56:62636. 10.1111/sjop.12251

  • 29.

    Söderström M Jeding K Ekstedt M Perski A Åkerstedt T . Insufficient sleep predicts clinical burnout. J Occup Health Psychol. (2012) 17:175. 10.1037/a0027518

  • 30.

    Toker S Biron M . Job burnout and depression: unraveling their temporal relationship and considering the role of physical activity. J Appl Psychol. (2012) 97:699. 10.1037/a0026914

  • 31.

    Naczenski LM de Vries JD van Hooff ML Kompier MA . Systematic review of the association between physical activity and burnout. J Occup Health. (2017) 59:47794. 10.1539/joh.17-0050-RA

  • 32.

    De Looff P Cornet L Embregts P Nijman H Didden H . Associations of sympathetic and parasympathetic activity in job stress and burnout: a systematic review. PLoS ONE. (2018) 13:e0205741. 10.1371/journal.pone.0205741

  • 33.

    Lennartsson AK Jonsdottir I Sjörs A . Low heart rate variability in patients with clinical burnout. Int J Psychophysiol. (2016) 110:1718. 10.1016/j.ijpsycho.2016.08.005

  • 34.

    Martinez GJ Grover T Mattingly SM Mark G D’Mello S Aledavood T , et al. Alignment between heart rate variability from fitness trackers and perceived stress: perspectives from a large-scale in situ longitudinal study of information workers. JMIR Hum Factors. (2022) 9:e33754. 10.2196/33754

  • 35.

    Vos G Trinh K Sarnyai Z Azghadi MR . Generalizable machine learning for stress monitoring from wearable devices: a systematic literature review. Int J Med Inform. (2023) 173:105026. 10.1016/j.ijmedinf.2023.105026

  • 36.

    Xiang JZ Wang QY Fang ZB Esquivel JA Su ZX . A multi-modal deep learning approach for stress detection using physiological signals: integrating time and frequency domain features. Front Physiol. (2025) 16:1584299. 10.3389/fphys.2025.1584299

  • 37.

    Georgiou K Larentzakis AV Khamis NN Alsuhaibani GI Alaska YA Giallafos EJ . Can wearable devices accurately measure heart rate variability? a systematic review. Folia Med (Plovdiv). (2018) 60:720. 10.2478/folmed-2018-0012

  • 38.

    Houkes I Winants Y Twellaar M Verdonk P . Development of burnout over time and the causal order of the three dimensions of burnout among male and female GPS. A three-wave panel study. BMC Public Health. (2011) 11:113. 10.1186/1471-2458-11-240

  • 39.

    Norlund S Reuterwall C Höög J Lindahl B Janlert U Birgander LS . Burnout, working conditions and gender-results from the Northern Sweden MONICA Study. BMC Public Health. (2010) 10:19. 10.1186/1471-2458-10-326

  • 40.

    Rushton CH Batcheller J Schroeder K Donohue P . Burnout and resilience among nurses practicing in high-intensity settings. Am J Crit Care. (2015) 24:41220. 10.4037/ajcc2015291

  • 41.

    Mahmoud NN Rothenberger D . From burnout to well-being: a focus on resilience. Clin Colon Rectal Surg. (2019) 32:41523. 10.1055/s-0039-1692710

  • 42.

    Adler DA Tseng VWS Qi G Scarpa J Sen S Choudhury T . Identifying mobile sensing indicators of stress-resilience. Proc ACM Interact Mob Wear Ubiquitous Technol. (2021) 5:132. 10.1145/3463528

  • 43.

    Chmitorz A Neumann R Kollmann B Ahrens K Öhlschläger S Goldbach N , et al. Longitudinal determination of resilience in humans to identify mechanisms of resilience to modern-life stressors: the longitudinal resilience assessment (LORA) study. Eur Arch Psychiatry Clin Neurosci. (2021) 271:103551. 10.1007/s00406-020-01159-2

  • 44.

    Kalisch R Köber G Binder H Ahrens KF Basten U Chmitorz A , et al. The frequent stressor and mental health monitoring-paradigm: a proposal for the operationalization and measurement of resilience and the identification of resilience processes in longitudinal observational studies. Front Psychol. (2021) 12:710493. 10.3389/fpsyg.2021.710493

  • 45.

    Krause F van Leeuwen J Bögemann S Tutunji R Roelofs K van Kraaij A , et al. Predicting resilience from psychological and physiological daily-life measures (2023).

  • 46.

    Marzorati D Dei Rossi A Svihrova R Grossenbacher M Faraci FD . Burnout risk prediction through wearable devices: an initial assessment. In Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Copenhagen: IEEE Engineering in Medicine and Biology Society. Annual International Conference (2025). Vol. 2025, p. 15. 10.1109/EMBC58623.2025.11252971

  • 47.

    Weiser M . The computer for the 21st century. ACM SIGMOBILE Mob Comput Commun Rev. (1999) 3:311. 10.1145/329124.329126

  • 48.

    University of Zurich Specialist Center for Disaster and Military Psychiatry. SMBM evaluation (2023). Available online at: https://www.fzkwp.uzh.ch/de/services/Stressmgmt/ZEP-1/AuswertungBurnout.html (Accessed January 4, 2025).

  • 49.

    Li H . A short introduction to learning to rank. IEICE Trans Inf Syst. (2011) 94:185462. 10.1587/transinf.E94.D.1854

  • 50.

    Bromley J Guyon I LeCun Y Säckinger E Shah R . Signature verification using a “siamese” time delay neural network. Adv Neural Inf Process Syst. (1993) 6.

  • 51.

    Burges C Shaked T Renshaw E Lazier A Deeds M Hamilton N , et al. Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning (2005). p. 89–96.

  • 52.

    World Medical Association. World medical association declaration of Helsinki: ethical principles for medical research involving human subjects. Jama. (2013) 310:21914. 10.1001/jama.2013.281053

  • 53.

    Marzorati D Rossi AD Švihrová R Kochergin V Grossenbacher M Faraci F . In-the-wild data collection with digital apps and wearable devices: insights from a longitudinal study on burnout with office and production workers (2025). Currently under review.

  • 54.

    Lau T Ong JL Ng BK Chan LF Koek D Tan CS , et al. Minimum number of nights for reliable estimation of habitual sleep using a consumer sleep tracker. Sleep Adv. (2022) 3:zpac026. 10.1093/sleepadvances/zpac026

  • 55.

    Myllymaki T , Stress and Recovery Analysis Method Based on 24-hour Heart Rate Variability. Firstbeat Technologies Ltd (2014).

  • 56.

    Hoffman L , Longitudinal Analysis: Modeling Within-Person Fluctuation and Change. Routledge (2015).

  • 57.

    Hammerla NY Plötz T . Let’s (not) stick together: pairwise similarity biases cross-validation in activity recognition. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (2015). p. 1041–51.

  • 58.

    Guyon I Weston J Barnhill S Vapnik V . Gene selection for cancer classification using support vector machines. Mach Learn. (2002) 46:389422. 10.1023/A:1012487302797

  • 59.

    Cho K Van Merriënboer B Gulcehre C Bahdanau D Bougares F Schwenk H , et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv [Preprint] arXiv:1406.1078 (2014).

  • 60.

    Kingma DP Ba J . Adam: a method for stochastic optimization. arXiv [Preprint] arXiv:1412.6980 (2014).

  • 61.

    Spearman C . The proof and measurement of association between two things (1961).

  • 62.

    Liu TY . Learning to rank for information retrieval. Found Trends® Inf Retr. (2009) 3:225331. 10.1561/1500000016

  • 63.

    Järvelin K Kekäläinen J . IR evaluation methods for retrieving highly relevant documents. In: ACM SIGIR Forum. Vol. 51. New York, NY, USA: ACM (2017). p. 243–50.

  • 64.

    Schaufeli W Enzmann D , The Burnout Companion to Study and Practice: A Critical Analysis. CRC Press (2020).

  • 65.

    Booth BM Vrzakova H Mattingly SM Martinez GJ Faust L D’Mello SK . Toward robust stress prediction in the age of wearables: modeling perceived stress in a longitudinal study with information workers. IEEE Trans Affect Comput. (2022) 13:220117. 10.1109/TAFFC.2022.3188006

  • 66.

    Cohen J , Statistical Power Analysis for the Behavioral Sciences. Routledge (2013).

  • 67.

    Mishra V Hao T Sun S Walter KN Ball MJ Chen CH , et al. Investigating the role of context in perceived stress detection in the wild. In: Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers (2018). p. 1708–16.

  • 68.

    Swider BW Zimmerman RD . Born to burnout: a meta-analytic path model of personality, job burnout, and work outcomes. J Vocat Behav. (2010) 76:487506. 10.1016/j.jvb.2010.01.003

  • 69.

    Vaidyam AN Wisniewski H Halamka JD Kashavan MS Torous JB . Chatbots and conversational agents in mental health: a review of the psychiatric landscape. Can J Psychiatry. (2019) 64:45664. 10.1177/0706743719828977

  • 70.

    Tindale LC Chiu D Minielly N Hrincu V Talhouk A Illes J . Wearable biosensors in the workplace: perceptions and perspectives. Front Digit Health. (2022) 4:800367. 10.3389/fdgth.2022.800367

  • 71.

    Roossien CC de Jong M Bonvanie AM Maeckelberghe ELM . Ethics in design and implementation of technologies for workplace health promotion: a call for discussion. Front Digit Health. (2021) 3:644539. 10.3389/fdgth.2021.644539

  • 72.

    Burges C Ragno R Le Q . Learning to rank with nonsmooth cost functions. Adv Neural Inf Process Syst. (2006) 19.

  • 73.

    Donmez P Svore KM Burges CJ . On the local optimality of lambdarank. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2009). p. 460–7.

Summary

Keywords

burnout, machine learning, ranking, siamese architecture, wearable devices

Citation

Dei Rossi A, Marzorati D, Švihrová R, Grossenbacher J, Kochergin V, Grossenbacher M and Faraci F (2026) Climbing the ladder: a ranking approach to burnout prediction. Front. Digit. Health 7:1694666. doi: 10.3389/fdgth.2025.1694666

Received

28 August 2025

Revised

01 December 2025

Accepted

08 December 2025

Published

02 January 2026

Volume

7 - 2025

Edited by

N. Sertac Artan, New York Institute of Technology, Old Westbury, United States

Reviewed by

Brandon Booth, University of Colorado Boulder, Boulder, United States

Junzhi Xiang, The First Affiliated Hospital of Wenzhou Medical University, China

Updates

Copyright

* Correspondence: Alvise Dei Rossi

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics