AUTHOR=Ciharova Marketa , Amarti Khadicha , van Breda Ward , Gevonden Martin J. , Ghassemi Sina , Kleiboer Annet , Vinkers Christiaan H. , Sep Milou S. C. , Trofimova Sophia , Cooper Alexander C. , Peng Xianhua , Schulte Mieke , Karyotaki Eirini , Cuijpers Pim , Riper Heleen 

TITLE=Machine-learning detection of stress severity expressed on a continuous scale using acoustic, verbal, visual, and physiological data: lessons learned

JOURNAL=Frontiers in Psychiatry

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2025.1548287

DOI=10.3389/fpsyt.2025.1548287

ISSN=1664-0640

ABSTRACT=BackgroundEarly detection of elevated acute stress is necessary if we aim to reduce consequences associated with prolonged or recurrent stress exposure. Stress monitoring may be supported by valid and reliable machine-learning algorithms. However, investigation of algorithms detecting stress severity on a continuous scale is missing due to high demands on data quality for such analyses. Use of multimodal data, meaning data coming from multiple sources, might contribute to machine-learning stress severity detection. We aimed to detect laboratory-induced stress using multimodal data and identify challenges researchers may encounter when conducting a similar study.MethodsWe conducted a preliminary exploration of performance of a machine-learning algorithm trained on multimodal data, namely visual, acoustic, verbal, and physiological features, in its ability to detect stress severity following a partially automated online version of the Trier Social Stress Test. College students (n = 42; M age = 20.79, 69% female) completed a self-reported stress visual analogue scale at five time-points: After the initial resting period (P1), during the three stress-inducing tasks (i.e., preparation for a presentation, a presentation task, and an arithmetic task, P2-4) and after a recovery period (P5). For the whole duration of the experiment, we recorded the participants’ voice and facial expressions by a video camera and measured cardiovascular and electrodermal physiology by an ambulatory monitoring system. Then, we evaluated the performance of the algorithm in detection of stress severity using 3 combinations of visual, acoustic, verbal, and physiological data collected at each of the periods of the experiment (P1-5).ResultsParticipants reported minimal (P1, M = 21.79, SD = 17.45) to moderate stress severity (P2, M = 47.95, SD = 15.92), depending on the period at hand. We found a very weak association between the detected and observed scores (r2 = .154; p = .021). In our post-hoc analysis, we classified participants into categories of stressed and non-stressed individuals. When applying all available features (i.e., visual, acoustic, verbal, and physiological), or a combination of visual, acoustic and verbal features, performance ranged from acceptable to good, but only for the presentation task (accuracy up to.71, F1-score up to.73).ConclusionsThe complexity of input features needed for machine-learning detection of stress severity based on multimodal data requires large sample sizes with wide variability of stress reactions and inputs among participants. These are difficult to recruit for laboratory setting, due to high time and effort demands on the side of both researcher and participant. Resources needed may be decreased using automatization of experimental procedures, which may, however, lead to additional technological challenges, potentially causing other recruitment setbacks. Further investigation is necessary, with the emphasis on quality ground truth, i.e., gold standard (self-report) instruments, but also outside of laboratory experiments, mainly in general populations and mental health care patients.