Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Psychiatry

Sec. Digital Mental Health

Volume 16 - 2025 | doi: 10.3389/fpsyt.2025.1656292

This article is part of the Research TopicEmotional Intelligence AI in Mental HealthView all 8 articles

Predicting Affective Engagement and Mental Strain from Prosodic Speech Features

Provisionally accepted
  • 1University Hospital RWTH Aachen, Aachen, Germany
  • 2LVR-Klinik Koln, Cologne, Germany

The final, formatted version of the article will be published soon.

Background: Emotional resilience and cognitive load (the mental effort for processing information) are critical aspects of mental health functioning. Traditional assessment methods, such as physiological sensors and post-task surveys, often disrupt natural behavior and fail to provide real-time insights. Speech prosody, offer a non-intrusive alternative for evaluating these psychological constructs. However, the relationship between speech prosody, emotional resilience, and cognitive load remains underexplored. Objective: This study proposes proxy measures for these constructs based on self-reported engagement, enjoyment, boredom, and cognitive effort during dyadic conversation. By leveraging the SEWA database, developed through a European research project on emotion recognition, the research seeks to develop machine learning models that correlate speech patterns with subjective self-reports of emotional and cognitive states. Methods: Prosodic features were extracted from the SEWA database recordings. These features are then normalized to account for inter-speaker variability and used as predictors in machine learning models. Regression and classification models are employed to correlate speech features with subjective self-reports, which serve as ground truth for Positive Affective Engagement and Perceived Mental Strain. Data from English and German speakers are analyzed separately to account for linguistic and cultural differences. Outcomes: The study establishes a significant relationship between speech prosody and psychological states, demonstrating that Positive Affective Engagement and Perceived Mental Strain can be effectively predicted through prosodic features. Higher emotional resilience is linked to more discernible prosodic patterns in German speech, such as higher loudness and greater voice probability consistency. In contrast, cognitive load prediction remains consistent across English and German datasets. Conclusion: This research introduces a novel approach for assessing Positive Affective Engagement and Perceived Mental Strain through speech prosody, highlighting the significant impact of language-specific variations. By combining prosodic features with machine learning techniques, the study offers a promising alternative to traditional psychological assessments. The findings emphasize the need for tailored, multilingual models to accurately estimate psychological states, with potential applications in mental health monitoring, cognitive workload analysis, and human-computer interaction. This work lays the foundation for future innovations in speech-based psychological profiling, advancing our understanding of human emotional and cognitive states in diverse linguistic contexts.

Keywords: speech prosody, positive affective engagement, Perceived Mental Strain, Machine Learning in Mental Health, ProsodicFeature Extraction, human-computer interaction

Received: 29 Jun 2025; Accepted: 26 Aug 2025.

Copyright: © 2025 Yache, Moradbakhti and Veselinovic. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Vaishnavi Prakash Yache, University Hospital RWTH Aachen, Aachen, Germany

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.