Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Comput. Sci.

Sec. Human-Media Interaction

Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1629725

This article is part of the Research TopicEmotional Intelligence AI in Mental HealthView all 6 articles

Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge

Provisionally accepted
Yupei  LiYupei Li1*Shuaijie  ShaoShuaijie Shao2Manuel  MillingManuel Milling3,4BjÖrn  W. SchullerBjÖrn W. Schuller1,3,4,5
  • 1Imperial College London, London, United Kingdom
  • 2University College London, London, United Kingdom
  • 3CHI – Chair of Health Informatics, TUM University Hospital, Munich, Germany
  • 4MCML – Munich Center for Machine Learning, Munich, Germany
  • 5MDSI – Munich Data Science Institute, Munich, Germany

The final, formatted version of the article will be published soon.

Depression is a growing concern gaining attention in both public discourse and AI research.While deep neural networks (DNNs) have been used for its recognition, they still lack real-world effectiveness. Large language models (LLMs) show strong potential but require domain-specific fine-tuning and struggle with non-textual cues. Since depression is often expressed through vocal tone and behaviour rather than explicit text, relying on language alone is insufficient. Diagnostic accuracy also suffers without incorporating psychological expertise. To address these limitations, we present, to the best of our knowledge, the first application of LLMs to multimodal depression detection using the DAIC-WOZ dataset. We extract the audio features using the pre-trained model Wav2Vec, and mapped it to text-based LLMs for further processing. We also propose a novel strategy for incorporating psychological knowledge into LLMs to enhance diagnostic performance, specifically using a question and answer set to grant authorised knowledge to LLMs.Our approach yields a notable improvement in both Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) compared to a base score proposed by the related original paper. The codes are available in Github.

Keywords: Large language models, Depression recognition, Psychological knowledge, spoken language, Speech

Received: 16 May 2025; Accepted: 07 Aug 2025.

Copyright: © 2025 Li, Shao, Milling and Schuller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Yupei Li, Imperial College London, London, United Kingdom

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.