AUTHOR=Lalk Christopher , Targan Kim , Steinbrenner Tobias , Schaffrath Jana , Eberhardt Steffen , Schwartz Brian , Vehlen Antonia , Lutz Wolfgang , Rubel Julian 

TITLE=Employing large language models for emotion detection in psychotherapy transcripts

JOURNAL=Frontiers in Psychiatry

VOLUME=Volume 16 - 2025

YEAR=2025

URL=https://www.frontiersin.org/journals/psychiatry/articles/10.3389/fpsyt.2025.1504306

DOI=10.3389/fpsyt.2025.1504306

ISSN=1664-0640

ABSTRACT=PurposeIn the context of psychotherapy, emotions play an important role both through their association with symptom severity, as well as their effects on the therapeutic relationship. In this analysis, we aim to train a large language model (LLM) for the detection of emotions in German speech. We want to apply this model on a corpus of psychotherapy transcripts to predict symptom severity and alliance aiming to identify the most important emotions for the prediction of symptom severity and therapeutic alliance.MethodsWe employed a public labeled dataset of 28 emotions and translated the dataset into German. A pre-trained LLM was then fine-tuned on this dataset for emotion classification. We applied the fine-tuned model to a dataset containing 553 psychotherapy sessions of 124 patients. Using machine learning (ML) and explainable artificial intelligence (AI), we predicted symptom severity and alliance by the detected emotions.ResultsOur fine-tuned model achieved modest classification performance (F1macro =0.45, Accuracy=0.41, Kappa=0.42) across the 28 emotions. Incorporating all emotions, our ML model showed satisfying performance for the prediction of symptom severity (r = .50; 95%-CI:.42,.57) and moderate performance for the prediction of alliance scores (r = .20; 95%-CI:.06,.32). The most important emotions for the prediction of symptom severity were approval, anger, and fear. The most important emotions for the prediction of alliance were curiosity, confusion, and surprise.ConclusionsEven though the classification results were only moderate, our model achieved a good performance especially for prediction of symptom severity. The results confirm the role of negative emotions in the prediction of symptom severity, while they also highlight the role of positive emotions in fostering a good alliance. Future directions entail the improvement of the labeled dataset, especially with regards to domain-specificity and incorporating context information. Additionally, other modalities and Natural Language Processsing (NLP)-based alliance assessment could be integrated.