ORIGINAL RESEARCH article
Front. Comput. Sci.
Sec. Human-Media Interaction
Volume 7 - 2025 | doi: 10.3389/fcomp.2025.1598099
This article is part of the Research TopicGenerative AI in the Metaverse: New Frontiers in Virtual Design and InteractionView all 3 articles
Evaluation of Generative Models for Emotional 3D Animation Generation in VR
Provisionally accepted- 1Computational Science and Technology, School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden
- 2Royal Institute of Technology, Stockholm, Stockholm, Sweden
- 3Department of Human Centered Technology, School of Electrical Engineering and Computer Science, Royal Institute of Technology, Stockholm, Stockholm, Sweden
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Social interactions incorporate various nonverbal signals, including facial expressions and body gestures, to convey emotions alongside speech. Generative models have demonstrated promising results in creating full-body nonverbal animations synchronized with speech; however, evaluations using statistical metrics in 2D settings fail to fully capture user-perceived emotions, limiting our understanding of the effectiveness of these models. To address this, we evaluate emotional 3D animation generative models within an immersive VR environment, emphasizing user-centric metrics—emotional realism, naturalness, enjoyment, diversity, and interaction quality—in a real-time human–virtual character interaction scenario. Through a user study (N=48), we systematically examine perceived emotional quality for three state-of-the-art speech-driven 3D animation methods across two specific emotions: happiness (high arousal) and neutral (low arousal). Additionally, we compare these generative models against real human expressions obtained via a reconstruction-based method to assess both their strengths and limitations and how closely they replicate real human facial and body expressions. Our results demonstrate that methods explicitly modeling emotions lead to higher recognition accuracy compared to those focusing solely on speech-driven synchrony. Users rated happy animations significantly higher than neutral animations, highlighting limitations of current generative models in handling subtle emotional states. Generative models underperformed compared to reconstruction-based methods in facial expression quality, and all methods received relatively low ratings for animation enjoyment and interaction quality, emphasizing the importance of incorporating user-centric evaluations into generative model development. Finally, participants positively recognized animation diversity across all generative models.
Keywords: Generative models, 3D Emotional Animation, User-centric evaluation, virtual reality, Nonverbal Communication
Received: 22 Mar 2025; Accepted: 24 Jun 2025.
Copyright: © 2025 Chhatre, Guarese, Matviienko and Peters. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Kiran Chhatre, Computational Science and Technology, School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.