ORIGINAL RESEARCH article
Front. Psychol.
Sec. Cognitive Science
Volume 16 - 2025 | doi: 10.3389/fpsyg.2025.1520630
This article is part of the Research TopicCrossing Sensory Boundaries: Multisensory Perception Through the Lens of AuditionView all 11 articles
Speaker-Story Mapping as a Method to Evaluate Audiovisual Scene Analysis in a Virtual Classroom Scenario
Provisionally accepted- 1Technische Universität Ilmenau, Ilmenau, Germany
- 2RWTH Aachen University, Aachen, North Rhine-Westphalia, Germany
- 3Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau, Kaiserslautern, Germany
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
This study explores how audiovisual Immersive Virtual Environments (IVEs) can assess cognitive performance in classroom-like settings, addressing limitations in simpler acoustic and visual representations. This paper examines the potential of a test paradigm using speakerstory mapping, called "audiovisual scene analysis (AV-SA)", originally developed for Virtual Reality (VR) hearing research, as a method to evaluate audiovisual scene analysis in a virtual classroom scenario. Factors of acoustic and visual scene representation were varied to investigate their impact on audiovisual scene analysis. Two acoustic representations were used, a simple "diotic" presentation with the same signal presented to both ears, as well as a dynamically live-rendered binaural synthesis ("binaural"). Two visual representations were used: 360° / omnidirectional video with intrinsic lip-sync, and computer-generated imagery (CGI) without lip-sync. Three subjective experiments were conducted, with different combinations of the two acoustic and visual conditions: The first experiment, involving 36 participants, used 360° video with "binaural" audio. The second experiment, with 24 participants, combined 360° video with "diotic" audio. The third experiment, with 34 participants, used the CGI environment with "binaural" audio. Each environment presented 20 different speakers in a classroom-like circle of 20 chairs, with the number of simultaneously active speakers ranging from two to ten, while the remaining speakers kept silent and were always shown. During the experiments, the subjects' task was to correctly map the stories' topics to the corresponding speakers. The primary dependent variable was the number of correct assignments during a fixed period of 2 min, followed by two questionnaires on mental load after each trial. In addition, before and / or after the experiments, subjects needed to complete questionnaires about simulator sickness, noise sensitivity, and presence. Results indicate that the experimental condition significantly influenced task performance, mental load, and user behaviour but did not affect perceived simulator sickness and presence. Performance decreased when comparing the 360° video and "binaural" audio experiment with either the experiment using "diotic" audio and 360° , or using "binaural" audio with CGI-based, showing the usefulness of the test method in investigating influences on cognitive audiovisual scene analysis performance.
Keywords: Immersive virtual environments, virtual reality, audiovisual scene analysis, audiovisual scene perception, Dynamic binaural rendering, speaker-story mapping, classroom, task performance
Received: 31 Oct 2024; Accepted: 15 May 2025.
Copyright: © 2025 Fremerey, Breuer, Leist, Klatte, Fels and Raake. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Stephan Fremerey, Technische Universität Ilmenau, Ilmenau, Germany
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.