Diegetic and Object-Based Spatial Audio in Cinematic VR: A PRISMA-Guided Systematic Review with a Functional Taxonomy and Validation Framework

Perumal, Vimala; Shah, Zeeshan  Jawed

doi:10.3389/frvir.2026.1696677

SYSTEMATIC REVIEW article

Front. Virtual Real.

Sec. Virtual Reality and Human Behaviour

This article is part of the Research TopicExploring Meaningful Extended Reality (XR) Experiences: Psychological, Educational, and Data-Driven PerspectivesView all 13 articles

Diegetic and Object-Based Spatial Audio in Cinematic VR: A PRISMA-Guided Systematic Review with a Functional Taxonomy and Validation Framework

Provisionally accepted

Vimala Perumal^1*

Zeeshan Jawed Shah^1,2*

¹Multimedia University, Cyberjaya, Malaysia
²Prince Mohammad Bin Fahd University, Al Khobar, Saudi Arabia

The final, formatted version of the article will be published soon.

Cinematic VR (CVR) removes the director’s frame, creating the challenge of guiding audience attention without breaking immersion. This systematic review examines two audio modalities with strong potential to act as narrative agents—diegetic audio (sounds from within the story world) and object-based spatial audio (discrete sound “objects” rendered with positional metadata)—to synthesize empirical evidence on how they guide attention, shape affect and presence, and to consolidate the methods used to validate these effects. Searches in IEEE Xplore, ACM Digital Library, Scopus, and Web of Science (June 2025) identified studies that used diegetic and/or object-based spatial audio as narrative devices in CVR with empirical user data; non-diegetic-only or purely technical papers without user measures were excluded except as baselines. Following PRISMA-2020, we qualitatively synthesized 18 studies employing behavioral (head/eye tracking), subjective (presence/engagement), and physiological (HR/EMG/EDA/PPG) measures. Across studies, world-locked, off-screen diegetic cues were repeatedly reported to redirect gaze and shorten time-to-ROI after cuts, while object-based rendering enabled precise, dynamic cue placement and was commonly associated with higher presence/immersion and affective arousal relative to non-spatial or head-locked baselines; however, methodological heterogeneity and small-to-moderate samples limit cross-study comparability and certainty. We contribute (i) a functional taxonomy of narrative audio techniques aligned to diegetic/object-based practice (Chion, 1994/2019; Rumsey, 2001/2012); (ii) a Validation Triangulation Framework integrating behavioral, subjective, and physiological evidence (Cacioppo, Tassinary, & Berntson, 2007; Laborde, Mosley, & Thayer, 2017); and (iii) a Minimum Reporting & Sharing Standard for CVR Narrative Audio (MRSS-CVR) specifying what to report, how to preregister, and how to share data/metadata in line with PRISMA 2020/PRISMA-S and FAIR principles (Page et al., 2021; Rethlefsen et al., 2021; Wilkinson et al., 2016). Limitations include small samples, inconsistent reporting, and limited direct measures of narrative comprehension. Overall, the available evidence suggests that diegetic and object-based spatial audio can support narrative guidance in CVR, but the strength of inference remains constrained by small samples, heterogeneous methods, and limited direct measures of narrative comprehension. The proposed taxonomy and validation framework are therefore positioned as evidence-informed, forward-looking tools to improve comparability and enable cumulative progress (PRISMA 2020 abstract guidance followed; no protocol registration).

Keywords: Cinematic virtual reality, Diegetic sound, gaze tracking, Immersion, Narrative guidance, object-based audio (ADM/MPEG-H), presence, Spatial audio

Received: 01 Sep 2025; Accepted: 03 Feb 2026.

Copyright: © 2026 Perumal and Shah. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Vimala Perumal
Zeeshan Jawed Shah

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.