I Know It Is Not Real (And That Matters) Media Awareness vs. Presence in a Parallel Processing Account of the VR Experience

Inspired by the widely recognized idea that in VR/XR, not only presence but also encountered plausibility is relevant (Slater, Phil. Trans. R. Soc. B, 2009, 364 (1535), 3549–3557), we propose a general psychological parallel processing account to explain users’ VR and XR experience. The model adopts a broad psychological view by building on interdisciplinary literature on the dualistic nature of perceiving and experiencing (mediated) representations. It proposes that perceptual sensations like presence are paralleled by users’ belief that “this is not really happening,” which we refer to as media awareness. We review the developmental underpinnings of basic media awareness, and argue that it is triggered in users’ conscious exposure to VR/XR. During exposure, the salience of media awareness can vary dynamically due to factors like encountered sensory and semantic (in)consistencies. Our account sketches media awareness and presence as two parallel processes that together define a situation as a media exposure situation. We also review potential joint effects on subsequent psychological and behavioral responses that characterize the user experience in VR/XR. We conclude the article with a programmatic outlook on testable assumptions and open questions for future research.


INTRODUCTION
When scholars explicate user's experience of Virtual Reality (in the following we simply speak of VR or the VR experience, but we believe that our ideas extend to any extended reality/XR technology and experience), they traditionally focus on the sensation of presence. In this manuscript, when we talk about presence, we follow the definition provided by Lee (2004), who defines presence as "psychological state in which virtual (para-authentic or artificial) objects are experienced as actual objects in either sensory or nonsensory ways" (p. 37). For the sake of simplicity, we focus on presence as a unitary concept that includes subtypes such as spatial, social, and self-presence (Lee, 2004). Accordingly, presence entails users' sensation of owning a virtual body, and of "being there" in a virtual or virtually augmented space, perhaps with social others feeling co-present. In general, we regard presence as a highly automatic, cognitively nontaxing, mostly sensory-driven perceptual sensation or feeling that is introspectively accessible (Schubert, 2009).
While presence has been highlighted as a defining part of the VR experience, scholars in the field also frequently noted that users of VR still stay at least partially aware of the mediated nature of their experience (i.e., they know that "this is not real or really happening"; e.g., ISPR, 2001). What is this media awareness, as we call it in the present article, and when or how does it shape the VR experience? Do two users who feel equally present, but differ in their media awareness, have a different overall user experience? In the present paper, we address these important questions. Our central proposition is that a comprehensive conceptualization of the VR experience (and, potentially, even the experience of any mediated representation or content) must emphasize both users' perceptual sensations like presence and their media awareness, and recognize how both jointly shape users' overall experience.
We are not the first scholars suggesting this idea. In fact, in a widely influential and recognized article, Slater (2009) proposed that users' responses to VR can only be fully understood if not only presence is taken into account, but also users' perceived plausibility. According to Slater, users respond to VR as if it was real only if they both feel spatially present ("place illusion," PI) and if they simultaneously feel that events in the scenario refer to their presence, respectively that events are actually taking place ("plausibility illusion," Psi). Slater concluded his conceptualization with a call for further research on users' perceived plausibility: The "area of Psi is now a more fruitful and challenging research area than PI" (p. 3555). In the present article, we try to answer Slater's call by re-positioning plausibility and presence in a more general parallel processing account of users' VR experience, which is inspired by existing research on the dualistic nature of representations (e.g., Grodal, 2002;Nieding et al., 2017), and converges with recent discussions by other VR scholars (e.g., Gonzalez-Franco and Lanier, 2017;de Gelder et al., 2018;Pan and Hamilton, 2018).
We proceed in five steps to develop a new theoretical look on the VR experience. First, we briefly review Slater's influential conceptualization of the plausibility (vs. place) illusion and the revision developed by Skarbez (2016). Second, because we suggest considering plausibility as part of a bigger picture, we broaden the view (beyond plausibility, and beyond VR) by reviewing existing interdisciplinary research on the dualistic nature of users' experience of mediated representations. This research suggests that users' experience derives from their perceptual sensations or intuitive feelings and their higher-order beliefs or knowledge about what is happening. In a third step, we explicate media awareness as users' belief that "this is not really happening", illustrate its developmental underpinnings, and discuss how it is cued at the onset and during media exposure. Fourth, we discuss how media awareness and perceptual sensations like presence might be related to each other. Consequently, we explicate how both might jointly affect the overall user experience. We conclude the article in a fifth step by looking at how the proposed framework can guide and inspire future research.

The Plausibility Illusion
Presence is the hallmark of the VR experience, also in comparison to other media channels that only evoke this sensation to a lesser degree, if at all. Yet, VR users' experience is not fully or adequately described by only focusing on presence (Pan and Hamilton, 2018). This fact has been most prominently addressed to date in a widely recognized article by Slater (2009). In this article, Slater focuses on the question when or why users respond realistically to VR. This is a relevant question, because VR is often said to trigger life-like experiences and it is increasingly used as a tool to train or study real-world behavior (Fox et al., 2009). According to Slater (2009), users respond realistically to VR if they experience place illusion and plausibility illusion, which he considers as two "orthogonal components" (p. 3549). The place illusion is a perceptual illusion that refers to "the sense of being there" (commonly addressed as spatial, physical, or telepresence). This factor has received a lot of attention in the past and is by now relatively well understood (see for overviews, e.g., Haans and IJsselsteijn, 2012;Hartmann et al., 2015;Gonzalez-Franco and Lanier, 2017).
In contrast to presence, the plausibility illusion received much less scholarly attention and is less well understood to date. According to Slater (2009), this illusion refers to users' sensation "that the scenario being depicted is actually occurring" (p. 3549), even if users know for sure that this is not true. According to Slater (2009), the plausibility illusion results from the extent the virtual environment "acknowledges" users' presence in the world (i.e., shadows cast by a user's avatar, or an agent's eye-gazing towards the avatar, etc.). Furthermore, the illusion results from "the overall credibility of the scenario being depicted in comparison with (users') expectations" (p. 3549). Hence, the plausibility illusion is "concerned with the "reality" of the situation depicted" (p. 3556), which implies that users' expectations are supported. Yet, Slater also recognizes that even if the scenario appears highly realistic and plausible, "at a higher cognitive level (users) know that nothing is "really" happening, and they can consciously decide to modify their automatic behaviour accordingly" (p. 3554). Slater's (2009) approach provides an intriguing elaboration of the VR experience. Yet, while the approach offers a lot of valuable insights, some questions remain about exact concept definitions and their integration into existing literature. For example, Slater's (2009) approach focuses on when or why users respond realistically to VR (i.e., as if they were in a non-mediated situation). A bit confusingly perhaps, responding realistically is addressed as presence (Skarbez et al., 2017), while what was (and probably still is) commonly understood as one important type of presence (i.e., the "feeling of being there") is dubbed the place illusion. Furthermore, while the idea of the plausibility illusion is very intriguing, its exact operationalization stayed perhaps a bit tentative in the original approach (see also Skarbez et al., 2017). The way it is introduced, the plausibility illusion seems potentially overlapping with the outcome (i.e., users responding as if the scenario was real, Berthiaume et al., 2021). Furthermore, the plausibility illusion seems to be closely related to perceived realism, a multi-dimensional concept that is well established in the literature (Popova, 2010). In addition, the fact that users always stay aware at a higher cognitive level that "this is not really happening" is noted yet not fully elaborated in Slater's original approach, and remains somewhat disconnected to the other ideas, e.g., about plausibility.
In our view, some of this ambiguity surrounding the plausibility illusion has been resolved by Skarbez and colleagues (Skarbez, 2016;Skarbez et al., 2017, see also;Gilbert, 2016). The authors propose that the plausibility illusion builds on coherence (i.e., the extent to which a virtual scenario "behaves reasonably" or consistent to users' expectations). If the VR technology frequently fails to support users' expectations, it is unlikely that they will respond to the virtual environment as if it was real. Coherence thus appears to be central in understanding the user experience, yet it also remains an ambiguous concept. For example, Skarbez et al. (2017) focus on coherence as a system factor, while others consider it a user factor (Berthiaume et al., 2021). For instance, in perceived realism research (Hall, 2003), coherence has been considered as users' perceived external (consistency with real-life knowledge) and internal (consistency within the description) plausibility of the media depiction (Busselle and Bilandzic, 2008). While Skarbez et al. (2017) consider coherence as a factor that is specifically affecting the plausibility illusion, other scholars regard coherence as central to the general user experience, including potential presence experiences (Seth et al., 2012;Latoschik and Wienrich, 2021). How coherence affects users' VR experience apparently is a topic of debate, but by highlighting coherence as a central factor, Skarbez and colleagues helped both refining Slater's original idea and integrating it more firmly into existing research.
The present account aims to contribute and further expand these attempts to explicate the VR experience. We believe that present theorizing in this area can benefit from broadening the theoretical view and incorporating insights from a wider range of existing literature (e.g., about how users perceive and respond to media representations in general). We believe that such a broader approach moves the focus away from plausibility onto media awareness, a concept that we introduce in the present article. According to our notion, whenever encountering mediated content, users simultaneously feel that "this is real or happening" while knowing that their experience is mediated. If adapted to VR, we propose that "feeling that this is real" refers to users' sensation of presence, while "knowing this is not real" refers to users' media awareness. We are convinced that we can only reach a comprehensive understanding of users' VR experience if we model how both users' presence sensations and their media awareness jointly shape the overall experience. We propose that plausibility, in turn, matters as a determinant of media awareness. 1

REVIEWING THEORETICAL ACCOUNTS OF THE DUALISTIC MEDIA EXPERIENCE
Our general approach is inspired by a central idea expressed in various conceptualizations of how users experience mediated representations. These mostly disconnected approaches stem from interdisciplinary research strands like film or book studies in the humanities, but also research on art perception, optical illusions, philosophy, and research from (perceptual, media, cognitive, developmental) psychology. Most of these approaches target the experience of specific mediated representations, like sketched figures, photos, film, narratives (e.g., in books) or VR, while some (e.g., Wolf, 2017) set out to model the experience of any (mediated) representation. As diverse as they might be, a core idea expressed in all of these approaches is that the user experience is inherently dualistic. In the following, we review a couple of related relevant concepts or approaches in more detail: • A first relevant related concept is the aesthetic illusion, which is mostly studied in media (film/text) studies in the humanities and arts (see for an overview see for instance Wolf, 2014;Koblížek, 2017). Wolf (2017) defines the aesthetic illusion as primarily "a feeling, with variable intensity, of being imaginatively and emotionally immersed in a represented world and of experiencing this world as a presence (. . .) in an as-if mode, that is, in a way similar (but not identical) to real life. At the same time, however, this impression of immersion is counterbalanced by a latent rational distance resulting from a (. . .) (media-)awareness of the difference between representation and reality" (p. 32, italics added). The quote reveals what scholars on the aesthetic illusion define as the central dualistic nature of the media experience, namely users' intuitive sensations of an apparent reality and their parallel awareness of the mediated nature of their intuitive sensations. • In the realm of picture perception, "seeing-in" (Wollheim, 1998) represents another relevant concept addressing the dualistic or parallel nature of the mediated experience. Wollheim argues that when looking at a picture, viewers have two simultaneous experiences: they are aware of the represented object (e.g., a house) and the way the object is represented (e.g., red oil paint). Wollheim claims that the two experiences are not independent, but two aspects of a single experience which he refers to as two-foldedness. According to Nanay (2005), "(a) visual experience of an agent is 'twofold' if she is simultaneously aware of both the represented object and the medium of representation" (p. 263). • Relatedly, psychological-developmental research on symbolic or pictorial competence highlights the dualistic nature of representations (or symbolic artifacts). "Every symbolic artifact is an object in and of itself, and at the same time it also stands for something other than itself" (DeLoache et al., 2003, p. 114). According to DeLoache et al. (2003), to "understand and use a symbol, dual representation is necessary-one must mentally represent both facets of the symbol's dual reality, both its concrete characteristics and its abstract relation to what it stands for" (p. 114). • In his general theory of film perception and visual aesthetic (the PECMA Flow), rooted in cognitive film studies, Grodal (2002); Grodal (2006) regards cinematic experiences not as processing of representations, but as primary (real-world) experiences "although we know that this seeing is induced by artificial means" (2006, p. 3). According to Grodal (2002) the more salient users' media awareness, the more it "is added to, and enriches, the phenomenal experience" (p. 72). However, according to the PECMA flow, recalling that the processed depictions are not real requires some cognitive effort. • Communication science scholars argued that media users can switch either between an involved reception mode (accepting the presented world; thinking "within" the logic of the media offering) or analytical reception mode (reflecting about the media offering, (Michelle, 2007;Suckfüll and Scharkow, 2009;Frey, 2018). The latter, sometimes also referred to as psychical or aesthetic distance (Cupchik, 2001), includes considering how a certain film or scene was produced (Suckfüll and Scharkow, 2009). Relatedly, Frey (2018) distinguishes an experiential mode of reception from a thinking or nonexperiential mode of reception. The thinking mode is characterized by mental effort, and it can result in either greater belief and acceptance or in greater disbelief and rejection of the "apparent reality" (p. 500) suggested by the media depiction. • In philosophy, Gendler (2008); Gendler (2019) proposed to distinguish alief from belief. According to Gendler, alief is an automatic or habitual belief-like attitude. "Charles believes that he is sitting safely in a chair in a theater in front of a movie screen (but) the alief has roughly the following content: 'Dangerous two-eyed creature heading towards me! H-e-l-p . . . ! Activate fight or flight adrenaline now!'" (Gendler, 2008, p. 637). As the example shows, for Gendler alief represents users' acceptance of the depiction, whereas belief represents their co-existing knowledge that "this is just mediated". • Many scholars also already stressed the dualistic nature of users' VR experience. However, rather than providing a full account of media awareness, past literature often referred to VR users' "knowing that this is not real" as a curious side aspect. Recently, however, several scholars started focusing more closely on the dualistic nature of the VR experience. Gonzalez-Franco and Lanier (2017), for example, discuss users' "partial awareness of (the presence) illusion" (p. 5). They speculate that high plausibility and strong sensory saturation provided by VR, and high cognitive load among users might reduce media awareness. Similarly 2) stress that users' knowledge of the unreality of VR "denotes the special cognitive or epistemological status of the VR experience". Other scholars, too, noted this dualistic nature of the VR experience. We identify related ideas, for example, in Turner's (2016) argument that experiencing presence requires pretense, and in Waterworth and Tjostheim's (2021) argument that VR users believe what is happening is real, "except in the sense that at some level, (they) know the virtual reality is a simulation" (p. 23).
Our literature review does not claim to be comprehensive. Yet, it shows how scholars from largely disconnected fields converge on a strikingly similar idea about how users process and experience (mediated) representations, and hence also VR. The idea is that users' experience of representations is inherently dualistic: Users intuitively process, perceive, and experience represented content "as if it was real or unmediated," while simultaneously, and in varying intensity, staying cognitively aware that their experience is triggered by a representation. Accordingly, we also assume that VR users might automatically feel present, while simultaneously staying aware that this sensation is triggered by VR technology.

PRESENCE
In the present approach we endorse a broad conceptualization of presence that entails various forms such as users' feeling spatially present in VR or users perceiving artificial agents to be physically co-present in their augmented real environment. In general, presence is a conscious perceptual sensation or feeling (one feels present, something feels present). Presence builds on the interplay of external sensory stimulation of the VR/XR system, and users' motor actions, respectively the internal interoceptive and proprioceptive signals they accompany (e.g., see "sensorimotor contingencies, " Slater, 2009, p. 3549). In the logic of the predictive coding paradigm, presence arises if the predictions about external and internal sensory signals that accompany motor action are so accurately matched by technology that any residual error can be "explained away" by the brain (Seth et al., 2012).
As a perceptual sensation, we think presence is an inevitable user response to any correctly calibrated VR system. Neither pretense nor a related suspension of disbelief (Wirth et al., 2007;Waterworth and Tjostheim, 2021) might be necessary to foster presence. For example, a proper VR system will always make human users feel spatially present. To provide another example, eye-gazing of an artificial agent that is augmented into users' actual environment will inevitably trigger a subtle feeling of social or co-presence (Senju and Johnson, 2009). However, our central argument is that in the context of VR exposure, these automatic perceptual presence sensations are always accompanied by the belief, or awareness, that they are triggered by human-made technology. Users' media awareness provides the (cognitive) backdrop based on which perceptual sensations like presence are interpreted.

WHAT IS MEDIA AWARENESS?
In short, media awareness is about users' salient belief that "this is not really happening" during VR (or any media) exposure. We define media awareness as the salience of users' propositional belief or conviction that their experiences during exposure are based on human-made technology. In other words, if being media aware, users believe and are conscious of the fact that what they perceive and experience in the present situation is largely determined by human-made technology rather than by authentic (non-artificial) stimuli that were actually present here and now. More specifically, media awareness can imply different things. It can imply that users believe that they currently do not perceive an object or event directly, but through a medium or interface (like in live events, tele-surgery, or when navigating drones). In addition, it can imply that users believe that currently encountered objects or events are not actually existing (in space and time) but are non-authentic or fictional. These rather stable and firm higher-order cognitive beliefs, which we deem central to media awareness, can be distinguished from lower-order perceptual beliefs that might originate from perceptual sensations like presence and that only have a tentative status (e.g., see similarly Gilbert et al., 1993;Grodal, 2002;Kahneman and Frederick, 2002;Gawronski and Bodenhausen, 2014;Herschbach, 2015;Gendler, 2019).
If media awareness is about the belief that "this is not really happening," when and how is it activated? And to what extent does it stay in mind during exposure? We claim that media awareness is 1) activated when individuals consciously initiate a media exposure episode and that it, subsequently, stays minimally salient in mind; we refer to this as basic awareness. Furthermore, we claim that 2) media awareness can dynamically vary in salience up-and-above this basic level, which we refer to dynamically salient media awareness We start by discussing basic media awareness and its developmental underpinnings.

Basic Media Awareness
Whenever users consciously approach a medium or representation (and recognize it as such, e.g., a photo, a VR headset, or a hologram) the belief that "this is not really happening" will be accessed from propositional knowledge and will be activated. We assume that the belief is held in working memory (i.e., "the ensemble of components of the mind that hold a limited amount of information temporarily in a heightened state of availability for use in ongoing information processing," Cowan, 2017Cowan, , p. 1163. Once activated, the belief stays in mind during media exposure (e.g., through attentional refreshing, Camos et al., 2018), and thus establishes a permanent baseline level of media awareness. We think basic media awareness is what scholars mean if they say that users, despite feeling present, still know that they are using technology (ISPR, 2001;Slater et al., 2006;Slater, 2009).
We assume that basic media awareness has important consequences. For example, if individuals are looking at the hologram of a duck, they perhaps will establish a tentative perceptual sensation that "there is a duck" (Zeimbekis, 2015). They might even walk around the hologram-duck and look at it from different angles. Yet, the belief that "this is not really happening" will embed this sensation, thus shaping the overall experience of their response towards (the representation of) the duck. Accordingly, basic media awareness should affect the construction of meaning, and how a situation is subjectively interpreted. It subjectively defines the overall situation as a media exposure situation, and provides a cognitive backdrop based on which perceptual sensations are interpreted. To use the words of Grodal (2002, p. 72), we think that through basic media awareness users' "knowledge of "reference" is added to, and enriches, the phenomenal experience". 2

Developmental Underpinnings of Media Awareness
Being "media aware" requires competence and learning. This competence is acquired during child development. Perceiving representations, from a simple Necker cube to a moving 3Dobject in VR, is easy and effortless, particularly if they sufficiently match authentic objects in appearance and functionality. The representation's sensory information (e.g., visual depth cues) feeds into quickly activated, hard-wired or heavily ingrained perceptual mechanisms that immediately create the perceptual sensation. Therefore, the represented object often springs to mind easily, and a vivid perception of it unfolds naturally. We effortlessly and automatically perceive a cube as a 3D-object or see a face when looking at a picture. In fact, as Zeimbekis (2015) notes in the context of picture perception, in most cases, the representation provides the natural and the medium the nonnatural perception (Grodal, 2002;Wolf, 2017).
Accordingly, it is perhaps not surprising that we as human beings first have to learn to become aware of mediated representations (i.e., to recognize them, to understand what they imply, and how to use them, Flavell et al., 1983;Schlottmann, 2001). This ability is addressed in the literature as pictorial or symbolic competence (DeLoache et al., 2010). Humans routinely start developing symbolic competence very early in life and continue to develop this skill, as they grow older. Infants, for example, if exposed to objects in pictures, first tend to try to grasp these objects. They fail to accurately distinguish the representation from its authentic counterpart, as they have not yet learned what a picture is. Once they understand that objects on pictures are "these things that look like the actual object but can't be grasped" (Grodal, 2002), young children learn two important things. First, these sensory objects are called a photo. From thereon, they can categorize and interpret their photo-induced sensations adequately. That is, they start developing a theory or conceptualization of media. This resembles other important developments taking place at this age, such as the development of a theory of mind (Flavell, 2000). Second, they learn that the accurate response to a photo is to point to objects rather than trying to grasp them (DeLoache et al., 2010). Of course, from thereon symbolic competence will continue to extend. 3 In summary, we consider symbolic competence as the central ability that develops throughout ontogenesis to allow an individual becoming "media aware". It is the developmental underpinning of users' propositional knowledge or belief that "this is not really happening".

Dynamically Salient Media Awareness
During media exposure, levels of media awareness can also dynamically fluctuate. It appears that the belief that "this is not really happening" can recede to the back of mind or stay on top of mind (e.g., Jacobs and Silvanto, 2015), while never dropping below a certain baseline level. In other words, users might sometimes be very aware that "this is just mediated" and sometimes barely aware, while never forgetting that "this is just mediated". We assume that these dynamic shifts of media awareness might be driven by user factors, like users' motivated attention allocation, for example if they actively want to recall that this is not real (Busselle and Bilandzic, 2008). However, they might also arise from the interplay of the medium and user. For example, due to inconsistencies or flaws noted by perceptual processes (Gilbert, 2016;Skarbez et al., 2017), users might intuitively sense that something is wrong or odd, or unreal. In the search for an explanation for this sensation of unrealness users are likely to recruit basic media awareness (i.e., the firm propositional belief that experienced perceptual sensations originate from media technology). In other words, something seems strange, but this irritation can be smoothly explained by the already activated belief that "this is not really happening". As a consequence, perceptual sensations of unrealness might shift basic media awareness back into the focus of attention. 4

Triggering Dynamically Salient Media Awareness During Exposure
Developing symbolic competence, and acquiring related propositional knowledge about media representations, arises from encounters that individuals have with stimuli in their environment that seem somehow different to their authentic counterparts.
Individuals encounter inconsistencies (i.e., violations of their expectations that are grounded in experience of the authentic world). Hence, these inconsistencies require a new classification of the encountered stimuli (see also Gilbert, 2016). Environmental stimuli that regularly trigger inconsistencies are categorized as nonauthentic, represented or mediated. Subsequently, the same inconsistencies might cue this category if encountered again, also during media exposure. It still seems to be an open research question which inconsistency cues exactly mark objects, events, or situations as odd, unreal, or mediated. While we are unaware of an overarching psychological account to date, intriguing yet still tentative ideas have been offered in specific contexts, like in picture perception (Zeimbekis, 2015) or film reception (Grodal, 2002). In addition, a couple of systematic, yet still speculative, elaborations exist in the VR context (Lombard and Ditton, 1997;Timmins and Lombard, 2005;Gonzalez-Franco and Lanier, 2017). Closely following these ideas (e.g., Gilbert, 2016;Skarbez, 2016), we assume two clusters of inconsistency cues that plausibly categorize something as odd, unreal or mediated -and thus also affect the salience of media awareness, namely 1) sensory inconsistency (i.e., the extent to which represented objects or events fail to match expectations about their authentic counterparts in terms of sensory information and affordances, and the overall visibility of the medium), and 2) semantic inconsistency (i.e., the extent to which represented objects and events are unexpected or seem unlikely, given the present context or situation). In general, in line with Gonzalez-Franco and Lanier (2017) we propose that these violations of sensory or propositional consistency might increase the salience of media awareness during exposure.
Sensory inconsistency refers to users' sensing of the representation or interface. We distinguish two processes. First, sensory inconsistency can refer to the extent that a depicted entity (e.g., an object) fails to provide the sensorimotor contingencies or affordances that are expected from interaction with its authentic counterpart (see for related ideas predictive coding; Seth et al., 2012; sensory power and consistency, Skarbez, 2016;Gonzalez-Franco and Lanier, 2017;reality status, Grodal, 2006;authenticity, Gilbert, 2016). If expectations are not met and mismatches cannot be easily explained away or integrated (Biocca et al., 2001), the representation reveals itself. For example, Zeimbekis (2015) argues that pictures do not provide "binocular disparity" and thus no stereoscopic depth. Accordingly, they "do not engage the motion-guiding visual system" (dorsal (motion) vs. ventral vision, p. 319) of the brain although the user might see or rather construct depth. "So perhaps the dorsal system dedicated to navigation 'knows' that the picture is a more or less flat object, while at the same time the ventral system picks up the volumetric contents and depth relations from the picture's surface" (Zeimbekis, 2015, p. 320). Hence pictures, or the sensations they evoke, need to be categorized by the user as something different than their authentic counterparts. In terms of VR representations, the situation is very similar, although the user can navigate and therefore the dorsal system is active. However, also in VR representations important cues are missing (e.g., tactile, temperature, or olfactory cues) that a user likely expects from authentic counterparts. These mismatches might cue VR as something mediated. Potentially, the lower the match between the affordances provided by the VR (or any) representation and the expectations based on its authentic counterpart, the more likely it is that media awareness is cued. In addition, technical glitches in a VR (e.g., rendering problems, frozen screens) might violate a number of expectations and thus potentially represent a very strong trigger of media awareness.
Second, building on non-supported affordances, sensory inconsistency refers to the extent that the medium reveals itself to the user, not based on imperfect sensory representation and unsupported affordances, but based on the visibility of the interface itself (e.g., visible canvas or pixels, a TV frame, the edge of VR goggles). The medium (or the interface) inevitably reveals itself, we believe, if the user consciously initiates the exposure situation. However, the medium might also reveal itself during exposure. For example, users might shift their attention to the cover of a book or frame of a TV, and thus feel reminded of the mediated origin of their experience. In VR, the user seems to be more enveloped by the technology (field of view covered by headset, unrestricted movements, headphones; Slater and Wilbur, 1997), thus potentially lowering the visibility of the medium (Lombard and Ditton, 1997). However, the weight of the headset, tangible cables, and visible pixels are among the cues that potentially reveal the medium during VR exposure, too. In addition, cross-cutting sensory information from the nonmediated environment (e.g., hearing a shout, bumping into an object) or from the body (cybersickness, Rebenitsch and Owen, 2016) might trigger media awareness, if users fail to successfully integrate this information and instead shift their attention onto the interface in their attempt to make sense of the situation.
The second cluster of factors that might dynamically increase media awareness-semantic inconsistency-represents a more cognitive cluster than the sensory-based first cluster. We believe that media awareness might also depend on the extent to which depicted entities or events fail to meet users' expectations that they derive from their propositional knowledge. Hence, media awareness might vary based on how plausible or likely users find encountered objects or events (see "plausibility" or "coherence," Gonzalez-Franco and Lanier, 2017; Skarbez et al., 2017;Latoschik and Wienrich, 2021). If encountered entities or events are very unexpected, or deemed highly implausible or unlikely, users' sense-making attempts might increase their media awareness. We propose two different types of plausibility judgments, respectively violated expectancies. First, semantic inconsistency can refer to the extent to which encountered objects or events seem (im-) plausible in light of users' real-world knowledge and expectations. This type has been addressed as external plausibility in perceived realism research (Busselle and Bilandzic, 2008;Popova, 2010;Hofer et al., 2020), and, if referring to a social world, as social realism (Lombard and Ditton, 1997). How irritating violations of external plausibility are depends on how much users expect the encountered format or genre to display reality (i.e., to match their real-world propositional knowledge). External plausibility violations (e.g., a flying elephant) might trigger media awareness only if users expect the format or genre to offer high semantic affinity with the real world (e.g., like documentaries or news, Busselle and Bilandzic, 2008). Second, semantic inconsistency can refer to the extent to which encountered objects or events appear (in-) consistent within the logics of the presented story or environment. This type has been addressed as internal plausibility (Popova, 2010) or narrative realism (Busselle and Bilandzic, 2008). For instance, even if the format offers fiction, a flying elephant might appear implausible, if the narrative previously emphasized that elephants cannot fly and users subsequently fail to come up with a compelling reason for the flying elephant.
The provided list of cues that trigger media awareness suggests that virtually any media technology reveals itself once the user consciously activates it (e.g., from opening a book to putting on the VR headset). During exposure, virtually all existing media provide imperfect sensory fidelity and do not support all expected affordances (e.g., impossibility to look behind objects or to touch them). Often, the interface stays visible during exposure. Zeimbekis (2015), p. 321) argues that to date perhaps only an old media technique, namely trompel'oeil, provides perfect illusions (or delusions) in which users might be completely unaware. Trompe-l'oeils do not require conscious exposure, and (if the vantage point is right) do not violate sensory and semantic consistency. The question is whether any other media technology, like VR, will be able to delude users one day, so that they are completely unaware of using a medium. Presumably, this would require XR technology like glasses that we would commonly wear, and we would then forget about wearing. This device, which then almost must become a permanent part of one's body, might augment reality perhaps in such a sensory-and semantically consistent way that we might be completely unaware that we encounter non-authentic objects, people, or events (Biocca, 1997).
Sensory and semantic inconsistency suggest that users become more fully "media aware" if the system does not sufficiently support their expectations. However, potentially users might also vary the salience of media awareness completely voluntarily, independent of the content they encounter and its perceived consistency. Accordingly, for very different reasons, users might also be simply motivated to actively recall that "this is not really happening", and thus momentarily refresh media awareness and increase its salience in working memory (Camos et al., 2018). These motivated recalls might be backed up by guiding the attentional focus onto "evidence" that this is not really happening. An example would be a user who feels strongly co-present with a scary monster in VR, and thus experiences strong fear as a response. This user might be motivated to enhance media awareness and actively recall that "this is not really happening".

Reducing Dynamically Salient of Media Awareness
Sensory-based and semantic inconsistency are likely to heighten media awareness during exposure. In addition, users might voluntarily heighten media awareness if they are motivated to recall "that this is not really happening." However, which factors might potentially lower media awareness beyond the mere absence of the above-mentioned factors?
Gonzalez-Franco and Lanier (2017) hypothesize that greater familiarity with VR, and higher cognitive load, might both decrease media awareness. We agree that greater familiarity might decrease media awareness. We find this assumption plausible for two reasons. First, more familiar users might either encounter fewer technical issues or need to pay less attention to the interface than less familiar users. Second, with repeated exposure users might adapt their expectations to what is commonly displayed in VR, thus encountering fewer surprises (e.g., about missing affordances or semantic inconsistencies, Gonzalez-Franco and Lanier, 2017;Berthiaume et al., 2021). Familiarity might thus plausibly affect media awareness, but future research is necessary to test this assumption.
While we agree with Gonzalez-Franco and Lanier (2017) that cognitive load is also an interesting factor to examine, we are skeptical that staying media aware, or simply recalling that "this is not really happening" qualifies as a cognitively taxing activity. Therefore, we also doubt that cognitive load would impede media awareness. In the absence of empirical evidence, we think it remains speculative, if not doubtful, that the belief that "this is not really happening" becomes less salient, or is less easily refreshed in working memory (Camos et al., 2018), if processing resources are largely occupied by paying attention to the displayed environment and objects in VR.

Intermediate Summary
In summary, our argument is that throughout early ontogenesis we develop the competence to distinguish representation from their authentic counterparts. While achieving this skill, we also learn about the stimuli (interfaces, media technologies) that bring forth related and possibly "strange" sensory experiences, and their names (book, TV, smartphone, VR). Conscious initiation of exposure activates the belief that subsequent perceptual sensations like presence, no matter how compelling, are not really happening (in the sense that they originate in the real world), but can be attributed to the technology. This belief stays in mind as a basic media awareness and allows the user to subjectively interpret the situation as a media exposure situation. Therefore, we think that users never respond to encountered representations the same way as they would do if they believed that "this is really happening" (we will return to this point again later). However, salience of the belief might also vary throughout exposure, heightened by sensory and semantic inconsistencies, and by pro-active or motivated recall, and lowered potentially with greater familiarity (see Figure 1). An additional factor that might reduce media awareness is cognitive load, yet we are skeptical that keeping in mind that "this is not really happening" is cognitively taxing, and therefore we did not include this factor in our model depicted in Figure 1. The moderating impact of the belief "that this is not really happening" on the overall user experience (discussed below, see also Figure 2) might partly depend on how salient it is. Figure 2 depicts the unfolding of media awareness over the course of a VR exposure episode. The x-axis represents time and the y-axis represents the salience of media awareness.

A PARALLEL PROCESSING ACCOUNT OF THE VR EXPERIENCE: PRESENCE VS. MEDIA AWARENESS
So far, we roughly suggested that media awareness co-occurs with the perceptual sensation of presence, and that they together define the typical VR experience. In the remainder of this article, we elaborate these two ideas. Before we discuss how presence and media awareness jointly shape the overall VR experience, in this section, we refine the idea that both result from two parallel processes during exposure. So far, leaning on literature on perception, we considered presence an outcome of bottom-up perception, and media awareness an outcome of top-down cognition or knowledge. At the same time, we think both FIGURE 1 | Factors triggering (+) and factors reducing (-) dynamically salient media awareness. On the left, sensory inconsistency and semantic inconsistency are possible features of VR/XR, and motivated recall is a user factor that might enhance the salience of dynamical varying media awareness. On the right, familiarity with the VR/XR is a user factor that likely reduces dynamically salient media awareness.  Figure 1) influence the dynamically salient part of media awareness. We assume media awareness never drops below baseline levels, however.
presence and media awareness can also be linked to two different processing systems that underlie people's reasoning, judgments, and beliefs, and are prominently discussed in psychological research on dual processing (Evans, 2007). In light of dual processing, we think that presence stems from associative processing and media awareness from propositional processing (Gawronski and Bodenhausen, 2011).
According to this view (Hartmann, 2012;Hofer, 2016;Krcmar and Eden, 2019), presence, as a perceptual sensation, is the result of quick, effortless, and automatic sensory-driven perceptual or so-called System-1 processing in the brain. System-1 processing requires no specific training or literacy; it is an in-born facility human beings share with other animals and that is already commonly utilized by infants. System-1 processing gives rise to an intuition, gut feeling, or tentative perceptual belief (Kahneman, 2012). Schubert (2009) proposes that users' feeling of spatial or social presence resembles such a gut feeling or tentative perceptual belief. In contrast, System-2 operates based on knowledge, and rule-based logical or analytical processing. System-2 has been linked to uniquely human facilities, such as hypothetical thinking, mental simulations, and detection of illusions (Evans and Stanovich, 2013). Accordingly, we propose that media awareness is evoked by System-2 processing.
An important yet thorny question is how both processes interact with each other during exposure. For example, does System-1 processing, and hence presence as an output, interfere with System-2 processing, thus affecting how media aware users are during exposure? To address this question, we must also look at the extent to which both processes co-occur throughout exposure. Interaction between both processes seems only possible when both processes co-occur during exposure, but not if one of the processes is muted. Research on dual processing distinguishes parallel-competitive (Smith and DeCoster, 2000;Sloman, 1996) and default-interventionist (Evans and Stanovich, 2013) dual processing theories. When applied to the present case, we think that presence vs. media awareness might better be modeled as resulting from parallelcompetitive than from default-interventionist processing.

Parallel-Competitive Processing
In light of the a parallel-competitive dual processing notion, presence and media awareness would be the outcomes of two processes that constantly co-occur throughout media exposure, yet run largely independent from each other and do not causally affect each other (see also "simultaneous contradictory belief", Sloman, 1996, p. 11; see also for a related discussion of visual illusions, Kahneman, 2012). Presence can be considered a continuously updated output from associative System-1 processing, whereas media awareness can be considered a continuously refreshed output from parallel propositional System-2 processing. This notion implies that both presence and media awareness can be quickly established. System-2 processing resulting in media awareness would not be more cognitively taxing or slower than System-1 processing resulting in presence (Gawronski and Bodenhausen, 2014). Next to the fact that both processes would be "default"-processes that are quickly established at the onset of media exposure, the notion of parallel-competitive processing presence would also suggest that both are largely independent processes that are not causally affecting each other. What evidence speaks for this assumption?
First, the idea of two causally unrelated processes would imply that media awareness does not affect presence. This idea converges well with the notion that perception (e.g., of optical illusions) is cognitively impenetrable (e.g., Sloman, 1996;Zeimbekis, 2015)-the perceptual impression is not affected by "better knowledge". If adapted to the present case, this principle would suggest that media awareness as a System 2-processing output does not directly alter perceptual presence sensations as a System-1 output. A user might not feel less present, simply because s/he gets more aware that "this is not really happening". 5 Likewise, being engaged in propositional System-2 processing should not interfere with being engaged in parallel associative System-1 processing. 6 Empirical evidence for this assumption is, however, scarce, indirect, and mixed. Two studies only indirectly illuminated if media awareness affects presence. Both studies did not directly measure media awareness, but manipulated consistency which we consider a trigger of media awareness. A recent experiment (Hofer et al., 2020) manipulated the semantic consistency of a VR environment (i.e., the external plausibility of an apartment) and found that these variations of plausibility did not affect users' sensation of spatial presence. Another experiment by Skarbez et al. (2018), Study 2) manipulated coherence based on the degree to which events in the VR environment adhered to laws of physics. This study, too, yielded no effects on different presence measures. However, in another recent study (Quaglia and Holecek, 2018), participants were subjected to a fear-of-height experience in VR. The authors found that virtual lucidity (i.e., "awareness that one is having a virtual experience", p. 1) was not only associated with lower fear and more daring behaviour in the presented "virtual plank"-scenario, but also lower spatial presence. In summary, in light of these scarce and mixed findings, the idea that media awareness does not affect presence remains a plausible 5 We assume media awareness can only alter presence indirectly (e.g., if users, perhaps after a peak in media awareness, shift their attentional focus onto the medium interface or the real world and thus change the perceptual input that establishes the sensation of presence). But in the absence of these shifts in attention, media awareness might not alter the sensation of presence. 6 This view can be further refined by recalling what Gawronski and Bodenhausen (2014) call the operating principles of System 1 and 2. The operating principle of System-1 or associative processing is that it works independently of truth judgments, whereas System-2 or propositional processing serves as the "validation of momentarily activated information on the basis of logical consistency" (p. 189). Adapted to the present case, in line with the operating principle of System-1, we believe presence occurs independent of users' media awareness (as a truth judgment), just like optical illusions usually occur despite better knowledge. However, in line with the operating principle of System-2, media awareness might invalidate sensations of presence, not by diminishing the perceptual sensation, but by invalidating perceptual beliefs emerging from this sensation such as that "this is really happening right now in front of me." (Sloman, 1996;Kahneman, 2012)-and subsequently by moderating effects of presence on the overall user experience. Our view thus converges with the operating principles of System 1 and 2 suggested by Gawronski and Bodenhausen (2014). assumption which, however, needs to be further empirically scrutinized.
Second, the idea of two causally unrelated processes also implies that feeling present does not affect media awareness. Users' ability to engage in propositional System-2 processing and stay "media aware" might be unrelated to the extent that they engage in parallel associative System-1 processing or the intensity of their presence sensations. Admittedly, however, to date we know of no theoretical account or empirical study that would explicitly inform about this assumption.

Default-Interventionist Dual Processing
According to the default-interventionist dual processing logic (Evans and Stanovich, 2013), presence would be the outcome of quickly established, default, and continuously activated System-1 processing, while media awareness would be the outcome of slow, cognitively taxing and thus only occasionally activated System-2 processing. Following the default-interventionist logic, System-2 processes allow to intervene in System-1 processing and regulate (e.g., dismiss, weaken, contrast) related outcomes. Another typical characteristic of System-2 processing in a defaultinterventionist logic is that it is cognitively taxing. It requires working memory and attentional focus. Therefore, only if cognitive resources and motivation allow, perceptually-driven System-1 outputs might be overridden or regulated by effortful System-2 operations.
According to a default-interventionist logic, associative System-1 processing would be the default mode. Relatedly, feeling present (and temporarily "believing" in this sensation) would be a quick and default System-1 output in media exposure. However, users might engage in effortful System-2 processing to trigger media awareness and recall that "this is not really happening". According to the default-interventionist logic these interventions would require energy and would only be triggered if necessary. Only if sufficiently motivated and having sufficient cognitive capacity users might effortfully become media aware by accessing their higher-order propositional knowledge. Accordingly, following a default-interventionist logic, System-2 and System-1 processing, respectively media awareness and presence, would only occasionally co-occur in media exposure, namely when media awareness is effortfully triggered to causally affect presence.
What evidence speaks for these assumptions? We reviewed the mixed empirical evidence regarding a potential intervening influence of media awareness on presence above. Apart from these studies we are unaware of any direct empirical examinations of how media awareness and presence develop throughout exposure and potentially interact. Hence, we can only discuss the default-interventionist logic on theoretical grounds. A default-interventionist view on presence and media awareness has been implied by several authors in the literature, including ourselves in the past (e.g., Schubert, 2009;Hofer, 2016;Hartmann, 2017). In general, many media scholars argue that users approach media as "believers," and that perceiving the represented content rather than the representation is the default mode in media exposure. In contrast, users might engage in effortful evaluation of the representation (and hence become "media aware") only if this default mode encounters problems like inconsistencies, or because it triggers undesired psychological states like obnoxious fear (Gilbert et al., 1993;Grodal, 2002;Busselle and Bilandzic, 2008;Kahneman, 2012;Shapiro and Kim, 2012).
While these arguments speak for a default-interventionist notion, we would also like to highlight two potential problems with this notion, which make us skeptical that it provides a more suitable view on how media awareness and presence are related than the alternative parallel-competitive notion. First, it is unclear if recognizing a situation as a media exposure situation (i.e. both starting to be and staying media aware) is actually cognitively taxing, as the default-interventionist view would imply. At least, if compared to the typical cognitively taxing System-2 activity addressed in the literature, like for example solving mathematical problems or deeper analytical thinking (Tversky and Kahneman, 1974), activating and refreshing the belief that "this is not really happening" seems a relatively quick and effortless activity. However, we also call for future research that tests if this assumption is eventually correct and if, as we expect, staying "media aware" is not cognitively taxing, and also does not become more cognitively taxing with more intense presence sensations.
A second potential problem we see when applying the defaultinterventionist logic is that it would consider presence a quick default response, and media awareness a slow occasional interventionist activity. We admit that this view converges well with notions in the literature that believing is the default mode in media exposure, and disbelieving might take effort (Gilbert et al., 1993;Shapiro and Kim, 2012). We also agree that perceptual presence sensations are quickly established. Finally, we also admit that our view converges with a default-interventionist logic in assuming that the (System-2) belief "this is not really happening" can turn more salient if perceptual processing (default System-1) encounters inconsistencies (see Figure 1 and Figure 2). However, in our view, presence is also embedded into basic media awareness (i.e., a knowing state "that this is not really happening"). In other words, contrary to the default-interventionist logic that argues presence precedes media awareness, we argue that basic media awareness precedes perceptual sensations like presence, because it is already triggered if a user consciously exposes him/herself to media technology. Similarly, the default-interventionist logic would suggest that feeling present (or "believing") would be the default mode in exposure, and media awareness the occasional intervention. Our approach, in contrast, suggests that both, perceptual sensations and media awareness are constantly co-occurring, and thus jointly define the default mode in media exposure.
To conclude, based on these arguments we think of presence and media awareness as two phenomena that are continuously co-occurring in VR exposure. This view converges with the notion of presence and media awareness as representing two parallel (competitive) processes. However, empirical research is needed to derive a more conclusive picture about how both processes co-occur and affect each other. Is it indeed not cognitively taxing, as we expect, to recall that "this is not really happening"? Is it indeed not harder, as we expect, for a user to stay media aware if presence intensifies? Does recalling that "this is not really happening" indeed not affect presence, as we expect? These central questions can only be more firmly answered based on future empirical evidence. While understanding the exact relations of media awareness and presence as two parallel processes requires further empirical scrutiny, we think the way both jointly shape the overall user experience can already be derived more firmly based on existing empirical research.

Contextualization: Media awareness Qualifies the Consequences of Presence
A core assumption of our approach is that media awareness provides the cognitive backdrop, or context, based on which immediate perceptual sensations like presence are subjectively interpreted by a user. Media awareness thus qualifies, moderates, or contextualizes effects of perceptual presence sensations on subsequent affective, cognitive, and behavioral responses in media exposure (see Figure 3).
Imagine a person sitting in a virtual living room in a perfect VR. This person might have exactly the same presence sensation as if she would be sitting in her real living room. Nevertheless, she might respond to the environment quite differently. The way her sensation of presence motivates subsequent psychological responses (e.g., arousal, emotions, thoughts, behaviour) might be strongly qualified by whether she believes the environment is mediated (thus attributing her presence sensation to technology) or real. Feeling present while believing "this is not really happening" is not the same, and does not cause the same consequences, as feeling present while believing "this is really happening". The person in the present example might, for instance, perceive that she is attacked by a bear in her living room. It will make a difference if the person, in parallel to this perceptual sensation, believes that this is really happening or not. Accordingly, media awareness, or "believing this is not really happening" matters, as this knowing state contextualizes perceptual sensations and changes their meaning (Berthiaume et al., 2021, p. 393), thus making it possible for users to respond to them differently.
More specifically, we propose that believing that "this is not really happening" tempers the effect of perceptual sensations as they seem less self-relevant (Abraham and von Cramon, 2009) and more inconsequential (i.e. unable to seriously physically or psychologically affect a person, Hartmann and Fox 2021). An analogy is to think of media awareness as a protective layer similar to a glass wall (e.g., when encountering a poisonous spider in a zoo, Russell, 1994;Gendler, 2019). The glass wall does not so much change the perceptual sensation of the object on the other side as it changes the overall meaning of the situation. By being aware of the glass wall when encountering the spider, the situation becomes less threatening, arousing, and perhaps even more exciting. This view on media awareness converges with the popular idea among scholars that media provide a protective layer (Andrade and Cohen, 2007), or playground (Vorderer, 2001), because represented objects have no physical impact (e.g., they do not hurt), and their psychological impact can be relatively well controlled (e.g., regulation of undesired affect).
We think the strongest empirical evidence for this view on media awareness comes from experimental studies showing that participants treat identical sensory stimuli, including VR, differently simply based on how they cognitively categorize the stimulus (e.g., avatar vs. agent, e.g., Ahn et al., 2012, mediated vs. real, Pönkänen et al., 2011. Other evidence comes from studies comparing real-life vs. virtual stimuli (e.g., Blankendaal et al., 2015;Gallup et al., 2019), although these studies potentially confound the manipulation of perceptual sensations (e.g., perceiving the stimulus as a physically embodied humanbeing) and people's higher-order cognitive belief or expectation (e.g., categorizing the other as an actually present human-being or representation).
Contextualization does not imply that we think presence would have no effect on users' overall experience, including emotions and behaviour. It implies that we think media awareness qualifies these effects. In fact, in line with a large body of evidence, we assume that in general, the stronger perceptual sensations like presence, the stronger their psychological consequences. For example, users in a fear-ofheight VR that feel more spatially present should experience greater fear than users feeling less present in the environment. However, we believe that media awareness provides a decoupling from perceptual sensations like presence. The effect of perceptual sensations like presence on psychological consequences like fear FIGURE 3 | Possible Interconnections between Presence and Media Awareness and Joint (Interactive) Effects on Outcomes that are Characterizing the Overall User Experience. 1) Media awareness consists of a basic and a dynamically varying part. Basic media awareness is triggered by (conscious) media exposure. Dynamically salient media awareness is triggered by factors outlined in Figure 1. 2) Media awareness and presence represent two parallel processes during exposure. We assume they might not causally affect each other (yet in the absence of empirical evidence this assumption remains speculative). 3) Presence affects outcomes, i.e., physiological (e.g., arousal), affective (e.g., fear), and behavioral (e.g., approach vs. withdrawal) aspects of the user experience. 4) We assume that basic and dynamically salient media awareness moderate these effects of presence on outcomes. For example, fear might be reversed into pleasurable excitation if media awareness reaches a certain salience level. Hence, users' overall media exposure experience needs to be explained based on the interaction of presence and media awareness.
Frontiers in Virtual Reality | www.frontiersin.org April 2022 | Volume 3 | Article 694048 might be less strong, and hence consequences might be weaker, the greater media awareness. For example, among users feeling equally present in a fear-of-height VR, those that are more aware "that this is not really happening" might feel less scared. Next to dampening the consequences of perceptual sensations, greater media awareness might also create novel opportunities on how to respond to perceptual sensations like "feeling present". For example, the stress of perceiving the physical presence of an aggressive bear might be reversed into excitement by being aware that the bear attack is not really happening. More specifically, we suggest that both basic and dynamically salient media awareness qualify the overall user experience in the following ways (see Figure 3, see also Hartmann and Fox, 2021): • Media awareness reduces subsequent physiological responses, arousal and affect. Perceptual sensations like presence might precede and inform physiological and affective responses (i.e., people affectively respond to what they perceive). However, media awareness might weaken the coupling and impact of perceptual sensations on these responses. In general, recalling that "this is not really happening" is considered an effective way to regulate undesired affect in media exposure (Cantor and Wilson, 1988;Hofer et al., 2015). However, media awareness is not just a coping strategy. In an experiment by Pönkänen et al. (2011), participants were exposed to an identical stimulus of another person's animated face. One group was made to believe that the face was a picture on a screen, the other group believed it to be the head of somebody looking through a window from the adjacent room. Eye-gazing of the other person triggered less arousal among the group that believed the face was just a picture, as compared to the other group that believed seeing a real person (see Risko et al., 2016, for related findings). Relatedly, van der Waal et al. (2021) examined how people respond to food stimuli (e.g., chocolate) in real life vs. VR. They find that exposure to food vs. non-food stimuli leads to more salivation in participants in real life, but not in VR. Potentially, awareness "that this is not real" suppressed users' salvation response. Another study by Quaglia and Holecek (2018) found that participants in a fear-of-height VR reported less fear, the more they stayed aware that they were immersed in a VR application. In summary, these studies suggest that (greater) media awareness might weaken the effect of presence on physiological responses (e.g., salivation, arousal) and emotions (e.g., fear). • Media awareness triggers hedonic reversals. Media awareness is known to also allow for hedonic reversals in which negative primary affect is reappraised as something positive. Hence, the consequences of perceptual sensations (e.g., sensing the presence of an attacking bear) might not only be dampened but also reversed in their valence. For example, while sensing the presence of an attacking bear should instigate distress, being aware that "this is not really happening" allows to reverse the valence of this arousal, and thus turn distress into pleasurable excitement (Andrade and Cohen, 2007). This principle of hedonic reversals is well known from roller-coaster rides (body in fear, mind believes it is safe = fun) and other pleasurable body-over-mind experiences (e.g., chili consumption, Rozin et al., 2013). Hedonic reversals seem to work even if encountering highly immersive representations. For example, in the fear-ofheight VR study by Quaglia and Holecek (2018) participants indeed enjoyed the fearful sensation more, the more they stayed media aware. • Media awareness instigates more daring, exploratory, and playful behavior. If media awareness makes users recognize the situation as less consequential, and thus less threatening or risky than it seems, it is plausible that they adapt their behavior accordingly. Users might be inclined to engage in more exploratory, risky, and daring behaviour than they would if they were less media aware, or if they believed to be present in a real-world situation. For example, in the fearof-height VR study by Quaglia and Holecek (2018), in which participants had to step on a plank, participants that were more media aware were more likely to dare jumping off the plank. This finding converges well with the idea of media providing a safe playground in which users explore boundaries (Vorderer, 2001), for better (e.g., entertainment, training) or worse (e.g., disinhibited harmful behaviour, like harassment or trolling).

CONCLUSION AND FUTURE RESEARCH AGENDA: HOW DO USERS EXPERIENCE VR?
In the present paper we aimed to conceptualize the typical VR experience. This goal was inspired by Slater's (2009) widely recognized approach, and particularly his notion of a plausibility illusion, and Skarbez et al. (2017) revision of the plausibility illusion as coherence. We embedded the concept of plausibility into a larger, more general model on how individuals might experience and respond to (mediated) representations, including highly immersive VR content. In our approach, both media awareness and presence are key concepts. We drew on interdisciplinary literature (particularly on the dualistic nature of representations) to draft a parallel-process account, in which the perceptual sensation of presence is contextualized by users' media awareness. We argued (see Figure 3) that both processes jointly shape the overall user experience, because media awareness moderates the effects of presence on physiological responses, affect, and behaviour. In our view, media awareness consists of basic media awareness that is initiated when a user consciously starts a media exposure episode. Therefore, we argue that, to a certain extent, users are constantly media aware. In addition, however, media awareness can vary in salience above these baseline levels throughout exposure. Integrating previous ideas about a plausibility illusion (Slater, 2009) and coherence (Skarbez et al., 2015), we argued that perceived (im)plausibility, both on a sensory/affordance and semantic level, next to motivated recall, affects this dynamic part of media awareness. We reckoned that familiarity might reduce this dynamic media awareness. We also engaged in the thorny topic of how perceptual presence sensations and media awareness might be mutually related, and propose that both co-occur in parallel during exposure without causally affecting each other. Stressing the importance of media awareness, we concluded with a view on how both interact to jointly affect physiological, affective, and behavioral responses that characterize the overall user experience. Altogether, the provided conceptualization suggests that media awareness, albeit often neglected in theoretical explications-and perhaps particularly in literature on VR that often emphasizes users' life-like responses-matters. Based on the present approach we believe that the user experience of VR, and arguably of any presence-evoking medium, can only be thoroughly understood if both presence and media awareness are jointly taken into account. The proposed parallel-processing account answers related calls in the literature and promises to clarify important prevailing questions. For example, Pan and Hamilton (2018), p. 3) recently argued in a special issue on VR as a tool to study social behaviour that "little is known about the cognitive processes which allow us to engage in dual realities." Indeed, in most articles about VR or related media, scholars mention users' media awareness or users' knowledge "that this is not really happening" only as a curious side aspect. Other related approaches suggested that users stay mindless during media exposure, and are hence deluded (e.g., "media equation," Reeves and Nass, 1996). With the present account, however, we hope we challenged these views to potentially make room for a more fine-grained understanding of the dual realities of media exposure in general, and users' VR experience in particular.
A skeptical reader might wonder how our emphasis on media awareness aligns with the evidence that users appear to respond to VR as if it was real? Our account of media awareness is of course not neglecting that particularly immersive media like VR might trigger perceptual sensations coupled with psychological responses that look very similar to responses we observe to equivalent real-world stimuli. We think that these responses happen, because, as we discussed, presence is effective, and changes in the environment can effectively trigger changes in users, if users feel present. For instance, in VR, just like in real life, standing on a small plank very high above the ground evokes more fear as compared to standing on the ground, because users feel present. Hartmann (2017) suggested addressing these "lifelike responses" as structurally equivalent responses (the structure of responses to mediated condition A vs. B is equivalent to the responses to A and B we would observe in real life). However, as seen in examples of hedonic reversals (users are more likely to enjoy standing on a high skyscraper in VR) or very risky behaviour (users dare to jump off the skyscraper in VR), even in VR structural equivalence is not a given. Furthermore, in almost all studies we observe significant differences in the intensity of responses. The difference in response to condition A vs. B, as well as the overall intensity level, is in almost any study on mediated exposure, including VR, strikingly lower than responses to equivalent real-life conditions (e.g., Blankendaal et al., 2015). Our account suggests that this is largely a result of users' media awareness that is either dampening or even reversing these responses to conditions A vs. B.
To summarize, Figure 4 illustrates the idea of structural equivalence vs. intensity difference of people's responses under real vs. virtual conditions in the context of the findings of Blankendaal et al. (2015). Their study found that people are more aroused if the interaction partner behaves aggressively vs. peacefully, and that this is equally true for real and virtual interaction partners. Yet, the study also found clear differences in the intensity of people's arousal (both under peaceful and aggressive conditions), with people interacting with a virtual agent being remarkably less aroused than people interacting with a real confederate in the lab. We would argue that people were more aroused if confronted with an aggressive vs. peaceful real or virtual interaction partner, because they intuitively perceived the interaction partner to be co-present. At the same time, we assume that users' greater awareness that the virtual partner was actually not really co-present dampened arousal under virtual conditions. Similar to Gonzalez-Franco and Lanier (2017), e.g., p. 7), our attempt to conceptualize the overall VR experience of course also raises many open questions. But we think our explications directly suggest testable propositions or hypotheses.
First, to more fully understand the VR user experience in the future, research examining how media awareness, including plausibility cues, and presence sensations relate to each other seems important. For example, future studies could test our model's assumptions by examining the proposed cognitive impenetrability of presence. Scholars could also test if keeping media awareness salient is affected by perceptual sensations like presence, and whether or not it is a cognitively taxing activity (perhaps depending on the intensity of presence).
Relatedly, mostly for the sake of simplicity, but also because these subtypes might be correlated, we treated presence as a unitary concept in this article and did not discuss commonly distinguished subtypes like spatial, social, and self-presence (Lee, 2004). However, future studies could discuss and test if our assumptions generally apply to all types of presence. For example, is recalling that "this is not truly happening" indeed similarly effortless when feeling spatially or socially present or self-present? Or, to raise another question, are spatial, social, and self-presence indeed equally cognitively impenetrable and thus unaffected by parallel media awareness? Second, and not less importantly, researchers might want to test our propositions regarding the joint effects of presence and media awareness on arousal, affect, or behaviour. For example, it would be intriguing to address this hypothesis by manipulating media awareness under conditions of high social presence. Adapting a design of Pönkänen et al. (2011), participants could be exposed to a scene played by real human confederates behind a window. The experimental manipulation could consist of convincing one experimental group that what they see is a sophisticated VR simulation. The other group sees the exact same scene, but is convinced that they simply observe people in the adjacent room.
Measures of presence, physiological responses, subjective or behavioral measures could serve as dependent variables. In addition, a yet to be developed measure of media awareness would have to be included as well. Such a design would allow to test if variations in media awareness, even if encountering "perfect stimuli," qualify users' responses.
Third, we think future research should test the cues that we propose to trigger the belief that "this is not really happening." Further insight into when people believe that "this is not real," also in non-mediated situations (Timmins and Lombard, 2005), might help positioning VR more clearly as "another environmental stimulus" (like a picture or a real object), that triggers general psychological mechanisms linked to reality perception and sensemaking in a specific way, resulting in a specific user experience. In this context, our approach should also be merged with other approaches that generally place (in)consistencies and plausibility, respectively the extent perceived objects meet users' expectations, at the heart of presence (Seth et al., 2012) and related user experiences (Latoschik and Wienrich, 2021). While we focused on (in)consistencies that potentially trigger media awareness in this paper, certain (in)consistencies such as profound violations of basic laws of spatial perception might also affect presence experiences (Hofer et al., 2020).
Fourth, as a methodological challenge, future research would require a separate assessment of the basic vs. dynamic part of media awareness. A related psychometrically tested and validated measurement does not exist to date. A valid, reliable and timesensitive measurement of media awareness might also require a solid theoretical understanding of its dynamic variation throughout exposure. In the light of recent conceptualizations of the temporal development of perceptual processes (e.g., Wirth et al., 2007;Merfeld et al., 2016), does media awareness shift in a dichotomous way (from activated to non-activated) or in a more continuous way (Merfeld et al., 2016, see similarly;Skarbez et al., 2018)?
Fifth and finally, we think the present approach might also raise questions about how users experience mediated representations in general (e.g., see Wolf, 2017). For instance, are the same mechanisms at play when we perceive robots as socially present beings? The above-mentioned findings by Pönkänen et al. (2011) might suggest this. Relatedly, we wonder if the same processes like the one we sketch in the present approach are at work in exposure to symbolic (text) vs. analog (e.g., audio-visual) stimuli. Does media awareness, for example, also qualify users' response to a text like "you are attacked by a bear!", as we assume it does qualify the response in VR exposure? If these processes converge, it would be a bold yet certainly deserving endeavour to work towards a general model of how users experience media or mediated representations.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.