# THE IMPACT OF VIRTUAL AND AUGMENTED REALITY ON INDIVIDUALS AND SOCIETY

EDITED BY : Mel Slater, Maria V. Sanchez-Vives, Albert Rizzo and Massimo Bergamasco PUBLISHED IN : Frontiers in Psychology, Frontiers in Robotics and AI, Frontiers in ICT and Frontiers in Digital Humanities

#### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88963-064-6 DOI 10.3389/978-2-88963-064-6

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# THE IMPACT OF VIRTUAL AND AUGMENTED REALITY ON INDIVIDUALS AND SOCIETY

Topic Editors: Mel Slater, University of Barcelona, Spain Maria V. Sanchez-Vives, August Pi i Sunyer Biomedical Research Institute (IDIBAPS), Spain Albert Rizzo, University of Southern California, United States Massimo Bergamasco, Sant'Anna School of Advanced Studies, Italy

Citation: Slater, M., Sanchez-Vives, M. V., Rizzo, A., Bergamasco, M., eds. (2019). The Impact of Virtual and Augmented Reality on Individuals and Society. Lausanne: Frontiers Media. doi: 10.3389/978-2-88963-064-6

# Table of Contents

*05 Age-Related Differences With Immersive and Non-Immersive Virtual Reality in Memory Assessment*

Adéla Plechatá, Václav Sahula, Dan Fayette and Iveta Fajnerová


Séamas Weech, Sophie Kenny and Michael Barnett-Cowan

*67 How can On-Road Hazard Perception and Anticipation be Improved? Evidence From the Body*

Mariaelena Tagliabue, Michela Sarlo and Evelyn Gianfranchi


Natalia Dużmańska, Paweł Strojny and Agnieszka Strojny

*98 The Past, Present, and Future of Virtual and Augmented Reality Research: A Network and Cluster Analysis of the Literature* Pietro Cipresso, Irene Alice Chicchi Giglioli, Mariano Alcañiz Raya and

Giuseppe Riva


Mina C. Johnson-Glenberg

*156 Combined Cognitive-Motor Rehabilitation in Virtual Reality Improves Motor Outcomes in Chronic Stroke – A Pilot Study* Ana L. Faria, Mónica S. Cameirão, Joana F. Couras, Joana R. O. Aguiar,

Gabriel M. Costa and Sergi Bermúdez i Badia

*169 Learning Empathy Through Virtual Reality: Multiple Strategies for Training Empathy-Related Abilities Using Body Ownership Illusions in Embodied Virtual Reality*

Philippe Bertrand, Jérôme Guegan, Léonore Robieux, Cade Andrew McCall and Franck Zenasni


Benjamin J. Li, Jeremy N. Bailenson, Adam Pines, Walter J. Greenleaf and Leanne M. Williams

*256 Reporting Mental Health Symptoms: Breaking Down Barriers to Care With Virtual Human Interviewers*

Gale M. Lucas, Albert Rizzo, Jonathan Gratch, Stefan Scherer, Giota Stratou, Jill Boberg and Louis-Philippe Morency


Yu Tian, Yulong Bian, Piguo Han, Peng Wang, Fengqiang Gao and Yingmin Chen


# Age-Related Differences With Immersive and Non-immersive Virtual Reality in Memory Assessment

Adéla Plechatá1,2 \*, Václav Sahula<sup>1</sup> , Dan Fayette1,2 and Iveta Fajnerová<sup>1</sup> \*

<sup>1</sup> National Institute of Mental Health, Klecany, Czechia, <sup>2</sup> Third Faculty of Medicine, Charles University, Prague, Czechia

Memory decline associated with physiological aging and age-related neurological disorders has a direct impact on quality of life for seniors. With demographic aging, the assessment of cognitive functions is gaining importance, as early diagnosis can lead to more effective cognitive interventions. In comparison to classic paper-andpencil approaches, virtual reality (VR) could offer an ecologically valid environment for assessment and remediation of cognitive deficits. Despite the rapid development and application of new technologies, the results of studies aimed at the role of VR immersion in assessing cognitive performance and the use of VR in aging populations are often ambiguous. VR can be presented in a less immersive form, with a desktop platform, or with more advanced technologies like head-mounted displays (HMDs). Both these VR platforms are associated with certain advantages and disadvantages. In this study, we investigated age-related differences related to the use of desktop and HMD platforms during memory assessment using an intra-subject design. Groups of seniors (N = 36) and young adults (N = 25) completed a virtual Supermarket Shopping task using desktop and HMD platforms in a counterbalanced order. Our results show that the senior performances were superior when using the non-immersive desktop platform. The ability to recall a shopping list in the young adult group remained stable regardless of the platform used. With the HMD platform, the performance of the subjects of both groups seemed to be more influenced by fatigue. The evaluated user experiences did not differ between the two platforms, and only minimal and rare side effects were reported by seniors. This implies that highly immersive technology has good acceptance among aging adults. These findings might have implications for the further use of HMD in cognitive assessment and remediation.

Keywords: virtual reality, memory assessment, aging, immersion, neurocognitive methods

# INTRODUCTION

Cognitive functions play an important role in our everyday lives, governing our thoughts and actions and enabling successful adaptation to changes occurring in the surrounding environment (Sternberg et al., 2012). Our cognitive abilities can be affected during aging by common physiological processes and by neuropsychiatric and neurological disorders such as Alzheimer's

#### Edited by:

Massimo Bergamasco, Sant'Anna School of Advanced Studies, Italy

#### Reviewed by:

Pedro Gamito, Universidade Lusófona, Portugal Pascual Gonzalez, University of Castilla La Mancha, Spain

\*Correspondence:

Adéla Plechatá adela.plechata@nudz.cz Iveta Fajnerová iveta.fajnerova@nudz.cz

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 31 October 2018 Accepted: 22 May 2019 Published: 11 June 2019

#### Citation:

Plechatá A, Sahula V, Fayette D and Fajnerová I (2019) Age-Related Differences With Immersive and Non-immersive Virtual Reality in Memory Assessment. Front. Psychol. 10:1330. doi: 10.3389/fpsyg.2019.01330

**5**

disease (AD) and vascular impairments. In the context of demographic aging, with adults over 65 years of age forming 15% of the entire United States population (United States Census Bureau, 2018) and 19.2% of the European Union population (Eurostat, 2018) the problems associated with older age are gaining in importance. Physiological aging typically accompanies decline across all cognitive domains, mainly in processing speed, divided attention, language, visuospatial abilities, memory, and executive functions (Harada et al., 2013). The most robust manifestation of physiological aging is visible memory decline (Rönnlund et al., 2005); this is subjectively the most relevant for seniors (Harada et al., 2013). In AD diagnostics, episodic memory plays an important role. The deficit in episodic memory in seniors is strongly pronounced and can be demonstrated both in errors of recent autobiographical memory and laboratory assessments using recall and recognition tasks (Rönnlund et al., 2005). The deficit in episodic memory is detectable using neuropsychological measurements up to 10 years before the diagnosis of AD; it could therefore possibly be used as a marker for early diagnosis (Bäckman et al., 2001; Boraxbekk et al., 2015). Early diagnosis can result in better-timed and more effective interventions, which might delay further progression of the cognitive decline (Naqvi et al., 2013). Thus, in the light of increasing life expectancy, the assessment of age-related memory changes is growing in relevance.

Memory deficit is usually assessed using classic paper-andpencil neuropsychological methods; such methods have been questioned for their lack of ecological validity since 1978 (Neisser, 1978). Ecological validity can be understood as the degree to which experimental conditions approximate conditions in the real-world environment (Tupper and Cicerone, 1990) or the extent to which the test performance or study results can be generalized to real-life settings (Franzen, 1997). Classic neuropsychological tests fail to resemble real-world demands, and there has been increasing interest in neuroscience in the use of advanced technology (Parsons, 2015). Computer technologies enable precise test administration, stimulus presentation, and automatic response recording. Virtual reality (VR) is gaining in popularity due to its ability to present three-dimensional objects and create complex virtual environments (VE) that might be realistic and ecologically valid while also being precisely controllable (Parsons, 2015).

Important term linked to VR is immersion. Immersion was defined by Slater (2009) as a characteristic of the technology used for VE presentation; basically, the higher the quality of the system, the higher the level of immersion (for example, in terms of the tracking latency, the size of the field of view, or the visual quality of the scene and images). Immersion is also determined by the ability of the system to support sensorimotor contingencies, such as how the technology responds to the action performed by the user to perceive reality, e.g., turning the head to change the gaze direction (O'Regan and Noë, 2001).

Despite the obvious benefits of HMD technology (multisensory stimulation, tracking of the head and body movements, higher sense of presence), results of previous studies are not conclusive in terms of the advantages of HMD in assessing cognitive performance nor in its usability in the senior population. Previous studies have shown superior performance either using HMD (Bowman et al., 2009; Murcia-López and Steed, 2016) or using less immersive technology, such as desktop or large screen platforms (Ruddle et al., 1999; Mania and Chalmers, 2001; Sousa Santos et al., 2009). Moreover, the majority of the studies comparing HMD and less immersive technologies in terms of cognitive performance have focused on navigation or spatial memory (Ruddle et al., 1999; Bowman et al., 2009; Sousa Santos et al., 2009; Murcia-López and Steed, 2016); few studies have investigated other cognitive domains (Mania and Chalmers, 2001; Rand et al., 2005). The findings considering preference and usability of HMD seem to be more consistent, showing a preference for higher immersion technologies, mainly in terms of increased motivation (e.g., Moreno and Mayer, 2004; Richards and Taylor, 2015; Parong and Mayer, 2018), more intuitive action control, and greater enjoyment associated with task fulfillment (e.g., Sousa Santos et al., 2009). Most of these studies (except Rand et al., 2005) were conducted on young subjects; their findings cannot be easily generalized to the senior population. There is not enough evidence indicating the applicability and acceptance of HMD for cognitive assessment and training in seniors.

The aims of our study are:


We used an intra-subject design to investigate the role of the level of immersion on performance and user experience in memory assessment. We were interested in the difference in acceptance as evaluated by seniors (60 years and older) and by young adults (up to 40 years old). HMD has been previously considered more intuitive and motivating (Martínez-Arán et al., 2004; Richards and Taylor, 2015; Parong and Mayer, 2018). We therefore hypothesized that the platform used will affect user experience. We expected to find differences between platforms in memory performances, as the more immersive technology is seen as more engaging and thus might result in better cognitive outcomes. This hypothesis is in contrast with some previous findings that associate the HMD platform with lower cognitive performance. We speculate that recent innovations in the technology of virtual glasses might lead to a different outcome.

# MATERIALS AND METHODS

# Participants

Thirty-six seniors (13 males and 23 females, mean age = 69.47; SD = 7.39; age range = 60–91) and 25 young adults (9 males and 16 females, mean age = 25.4; SD = 5.13; age range = 19–39) voluntarily participated in this study. All participants signed an informed consent form containing

#### TABLE 1 | Summary table of demographic characteristics for individual age groups.


information about the experiment procedure and exclusion criteria. The study was approved by the ethics committee of the NIMH in Klecany. Seniors were recruited from the database of the Department of Cognitive Disorders (NIMH) where they were neuropsychologically evaluated and classified as cognitively healthy. Young adults were recruited from the NIMH database of healthy volunteers to be matched in sex and education level to the group of seniors. Participants were not included in the study if they had major neurological disorders, diagnosed psychiatric illness, recent traumatic brain injury, brain surgery, or another illness involving major visual or movement impairment that would prevent them from participating in the experiment. The groups did not differ in demographic characteristics (apart from age). Detailed characteristics of the groups of seniors and young adults are presented in **Table 1**. **Figure 1** presents group-specific distributions of characteristics related to the computer/videogame experience obtained from the usability questionnaire (see section "Usability Questionnaire").

# Cognitive Evaluation

All participants were assessed using standard neuropsychological methods to briefly evaluate their cognitive performance, particularly learning and declarative memory, psychomotor speed, and mental flexibility.

The Czech version of the Rey Auditory Verbal Learning Test (RAVLT) (Rey, 1964; Preiss, 1999) was used as a standard measure of episodic memory (Pause et al., 2013) evaluating verbal learning and delayed recall. For the group comparison we used the total number of recalled words (RAVLT I-V) and the number of words correctly recalled after a 30-min delay (RAVLT delayed).

The Czech version of the Trail Making Test (TMT) (Reitan and Wolfson, 1985; Preiss and Preiss, 2006) was used as a standard measure of psychomotor speed and attention. Part A (TMT-A) evaluates psychomotor speed and visual attention; part B (TMT-B) is focused on visuospatial working memory and mental flexibility.

# The Virtual Supermarket Shopping Task

The virtual Supermarket Shopping Task (vSST) was specifically designed using Unity Engine software<sup>1</sup> for assessing episodic memory in an ecologically valid environment. The desktop

<sup>1</sup>https://unity3d.com/

version of the task was tested on patients with chronic schizophrenia and on healthy young adults (Plechatá, 2017; Plechatá et al., 2017). Other than feasibility testing in a pilot study using both desktop and HMD platforms, no sample of seniors has previously been assessed using the vSST task. The task was originally created in order to assess everyday functioning in a virtual environment that reflects real-world situations. The task is similar to neuropsychological multiple errand tasks, but it is performed in virtual reality, which ensures a safe environment and complete control over the presented stimuli (Parsons, 2015). A similar fully immersive shopping task was recently validated as a measure of episodic memory performance (Corriveau Lecavalier et al., 2018).

The virtual environment of the vSST resembles a grocery store in which the subject is supposed to remember a shopping list and later find and collect recalled items in the virtual shop. Prior to the beginning of the testing, the participant has time to explore the VE and to become familiar with the control system. The length of the exploration phase differed according to the platform used (10 min for HMD and 4 min for desktop). Each trial of the vSST task consist of two phases: the acquisition phase (presentation of the shopping list) and the recall phase (testing the recall of the shopping list by direct collection of individual items in the virtual supermarket). Between the acquisition and recall phases, participants were instructed to play a visuospatial game, the LEU Brain Stimulator<sup>2</sup> , for 3 min as a distraction task. The length of the delay was directly controlled by the vSST application, and the countdown was displayed on the screen.

The vSST had four consecutive levels of increasing difficulty (requiring remembering three, five, seven, and nine items on the shopping list). The first trial, with three items, was meant as a pretraining trial and its results were not further analyzed. The length of the acquisition phase increased automatically by 5 s for each item added to the list (i.e., 15 s for three items; 25 s for five items; 35 s for seven items; 45 s for nine items). After completing each recall phase, the results (number of errors, trial time, and trajectory) were presented to the participant. The beginning of the next acquisition phase was controlled by the participant, who could start off the next trial by pressing a confirmation button with the mouse or with the HTC VIVE controller.

<sup>2</sup>http://www.leubrainstimulator.com/

In order to allow for repeated assessment using the vSST, two task variants of the shopping list were created for each difficulty level (variant A and variant B). Both variants were demonstrated to be comparable in terms of difficulty in the previous study (Plechatá, 2017).

The vSST makes it possible to evaluate three main variables: errors (omissions – missing items, and intrusions – additional items) committed while recalling individual items from the shopping list, time spent solving the task (recalling and picking up the item) and trajectory length (distance traveled in VE). For the purposes of this study, we report only the number of errors directly related to memory recall. Moreover, the movement control was different across the platforms (teleportation in HMD together with free real-world movements vs. walking using a keyboard in the desktop platform); therefore, platforms are not fully comparable in terms of trajectory traveled and solving time.

# Usability Questionnaire

For this study, we developed a 55-item usability questionnaire inspired by previous usability studies (Lewis, 1995; Kaufmann and Dünser, 2007). The questionnaire has four main parts, which are summarized in **Table 2**. Responses considering user experience with platforms and comparison of the platforms were recorded using a five-point Likert scale (ranging from "strongly disagree" designated as 1 to "strongly agree" designated as 5). In the analysis of the questionnaire, we worked with cumulative raw scores for each platform. The cumulative score was computed by combining the score of 14 items. From the UQ II HMD and UQ II D, we extracted nine questions (three of these items were reversed); five more questions were obtained from UQ III. Adverse effects and pleasantness of the platform were analyzed separately based on individual items of the questionnaire. For more information please see the **Supplementary Material**.

#### TABLE 2 | Structure of usability questionnaire.

fpsyg-10-01330 June 10, 2019 Time: 14:49 # 5


The table displays the four main parts of the usability questionnaire, descriptions, and the corresponding numbers of items.

# Materials

The experiment was conducted in a NIMH VR lab which was a 7 m long × 5 m wide × 3.5 m high open space. HTC VIVE was used as the HMD platform, with a display resolution of 1080 × 1200 pixels per eye. The motor activity of the participants was tracked using the HTC VIVE headset and controller. The movement in VE was enabled using teleport on the HTC VIVE controller (trackpad) and also by physically walking around the room (walking was limited by the room parameters). The controller trigger was used for the selection of objects. For the desktop platform, a 24-inch monitor with a display resolution of 1920 × 1080 pixels was used. The participants controlled their movements and pick up/drop actions using the keyboard arrows and a computer mouse.

# Procedure

To compare platform usability and platform influence on measured performance, we used an intra-subject design with a counterbalanced order. The participants performed vSST in two conditions with different levels of immersion according to the platform applied: HMD and desktop. During the experiment, we counterbalanced both the order of the platforms (HMD/desktop) and the two vSST task variants (A/B – sets of the lists to remember) to minimize the practice effect on repeatedly measured performance.

After performing the vSST using the first platform selected according to the counterbalanced order (HMD/desktop, see **Figure 2**), the participants completed the first two parts of the Usability Questionnaire (UQ I and UQ II HMD/desktop). After

performing the vSST using the second platform, participants completed the remaining two parts of the questionnaire (UQ II HMD/desktop and UQ III). Seniors completed a neurocognitive evaluation in a separate session prior to the experiment; young adults were assessed in the end of the experimental procedure.

# Statistics Analysis

The statistical analysis was performed using statistical software IBM SPSS Statistics 19. The group differences in the standard cognitive assessment were analyzed by Mann-Whitney U test. Analyses of the differences in vSST performances and user experiences in terms of platform, group and order were examined for statistical significance using ANOVA for repeated measures including the Tukey post hoc test. The individual vSST errors and individual questions from usability questionnaire were analyzed using and Wilcoxon Sign Test.

# RESULTS

# Results of the Cognitive Evaluation

In order to compare both tested groups in terms of cognitive functioning controlled by the age effect, prior to the statistical analysis, the raw data acquired from the standard neuropsychological methods were transformed to percentiles according to the Czech normative data (Preiss et al., 2012). We used non-parametric Mann-Whitney U test to compare the two groups (seniors and young adults). The normative cognitive performance of seniors in RAVLT and TMT did not differ from that of young adults. The evaluated variables and statistical data for the group comparison can be found in **Table 3**.

#### TABLE 3 | Results of the cognitive assessment.

fpsyg-10-01330 June 10, 2019 Time: 14:49 # 6


Raw scores and percentiles of test variables (presented in means and SD) reported separately for each tested group and results of statistical comparison between senior and young adult groups. RAVLT, rey auditory verbal learning task; RAVLT I-V, total number of recalled words (highest possible score is 75); RAVLT delayed, number of words recalled after a 30-min delay (from a total of 15 words); TMT-A, trail making test part A; TMT-B, trail making test part B.

# The Virtual Supermarket Shopping Task Performance

In vSST, we were mainly interested in the number of errors as a parameter measuring the recall accuracy crucial for assessing memory abilities.

#### Cumulative vSST Errors

In the statistical comparison, we analyzed cumulative errors consisting of combined omission and intrusion errors made during three levels of task difficulty (for five, seven, and nine items on the list). We used a general linear model (GLM) with ANOVA for repeated measures with platform, group, and order of platforms as within-subject factors to analyze vSST errors (see **Figures 3**, **4**). The analysis revealed the main effect of platform – the difference between the mean of HMD errors 8.31 (SD = 5.21) and the mean of desktop errors 6.98 (SD = 4.88) is significant, F(1,57) = 7.474, p = 0.008. A significant main effect was found also in terms of group (F(1,57) = 45.814, p < 0.001) with the mean of errors 20.5 (SD = 8.03) for

seniors and the mean of errors 7.8 (SD = 5.02) for young adults. Furthermore, the GLM analysis revealed two interaction effects, for platform<sup>∗</sup> group F(1,57) = 4.219, p = 0.045 and for platform<sup>∗</sup> order F(1,57) = 6.091, p = 0.017.

The Tukey post hoc test was used to test these interactions, which revealed a significant difference between the HMD errors (mean 11.43, SD = 4.23) and desktop errors in seniors (mean 9.08, SD = 4.64), p = 0.001. The performance of the group of young adults did not differ across the platforms (p = 0.998). Furthermore, a post hoc test showed the difference between HMD errors (mean 9.34, SD = 5.17) and desktop errors (mean 6.69, SD = 4.68) while performing HMD second (platform∗order), p < 0.001, whereas the vSST errors did not differ across the platforms when applying HMD first (p = 0.997). No effect of platform order was found with the desktop platform.

## vSST Errors in Individual Trials

Using the Wilcoxon signed rank test, we analyzed particular vSST errors in individual trials for each tested group to further investigate the variance between the platforms. After applying Bonferroni correction for repeated statistical tests, the difference between the two platforms was not significant in terms of individual vSST errors. **Table 4** shows the specific values for each platform and group with appropriate statistics.

# Usability Questionnaire Cumulative Score

We applied a general linear model (GLM) with ANOVA for repeated measures with platform, group, and order of platforms as within-subject factors to analyze the summary results for the usability of individual platforms (for details, see **Figure 5**).

The analysis revealed a main effect of group with the mean usability score 105.29 (SD = 11.71) for seniors and 114.64 (SD = 6.40) for young adults [F(1,56) = 10.986, p = 0.002]. Furthermore, the analysis revealed only one interaction effect for platform<sup>∗</sup> group F(1,56) = 6.148, p = 0.016.

For further analysis of this interaction effect, we used the Tukey post hoc test, which revealed a significant difference

∗∗∗ p-value < 0.001; n.s., p-value > 0.05.

TABLE 4 | Number of errors in individual trials of vSST for each platform and group.


The table reports mean number and SD of total errors, intrusions and omissions, and statistical difference in total errors for each group according to the platform used. The differences for total errors obtained in individual trials are reported with corresponding statistics. There are no significant effects after applying the Bonferroni correction (α = 0.017).

(p < 0.001) between HMD scores in seniors (mean 50.49, SD = 11.29) and HMD scores in young adults (mean 59.72, SD = 5.86); the user experience with the desktop platform showed no group effect (p = 0.999). There was no significant difference between the platforms' usability scores in either of the age groups.

#### Individual Questions

In addition to cumulative scores calculated for individual platforms and groups, we analyzed the results for individual items from sections UQ II HMD and UQ II D. Because of the Likert scale usage, we investigated the difference between the platforms with a non-parametric Wilcoxon-signed rank test. After Bonferroni correction for repeated statistical comparison (α = 0.01), we observed a significant difference between the platforms only in the group of young adults. Specifically, the young adults preferred HMD (mean 4.2, SD = 1.11) over the desktop platform (mean 2.04, SD = 0.97), Z = −3.42, p < 0.001. The young adults also enjoyed the HMD (mean = 4.32, SD = 0.9) significantly more than the desktop (mean 2, SD = 0.81), Z = −3.98, p < 0.001). For details, see **Table 5**.

FIGURE 5 | Boxplots of cumulative scores of the Usability questionnaire. Boxplots represent the following information – the line is plotted at the median, box extends from the 25th to 75th percentiles, the whiskers are drawn up/down to the 10th and 90th percentile, and points represent the outliers. The results of statistical analysis are visualized as follows: full line markers represent the group effect, dashed line markers represent group∗platform interaction effects, significance levels are presented as ∗∗∗ p-value < 0.001; ∗∗ p-value < 0.01; n.s., p-value > 0.05.

TABLE 5 | Mean score of individual questions.


We report mean scores (SD) for individual statements from the usability questionnaire for each platform and group separately. The difference is reported with corresponding statistics. Significant effects after applying the Bonferroni correction (α = 0.01) are marked with a symbol <sup>∗</sup> .

#### Side Effects

In the usability questionnaire sections UQ II HMD and UQ II D, we asked participants about the adverse effects of the specific platform. The participants were asked about unpleasant feelings connected with the task; if they reported the presence of unpleasant feelings, they were asked to specify the feeling (Was the unpleasant feeling connected with experienced discomfort? Select one or more options from the list of the possible adverse effects. . .). The incidence of the side effects, including their specific characteristics, are reported in **Table 6**. Importantly, the reported side effects were small and no participant asked to terminate their participation in the study.

TABLE 6 | The incidence of reported side effects associated with VR experience.


# DISCUSSION

The main findings of the presented study are the significant age-related differences across the tested VR platforms (HMD vs. desktop) that were identified not only in terms of assessed performance but also in user experience. This age-related effect is not surprising as the addressed groups typically differ in experience with new technologies, of which HMD is an example.

# Memory Recall

The study aimed to evaluate possible effects of immersion level (desktop vs. HMD platform) on the ability to recall items from a presented shopping list (participant accuracy was expressed as the number of errors in the vSST task). According to our results, the seniors made significantly more errors when using the HMD platform than when using the desktop platform. The vSST recall performance of the young adults was stable regardless of the platform used. Our findings for the senior group are in accordance with some previous studies investigating navigation and spatial memory (Sousa Santos et al., 2009) that associated the desktop platform with superior performance. Similar findings were reported in a study by Mania and Chalmers (2001) that investigated the ability to recall information from a seminar presented in four conditions: a real-world environment, desktop, HMD, and audio-only. According to that study, the memory performance was the best in the real-world scenario and the worst in the HMD platform. Moreover, the memory recall was statistically higher in the desktop platform than in HMD.

Other studies favor the HMD platform in terms of spatial memory recall (Ruddle et al., 1999; Bowman et al., 2009; Murcia-López and Steed, 2016). A possible explanation for such contradictory results is that the benefits of HMD, such as the active movement control and rotation controlled by head movements, are highlighted in studies that assess spatial navigation abilities. This potential of HMD might be overshadowed by different factors in non-spatial memory tasks.

We speculate that the presentation of the recall tasks in HMD can lead to perceptual or cognitive overload; the participants are present "inside" a virtual environment with possibly higher perceptual stimulation (Richards and Taylor, 2015). The possibility that higher immersion is a distracting factor while learning a task has been investigated. Despite

the motivational potential of HMD, the higher immersion can distract participants from the studied material (Moreno and Mayer, 2004; Richards and Taylor, 2015; Parong and Mayer, 2018). Makransky et al. (2019) pointed out a possible effect of higher levels of cognitive load (measured by EEG) associated with more immersive technology. These findings may explain the inferior HMD performance observed in the seniors, considering the goal of the task (remembering a shopping list). The difference between the young adult and senior subjects in our study could be thus related to the lower ability to inhibit distracting information in seniors (Moreno and Mayer, 2004).

On the other hand, the higher stimulation and distraction of the HMD platform might in some way reflect its higher ecological validity in comparison to the desktop platform. For this reason, it would be beneficial to add an extra measure of ecological validity in future comparative studies.

Importantly, most of the mentioned studies did not investigate age-related differences. Such a comparison, in terms of acceptance of new technologies and memory assessment, is important, as memory decline is typical in older adults (Small, 2001). A comparison of the different platforms and two age groups (young adults ages 16–35; seniors ages 60–75) was conducted by Rand et al. (2005). The authors used the "Virtual Office" environment, which was developed to assess attention and memory performance (Rizzo et al., 2002). Based on the obtained results, the performance of both age groups was significantly lower when using the HMD platform. These findings are only partially in accordance with our results as the authors observed an inferior HMD performance also in young adults. This difference in the obtained results could be explained by technological progress in HMD devices in recent years.

Regardless of the observed effect of platform on performance in the memory task in seniors, the fact that the group of seniors performed worse in both platforms than the group of young adults confirms the validity of vSST for memory assessment. The validity of the task was also indicated in previous studies conducted on healthy young adults and patients with chronic schizophrenia (Plechatá, 2017; Plechatá et al., 2017).

By counterbalancing the order of the platforms and task variants applied we controlled for possible effects of fatigue and practice effect. A similar approach was applied in other studies (Ruddle et al., 1999; Sousa Santos et al., 2009). Additionally, in our study the platform order was used as a confounding variable in the presented GLM analysis. We expected that previous experience with the task using the desktop platform would improve consecutive HMD performance. Surprisingly, when using the desktop platform first, the participants from both age groups made higher numbers of errors using HMD than they did using the desktop platform. In contrast, if the HMD platform was presented first, the performance was comparable between both platforms.

Several possible factors might have induced this interaction effect. We argue that the HMD performance might be influenced by the fatigue of the subjects (due to the repeated measurement); the results would differ with the desktop platform, as most of the participants had previous experience with the desktop but not with the HMD platform. Higher sensitivity to fatigue in seniors (Eldadah, 2010) can be also associated with the perceptual overload of HMD, mentioned above, which can lead to higher difficulty of the task itself. Unfortunately, to our knowledge none of the previous studies analyzed the effect of the order in which the platforms were applied (Ruddle et al., 1999; Sousa Santos et al., 2009).

# User Experience

According to the results of the usability questionnaire, the user experience with HMD or desktop platforms is not comparable across the different age groups. The seniors evaluated the HMD experience differently than the young adult subjects. In general, the young adults evaluated the experience with higher scores than the seniors did. However, in the cumulative score of the questionnaire, we found no significant preference for HMD or desktop platform in the young adult or senior participants. The fact that the young adults scored higher in the usability questionnaire than seniors did regardless of the platform may reflect a difference in their attitude toward the specific task or toward computer technology in general.

In respect to individual categories evaluated in the usability questionnaire, the participants in our study favored neither HMD nor desktop platforms in terms of input controls or intelligibility of the task. Nevertheless, the younger adults stated that they liked the HMD platform more than desktop platform. Similarly, the younger participants enjoyed the experience of using HMD more than using the desktop platform. Our findings are in line with the results of previous studies that favored the HMD platform over desktop and screen platforms (Adamo-Villani and Wilbur, 2008; Sousa Santos et al., 2009) in cognitive assessments of young adults. The participants of these studies preferred HMD in general; they considered it more intuitive (Sousa Santos et al., 2009) and more fun (Adamo-Villani and Wilbur, 2008). As both evaluated factors are closely related to motivation, these results might also be supported by studies focusing on the potential of HMD for educational purposes showing that the more immersive technology increased motivation to study (Moreno and Mayer, 2004; Richards and Taylor, 2015; Parong and Mayer, 2018).

On the other hand, the user experience evaluated by seniors in our study did not reflect these findings as the seniors preferred neither HMD nor the desktop platform. Unfortunately, to our knowledge, the existing studies comparing the two platforms in cognitive assessments did not involve older adults. The only exception is the study by Rand et al. (2005), which did not investigate the platform-dependent difference in the user experience. None of the seniors recruited in our study had previous experience with HMD and virtual reality games, while most of the seniors were experienced with computers. As was demonstrated previously, repeated exposure to immersive VR can lead to a decrease of its adverse effects (Taylor et al., 2011); therefore, it could be expected that it also leads to the improvement in other variables of the user experience. The role of repeated exposure either to HMD or to the task itself should be further studied in order to evaluate its potential for cognitive training and remediation.

Considering the adverse effects of immersive virtual reality, the presence of typical side effects associated with HMD were very

low among seniors. Moreover, no cybersickness symptoms were reported in the group of young adults. The higher acceptance of immersive VR in this study without negative side effects could be associated with the design and navigation system used in the task (combination of teleport and active movement).

# Limitations

Despite our effort to control for other confounding factors (e.g., by a counterbalanced order of the platforms), we admit that the differences observed in the task performance could have been influenced by other variables.

In particular, the inferior performance in HMD observed in the group of seniors could be associated with the small but important distinction of the experimental procedure. In contrast to the desktop platform, during the HMD condition the participant was instructed first to take off the HMD and then to sit at a nearby table and play a visuospatial game LEU (used as a distractor in both platforms). Thus, with the HMD platform, there was a specific additional distractor in the form of removing the HMD glasses. Moreover, the participants were standing during HMD and sitting while using desktop platform. The different motor involvement in the task and different control system could influence task performance. This effect could be even stronger in a group of seniors with lower visuospatial coordination abilities (Hoogendam et al., 2014). In future studies, the distinction in the experimental setting could be eliminated by adding a distraction task directly into the VR application, thus not requiring participants to take off HMD glasses during the procedure.

Despite the investigation of the role of immersion, we did not study the sense of presence that is typically measured by questionnaires (Slater et al., 1994) after performing the VR task. As the level of presence was not a key variable in this study, it was not investigated mainly due to higher time demands of the experimental procedure in individual participants. It could be, however, beneficial to study the difference in the sense of presence especially in seniors, as it might explain the age-related variance in the platform performance and user experience in more detail. It was previously shown that the sense of presence is typically higher when using more immersive technology (Slater, 2018). A recent study (Corriveau Lecavalier et al., 2018) showed that both young and older adults experience comparable level of presence in immersive VR environment. However, this study also reports positive correlation between the performances measured in a Virtual Shop task aimed at episodic memory and reported sense of presence in seniors. These results do not explain the negative effect of higher immersion on performance of seniors found in our study. This discrepancy should be therefore addressed in future studies.

Finally, despite the reasonable number of participants recruited in this study, the number of subjects with limited or no PC experience made it impossible to evaluate the possible benefits of HMD technology in such participants, especially in the group of seniors. Future studies should investigate the role of ecological validity in terms of VR immersion level and behavioral outcomes of the participants.

# CONCLUSION

In the presented study, we studied the age-related differences between HMD and desktop platforms in memory assessment using an intra-subject design. Groups of seniors and young adults performed a virtual Supermarket Shopping task aimed at episodic memory using HMD and desktop platforms in a counterbalanced order. We focused on the role of the level of immersion on the task performance and its usability. According to our results, the senior performances were inferior in HMD in contrast to the desktop platform. The measured performance of the young adults was stable and comparable regardless of the platform used. In the context of the diagnostic application of VR tasks in seniors, our results indicate that it is necessary to create separate normative data for the task, dependent on the VR platform used for the assessment. Furthermore, the HMD platform was more influenced by fatigue of the participants, as the performance was lower on HMD for both groups when performing HMD as the second platform. In general, the seniors evaluated their user experience lower than the young adults did regardless of the platform used. We did not find any significant platform-related differences in overall user experience in any of the tested groups. However, according to the data obtained in individual items of the questionnaire, the young adults tended to prefer HMD over the desktop platform.

Our results indicate that performing the task with HMD may be more difficult than with the desktop platform; this difficulty may be associated with perceptual overload in the senior subjects. It might also indicate the superior ecological validity of the HMD presented task; this possibility should be studied further. The fact that the user experience did not differ across the platforms used and only minimal side effects were reported indicate that highly immersive technology may be well accepted by aging adults. This may have implications for the further use of HMD in cognitive remediation; this has been proposed in previous studies (Gamito et al., 2014). We hypothesize that with repeated HMD experiences, seniors will find it more motivating and intuitive to use than the desktop platform. However, in the context of diagnostic use of VR in a single session, the benefits of higher immersion are questionable.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of "NIMH CZ Ethics Committee" with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the "NIMH CZ Ethics Committee."

# AUTHOR CONTRIBUTIONS

AP was responsible for the design of the experiment and data collection. VS developed the virtual supermarket shopping task. DF was responsible for recruiting the participants. IF supervised the whole study and together with AP was responsible for writing the manuscript.

# FUNDING

This study was funded by the Charles University grant agency project no. 1832218, with financial support from the European Regional Development Fund project "PharmaBrain" no. CZ.02.1.01/0.0/0.0/16\_025/0007444 and Technology Agency of the Czech Republic project no. TL01000309.

# ACKNOWLEDGMENTS

fpsyg-10-01330 June 10, 2019 Time: 14:49 # 11

We would like to thank Aleš Bartoš and his team at the Department of Cognitive Disorders NIMH who were responsible for creating the database of healthy senior

# REFERENCES


participants that allowed us to recruit this group of volunteers. We thank Jan Šeliga for the preparing the cumulative dataset, and the students who participated in recruiting and assessing the volunteers, mainly Filip Havlík, Markéta Slezáková, and Hana Šrámková. We also thank Dr. Tereza Nekováˇrová for her feedback on the study design.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.01330/full#supplementary-material



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Plechatá, Sahula, Fayette and Fajnerová. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Understanding AWE: Can a Virtual Journey, Inspired by the Overview Effect, Lead to an Increased Sense of Interconnectedness?

#### Ekaterina R. Stepanova\*, Denise Quesnel and Bernhard E. Riecke

*iSpace Lab, School Of Interactive Arts and Technology, Simon Fraser University, Surrey, BC, Canada*

#### Edited by:

*Albert Rizzo, University of Southern California, United States*

#### Reviewed by:

*Carlos Vaz De Carvalho, Polytechnic Institute of Porto, Portugal Glenn Ryan Fox, University of Southern California, United States*

\*Correspondence:

*Ekaterina R. Stepanova erstepan@sfu.ca*

#### Specialty section:

*This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Digital Humanities*

Received: *01 November 2018* Accepted: *29 April 2019* Published: *22 May 2019*

#### Citation:

*Stepanova ER, Quesnel D and Riecke BE (2019) Understanding AWE: Can a Virtual Journey, Inspired by the Overview Effect, Lead to an Increased Sense of Interconnectedness? Front. Digit. Humanit. 6:9. doi: 10.3389/fdigh.2019.00009* Immersive technology, such as virtual reality, provides us with novel opportunities to create and explore affective experiences with a transformative potential mediated through awe. The profound emotion of awe, that is experienced in response to witnessing vastness and creates the need for accommodation that can lead to restructuring of one's worldview and an increased feeling of connectedness. An iconic example of the powers of awe is observed in astronauts who develop instant social consciousness and strong pro-environmental values in response to the overwhelming beauty of Earth observed from space. Here on Earth, awe can also be experienced in response to observing vast natural phenomenon or even sometimes in response to some forms of art, presenting vast beauty to its audience. Can virtual reality provide a new powerful tool for reliably inducing such experiences? What are some unique potentials of this emerging medium? This paper describes the evaluation of an immersive installation *"AWE"*—Awe-inspiring Wellness Environment. The results indicate that the experience of being in *"AWE"* can elicit some components of awe emotion and induce minor cognitive shifts in participant's worldview similar to the Overview Effect, while this experience also has its own attributes that might be unique to this specific medium. Comparing the results of this exploratory study to other virtual environments designed to elicit Overview Effect provides insights on the relationship between design features and participant's experience. The qualitative results highlight the importance of perceived safety, personal background and familiarity with the environment, and the induction of a small visceral fear reaction as a part of the emotional arc of the virtual journey—as some of the key contributers to the affective experience of the immersive installation. Even though the observed components of awe and a few indications of cognitive shift support the potential of Virtual Reality as a transformative medium, many more iterations of the design and research tools are required before we can achieve and fully explore a profound awe-inspiring transformative experience mediated through immersive technologies.

Keywords: virtual reality, overview effect, awe, transformative experiences, interconnectedness, cognitive shift, positive technology, experience design

# 1. INTRODUCTION

The overwhelmingly beautiful sight of our Earth triggers a profound emotional response in most astronauts, leading to a cognitive shift, making them realize the global interconnectedness of all life and feel responsibility for the future of our planet. This phenomenon was described by White (2014) and termed the Overview Effect. This experience has the attributes of self-transcendence and awe (Yaden et al., 2016) and is a remarkable example of a transformative experience. Besides the Overview Effect, there are other experiences that have similar effect of evolving an individual as a changed person and promoting the feeling of unity or interconnectedness. For instance, such experiences happen in the context of interaction with nature (Williams and Harvey, 2001; McDonald et al., 2009; Tsaur et al., 2013) or in religious or spiritual context (Keltner and Haidt, 2003; Levin and Steele, 2005), as well as mystical experiences, meditation, peak and flow experiences during high task performance and several other contexts (Yaden et al., 2017). The emotion of awe is often at the core of these experiences (Yaden et al., 2017; Chirico and Yaden, 2018). Even though the terms "transformative," "transcedent," and "awe-inspiring" experiences are not interchangeable, there is a large overlap between the phenomena they are describing. For the purpose of the project described in this paper, as we were aiming for the experience that is laying anywhere within the cluster of these phenomena, we will be discussing them together, without drawing a careful distinction between the terms.

Besides being an enjoyable experience (Shiota et al., 2011), such phenomena can have short and long-term positive outcomes: leading to increased well-being (Ihle et al., 2006; Suedfeld et al., 2012; Krause and Hayward, 2015), pro-social (Piff et al., 2015; Prade and Saroglou, 2016; Yang et al., 2016; Stellar et al., 2017, 2018), and pro-environmental (White, 2014; Garan, 2015) attitudes, and even improved physical health (Stellar et al., 2015). The feeling of interconnectedness can lead to the development of social consciousness, which in turn would lead to pro-social behavior (Schlitz et al., 2010). However, despite all the benefits of transformative and awe-inspiring experiences, they remain rare, inaccessible to some people (e.g., due to physical or economic reasons) and could be challenging to achieve at will. Developing tools that could allow us to create environments that could reliably invite such experiences to happen would greatly benefit the world on both individual and societal levels. If we can facilitate the invitation of transformative experiences even only half of the time, that already would make such experiences much more accessible, and the tool allowing us to do that, arguably, would be able to claim itself as a transformative medium.

Virtual Reality (VR) technology with its controllability and ability to afford sense of presence could provide us with a unique medium to design for and study awe-inspiring experiences (Chirico et al., 2016), making them more accessible to the public and researchers (Stepanova et al., 2018). The potential of immersive technology to create applications for positive change has been widely explored in different contexts, see reviews in Kitson et al. (2018a) and Riva et al. (2016). Researchers explored the potential of VR to induce awe in controlled lab conditions through using immersive videos (Chirico et al., 2017) and virtual environments (Chirico et al., 2018a), and were successfully able to elicit a self-reported awe response in some of their participants. Quesnel and Riecke (2018) and Gallagher et al. (2015) have also used virtual experiences of a spaceflight and evaluated its potential for inducing awe. Even though none of these studies observed a transformative experience of a similar scale to the Overview Effect in their participants, they still showed promising results indicating that VR, as a medium, could successfully deliver experiences that can trigger profound emotional responses such as awe.

However, there is still little research on awe, as well as the Overview Effect and other transformative experiences, that could inspire the design of a transformative experience in VR. Moreover, a larger body of knowledge needs to be build about the specific potential and affordances provided by VR for the design of profound experiences, as well as an understanding of what would someone's experience of going through such installation be like. As VR technology and affective design are both relatively new fields, it is important to not only bring in the understanding of how profound transformative experiences happen outside of VR as a guidance for the design of the immersive experiences and assessment of their effectiveness, but to also develop rich body of knowledge of how such immersive installations are experienced by different individuals. This study attempts to contribute to this developing body of knowledge by describing and analyzing personal experiences of individuals going through an immersive VR installation designed with a goal of awe elicitation and invitation of a transformative experience. This understanding will be essential for future assessment of VR technology as a more ecologically-valid approach to conducting controlled lab studies of complex phenomena and for informing design strategies, affordances and limitations for the development of profound positive immersive experiences with transformative potential. VR technology can not only allow us to "replicate" in a virtual world experiences that are poorly accessible in real world, such as a spaceflight, but this medium also presents its own unique opportunities for creating spaces and journeys that can invite a transformative experience. For instance, technology in itself, with the vastness of the data it can connect you to, can elicit awe (Bai et al., 2017). Thus, it is reasonable to explore the virtual transformative experiences as its own sub-cluster of transformative phenomena with its own unique attributes and processes, but similar desired benefits such as an increased feeling of interconnectedness, and the benefits for well-being and prosocial and pro-environmental attitudes that could follow from it.

In order to build this knowledge base about the transformative potential of VR and the phenomenology of individual's experience in a VR installation, we need to utilize our knowledge of profound transformative experiences to motivate the design of VR installations and then study the experience it induces as its own phenomenon. Using qualitative research methods allows us to develop an understanding of how personal experience is unfolding and what the important aspects of it are. Then, we can relate that understanding to the attributes of the design and the desired outcome. Comparing the experience elicited by different VR installations would provide deeper insights in how different design elements, as well as the setting and participant's background might correlate with particular aspects of the elicited experience. Additionally, relating the personal experiences of participants to the design decisions will help developers of transformative VR experiences validate their design hypotheses and intuitions, as well as propose new direction for investigation.

To achieve that, for this exploratory study we designed an immersive VR installation "AWE"—Awe-inspiring Wellness Environment (description of the development including the design hypotheses can be found in Quesnel et al., 2018b)—that was inspired by the Overview Effect and other awe-inspiring experiences in nature. This installation is not an attempt of a virtual replication of an astronaut's experience, but rather an artistic creation aiming at eliciting an experience that will have some similar outcomes to the Overview Effect. The Overview Effect is described as a cognitive shift that includes an experience of awe and feeling of connectedness to the world, the people and nature (White, 2014; Yaden et al., 2016; Stepanova et al., 2018, 2019), so these were the qualities of the experience that we were hoping to observe in the immersants going through AWE. At the same time, giving the complexity of the experiences of awe, self-transcendence, connection and the Overview Effect, and the complexity of the conditions in which they may occur, at this stage we couldn't directly test for an effect of singular aspect of the design of the virtual experience on likelihood of the desired experience occurring. It doesn't seem to be possible to isolate a singular aspect of the experience that might be responsible for the desired experience in the immersants. Thus, in order to form testable hypotheses about the relationship of the design and user experience, we first need to develop a VR experience capable of eliciting the feelings of awe, connectedness and cognitive shifts, related to the Overview Effect; and then build a rich knowledge of the phenomenological experience of that VR experience, from which new hypotheses can be derived.

In this exploratory study we discuss the aspects of the experience that the participants of "AWE" have described and relate their accounts to the research on the Overview Effect and awe-inspiring experiences. This study has two distinct goals: (1) evaluate the potential of the current research prototype, "AWE," for eliciting some of its desired effects that have been associated with the Overview Effect; (2) develop a better understanding of what are the important components of an individual's experience of going through an affective VR installation designed for awe elicitation, and how it can inform future system development and hypothesis formation. To develop a better understanding of the different components of the experience of a person going through an affective VR installation like "AWE" we performed in-depth qualitative interviews with participants about their experience. To evaluate the potential of our "AWE" experience to elicit awe and ideally lead to a cognitive shift and increased interconnectedness, besides comparing the thematic analyses of interviews to existing qualitative research on awe and Overview Effect, we also implemented two quantitative measures that could be used for assessing components of the Overview Effect: occurrences of awe measured through goosebumps extending work of Quesnel and Riecke (2017) and Benedek and Kaernbach (2011) and connectedness to nature measured through an Implicit Association Test (IAT) used in Schultz et al. (2004).

As this is an exploratory and largely qualitative study, we were not testing any formal scientific hypothesis. However, in the process of designing the "AWE" installation, several design hypotheses were made as a part of the creation process. Some of these design hypotheses are discussed in our paper describing the development of "AWE" (Quesnel et al., 2018b). Even though these hypotheses are not directly tested in this study, they might have formed some expectations that we had prior to collecting and analyzing the data, that were informed by these hypotheses. Additionally, in a separate publication, we have also proposed design guidelines for a virtual Overview Effect experience based on astronauts' recollections of it and available research—Stepanova et al. (2019). Those proposed guidelines have both informed the design of the "AWE" and might have formed our expectations for the current study. To minimize our bias in the analyses, we used phenomenological method that attempts to suspend the researchers' expectations through the process of epoché (a.k.a. "bracketing") (Smith and Osborn, 2004). After the analyses and reporting results, we turn back to our expectations formed prior to the study and discuss the relation of the results of this study to the guidelines discussed in Stepanova et al. (2019) in the section 4 of this paper.

This paper makes a contribution to several fields: to the field of the VR experience design (esp. VR4Good—Virtual Reality for positive change) by identifying the aspects of an affective experience of being in VR that can be supported with thoughtful design of VR installation; to the field of transformative experience design by describing possibility for inducing cognitive shifts in VR and how they might occur; to the field of psychology describing possible methodological approach for investigating awe, the feeling of connectedness and transformative experiences, that might be difficult to access, like the Overview Effect.

# 2. MATERIALS AND METHODS

# 2.1. Immersive Experience and Physical Set-Up

Participants were invited into the study room where there was a separate "tent" section for the virtual experience and the preparation area with a table and a laptop, where participants were signing the consent form and doing the IAT. The "tent" was set up with a 305 × 305 × 211 cm gazebo, that was diagonally separated with black curtains into the VR and the researcher (from where the equipment was operated) areas. Inside the "tent" there was an office chair covered with a blanket (to suggest the atmosphere of comfort) and some pillows on the floor (to match the virtual environment (VE)); the outside of the "tent" was decorated with fairy lights, that resemble starry night sky when viewed from inside, which corresponds to the first stage of the VE (**Figure 1**). We set up the virtual experience inside the physical tent for two main reasons. Firstly, to create an explicit entry into the experience space, that would separate it from the formal study procedures

90 Hz refresh rate at 110◦ diagonal field of view) and noise-canceling headphones on his head, and a goosebump camera on his right hand. Written informed consent for the publication of this image was obtained from the person depicted.

space. As such, the stepping into the tent was serving as a small ritual, that is proposed as a design guideline for transcendent VR experiences (Kitson et al., 2018b). Secondly, the tent was creating a semi-private environment where participants knew that they were not being directly observed and can be more immersed and expressive. We believed that these two conditions might be important for inviting the opportunity of a transformative experience.

The navigation interface used for locomotion was adapted from Swivel Chair (Nguyen-Vo, 2018), which uses the rotation and leaning of one's body for locomotion through a virtual space. Participants were sitting on an office chair and controlling their simulated self-motion by leaning in the direction they want to go, with the amount of leaning determining the translation velocity in the direction they were leaning. To rotate, participant turn around on the chair that can spin 360◦ . The interface was calibrated for the individual's height.

The immersive experience "AWE" (Quesnel et al., 2018b) consisted of three environments: forest, lake and space (see **Figure 2** and a video of the latest prototype http://ispace.iat.sfu. ca/project/awe/).

The three stages of VE allowed for different amounts of active locomotion:


# 2.2. Participants

As the main contribution of this exploratory study relies on the phenomenological analyses of the interviews, we were aiming for the recommended sample size between 5 and 25 participants (Creswell, 1998). We used purporsive sampling method commonly used in exploratory qualitative research in order to obtain rich descriptions from knowledgeable participants (Palys, 2008). A total of 15 participants were recruited through a purposive sampling method with the help of our partner organization—NGX Interactive, a local company that creates interactive exhibits for culture industry. Participants were recruited within the company's employees and clients and are representing the community of professionals working in the field of culture industry and technology. We specifically recruited participants who will be able to provide us with wellinformed feedback on the system and its potential to be used in culture industry for facilitating shifts in worldviews, but they were naive in terms of the specific details of this study. Additionally, even though the experience with VR technology varied between participants, they had ample experience with interactive technologies, and therefore would be able to go beyond the initial "wow" response, that first time users of VR sometimes have. We will be referring to participants as P#. Two participants (P07,P15) were excluded from the analyses as they did not finish the experience due to cybersickness, resulting in a final sample of 13 (7 females). The ethics approval was granted by Simon Fraser University Office of Research Ethics (Study#: 2017s0269).

Throughout the iterative development of the AWE experience we conducted a multitude of smaller formative user tests with a range of participant populations to inform the design of the AWE experience. While they generally confirm the results of the current study, reporting them in any detail goes beyond the scope of the current study and would not substantially alter the findings.

# 2.3. Procedure

After signing the written informed consent form, participants were asked to enter the tent and sit down on the swivel chair. The researcher explained the set-up procedure and the navigation, handed the Head-Mounted Display (HMD, HTC Vive) and the noise-canceling headphones to the participant and assisted with putting the equipment on. Participants were instructed in case of a mild cybersickness to close their eyes for a moment, and, if the feeling persists or is strong, to notify the researcher and they would stop the experience. Next, the researcher asked

the participant to roll up their sleeve and put the goosebump camera (explained in the following section) on their arm. Once confirmed that the participant feels comfortable, the second researcher starts the virtual experience, and the first researcher directs the participant through the initial calibration process for the navigation, while second researcher starts the recording of the goosebump camera. Then, the first researcher notifies the participant that everything is now in order and leaves the tent leaving the participant in privacy for the experience. After the virtual experience, the first researcher returns to the tent to assist the participant with taking off the equipment and sets up for the interview. After the interview, the participant is directed out of the tent to complete the Implicit Association Test (IAT) on a laptop (13-inch MacBook Pro). The participant's experience in the VE was recorded through screen capture and the interviews were recorded with a GoPro camera. The study took approximately 1 h.

# 2.4. Evaluation Methods

We have used a combination of qualitative and quantitative measures to help us address two goals: (1) understand the participant's phenomenological experience and (2) to assess the potential of the AWE experience to create conditions in which an awe-inspiring experience similar to the overview effect (or a degree of) may occur. As the overview effect is described as a cognitive shift that starts with an experience of awe and leads to the increased feeling of connection and responsibility for Earth (White, 2014; Yaden et al., 2016; Stepanova et al., 2018, 2019), we included measures of awe and connection with nature. We didn't include specific measures of the responsibility for Earth at this stage, as first we needed to establish that earlier stages of the desired transformative experience can be achieved.

We used interviews to collect qualitative data about the participants' phenomenological experience of going through the VR installation. Additionally, we included two quantitative measures to assess two components of the Overview Effect experience: an implicit association test to assess the interconnectedness, and a measure of piloerection (goose bumps) to assess the occurrences of awe. These two quantitative measures were included as a methodological exploration in preparation for future studies, that will use a randomized controlled experimental design, less in-depth qualitative measures and a larger sample size. Here, we hypothesized that we will observe a trend indicative of correlation between the measure of awe and the measure of connectedness (higher scores on the implicit association test will co-occur with higher number of instances of piloerection), as in the Overview Effect they are described to occur together.

### 2.4.1. Interviews

We collected the qualitative data through either cued-recall debrief (Bentley et al., 2005) or micro-phenomenological interviews (Petitmengin et al., 2009). Both of these methods are designed to help participants get re-immersed in the past experience and therefor to have more direct access to different aspects of the experience reducing recall errors that could be introduced with the use of retrospective measures (Henry et al., 1994). To further minimize the recall errors caused by the delay between the experience and the interview, each interview was administered immediately after the virtual experience. We implemented both methods in order to assess how they fit into the context of research of affective VR experiences and evaluate what type of data they will be most effective at yielding. To keep the study under an hour to avoid participant's fatigue, we used only one type of interview with each participant: four participants (P02, P03, P04, P09) were interviewed with microphenomenological and nine with cued-recall debrief methods. Each interview was followed by a short set of general questions about the experience. The type of the interview administered depended on the timeslot (determined by the availability of the trained micro-phenomenological interviewer). When signing up for the study, participants were not informed about the relationship between the timeslots and interview methods. Each interview took about 20–30 min.

### **2.4.1.1. Cued-recall debrief**

After the virtual experience, the researcher would help the participant to take off the equipment, while the second researcher would turn around the monitor and load the recording of participant's experience on the screen and set-up the video camera. During cued-recall debrief (Bentley et al., 2005) the participant watched the screen capture of the experience together with the researcher and talked through what was happening at any particular moment of the experience. The researcher may prompt the participant with questions to direct their attention to different aspects of their experience, for example: "What were you doing here?," "Did you have any thoughts when you looked up?" or "What did it feel like when you went in?"; or to direct their attention to a specific behavior observed in the recording: "You seem to be looking around a little more here, was there something that caught your eye?"

#### **2.4.1.2. Micro-phenomenology**

Unlike cued-recall, micro-phenomenological interview (Petitmengin et al., 2009) did not use visual prompts to assist the participant with re-immersion, and was administered by an interviewer trained in the method. The interview started with a short practice interview not related to the virtual experience (discussing a moment from the recent weekend) to give an opportunity for the participant to get familiarized with the method and what is expected from them. Then the interviewer asked the participant to identify one or a few moments in their experience that stood out to them and invited them to focus on each moment at a time. The interviewer than lead the participant through the process of the re-evocation of that moment directing their attention to different sensory and temporary dimensions of their experience.

### 2.4.2. Implicit Attitudes

We used the same Implicit Association Test (IAT) for assessing one's connection to Nature as in Schultz et al. (2004). This measure is used to measure interconnectedness—the component of the Overview Effect. This test asks participants to categorize words in one of the two categories by pressing "E" or "I" key on a computer with left and right index finger, respectively. In the test trials the categories are appearing together creating either a congruent or non-congruent pair (**Figure 3**). The

FIGURE 3 | The Implicit Association Test (IAT) screen with congruent categories pairing and inaccurate response.

FIGURE 4 | Custom made set-up of a wearable camera for recording a video of participant's skin for identifying goosebumps and shivers.

results are based on response reaction time and accuracy for congruent and non-congruent category pairs. The categories were Self vs. Other and Nature vs. Build with 7 blocks of trials.

### 2.4.3. Piloerection: Goosebumps and Shivers

Piloerection observed in a form of goosebump or shivers can be used as a physiological marker of awe (Benedek and Kaernbach, 2011; Quesnel and Riecke, 2017). A "goosebump camera" (see **Figure 4**) was placed on participant's arm to record a video of their skin during the experience. The researcher helped participant to put on the camera and adjusted the focal distance from the camera to the skin for the best clarity of image. Video recording from the camera was manually synchronized with the screen recording of participant's experience for future alignment.

# 2.5. Analyses

### 2.5.1. Interview Thematic Analyses

The interviews were transcribed and analyzed in NVivo. Even though some of the data was collected with microphenomenological interviews, we didn't perform a microphenomenological analyses for this study, but analyzed all of the interviews through the same phenomenolgical method. First, two researchers independently went through the transcripts, identified meaning units and combined them into higher level themes. The two researchers then compared and discussed the themes, they have identified, to agree upon one set of themes. Then the researcher went back to NVivo and proceeded with coding. To minimize the researcher's bias in interpreting the data we used "bracketing" and a bottom-up coding approach similar to interpretive phenomenology analyses (Smith and Osborn, 2004) and looked for themes that naturally emerge from the data instead of coding for the specific themes of interest. We present the summary of the distribution of all themes, however, in the interest of space, we will only report in detail on the most prominent and relevant themes.

## 2.5.2. Implicit Association Test

We calculated IAT effect D scores of strength of association based on a standard algorithm for IAT (Wittenbrink and Schwarz, 2007). D scores have a possible range of -2 to +2. According to standard conventions we identified the strength of connection in accordance with the following break points: "slight" - (0.15 ≤ |D| < 0.35), "moderate" - (0.35 ≤ |D| < 0.65); and "strong" - (0.65 ≤ |D|).

### 2.5.3. Goosebumps and Shivers

The video recordings from goosebumps camera were independently manually coded by two researchers to identify moments of goosebumps or shivers. Moments of goosebumps are visually evident from hairs erecting, with the appearance of raised bumps on the skin. Shivers have less prominent raised bumps, but they are evident from micro-movements of muscles under the skin that visually look like a wave lifting the hairs up slightly.

# 3. RESULTS AND DISCUSSION

The first two section of the results report on quantitative data, and the following discuss the interview data. First, we present the interview data based on the thematic analyses. After, we present the analyses of categories of emotions related to awe based on a hermeneutical analyses reported in Gallagher et al. (2015) and compare it to the results observed in Quesnel and Riecke (2018), that used Google Earth VR.

# 3.1. Implicit Association Test

Mean D score across all participants was 0.46 (SD = 0.54), which indicates a moderate strength of positive connection between Self and Nature. Nine participants had a moderate to strong positive connection (M = 0.78, SD = 0.23), two participants had slight or moderate negative connection (M = −0.39, SD = 0.25), and two participants had neutral scores (M = −0.11, SD = 0.0015).

To give context to our observed results, we compared our results to to D-scores obtained on the same IAT test by Schultz and Tabanico (2007), who observed an average 0.40 score between 60 undergraduate psychology students and 0.45 between 121 park visitors in California, we can speculate that possibly the effect of our virtual experience is similar to the effect of walking in the park in terms of one's implicit connection with nature. However, the sample sizes and the context in which the measures were conducted were widely different, and therefor a strong comparison is not possible.

# 3.2. Shivers

In this study we observed one moment of shivers in one participant, when the participant was observing the sun revealing behind the dark Earth. The **Figure 5** illustrates the moment when the shivers occurred.

# 3.2.1. Thematic Interview Analyses

**Table 1** summarizes all the themes observed and coded in the data. We are setting the usability and design related comments aside, as they are outside of the scope of this paper and will be reported separately. We are reporting on the most prominent and relevant themes to this paper, specifically: emotions and feelings, body-centric sensations and embodiment, familiarity and novelty (role of the personal background) and cognitive mini-shifts. These themes are highlighted in the **Table 1** and their frequencies are summarized in **Figure 6**.

# 3.2.2. Emotions and Feelings

## **3.2.2.1. Curiosity and wonder**

After "cool," "interesting," and "pretty," "curiosity" was the most frequent affect related word used by participants. Curiosity and wonder were positive emotions driving participants' exploration behavior: "Another sense of delight: Oh it's a lake! Not knowing what's gonna happen. Do I just look at the lake? But when I break through the lake its quite a sense of wonder: oh, that's quite lovely!" (P08). The properties of the environment, specifically some level of mysteriousness or the "unknown-ness" of it, were inspiring TABLE 1 | Comprehensive summary of themes coded in the interview data, with the prominent themes reported in this chapter bolded.


the curiosity: "I was just curious about the environment. The environment felt deep. It reminded me the Truman show, where you have the bubble that you can explore." (P06), but at same time inducing some level of fear: "It's really a lot of curiosity and I guess nervousness." (P11).

The novelty and new perspectives were also contributing to curiosity: "I am enjoying the curiosity. I guess I was more interested in looking at the Earth, from this vantage point. I enjoyed looking at the space in reference to the Earth" (P05).

### **3.2.2.2. Safety and fear**

Most of participants (N = 8) were distinguishing two states in relation to the environment: comfortable and safe vs. uncomfortable and scary.

3.2.2.2.1. Safety. The majority (N = 11) considered the first environment, the forest, and especially the tent to be safe and comforting: "the whole set up of the tent, and what I saw here... as a tent was really, like, I felt safe. I felt the tent provided a safe starting spot for me to start to going into the outside world." (P01). When aiming to achieve a transformative experience in VR, we believed that it was important to have a safe starting point, to help participants trust the system to take them on a potentially emotional journey and help them be more open to this experience. If the medium is not allowing participants to feel comfortable within it, they will likely be more resistant and closed-off from the experience. The physical and the virtual tent appeared to successfully serve that function for most participants. It was also important to conclude the experience with a safe environment. Here participant describes the last transition and coming back into the tent: "this again is much more familiar, I do this every day kind of thing. It was comforting. Probably in a weird way one of the most comforting parts" (P05). And since participants already developed some connection and familiarity with that environment, it was even more likely to elicit a sense of comfort: "Cozy. I felt like I was home, even though it's a temporary home. Daylight, so it's more comforting" (P06).

3.2.2.2.2. Fear. Fear, was probably one of the strongest and most interesting emotional reactions observed. Participants reported being a little "scared," "nervous," "uncomfortable," or "anxious," which was usually associated with the jump into or descend in the water, or, in a few cases, with walking through the dark forest. Both, the act of jumping of a height and the descend into the deep water was uncomfortable for some participants: "Then I looked down and I see everything is dark, so for me it was .. I don't know how to explain.. it was just uncomfortable a little bit.. somewhere you are in the water and everything is dark and you are going down" (P09). This was also the transition into the lake where the locomotion was more restricted than in the forest, that increased the level of fear:"I know that if I jump into the lake I can get out as fast as I can, and it's up to me, but I felt like jumping in with the weights attached to your ankle—I am not in control of this situation and it doesn't make me feel comfortable. I am being lead. I don't want to be lead" (P06). This also relates to the role of the sense of agency in the environment, the loss of which was often undermining participant's enjoyment.

There were many strong bodily reactions to the jump and descend into the lake in the VE, that was surprising and in some way profound for the participants: "I felt a shock. It felt like I was choked. That surprised me. It was not just like "Oh that was kind of weird," I did feel like someone poked me or something. I felt an actual zap to myself, a tension, that I wasn't expecting." (P05)

The strategies participants used to cope with this fear were: (1) dissociate from the experience and bring yourself to the analytical level: "Mentally overwrote back that this is just the experience." (P06), (2) find a comforting point of reference: "There is fish, which is a comforting reference point in this black void. Trying to follow the light." (P05), and (3) just wait for it to pass: "I noticed myself clutching my hands. I am not comfortable, I am just going to wait it out until it goes away" (P06).

## **3.2.2.3. Other affects**

A distribution of positive and negative valence affects were observed. Negative affects were coming through two main sources: (1) usability issues were causing frustration and inability to explore something of interest was causing disappointment and (2) some parts of the environment were causing nervousness, anxiety or fear, discussed in the sections below. Positive affects could be categorized into the following groups: excitement, inner peace and appreciation of beauty.

3.2.2.3.1. Excitement. Participants were describing their experience as "fun," "exciting," "wow." These affects were often related to the visual and audio attributes of the environment: "The sun was really exciting, because it is bright. There is music attached to it obviously, other than just my vision, it was also creating that kind of excitement. Bright and exciting" (P04); or to an interest and anticipation: "When I first looked around I was kind of hoping I would get to go in there, an when I saw that you can, there was a bit of excitement that I can go and explore the forest around. During that time I was actually looking around a lot. It was kind of immersive, it was fun" (P03).

Another aspect of the experience that seemed to elicit excitement was the vertical dimension, which is opening a novel perspective. Often, when looking up: "I kept looking up and thinking how far down am I. It was pretty neat, it was cool" (P13) or down: "So I didn't look down that much, but when I did, it was kind of fun and kind of scarier than looking elsewhere" (P04) participants would describe themselves being more engaged and excited. While the lack of vertical dimension of gaze direction they considered to be the evidence of low engagement: "I wasn't inclined to look up and down, I was looking more left and right, more like if you are in museum or something and you're kinda looking around" (P03).

3.2.2.3.2. Inner peace. Participants reported feeling relaxed and peaceful. The soundtrack appeared to significantly contribute to it: "It was very peaceful and soundtrack was nice and reminded me of nature and being in the forest" (P08), which was also helping with coping with anxiety from jumping into the lake: "The sound was calming, just seeing fish and seeing the opening above me made me feel a little more relaxed" (P09).

3.2.2.3.3. Appreciation of beauty. Participants described the beauty of the elements of the experience and how it made them feel delighted or appreciative. Both, the mystical and novel environments like the nebula: "There is something about it that I can't define. Because I know these are asteroids and that's probably a planet of some sort but then the fog is like 'Awww.'" (P01) and familiar natural beauty of the forest: "I like lakes, particularly because I can see the mountains and the sky behind it, so I wanted to look closer <. . .> I liked it, I can just sit there and look" (P06), as well as the beauty of the image of our planet: "It's just visually really striking. And again, familiar because you've seen images like that. And, the contrast between the dark and the light is really nice." (P12)—were all eliciting moments of appreciation and delight in participants.

## 3.2.3. Familiarity and Novelty

# **3.2.3.1. Relation to emotions**

The feeling of safety or fear as well as curiosity and wonder seem to often be related to the feelings of familiarity and novelty. The first environment of a campsite in a forest was familiar to most participants, and associated with positive emotions, which let them feel comfortable going into the environment. "It's a very familiar place. It's a tent, and there's a bonfire. There might be other people there. I chose to come here. I chose to be here and setup a tent and sleep in a tent" (P01). Moreover, throughout the virtual experience, participants will form new connections with elements of the environment and use them to bring themselves back to the state of comfort in the parts that felt scary to them: ". . . for my one comfort: 'here is the light, follow the light, here are some fish, I am being sort of acclimatized here'—that time helped" (P05)

While usually familiar environments were providing a sense of comfort, for other participants, they appeared less engaging. Contrary, novel environments were stimulating curiosity, wonder and excitement. Here a participant is at the end of the lake scene: "It felt like 'oh cool!'—Its not something you would normally be able to see, where is in the previous environment—I have gone camping before, so I get it. But here I am thinking this is cool, its really creative, really beautiful to see the stars through the water"(P08). For some participants it was easier to accept and get immersed in more novel environments, they wouldn't have had a concept for, while having a compelling familiar environment seemed more challenging:

It is neat to explore a perspective on the world that you would have none of <. . .> Where is when anything that is too familiar, because I am so in-tune with how I walk and how that feels, so you have that disconnect <. . .> Where in space—I have no context for that. So okay, this is how I would float in space, fair enough, I have no other way of knowing it. (P02)

## **3.2.3.2. Anchoring**

The act of cognitive anchoring to a familiar place was quite prominent, and it was not only used as a coping mechanism against anxiety and discomfort provoking environments, but also to orient oneself: "I saw the sun and recognized it, and quickly after that I saw the Earth, so there was a relation there—I knew where I was for the first time in the experience. Not that I haven't been in a tent before, that was quite familiar. But there I for sure knew where I was." (P04) and to connect with the environment in a more meaningful way: "This is kinda of an interesting angle of North America and South America. I have a colleague, who is working in Columbia right now, so I am trying . . . I am putting real people I know" (P05).

## **3.2.3.3. Importance of individual variables and background**

We were surprised to observe polarly different responses from our participants within such a fairly simple experience, with a fairly consisted journey. Each of the stages and transitions in the experience has produced opposing responses from love to hate and from relaxation and peacefulness to excitement or fear. This distribution of reactions has stressed the significance of participant's individual background.

The lake environment was the most striking example of opposing experiences participants were having and its relation to their background. One participant describes her delight in that stage: "I just love the water, and so going into the water was quite delightful. Happiness, familiarity, for me not too calm, but connectedness to nature in that way" (P08). While another participant had a very different reaction to the same environment: "A little worried. I don't like deep water. A little anxious. Okay, we got to go over to the lake, I hope we stay above it" (P06). Transition into the lake as well, which was reported to be one of the most memorable moments by most participants, elicited opposing reaction depending on personal background: an uncomfortable anticipation and anxiety by one participant: "coming down the little ledge to go in the water.. that was kind of .. I was a little bit hesitant before, because I don't normally like jumping into the water from height. Or jumping from height in general. That feeling scares me a little bit" (P09), while another participant had a positive anticipation and excitement coming up to that transitions: "I realized that okay, I am going down to the water, so perfect. This is great. <. . .> I was a little stoked, cause thats the direction where I wanted to go <. . .> I was a little bit timed here: Am I supposed to jump in here? <. . .> then I went for it" (P11), this participant later mentioned being a cliff-jumper.

Another important influence on the experience was coming from the video-games experience, that participants had, that was both helping them with navigation: "I have a little bit of a gaming background so I am sort of very comfortable with this firstperson movement through virtual space" (P13), and setting up an expectation to have a goal: " it reminded me of old video games where there is like a mission or something, I wouldn't necessarily do that mission and I would end up going off somewhere else" (P10).

# 3.2.4. Body-Centric Sensations and Embodiment **3.2.4.1. Jump into the water**

As discussed in the section on safety and fear, the transition into the water environment, that was inviting participants to jump into the lake, was inducing strong reactions in participants' bodies. They were describing clutching their hands, tensing up their muscles and holding their breath: "all your muscles constrict, or contract, so it's almost like you are trying to hold yourself tight, so when you get that cold, you can release it once you hit the water" (P02). This tension was often followed by a release and relaxation, when "hitting the water": "the body just kind of tense up, and you just kind of . . . just kind of muscles release . . . As soon as I got in the water" (P09).

# **3.2.4.2. Weightlessness**

Interestingly, that feeling of release might have facilitated the feeling of floating or weightlessness. Here a participant describes the moment when that release happened:

That's weird, because, on the ground, up to that transition, I am super conscious of how I am sitting on a chair, and that kind of leaning forward is feeling a little awkward. . . But in that second I didn't feel the. . . And that's what I kind of loved too, is how, I had no idea you could reproduce that, give that sense that you are weightless, suddenly I wasn't conscious of my body pressing into the Earth. (P02)

For a different participant a similar moment of release leading to the sense of weightlessness happened in the transition into the space: "When I was in the water I felt like I was not in control and I was weighted down, like if I had weights around my ankles, where is when I was transitioning into the night sky it felt like the opposite: the weights are off the ankles, you are weightless" (P06). This participant was afraid of the water environment, and even though that transition into space produced less internal bodily responses for most participants than the transition into water, the psychological release of letting go of the fear still lead this participant to experience the illusion of weightlessness.

It was interesting to observe that 6 participants have mentioned floating or the feeling of weightlessness. It might not have been a strong bodily feeling for everyone, but it is encouraging to see that even with a simple hands-free leaningbased interface through a design of the storyline and the visuals, we were able to elicit some level of the feeling of weightlessness without submersing participants in a flotation tank [which would be a more literal induction of the feeling of weightlessness, for instance, planned by SpaceVR for 2018 Burning Man festival (Bonasio, 2018)].

# **3.2.4.3. Connect and disconnect between mind and body**

Imaginative immersion in combination with sensory immersion (Ermi and Mäyrä, 2005) when achieved successfully creates a condition in which participants experience a disconnect between their mind and body. Participants discuss these moments of disconnect, and having their perceptions overridden by their imagination as the optimal moments of their experience: "It was a bit more of the imagination and just like the feeling of being in warm water and submerging and yet not worrying about the panic of not being able to breath, and just something about that, that I quite liked. And maybe it's because I didn't feel this [points at different parts of his body], right?" (P02). While the moments, in which the conflict between the physical body position and the virtual position became apparent, lead to frustration and disappointment: "You start unpacking, okay, so you have this goggles, the audio here, and my arms and legs just feel static and crossed, how does that connect? Because that feels weird, when you come back to your body and then realize that it is a stagnate lump going through this [points at where HMD would have been]" (P02). It would be interesting to investigate how this connect/disconnect transitions are being triggered. In case of this participant, he had this desired disconnect during the lake stage that was initiated by a visceral jump into the lake and then "something broke the spell" (P02) when transition into the space happened. For him, the transition into the space came as a surprise and did not make sense. For a different participant, the conflict was the result of not having an avatar representation in the VE: "I felt a bit disconnected from my body, because when I look down I don't see my body, and usually its there, obviously" (P04).

# **3.2.4.4. Reflexes and vection**

Vection (an illusion of self-motion) and reflexes are often perceived as an indicator of how immersive and "believable" the experience was by participants.

For example, a participant describes descending down in the lake: "I see the sparkles, <. . .> I realized that they are kind of like surrounding me, that's when I really got the sense of the descent down. The closest I can compare it to is when you are going down a roller coaster, but it wasn't that intense, it was more calm kind of feeling" (P03) and then going into space: "As soon as the movement started, it kind of again felt a bit more immersive, the floating feeling came back again" (P03). The lack of self-motion illusion for some participants in space combined with restricted locomotion might have also contributed to some of them feeling as if they are watching a movie instead of participating.

Sometimes, participants would also report having a reflex in reaction to an event in the VE: for example, when the sun appeared, a participant was surprised and reported: "I am pretty sure I jumped." (P05) while another participant mentioned: "I found the sun pretty bright, almost wanted to put my hand up. But yeah, this is neat." (P10). While putting the hand up to protect one's eyes wouldn't have worked with an HMD, a different participant adopted her reflexes from diving to the VR equipment: "because I'm a diver I felt like I'm descending, there was one point were I adjusted my face but it's a bit like adjusting your regulator." (P14). This type of behavior could potentially indicate how "real" the experience was for the participants at that moment.

This "realness" and "being there" of the experience, that is indicated by multidimensional responses, including your internal body feelings and actions, are likely an important precursor to the possibility of transformative experience that could lead to cognitive shifts. For instance "presence," which is often described as the feeling of "realness" or "being there" in a virtual experience was shown to correlate with a stronger effect of the virtual experience on the following real-world behavior (Fox et al., 2009; Rosenberg et al., 2013).

### 3.2.5. Cognitive Mini-Shifts

As the ultimate goal of this project is to evaluate if VR experiences can be designed to elicit positive cognitive shifts similar to the Overview Effect and other awe-inspiring transformative experiences, we were excited (and a little surprised) to see some indication of some minor cognitive shifts voluntarily described in the interviews. Participants themselves were also intrigued by the shift in perspective resulted from their experience, even when the shift was in the perception of seemingly simple concepts:

I kinda compared that sort of spatial environment that I was in with all of the representations of space that we get used to, which is a very 2D item, the solar system prospective. And that difference, that being in it, and that way how it altered my sense of that relational space of one celestial body to another, that was really cool actually how it changed something in my mind slightly. (P13)

# **3.2.5.1. Day and night**

Four participants found the concept of day and night happening at the same time on different sides of the globe, that was observable in the experience when traveling around the Earth, very interesting. Even though they are intellectually familiar with this idea, seeing it from the first person perspective was a somewhat "eye-opening" experience. Participant reflects on her mental process of coming to that realization:

To realize that it is so easy to look at something through one lens, but when, if you are exposed to it in a different way, then something that was so familiar to you ... can give you such a different perspective. Something as simple as that sun is not shinning on the other side of the half of the world, means its night time, and it's so simple. And I studied, moons, and tides and sunrises and sunsets, but never thought about it quite so simply: that sun is shining on one side but not the other side. (P08)

## **3.2.5.2. Vastness**

Vastness can be better described as part of the perceptual experience that could lead to a cognitive shift (rather than a shift in itself), but as it is considered to be the precursor for the experience of awe (Keltner and Haidt, 2003) and cognitive shift of perspective (Gaggioli, 2016), they are closely related. A participant, who works at an aquarium described:

I remember thinking that the Pacific ocean is so big and for a while I thought that I am not seeing things correctly. Which is funny, because I <. . .> know that its huge. But it was so vast! And to see it in that perspective was what was very unique for me. <. . .> It was impressive and gave me another perspective on something that I see and think about everyday. (P04)

This admiration of vastness is also often related to the realization of how small each individual human is on the scale of the whole world. Here a participant describes his thoughts when orbiting around Earth: "I was really hoping to see maybe that sparkle of the civilization, some kind of movement, some kind of glimmer, to denote my . . . what's the word . . . like the size of people, how small compare to where I am" (P03).

## **3.2.5.3. Interconnection**

Overview Effect and other transcendent and awe-inspiring experiences have all in common the cognitive shift leading to a realization of interconnectedness of life. In our data there were a number of instances that could indicate this realization of wholeness of the world: "transition from the bottom of the water into the space scape and that sort of the initial moment when you look at it holistically and you see . . . everything is involved in it" (P11). But the most striking was the observation of the participant when traveling around the Earth:

There has been so many natural disasters lately with the hurricanes, fires and all of that.. When you see at a global level, the connection between things that are otherwise separate because of the political things... When you see as a whole—its just like, well, its just one planet. When you go around and see that Brazil is so close to Florida, you know politically things are so far away... (P06)

This realization of interconnectedness can then lead to behavioral changes, where in case of the Overview Effect, astronauts feel the need for everyone to unite together to protect our planet and its inhabitants (White, 2014).

### **3.2.5.4. Intent of a behavioral change**

In our data there were two comments from one participant that could suggest an intent for a change in behavior, that could be triggered by the feeling of interconnectedness. Firstly, on a personal level, she was inspired to learn more about other people and countries she may not know enough about: "I don't know much about south America, so it was interesting to look at it when I can see all other distracting places I know more about. I thought I should learn more about it" (P06). This could be related to the aspect of perspective shift related to brining cultures together by developing an understanding of other cultures [similar to what astronauts describe (Gallagher et al., 2015)]. Secondly, on a more global level, she had the urge to communicate this view of interconnectedness to more people:

Just need for people to figure out the environmental sciences, because its effecting everybody, but these are the artificial lines that seemed to be so unhelpful. I was thinking from the educators perspective. What a disservice it is to see a map as flat: things look so much further apart than they actually are. And that need—if we are going to problem solve bigger things, how this flat political map is just not going to get us there. (P06)

# 3.3. Gallagher's Hermeneutic Analyses of Awe

Gallagher et al. (2015) undertook syntactical followed by hermeneutic analysis of astronauts' awe experiences based on 51 texts by 45 astronauts. From the analysis, Gallagher et al., generated 34 consensus categories of awe. They allow researchers to determine whether in experimental studies, participants have experience of awe and Overview Effect. Here (**Figure 7**), we count the frequency of statements made by our participants that fit into the awe consensus categories. The categories that were not observed in our data and not included in the graph are: sublime, poetic expression, peace (conceptual thought about), inspired, home (feeling of being at home), fulfillment, floating in void (not related to weightlessness), elation, disorientation.

We can compare the results of this study to the study by Quesnel and Riecke (2018), that had 16 participants traveling through Google Earth VR, whose interviews were coded with the same categories of awe based on Gallagher et al. (2015). **Figure 8** shows the comparison of the frequencies of participants coded with the awe categories between these two studies. The "AWE" experience was able to elicit more responses of totality, spatial perspective shifts, sensation of floating and inquisitiveness, while the Google Earth experience was better at eliciting feelings of sublime and elation. We can speculate that the sensation of floating and inquisitiveness were elicited as a result of the narrative arc of the "AWE" experience, that wasn't a part of the Google Earth experience used in Quesnel and Riecke (2018). Totality and the spatial perspective shifts observed in our data are likely related to the "AWE" experience presenting the Earth from a more distant perspective than Google Earth VR allows. While the lack of sublime and elation responses in our study could be explained by the difference of the quality of the Earth models that we had in "AWE" and in the Google Earth VR.

Gallagher et al. (2015) did not report on the number of participants coded with a certain theme, but rather the total frequencies of codes (within 19 interviews). However, since the lengths and types of interview procedures were different between the current and Gallagher et al. (2015) studies, we can not make a precise comparison based on these counts. Still, in their data the most frequent categories were perspective shift (moral,internal), contentment, interest/inquisitiveness, scale effect, and significant sensory experiences, which only partially intersects with our data, as these categories, even though present, were not as prominent in our data. The study design was fairly different between our studies: Gallagher et al. (2015) study used a spaceflight simulation, designed to be realistic, that was presented through the screens of cockpit/window as opposed to an HMD. As their study was a more literal simulation of a spaceflight than "AWE," it is possible that their participants were more inclined to think about what they know about astronauts' experiences, so it is possible that some of these thoughts were introduced externally based on associations rather than emerged from the properties of the experience.

# 4. DISCUSSION AND LIMITATIONS

# 4.1. Relating to the Overview Effect

Stepanova et al. (2019) analyzed existing records and research on the Overview Effect and derived design guidelines and evaluation methods for virtual experiences aiming to elicit the Overview Effect or an extent of it. Comparing the themes that emerged from our data and the guidelines outlined in Stepanova et al. (2019), we identify an intersection in the themes outlined in **Table 2**.

From the evaluation guidelines we were pleased to observe some mini-shifts reported by participants, that would indicate each one of the 2b-2e themes. Even though we only observed a few instances of each, it was still very encouraging, considering that cognitive shifts are not easy to achieve, and it was still an early prototype of "AWE." From the design guidelines, the most strong and interesting intersection was in the privacy, initial fear, weightlessness and personal connection components.

## 4.1.1. Privacy and Social Space

Even though participants were not using the term "private," from their discussion of felt safety and comfort we can speculate that "AWE" was able to achieve the goal set out by the "privacy" design guideline—creating a safe space for participants to feel comfortable to have a transformative

experience. The social space guideline was aiming to assist with the process of accommodation that is a necessary component of a transformative experience following a witnessing of an awe-inspiring vista. Even though only one participant explicitly discussed it, but he reflected on how going through the process of the interview was valuable to help him unpack his experience and understand it on a deeper level than if he was just asked a few questions. Hence, we believe that the interviews, especially the microphenomenological method, were able to provide the social space and the conversation that could facilitate the process of accommodation.

### 4.1.2. Initial Fear

The precursors for the Overview Effect are hard to separate from components of a spaceflight, but the initial moment of fear naturally experienced when being shot in a rocket into space, is, quite possibly, an important stage in the progression of the experience (White, 2014). However, few people have personal experiences associated with rockets, and as such, jumping into water is a more visceral experience for most and therefore, when part of VR, has a potential to induce stronger response, which we indeed observed. However, we were surprised by the strength, length and frequency of fear experiences, as we were only intending for the jump into the lake to be a moment inducing hesitation and requiring participants to take the leap of faith. The personal background of participants shaped their experience of descending through water to be more fearful than we anticipated during the design process.

# 4.1.3. Weightlessness

The connection of feeling of weightlessness and Overview Effect is also unknown as the records of them are inseparable: it might be essential or not relevant (White, 2014). As the sense of weightlessness on Earth is logistically challenging to achieve in combination with VR, we were not aiming to replicate it as a part of the experience. It was insightful to observe that several participants did have a feeling of floating or weightlessness, and informed us how the narrative of the experience can facilitate the induction of this sensation.

# 4.1.4. Personal Connection

In at least some astronaut's descriptions the feeling of connectedness starts small from the personal connection to a familiar location, and then extends from there to the rest of the world. It was interesting to see in our data how prominent the concept of familiarity was—10/13 participants were discussing it (with no targeted prompts from interviewers). Two participants also described how, when orbiting around the Earth, they were picking out familiar locations to establish connection to them, much like the astronauts describe. The virtual travel to a familiar place in Google Earth was also powerful at eliciting awe in the study by Quesnel and Riecke (2018).

The other three design guidelines (embodied experience and self-relevancy, vastness, suspending disbelief through aesthetics) were not as evident in our data. Even though there are some indications of self-relevancy, for a lot of participants it was significantly reduced as a result of restricted locomotion in

Quesnel and Riecke (2018).

the last parts of the experience. While perceived vastness was mentioned three times, this is a fairly low frequency for an experience aiming to elicit awe (Keltner and Haidt, 2003). Suspending disbelief through aesthetics was only partially successful, as a lot of participants were still expecting an accurate representation of the real world inside the VE and were thrown off by any observable conflicts. Despite the clearly magical creature, sprite, and the lake portal into space, some participant's sense of immersion was broken by seeing jellyfish in fresh water, some trees appearing too tropical for the local biosphere or the tent seeming too large for one person. Evidently having magical elements in the narrative wasn't enough for suspending participant's disbelief, especially when they were very familiar with a specific environment (e.g., the jellyfish comment was made by participant working at the aquarium). It might be important to set up the right expectations from before the VR experience starts by adding a narrative to why participants enter the tent for going into the VR experience to prepare them for the virtual story.

Overall, even though the "AWE" experience did not follow all of the guidelines outlined in Stepanova et al. (2019), it was able to achieve some indications of each one of the core components of the overview effect: awe, increased connectedness, increased responsibility for the environment. The latter being indicated only once by a participant discussing the need for everyone to unite together to develop a better understanding of the weather systems as it is effecting everyone. While awe is a complex emotion, it is hard to make definite claims as to how much awe did our participants experience: their interviews indicate a number of components of awe identified by Gallagher et al. (2015) specifically in the context of the Overview Effect. However, the physiological measure of piloerection (Benedek and Kaernbach, 2011) revealed only one instance of awe in this study, which is either the fault of the recording instrument or, more likely, the result of the lack of intensity of awe that, even though experienced to some degree, didn't trigger the physiological reaction.

Connectedness is also a difficult cognitive construct to objectively measure, that we attempted with IAT. IAT scores indicated a fairly strong connection between Self and Nature, however these results are challenging to interpret, as we don't have a baseline for our Vancouver population. We made the comparison with the data collected with the same test (with identical items) in California, which could be an approximately comparable population as they are both from the West Coast of North America, although there still might be differences. Besides lack of baseline, we also cannot know how much of the TABLE 2 | Selected design and evaluation guidelines for design of the virtual experience of the Overview Effect from Stepanova et al. (2019).


connectedness of nature and self was attributed to the "AWE," and how much of it was a personal trait. Implementing IAT as a pre- and post-test measure could be a possible approach to tackling this challenge (as in Peck et al. (2013) in the context of racial bias), but as a reaction time measure, IAT scores are greatly influenced by learning effects, and therefor repeated tests become difficult to interpret as a measure of change. IAT is very rarely implemented as a pre- and post-test measure, and as in Peck et al. (2013) it requires inviting participants to visit the lab multiple times, and still expects to observe a strong learning effect. The qualitative data in our study, however, showed some promising indications of moments of realization of interconnectedness.

As traditionally the records of the overview effect are describing a moment during the spaceflight, it is difficult to separate which components of a spaceflight experience might be contributing to the Overview Effect and which ones are unrelated. Until this relationship is clarified, we will have to target both the components of the spaceflight and the Overview Effect experiences in VR experience design. In our data we observed some indications of some components of an experience of a spaceflight: change in perception of space and weightlessness, but not the change of perception of time and silence. However, we did not explicitly try to measure them.

# 4.2. Comparing to Other VR Awe-Inspiring Experiences

Here, we want to compare the current VR experience and study with other research attempting to elicit awe and Overview Effect through the use of VR. This comparison allows us to speculate about the role that the aspects of the VR experiences and research tools had on the obtained results, thus informing future research in this field. Chirico et al. (2017), Chirico et al. (2018a), and Chirico et al. (2018b) have shown that an immersive experience of awe-inducing stimuli were associated with a self-reported awe measured with a questionnaire, however these studies used less interactive environment than in our study, and did not perform an extensive qualitative analyses of how a participant's experience in VR unfolded, what some key components of it were, and how they relate to aspects of the virtual environments. Our study is most similar to Gallagher et al. (2015) and Quesnel and Riecke (2018), who also used a VR experience of a spaceflight/orbiting the Earth and collected qualitative interview data. They reported participants' experiences of awe in those VEs across 34 consensus categories defined by Gallagher et al. (2015) hermeneutic analysis, and compared participants' reports of the virtual experience to real-life accounts from astronauts, with some similarities identified. However, the environments used in both of these studies were aiming to provide a realistic representation of the view of the Earth from outer space, and did not have a strong narrative component unlike "AWE." Conversely, with "AWE" we were not aiming to provide a direct, realistic representation of the astronauts' actual experience, but rather wanted to integrate specific design features (artistic strategies and narratives) to create a target emotional journey in a research prototype. Our installation has elicited less observable goosebumps than Google Earth used in Quesnel and Riecke (2018), which could be due to a lower-fidelity quality of the Earth model and usability issues in "AWE." Another reason might be that in Quesnel and Riecke (2018) participants had a choice of their destination in Google Earth and would often travel

to their hometown, which was eliciting nostalgia, which could have contributed to awe. Another explanations might include limitations in the wearable goosebump recording instrument, which changed in prototype design from Quesnel and Riecke (2018) to the present study; see section 4.4 below. However, it should be noted that hermeneutic analyses of interviews have produced comparable distributions of reports related to awe categories between current and Quesnel and Riecke (2018) studies, meaning that while goosebump recording may have failed to detect physical indications of awe, the qualitative analysis has shown some reliability. The observed differences in distribution of awe categories can be explained through specifics of the design of the experience, as discussed above.

Even though our "AWE" installation in its current state did not elicit profound transformative experience in participants, it showed promising results supporting the premise that VR installations can elicit authentic emotional experiences and induce minor cognitive shifts in some participants. This study has also revealed some important aspects of an experience participants have when experiencing this type of immersive installation: specifically the safety and fear of the environment, familiarity and novelty, affects and bodily sensations were prominent themes in participants' descriptions.

# 4.3. Key Outcomes

The elicited fear and the relief from it were an especially interesting part of the experience of many participants. Astronauts also describe a similar transition including the association of the release from fear with the feeling of weightlessness and silence experiences when floating in space (Stepanova et al., 2019). This suggests an intriguing opportunity that a narrative in VR affords: where we could replicate some part of an emotional journey associated with a spaceflight with a use of a different but more familiar and visceral metaphor. If we have had recreated in VR an actual spaceflight experience, that probably wouldn't have achieved the same intensity of an emotional response as a jump into the lake did. This could also be indicated by an observation that most participants found the lake or the forest to be the environments they felt most emotionally connected to. However, when designing a VR experience seeking a profound emotional reaction, we should be cautious with inducing fear to avoid prompting a traumatic experience (Madary and Metzinger, 2016). It's important to learn from the variety of the experiences that participants had and to design the virtual journey in a way that facilitates the relief after the minimal fear induction.

To the best of our knowledge the role of psychological relief on inducing the feeling/illusion of physical weightlessness hasn't been discussed in the context of VR experience design. However some VR experiences were able to induce the feeling of floating or weightlessness. For instance, a meditation walk through a virtual forest for chronic pain management was able to elicit the sensation of weightlessness at least in one participant of Tong et al. (2016). Their study doesn't report on what might have triggered that sensation, but possibly it was a similar mechanism of relief/release, but in their case from some of the chronic pain. Jain et al. (2016) discussed that some of the divers participating in their virtual scuba-diving simulation felt weightless. However, it's hard to determine what have triggered it: it might have been that the familiarity of the environment brought back participants' memories of past diving experiences, or that the physical set-up of the simulation that was involving a swiveling torso support and harnesses for the limbs was responsible for the sensation, as participants were more or less suspended in the air. These type of set-ups dedicated to specific floating experiences are arguably a little cumbersome and expensive, as they often include large physical structure, moving platforms or strapping participants into harnesses, for instance: flying interface such as Birdly (Rheiner, 2014), skydiving (Eidenberger and Mossel, 2015) or swimming (Fels et al., 2005). Even though these interfaces often provide very compelling experiences, some simpler and cost effective solutions are desirable. Learning from the reports of our participant's describing the moments when they suddenly felt weightless could provide new strategies for developing VR experiences inducing the feeling of floating and weightlessness without the complicated physical set-ups.

The number of fear responses observed in the interviews stressed the high importance of understanding the personal background of participants, and that each individual's experience would be very different. Experience with video-games tend to help with objective performance measures in VR simulations, e.g., in a surgical simulation (Grantcharov et al., 2003). In our observations, gaming experience has not only influenced how quickly participants were able to learn the interface and efficiently navigate through environment, but it also significantly shaped what expectations participants brought in. We propose (and explore in our ongoing studies) for affective VR installations to design a pre-VR environment to help create appropriate expectations of the VR experience being an experiential piece as opposed to a game that is presenting a challenge that a gamer often seeks when entering a 3D environment.

Also, the individual experiences with forest and water environments were key for how the virtual experience unveiled. Some of participants had diving, cliff-jumping and camping experiences, while others also reported getting lost in a dark forest in childhood or being afraid of jumping from heights. All of them formed a connection between their personal experiences and being in the VE, which greatly effected their experience. Given everyone's different backgrounds at the design stage it was difficult to predict the distribution of the reactions of participants. Similarly, Shin (2018) in his study showed that personal traits and predispositions of immersants may have a larger effect on individual's experience of an empathy-provoking VR (specifically level of embodiment and empathy elicited), than the specifics of the VR environment and interface. In Quesnel and Riecke (2018) that used Google Earth VR we also observed that the innate experiences of each participant were completely different, and that their personal background and life experience factored into their experience of positive emotions in the study. However, the trend (that can be generalized across participants) is that they experienced more awe in VR when they had a personal connection to the virtual location. Even though some generalizable trends can be identified, the substantial role of the personal background

presents a challenge for designing profound VR experiences as well as to the interpretation of results of studies with them, especially quantitative results. Both designers and researchers need to develop strategies for addressing this challenge. Including interviews and demographic surveys, as well as pilot tests with varied demographics should be an integral part of the development of affective immersive installations in order to be able to understand participant's experiences, and what was the contribution of the installation to the affective state achieved by the participant. Studies of complex experiences and emotions that only collect quantitative data face a risk of not having the tools to disambiguate the responses they observe that stem from different participants' backgrounds and mis-attribute it to the components of the virtual system. This also raises the issue of whether 'one size fits all approach' could be suitable for immersive affective installations. It will be interesting to explore if procedural content generation in combination with bio-responsive environments can help create a more customized journey for each participant building on their personal background and reactions to the elements of the environment.

# 4.4. Limitations

There were likely some biases resulted from being a participant in the study. Even though participants were provided with limited information about the purpose of the study, the description given within the consent form could have shaped their expectations. Another bias stemmed from participants being purposefully recruited as experts in interactive exhibits and culture spaces, and consequently they were inclined to provide a lot of feedback on the quality of the installation. This feedback is exceptionally useful, however the focus on providing a critic might have distracted some participants from being in a more experiential state. This is also likely the reason why usability was the most frequent topic in the interviews, whereas usability concerns were not as prominent in previous tests of the prototype with a different demographic. Having to wear the goosebump camera sensors also might have presented a bias in participant's expectations. Only one participant had explicitly discussed how she was expecting something to jump out at her to give her goosebumps, but other participants possibly have also formed some expectations.

# 4.4.1. Usability Issues and Navigation Interface

One of the main limitations of this study in terms of assessing the potential of VR installation to induce an experience similar to an Overview Effect, is the usability issues with the "AWE." Even though most of the participants generally liked the installation, there are several technical aspects that need to be improved. Many participants wanted to have more control of their movement, especially in the underwater and space part of the experience and be able to move faster. Contrary, a few participants were experiencing motion sickness from movement through the forest scene, where they had the most freedom and the fastest movement. Also, some participants wanted to have full freedom to explore the virtual environment on their own and not to be guided in any obvious way through the narrative. Some also pointed that qualities of some virtual models can be improved and larger variety of models can be added to populate the virtual environment, especially in the underwater scene. The choice of soundtrack also was questioned by some participants, while appreciated by others. These, and many other usability related concerns were limiting the ability of the "AWE" installation to provide environment for a profound awe-inspiring experience leading to cognitive shifts.

Additionally, the leaning interface used in this study, even though useful for navigation and spatial orientation as supported by previous research (Nguyen-Vo, 2018), was found awkward by some participants and likely was not supporting the sensation of floating. Alternative interfaces, designed for flying (Rheiner, 2014; Eidenberger and Mossel, 2015) could have supported the feeling of floating, which might be useful for providing environment in which an experience of an Overview Effect can occur. In our current iteration of "AWE" we are integrating the Limbic Chair interface (Patrik Kunzler, 2019) to hopefully support the feeling of floating. However, all of these interfaces are fairly complex and expensive, and thus a more affordable solution of supporting the feeling of floating in VR would be desirable.

# 4.4.2. Lack of Goosebumps

A low number of occurrences of goosebumps in our study is likely associated with a number of usability issues in the prototype, which would be improved for future studies, including the resolution of the HMD, the quality of models and soundscape. However, it is also possible that some goosebumps or shivers did not register on our camera. There are limitations to our second prototype goosebump recording device used. In this case, the goosebump recording device touches nearby skin that is being recorded, and our concern is that goosebumps that would have otherwise appeared are thus suppressed by the recording device itself. The first prototype used in Quesnel and Riecke (2018) was bulkier, but instead touched the underside of the forearm, leaving the top of the forearm (the recorded surface) out of contact. This may have allowed for that study's 43% goosebump elicitation rate in line with previous studies also between 40 and 43% (Benedek and Kaernbach, 2011; Sumpf et al., 2015; Wassiliwizky et al., 2017). Our most recent goosebump instrument prototype now records the back of the participant's neck.

Interestingly in this study, the participant that had the moment of shivers, had a slightly negative connection between Self and Nature. Even though this is only one instance and no strong inferences can be drawn, this could be an indication that participants with a lower connection of Self and Nature could be more likely to have a stronger emotional reaction from observing awe-inspiring view of the Earth as they would have a stronger need for accommodation than participants who already feel a strong connection to nature and the experience easily assimilates into their worldview (Lorini and Castelfranchi, 2007; Gaggioli, 2016). However, the relationship between the strength of awe and the need for accommodation was not supported in the study by Schurtz et al. (2012), where the measure of the need for accommodation did not predict the measure of awe. However, their study was investigating awe in the social context, not nature, and their measure of the need of accommodation wasn't validated, and as such, the results do not eliminate the possibility of the relationship between the degree of the need of accommodation and the intensity of awe.

## 4.4.3. Gender Effects

Noteworthy, some gender differences were apparent in the descriptions of evoked emotions in the experience, that were less readily discussed by male participants than female, which is aligned with the research on gender differences in use of affective language (Goldshmidt and Weller, 2000). Microphenomenological interviews might be useful for guiding male participants to bring their attention to the affective dimension and assist them with verbalizing their feelings.

# 4.5. Comparing the Interview Methods

The two interview techniques—cued-recall debrief and microphenomenology—were successful in helping participants provide a detailed account of their experience, with more thorough and deep description than a semi-structured interview or a survey could have achieved. This is evident from comparing the richness and precision of the descriptions collected in this study with our earlier pilot tests, that used semi-structured interviewers. Unsurprisingly, the cued-recall method was a little better at encouraging the feedback about the system/installation and the micro-phenomenology the feedback about the progression and dimensions of individual experience. However, both methods have limitations: the micro-phenomenological interviews are zooming in only on a few moments, and thus don't address experience as a whole and provide little light on the portions of the experience that were not chosen, while cued-recall debrief doesn't provide as much depth in descriptions and is less rigorously structured, meaning that there might be more bias introduced by interviewer. We can also observe some trends in what type of responses are more likely to be provided within a given interview: for instance, from **Figure 7** we can see that body change responses are more likely to be reported in a micro-phenomenological interview, while intellectual appreciation in a cued-recall interview. This is anticipated given the interview structure.

# 5. CONCLUSIONS AND FUTURE WORK

This study indicated that a virtual experience, inspired by the Overview Effect and designed to elicit awe, despite some usability concerns, was able to invite minor transformative experiences in some participants, including the main aspects of it: the appreciation of beauty and vastness (Keltner and Haidt, 2003), realization of interconnectedness (Yaden et al., 2016) and a potential intent to change one's behavior based on that realization (White, 2014; Stepanova et al., 2018). We have also discovered some unique opportunities VR technology affords for a design of a profound experience: the opportunity to create a journey taking the participant through induction of a minimal fear in a safe environment and a following release from it; and the opportunity to explore the mind-body connection and the effects of shifting the strength and the locus of control within it.

The qualitative data of participants' experiences in this study inspires some research hypotheses that can be tested with experimental studies. A few of the hypotheses generated as a result of this study are:


Giving the reliance of this line of research on deep emotional responses and importance of individual background, we see two important directions for future development of this project: first, extensive demographics information and interviews are required when using quantitative methods of assessment in order to be able to explain results in the context of a personal experience; second, more flexible, bio-responsive and personalizable experience, that can adapt to the immersant's state is desirable and will be able to create a smoother journey to the desired emotional response.

In the future work we are planning to integrate more physiological sensors (Quesnel et al., 2018a) and automatizing the goosebump detection (Uchida et al., 2018), combined with interviews of the events identified from the physiological data. This will allow us to develop deeper understanding of progression of one's experience in an immersive affective installations, and identify what elements of the journey might be triggering the specific responses in the participants.

VR experiences, inspired by natural phenomena, provide us with an exciting opportunity to study an individual's experience in detail and establish the relation between the experience and the environment. However, we argue that a profound experiences mediated through technology should be seen as its own category of phenomena that requires more exploration. To build this body of knowledge more studies need to explore how profound affective VR personal experiences unfold. This knowledge would inform future design of positive transformative VR experiences that would make such desirable experiences more accessible to the public.

# ETHICS STATEMENT

The ethics approval was granted by Simon Fraser University Office of Research Ethics (Study#: 2017s0269). Consent form was signed digitally by each participant upon arrival to the study space.

# AUTHOR CONTRIBUTIONS

ES, DQ, and BR contributed conception of the project and design of the virtual experience and the study. ES coordinated the study. ES and DQ lead the data collection process. ES transcribed most of the interviews with the help of other members of the research group. ES and DQ developed the coding scheme and analyzed the interview data. ES was responsible for the thematic analyses, while DQ for the awe consensus categories analyses. ES implemented and analyzed IAT test. DQ designed goosebump camera instrument. DQ and ES coded the goosecamera recordings. ES wrote the majority of the manuscript. DQ contributed several sections, specifically related to hermeneutics analyses and goosebump camera. All authors revised and contributed to manuscript. BR

# REFERENCES


supervised the whole project. This work appears in ES's thesis (Stepanova, 2018).

# FUNDING

The funding for this project was provided through NSERC R619563 and 31-611547 and Small Institutional SSHRC Grant R632273, Simon Fraser University (SFU), and Centre for Digital Media (CDM).

# ACKNOWLEDGMENTS

We are thanking the Centre for Digital Media, Patrick Pennefather and the Drifting Pugs team for their tremendous help with the development of the virtual experience as well as the NGX Interactive for their valuable support and providing the space for the study. We are also thanking the members of the iSpace Lab (Ivan Aguilar and Alexandra Kitson) and Elgin-Skye Mclaren for their assistance with running the study and Mirjana Prpa for conducting the micro-phenomenological interviews.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Stepanova, Quesnel and Riecke. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Designing Trustworthy Product Recommendation Virtual Agents Operating Positive Emotion and Having Copious Amount of Knowledge

Tetsuya Matsui <sup>1</sup> \* and Seiji Yamada2,3

*<sup>1</sup> Department of Computer and Information Science, Faculty of Science and Technology, Seikei University, Tokyo, Japan, <sup>2</sup> Digital Content and Media Sciences Research Division, National Institute of Informatics, Tokyo, Japan, <sup>3</sup> Department of Informatics, The Graduate University for Advanced Studies (SOKENDAI), Tokyo, Japan*

#### Edited by:

*Maria V. Sanchez-Vives, August Pi i Sunyer Biomedical Research Institute (IDIBAPS), Spain*

#### Reviewed by:

*Jonathan M. Aitken, University of Sheffield, United Kingdom Maria Koutsombogera, Trinity College Dublin, Ireland*

\*Correspondence:

*Tetsuya Matsui t-matsui@st.seikei.ac.jp*

#### Specialty section:

*This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology*

Received: *09 August 2018* Accepted: *11 March 2019* Published: *02 April 2019*

#### Citation:

*Matsui T and Yamada S (2019) Designing Trustworthy Product Recommendation Virtual Agents Operating Positive Emotion and Having Copious Amount of Knowledge. Front. Psychol. 10:675. doi: 10.3389/fpsyg.2019.00675* Anthropomorphic agents used in online-shopping need to be trusted by users so that users feel comfortable buying products. In this paper, we propose a model for designing trustworthy agents by assuming two factors of trust, that is, emotion and knowledgeableness perceived. Our hypothesis is that when a user feels happy and perceives an agent as being highly knowledgeable, a high level of trust results between the user and agent. We conducted four experiments with participants to verify this hypothesis by preparing transition operators utilizing emotional contagion and knowledgeable utterances. As a result, we verified that users' internal states transitioned as expected and that the two factors significantly influenced their trust states.

Keywords: human-agent interaction, affective computing, anthropomorphic agent, virtual agent, PRVA, trustworthiness, emotional contagion

# 1. INTRODUCTION

In this paper, we suggest a model that increases the trustworthiness of a technological informant by designing its appearance and behavior. We focused on the trust between PRVAs and buyers. Agents that take part in online shopping and recommend products are called "product recommendation virtual agents" (PRVAs) (Qiu and Benbasat, 2009) and need to be trusted by customers in order for their buying motivation to increase. Lu et al. (2010) showed that trustworthiness perceived by consumers contributed to the buying motivation of consumers in e-commerce. In prior work, the design and behavior of PRVAs impacted the effects of their recommendations. Terada et al. inspected the effect of the appearance of PRVAs (Terada et al., 2015). It was revealed that appearance is a large factor in recommendation effects. It was found that a young female agent is one of the most effective PRVAs.

The trustworthiness of virtual agents as informants was studied in human-agent interaction. Virtual agents were perceived as real humans during interaction (Reeves and Nass, 1996); thus, the notion of trust that the agents displayed seemed to be near human.

Otherwise, virtual agents have the aspects of mechanical systems, and this fact affects their trustworthiness perceived by humans. Lucas et al. conducted an experiment in which participants were interviewed by virtual agents and were told that the agents were controlled by humans or automation. They showed that the humans were willing to report self-disclosure to the agents controlled by automation (Lucas et al., 2014). This result suggested that humans see virtual agents as computers; however, the agents have aspects of humans. Madhavan et al. (2006) showed that people lose their trust in computers more strongly than human advisers when they make a mistake . de Visser et al. showed that anthropomorphic virtual agents can reduce this effect. They showed that the anthropomorphic appearance reduces the disappointment humans feel when these agents did not to live up to the humans' expectations (de Visser et al., 2016). Thus, we concluded that virtual agents' trustworthiness has both human and computer aspects. The characteristics of virtual agents discovered in these pieces of research, that is, drawing out self-disclosure and reducing disappointment when the agents fail, seemed to be serviceable for recommending products.

Next is which virtual agent is trusted by users. Danovitch and Mills (2014) showed that children trust familiar virtual characters more than unfamiliar ones as informants. In the case of adults, more natural and recognizable virtual agents bring about a more positive impression, including trustworthiness (Hertzum et al., 2002). These pieces of research focused on the appearance of virtual agents. In our research, we focused on the behavior design of a trustworthy virtual agent. In other words, we aimed to construct trust between users and agents through interactions.

In the case of a robot, it was reported that the task performance of robots was the most important factor in robots' trustworthiness perceived by users (Hancock et al., 2011; Salem et al., 2015).

Rossi et al. investigated the magnitude of the robots' error perceived by users (Rossi et al., 2017a) and conducted experiments about how timing and error magnitude affected trustworthiness of a robot (Rossi et al., 2017b). The error magnitude seemed to be a factor of task performance.

Task performance seems to also be one factor in virtual agents' trustworthiness; however, many virtual agents do not work in the factories and at disaster sites that robots do. Thus, task performance seems to not be the most important factor in the case of virtual agents.

Much research has been conducted on forming rapport, that is, the state in a human and a virtual agent trust each other. Zhao et al. (2014) suggested that, to form rapport, we need long-term interaction with verbal and non-verbal cues. Gratch et al. (2007) showed that virtual agents that respond infrequently to humans form rapport with humans more smoothly than agents that respond frequently. Self-disclosure was also an effective method for creating trust and rapport. In the case of e-commerce via the web, Moon (2000) declared that intimate information exchange (self-disclosure) made consumers feel safe enough to reveal their information to computers. The validity of this method in virtual agents was demonstrated (Kang and Gratch, 2010; Kang et al., 2012). However, it took more time than conventional methods before making recommendations. Thus, we aimed at suggesting a method that creates trust immediately, without long interactions.

We suggest that trust depends on two kinds of parameters, users' emotion and knowledgeableness perceived by users'. Fogg (2002) stated that credibility seems mainly to be constructed by "trustworthiness" and "expertise" in the area of psychology. "Trustworthiness" is based on whether the truster feels that the trustee is fair and honest. This may be affected by emotion. "Expertise" is based on a high level of knowledge, skill, and experience. This is equal to knowledgeableness perceived. Fogg stated these two factors were important for persuasion via computer.

Many prior studies showed that emotional state was one of the important parameters for judging the trustworthiness of partners or informants. Dunn and Schweitzer (2005) showed that people tended to trust unfamiliar people when they were happy. Druckman and McDermott (2008) showed that people made risky choices on the basis of emotion. Also, Dong et al. (2014) showed that people tend to trust partners that have a positive expression. These pieces of research showed that a positive emotion is important for building trust. In psychology and cognitive science, Lang (1995's) dimensional emotional model is widely used . We used only the valence axis, positive or negative, in this model because the above prior pieces of work showed that a truster's positive or negative state affected the perception of trustworthiness.

The trustworthiness of informants seems to be partly based on an informant's knowledgeableness. It was shown that informants that provided diverse examples were more trusted than informants that provided non-diverse ones (Landrum et al., 2015). This means that informants that have a more copious amount of knowledge were trusted. Adults tend to trust technological informants more than humans (Noles et al., 2016). This fact seems to be caused by the expectation that technological informants contain a lot of correct information. In the case of virtual agents, knowledgeableness seemed to be mainly judged by appearance. It was shown that an agent that looked more intelligent was more trusted by users (Geven et al., 2006). We aimed to make an anthropomorphic agent trustworthy by making it express that it is knowledgeable.

# 2. TRUST BEHAVIOR TRANSITION MODEL AND TRANSITION OPERATORS

# 2.1. Trust Behavior Transition Model

In this study, we aimed to construct a model of a user's trust behavior transitions operated by a virtual agent's state transitions. In mental model theory, users update their model of a computer or agent after each output (Kaptelinin, 1996). In this work, we aimed to make users update their mental model of the trustworthiness of PRVAs.

**Figure 1** shows a model of the transition in the internal trust behavior of users that we propose. From the many prior pieces of work (Dunn and Schweitzer, 2005; Geven et al., 2006; Druckman and McDermott, 2008; Dong et al., 2014; Landrum et al., 2015; Noles et al., 2016) that were cited in the introduction, we introduce "emotion" and "knowledgeableness perceived" as important factors that influence a user's trust state on the basis of the above discussion. In this model, we describe a user's internal state by using two parameters, < E h , K <sup>a</sup> >. E <sup>h</sup> means the users' emotion, and K <sup>p</sup> means the agents' knowledgeableness perceived by the users. We defined these two parameters by using the descriptions "L (Low)" or "H (High)." E <sup>h</sup> means whether a user

feels a positive emotion (H) or not (L). K <sup>p</sup> means whether a user feels an agent is knowledgeable (H) or not (L). First, if E h and K p are both L, we describe the internal state as < E <sup>h</sup> = L, K <sup>a</sup> = L>. Also, if E h and K p transition to H, we describe the internal state as < E <sup>h</sup> = H, K <sup>a</sup> = L> or < E <sup>h</sup> = L, K <sup>a</sup> = H>. Finally, if both parameters transition to H, we describe the state as < E <sup>h</sup> = H, K <sup>a</sup> = H>. Also, we describe a user trust state as T. We aimed to inspect the T value for each state. We defined the trust level by using the descriptions "L (Low)," "N (Neutral)," and "H (High)." In our hypothesis, the trust state is L when the internal state is < E <sup>h</sup> = L, K <sup>a</sup> = L>. When the internal state is < E <sup>h</sup> = H, K a = L> or < E <sup>h</sup> = L, K <sup>a</sup> = H>, the trust state is N. Also, when the internal state is < E <sup>h</sup> = H, K <sup>a</sup> = H>, the trust state is H. L and H is relative value in an internal state, not absolute value. Thus this model can be used regardless of the users' mood and emotion in the first state.

It was not clear whether a positive emotion and knowledgeableness have a logical conjunction on the trustworthiness of PRVAs or not, so we focused on this conjunction. These trust states, the transition model, and the following transition operators for a PRVA are our original work.

# 2.2. Transition Operators

We introduce the notions of emotion and knowledgeableness transition operators for this research. These are executed when a PRVA is making recommendations that are expected to cause internal behavior transitions. We defined the emotion transition operators as a PRVA's smile and cute gestures and the knowledgeableness transition operators as a PRVA's technical and detailed knowledge.

Human emotion can be transmitted to other people through facial expressions, voice, or body movement. This is called emotional contagion (Hatfield et al., 1994), and it can occur between a virtual agent and a human through facial expressions (Tsai et al., 2012). This study was conducted via Amazon Mechanical Turk, the crowdsourcing service containing the participants in all over the world. Thus, the result of study has cultural generality.

These studies showed that a smiling agent causes users to feel happy. In the case of robots, Si and McDaniel (2016) showed that robots expressing a relaxed facial expression and body movement caused users to perceive the robots as friendly and trustworthy. This result could be caused by emotional contagion. Kose-Bagci et al. showed that drumming robot's head gesture increased the users' subjective fun (Kose-Bagci et al., 2009) and motivation of interaction (Kose-Bagci et al., 2010). These results showed that the robots' gesture affected the users' internal state. Kose et al. (2012) showed that the virtual agents' gesture and humans' gesture had the same effect in sign language tutoring. This result shows that executing the humans' gesture on the virtual agent is effective for affecting to the users' internal state.

Thus, we aimed to affect a user's emotional state E <sup>h</sup> with a PRVA through facial expressions and hand gestures in order to make the user trust the agent more strongly. We executed two kinds of hand gestures to bring about emotional contagion. The first gesture is an "attractive gesture," in which the PRVA gestures its hands toward the mouse. The second gesture is a "pointing gesture," in which the PRVA points out images. **Figure 2** shows these gestures. Hence, we could utilize emotional contagion as a transition operator. User emotion becomes correspondent with agent emotion when emotional contagion is executed.

Also, we needed a transition operator for the other factor, knowledgeableness perceived K p . An agent's knowledgeableness is expressed through technical and detailed knowledge. Knowledgeableness perceived in a user state corresponds to knowledgeableness in an agent. If an agent looks knowledgeable, the user perceives it as being knowledgeable. We implemented concrete transition operators for emotion and knowledgeableness perceived in experiments and conducted the experiments with on-line shopping to verify this model.

# 3. MATERIALS AND METHODS

We conducted four experiments with participants to verify our model. All experiments were conducted with the same implementations and same procedure. In all experiments, R1 means the recommendation that was shown to the participants first, and R2 means the recommendation that was shown to the participants second.

# 3.1. Experiment Design

All experiments were conducted on-line. All participants were recruited through Yahoo crowdsourcing<sup>1</sup> . The participants received 25 yen (about 0.22 dollars) as a reward. We conducted all experiments in September 2017. Detailed data on the participants are shown in sections for each experiment. All participants provided informed consent by clicking submit button, and the study design was approved by an research ethics committee in National Institute of Informatics.

The experiments took about 10 min for each participant. The participants were asked to watch movies in which a PRVA recommended a package tour to Japanese castles. The PRVA was

<sup>1</sup>https://crowdsourcing.yahoo.co.jp/

executed with MMDAgent<sup>2</sup> , and the agent's character was "Mei," a free model of MMDAgent distributed by the Nagoya Institute of Technology. We used VOCELOID+ Yuzuki Yukari EX<sup>3</sup> , which is text to speech software, for smooth utterances.

The PRVA recommended two package tours to two castles, and these recommendations are indicated by R1 and R2. The PRVA recommended package tours to Inuyama Castle and Gifu Castle. Both castles were built in the Japanese Middle Ages and have castle towers.

Neither of these castles are a World Heritage Site, and there seems to be no difference in the preferences of Japanese people between these two castles.

In all experiments, the PRVA recommended Inuyama Castle for R1 and Gifu Castle for R2 for a half of the participants. For the other half, the PRVA recommended Gifu Castle for R1 and Inuyama Castle for R2. This was for counterbalance.

<sup>2</sup>http://www.mmdagent.jp/

**Figure 3** shows a snapshot from the movies. The transition operators executed by the agent for each recommendation are as follows. In experiment 1, no transition operator was executed for R1, and the emotion transition operators were executed for R2.

In experiment 2, no transition operator was executed for R1, and the knowledge transition operators were executed for R2. In experiment 3, the emotion transition operators were executed for R1, and the knowledge transition operators were also executed for R2. In experiment 4, the knowledge transition operators were executed for R1, and the emotion transition operators were also executed for R2. Some part of movies are shown as a Supplementary Material (**Video 1**).

**Figure 4** shows movie snapshots showing the agent without the emotion transition operators and with the operators. **Table 1** shows which operators were executed for each recommendation in each experiment. **Table 2** shows examples of speech that recommended a trip to Gifu Castle without the knowledge transition operators and with the operators. With the knowledge transition operators, the PRVA explained historical episodes and gave details on fees.

<sup>3</sup>http://www.ah-soft.com/voiceroid/yukari/

TABLE 1 | Transition operators executed for each recommendation.


# 3.2. Questionnaires

Participants were each asked sets of questions after each recommendation was made. Thus, in total, they answered two sets of questions in one trial. The sets of questions were constructed with Wheeless's Interpersonal Solidarity Scale (ISS) and the Positive and Negative Affect Schedule (PANAS), a knowledgeableness scale. Wheeless's ISS is a scale for measuring the solidarity and trustworthiness of a particular person (Wheeless, 1978). We used this scale to measure the PRVA's trustworthiness that the participants perceived. This scale contains 20 questions. The participants were asked to answer these questions on a seven-point Likert scale. We calculated all scores and used the average as the Interpersonal Solidarity Score, the score of trust.

The Positive and Negative Affect Schedule (PANAS) is a scale that is used to measure a person's affect (Watson et al., 1988). This scale is based on the premise that affect is constructed of positive and negative affect. We used this scale to measure the participants' positive emotion. We used 16 questions, and the participants were asked to answer them on a six-point Likert scale. We used the sum of the score of eight questions related to positive affect as the Positive Affect Score for the score of positive emotion.

We constructed an original scale to measure knowledgeableness perceived, the knowledgeableness scale. This was constructed with these five questions shown in **Table 3**.

The participants were asked to answer these questions on a seven-point Likert scale. We defined the average of the score of these questions as the Knowledgeableness Score, the score of knowledgeableness perceived.

# 3.3. Statistical Procedure

For each experiment, we conducted a Wilcoxon signed-rank test between R1 and R2. The Wilcoxon signed-rank test is a non-parametric test that is widely used to compare two data sets and focuses on the transition between paired data (Gibbons and Chakraborti, 2011). This was the most suitable test for verifying the participants' internal state transition between the two recommendations. For the score on the PANAS and Knowledgeableness Scale, we defined the internal state (emotion and knowledgeableness perceived) transition as occurring when there was significant difference between each recommendation. We defined E h , K a , and T as L before the experiments. If the state was L before the recommendation and the score increased significantly after the recommendation, we defined the state as transitioning to H. If the state was H before the recommendation and the score decreased significantly after the recommendation, we defined the state as transitioning to


TABLE 3 | Questions for knowledgeableness scale.


L. Also, for the score on the ISS, we defined a trust state transition as having occurred when there was a significant difference. If the state was L before the recommendation and the score increased significantly after the recommendation, we defined the state as transitioning to N. Also, if the state was N before the recommendation and the score increased

TABLE 4 | Result of experiment 1.


TABLE 5 | Result of experiment 2.


significantly after the recommendation, we defined the state as transitioning to H.

In this paper, we use the terms increased pair and decreased pair. Increased pair means the number of participants whose score increased between the two recommendations. Decreased pair means the number of participants whose score decreased between the two recommendations. They were the most important parameters in the Wilcoxon signed-rank test.

Each participants answered in total 119 questions in one trial. If a participant answered with the same number 20 times in a row, we excluded that participant as noise.

# 4. RESULT

# 4.1. Experiment 1: R1, No Transition Operator; R2, Emotion Transition Operators

In the first experiment, the PRVA made recommendations without any transition operators for R1 and also made recommendations with emotion transition operators for R2 (see also **Table 1**). We conducted this experiment with the aim of making the participants' state transition from state A to state B in **Figure 1**.

We recruited 219 Japanese participants for experiment 1, and 178 remained after noise exclusion. There were 112 males and 66 females, and they were aged between 22 and 74, for an average of 40.7 (SD = 8.4).

**Table 4** shows the result of experiment 1. We conducted a Wilcoxon signed-rank test for these data. The Positive Affect Score (PA) significantly increased between the two recommendations (p < 0.01). The score on the Knowledgeableness Scale (KS) significantly decreased (p < 0.01). The score on the ISS significantly increased (p < 0.01).

We conducted the same test for the result of female participants. As a result, PA non-significantly increased (p = 0.095), KS significantly decreased (p = 0.001) and ISS significantly increased (p = 0.003). Also, we conducted the same test for the result of male participants. As a result, PA significantly increased (p = 0.029), KS nonsignificantly decreased (p = 0.101) and ISS significantly increased (p = 0.000).

# 4.2. Experiment 2: R1, No Transition Operator; R2, Knowledgeableness Transition Operators

In experiment 2, the PRVA made recommendations without any transition operators for R1 and also made recommendations with knowledgeableness transition operators for R2. We conducted this experiment with the aim of making the participants' state transition from state A to state C in **Figure 1**.

We recruited 249 Japanese participants for experiment 2, and 209 remained after noise exclusion. There were 104 males and 105 females, and they were aged between 20 and 68, for an average of 42.2 (SD = 9.5).

**Table 5** shows the result of experiment 2. We conducted a Wilcoxon signed-rank test for these data. For the PA, there was no significant difference. The KS significantly increased (p < 0.01). The ISS significantly increased (p < 0.05).

We conducted the same test for the result of female participants. As a result, PA non-significantly decreased (p = 0.354), KS significantly increased (p = 0.001) and ISS significantly increased (p = 0.026). Also, we conducted the same test for the result of male participants. As a result, PA non-significantly decreased (p = 0.170), KS significantly increased (p = 0.011) and ISS non-significantly increased (p = 0.166).

# 4.3. Experiment 3: R1, Emotion Transition Operator; R2, Emotion and

# Knowledgeableness Transition Operators

In experiment 3, the PRVA made recommendations with emotion transition operators for R1 and also made recommendations with emotion and knowledgeableness transition operators for R2. We conducted this experiment with the aim of making the participants' state transition from state B to state D in **Figure 1**.

We recruited 255 Japanese participants for experiment 3, and 202 participants remained after noise exclusion. There were 100 males and 102 females, and they were aged between 19 and 75, for an average of 41.2 (SD = 9.7)

**Table 6** shows the result of experiment 3. We conducted a Wilcoxon signed-rank test for these data. For the PA, there was no significant difference. The KS significantly increased (p < 0.05). The ISS significantly increased (p < 0.01).

We conducted the same test for the result of female participants. As a result, PA non-significantly increased (p = 0.558), KS non-significantly increased (p = 0.085) and ISS significantly increased (p = 0.018). Also, we conducted the same test for the result of male participants. As a result, PA non-significantly increased (p = 0.998), KS non-significantly increased (p = 0.091) and ISS significantly increased (p = 0.002). TABLE 6 | Result of experiment 3.


TABLE 7 | Result of experiment 4.


# 4.4. Experiment 4: R1, Knowledgeableness Transition Operator; R2, Emotion and Knowledgeableness Transition Operators

In experiment 4, the PRVA made recommendations with knowledgeableness transition operators for R1 and also made recommendations with emotion and knowledgeableness transition operators for R2. We conducted this experiment with the aim of making the participants' state transition from state C to state D in **Figure 1**.

We recruited 296 Japanese participants for experiment 4, and 246 remained after noise exclusion. There were 98 males and 148 females, and they were aged between 16 and 66, for an average of 39.2 (SD = 10.4).

**Table 7** shows the result of experiment 4. We conducted a Wilcoxon signed-rank test for these data. The PA significantly increased (p < 0.01), and the KS significantly decreased (p < 0.01). The ISS significantly increased (p < 0.01).

We conducted the same test for the result of female participants. As a result, PA significantly increased (p = 0.008), KS significantly decreased (p = 0.000) and ISS significantly increased (p = 0.000). Also, we conducted the same test for the result of male participants. As a result, PA significantly increased (p = 0.036), KS non-significantly increased (p = 0.666) and ISS significantly increased (p = 0.000).

# 5. DISCUSSION

# 5.1. Effect of Transition Operators

In experiments 1 and 4, we can see the effect of the emotion transition operators. As shown in both **Tables 4**, **7**, the PA significantly increased. Also, the ISS significantly increased. These results mean that the emotion transition operators had an effect as we hypothesized in both of these experiments.

In experiments 2 and 3, we can see the effect of the knowledgeableness transition operators. As shown in both **Tables 5**, **6**, the KS significantly increased. Also, the ISS significantly increased. These results mean that the knowledgeableness transition operators had an effect as we hypothesized in both of these experiments.

In the four experiments, both the emotion and knowledgeableness transition operators increased the perceived trustworthiness of the PRVA. These results show the validity of our model. However, the KS significantly decreased in both experiments 1 and 4. This means that the emotion transition operators reduced the perceived knowledgeableness of the PRVA. There is a possibility that the smiling and cute gestures were received as a sign of a lack of knowledge or intelligence. However, the ISS significantly increased when the KS significantly decreased. This result suggests that the emotion transition operators have more of an effect on trust than the knowledgeableness operators.

Why the emotion transition operators reduced the perceived knowledgeableness of the PRVA is an important question. Our consideration is that the participants perceived the PRVA's smile as being awkward. In this research, we used arm gestures and eye movement as a part of the emotion transition operators. These were reported as cues to deception (DePaulo et al., 2003). Thus, the participants may have unconsciously felt that the PRVAs were deceptive.

This effect suggested that emotion and knowledgeableness perceived can interfere with each other. However, experiments 1 and 3 showed that we could increase the trust level again by executing the knowledgeableness transition operators after executing the emotion transition operators. This result showed that we can use the knowledgeableness transition operators to amplify the trustworthiness that has already increased by executing the emotion transition operators. Thus, these operators are effective regardless of whether there is interference or not.

Little has been reported on the interference between emotion and knowledgeableness perceived. However, some studies reported that emotion and knowledgeableness perceived are another factors of decision making. In elaboration likelihood model, the customers used two kinds of buying decision making route, central route and peripheral route (Petty and Cacioppo, 1986). In central route, the customers will make decision based on logical thinking. In peripheral route, the customers will make decision based on their feeling, impression and heuristics. In our experiments, emotion is associated with peripheral route and knowledgeableness perceived is associated with central route. Dual-process theory suggested that the humans use two kinds of system for decision making, system 1 and system 2 (Evans and Stanovich, 2013). System 1 will be used when the human make decision immediately by feeling and system 2 will be used when the human make decision by deep thinking. Emotion seem to be associated with system 1 and knowledgeableness perceived seem to be associated with system 2. These models suggested two difference route or process, emotional route and logical route, in decision making. The interference between emotion and knowledgeableness perceived might mean the interference between two route. However, prior works define that two routes works independently, not coherently. This problem is our future work.

# 5.2. Observed Internal Behavior Transitions of Participants

**Figure 5** shows the internal behavior transitions observed in the four experiments. "Emotion" and "Knowledgeableness" means the transition operators that were executed.

In experiment 1, the participants' internal state transitioned from < E <sup>h</sup> = L, K <sup>a</sup> = L> to < E <sup>h</sup> = H, K <sup>a</sup> = L>. In experiment 2, it transitioned from < E <sup>h</sup> = L, K <sup>a</sup> = L> to < E <sup>h</sup> = L, K <sup>a</sup> = H >.

From the result of experiment 1, we concluded that the participants' state was < E <sup>h</sup> = H, K <sup>a</sup> = L> when they watched the PRVA with the emotion transition operators. Thus, we defined the participants' state as being < E <sup>h</sup> = H, K a = L> after R1 in experiment 3. The internal state transitioned from < E <sup>h</sup> = H, K <sup>a</sup> = L> to < E <sup>h</sup> = H, K <sup>a</sup> = H> after R2 in experiment 3. These results are suitable for our model.

From the result of experiment 2, we concluded that the participants' state was < E <sup>h</sup> = L, K <sup>a</sup> = H> when they watched the PRVA with the knowledgeableness transition operators. Thus, we defined the participants' state as being < E <sup>h</sup> = L, K a = H> after R1 in experiment 4. The internal state transitioned from < E <sup>h</sup> = L, K <sup>a</sup> = H> to < E <sup>h</sup> = H, K <sup>a</sup> = L> after R2 in experiment 3. This means the transition from C to B in **Figure 1**. In these experiments, we did not observe a transition from C to D. The reason seems to be that the effect of the emotion transition operators was too strong and interfered with the effect of the knowledgeableness transition operators. To observe the transition from C to D, we might adjust both operators.

The ISS significantly increased for all transitions. We defined the T (trust state) as transitioning from L to N in experiments 1 and 2. Also, in experiment 3, we defined T as transitioning from N to H. These results suggest that our hypothesis is correct. However, the ISS significantly increased also in experiment 4. According to our model, this result means that T transitioned from N to H. However, the internal state was < E <sup>h</sup> = H, K a = L> after R2 in experiment 4. In our model, T is N in this state. However, this is not a contradiction because we could increase the trustworthiness from this state as shown in experiment 3. It was shown that the < E <sup>h</sup> = H, K <sup>a</sup> = H> state has the most trustworthiness.

# 5.3. Limitations

This study has some limitations. First, we did not observe the transition from < E <sup>h</sup> = L, K <sup>a</sup> = H> to < E <sup>h</sup> = H, K a = H>. This suggested the limitation of our model or method. We guessed that this result was caused by the limitation of the transition operators. We expect that we can observe this transition when we will use other transition operators.

Second, we asked participants to fill out questionnaires after they watched each movie. To get a more accurate impression, we will use protocol data analysis. Also, to get a more objective result, we will use biological signals.

Third, there are gender difference in result. In experiment 1, 2, and 4, the both of female participants and male participants' ISS score significantly increased. In experiment 3, the male participants' ISS score did not significantly increase. Also, there are gender difference in some score at experiments. This result might be caused by gender difference in interacting virtual agents, or caused by small sample size.

Last, the generality of the results was not verified. Our model and method was effective for the PRVA; however, we cannot predicate that this method can be applied to other virtual agents. Cameron et al. suggested that a factor of robots' trustworthiness can come to the surface in a strong context experimental design (Cameron et al., 2015). In our experiment, the strength of context was not deeply considered.

# 6. CONCLUSION

We proposed a trust model and aimed at operating a user's trust toward PRVAs. Our original idea in this research is using two parameters to operate trust, a user's emotion and knowledgeableness perceived. Furthermore, we developed transition operators to make these parameters transition. Also, we conducted four experiments to verify this model with participants. In experiments 1 and 2, we verified the effect of the transition operators and successfully increased the participants' trust. In experiment 3, we executed knowledgeableness transition operators after emotion transition operators and observed the transition to a state that had the most trustworthiness. In experiment 4, emotion transition operators appeared after knowledgeableness transition operators, and we observed increased trustworthiness and decreased knowledgeableness perceived.

In these experiments, the knowledgeableness transition operators worked completely as expected, and the emotion transition operators definitely worked. The emotion transition operators restrained knowledgeableness perceived. This result suggested that these operators interfered with each other; however, this is not a big matter that increases trustworthiness. The emotion transition operators restrained the increase in trustworthiness regardless of the decrease in knowledgeableness perceived. Also, from the result of experiments 1 and 3, we discovered the most effective process to increase trustworthiness. When we executed the knowledgeableness transition operators after executing the emotion transition operators, we could cause the trust level to transition from L to H. This order is the most effective for making a user trust a PRVA.

These experiments have some limitations, and the interference between the emotion and knowledgeableness perceived is an unsolved problem. However, the transition model and transition operators suggested in this research can contribute to the design of PRVAs and other virtual agents.

# AUTHOR CONTRIBUTIONS

TM conducted the experiments and analysis and drafted the manuscript with important contributions from SY. All authors

# REFERENCES


participated in the review and revision of the manuscript and have approved the final manuscript to be published.

# FUNDING

This research was partially supported by JSPS KAKENHI Cognitive Interaction Design (No. 26118005).

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2019.00675/full#supplementary-material


the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems (Seoul), 929–934.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Matsui and Yamada. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Presence and Cybersickness in Virtual Reality Are Negatively Related: A Review

#### Séamas Weech1,2 \*, Sophie Kenny<sup>2</sup> and Michael Barnett-Cowan1,2

<sup>1</sup> Department of Kinesiology, University of Waterloo, Waterloo, ON, Canada, <sup>2</sup> The Games Institute, University of Waterloo, Waterloo, ON, Canada

In order to take advantage of the potential offered by the medium of virtual reality (VR), it will be essential to develop an understanding of how to maximize the desirable experience of "presence" in a virtual space ("being there"), and how to minimize the undesirable feeling of "cybersickness" (a constellation of discomfort symptoms experienced in VR). Although there have been frequent reports of a possible link between the observer's sense of presence and the experience of bodily discomfort in VR, the amount of literature that discusses the nature of the relationship is limited. Recent research has underlined the possibility that these variables have shared causes, and that both factors may be manipulated with a single approach. This review paper summarizes the concepts of presence and cybersickness and highlights the strengths and gaps in our understanding about their relationship. We review studies that have measured the association between presence and cybersickness, and conclude that the balance of evidence favors a negative relationship between the two factors which is driven principally by sensory integration processes. We also discuss how system immersiveness might play a role in modulating both presence and cybersickness. However, we identify a serious absence of high-powered studies that aim to reveal the nature of this relationship. Based on this evidence we propose recommendations for future studies investigating presence, cybersickness, and other related factors.

Keywords: presence, cybersickness, virtual reality, sensory integration, human factors

# INTRODUCTION

Around 30 years ago, the process of simulating a user's sensory environment gained the popular term "virtual reality"<sup>1</sup> (Krueger, 1992). Although the concept of virtual reality (VR) has morphed significantly since the initial conception, the promise inherent in simulating "the real world" has continually inspired and challenged scientists and artists (Jones, 2000). Fifty years ago, when the first attempts to implement a VR display were taking place, a large number of technical issues required a solution in order to achieve even a rudimentary mediated environment. While working at Harvard Computation Laboratory, Ivan Sutherland's team was able to solve many of these issues (Sutherland, 1968). Their stereoscopic display, including a refresh rate of 30 frames per second, a

#### Edited by:

Maria V. Sanchez-Vives, August Pi i Sunyer Biomedical Research Institute (IDIBAPS), Spain

# Reviewed by:

Mariano Alcañiz, Universitat Politècnica de València, Spain Bruno Herbelin, École Polytechnique Fédérale de Lausanne, Switzerland Massimo Bergamasco, Sant'Anna School of Advanced Studies, Italy

#### \*Correspondence: Séamas Weech

sweech@uwaterloo.ca

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 24 July 2018 Accepted: 16 January 2019 Published: 04 February 2019

#### Citation:

Weech S, Kenny S and Barnett-Cowan M (2019) Presence and Cybersickness in Virtual Reality Are Negatively Related: A Review. Front. Psychol. 10:158. doi: 10.3389/fpsyg.2019.00158

<sup>1</sup>Although non-immersive desktop systems are sometimes referred to as "virtual reality," we use the term here to signify immersive or semi-immersive systems such as head-mounted displays or projection systems.

field-of-view of 40◦ , and the ability to depict 3D objects only as wire-frames, was termed "favorable" by users. This implementation was some distance from providing the idealized VR experience. Since these initial inventions, a vast amount of effort has been focused on the development of improved means of inspecting and interacting with virtual worlds, and a myriad of other problems have since been solved. This rapid progress has led to the creation of VR systems that are orders of magnitude smaller, lighter, and more powerful than Sutherland's foundational technology. VR has recently seen popularity as a flexible tool for investigating a wide-range of human behaviors in high-fidelity with perfectly replicable conditions. Throughout the development process, however, the ultimate goal for VR has remained unachievable – that is, the accurate and credible simulation of a real experience. Chief among the enduring problems that prevent this achievement is the struggle to consistently generate a sense of presence in VR users, whereby conscious awareness of simulated mediation ceases. A second prominent barrier is cybersickness<sup>2</sup> (CS), the bodily discomfort associated with exposure to VR content. Unlocking the potential of VR will largely depend on our ability to understand, then to solve, these substantial and enduring hindrances. A large body of research has emerged from attempts to identify whether presence and CS are deterministically linked through a positive or negative association. However, these results are highly discordant, and no consensus currently exists regarding the nature of the relationship between CS and presence.

This review has three aims. In the first part of this article we describe in brief the concepts of presence and CS and outline techniques that are commonly used to measure both factors (Section "Introduction"). We intend this to provide the reader with relevant context for interpreting studies that are discussed later in the review. The concepts of CS and presence are discussed below (see Sections "Presence," and "Cybersickness"), but in brief, these are complex phenomena with a multitude of individual differences (e.g., sex, gaming experience) and external factors (e.g., control of navigation, visual display parameters) thought to influence their occurrence. Each factor has been targeted using several measurement techniques, all of which vary with respect to how the measured variable is operationalized.

The second part of this review highlights studies that have directly measured the link between presence and CS, which we identified through a scoping literature search (Section "Evidence of a Presence-Cybersickness Relationship"). Given the large and historic importance of understanding and improving the issues of presence and CS, numerous studies have measured both factors and have even identified relationships between them. However, while approaches to solving the problems of presence and CS in VR have been tackled separately in large numbers of recent research papers and reviews, evidence of a possible link between them has seen very little discussion, particularly in recent years. VR has developed rapidly since its first implementation in the 1960s, and as such, early reports on the link between presence and CS may not apply to the current state of VR.

The third part of the review constitutes a broad overview of the associations between presence, CS, and other variables, since a large number of contradictory findings have been reported in the literature. These confounding factors emerge in part due to the rapid rise of VR, the multifactorial nature of both CS and presence, and the influence of other modulating factors such as sensory mismatch, display factors, and personal characteristics (Section "Associations with Other Variables").

In our final section, and throughout the article, we aim to provide a synthesis of the research, with a special focus on unifying the discrepant findings about the nature of the presence-CS association. Our conclusions can be summarized as follows. First, there is more compelling evidence in support of a negative association between CS and presence than alternative relationships. The experimental results indicating a positive correlation between the two factors may be attributed to the necessity for settings to be "immersive" before CS can emerge. Second, there is considerable evidence for the role of sensory mismatch in both presence and CS. We also discuss the likelihood that sensory mismatch modulates a variety of factors that have been empirically linked with presence and CS (e.g., navigation control, display factors, vection). The strength of our conclusion is tempered by a need for additional high-powered studies in future research.

Our objective with this review is to provide an answer to the following question: What is the relationship between presence and CS in VR? Note that this review does not constitute a review of CS (see LaViola, 2000; Davis et al., 2014; Rebenitsch and Owen, 2016) or of presence (see Lombard and Ditton, 1997; Schuemie et al., 2001; Sadowski and Stanney, 2002; Biocca et al., 2003; Lee, 2004; Sanchez-Vives and Slater, 2005; for a metaanalysis see Cummings and Bailenson, 2016), but rather, of their interrelation. We address this relationship in order to answer pertinent questions in VR research: Does improving presence come at the cost of increasing CS, or can an intervention be conceived that improves presence and reduces CS? The second objective of this review is to provide a condensed view of the field of presence-CS research which we hope will prove useful to the next wave of studies on this complex relationship. While the majority of this review is focused on findings obtained in the disciplines of cognitive neuroscience and experimental psychology, the conclusions of the review are naturally relevant to human–computer interaction and human factors research.

# Presence

For over 40 years, the goal of achieving presence has been regarded as a defining aspect of a successful VR experience (Minsky, 1980). Although multiple definitions and dimensions of presence have been proposed (Sadowski and Stanney, 2002), the concept is almost universally described as the observer's sense of psychologically leaving their real location and feeling as if transported to a virtual environment. Put simply, presence is the illusion of "being there" (Heeter, 1992). A variety of factors influence the likelihood that a user feels presence in a virtual

<sup>2</sup>Note that this review includes discussion of several related phenomena, such as CS, visually-induced motion sickness (VIMS), and simulator sickness (SS). Due to the important differences between these phenomena (which we briefly discuss below), the reader is advised to consider that evidence of a relationship between vection and VIMS, for example, does not dictate the same association between vection and CS.

environment (see Section "Associations With Other Variables"). For instance, the earliest VR implementations were built with an understanding that presence depends upon receiving correlated multisensory inputs that convey a simulated environment (the cybernetic approach to VR; Minsky, 1980; Herbelin et al., 2015). Many consider presence to be associated with the degree of environmental interaction (Slater and Usoh, 1993) as well as the fidelity and realism of information about the simulated landscape that is conveyed to the sensory modalities (Witmer and Singer, 1994). Individual differences in susceptibility to presence also play a large role (Witmer and Singer, 1998). Distinctions have been made between types of presence: Physical presence, the sense of physical relocation of the observer (IJsselsteijn et al., 2000); and social presence, the sense of being collocated with virtual agents (Heeter, 1992; Lombard and Ditton, 1997; Biocca et al., 2003). Several researchers have noted important distinctions between presence – the feeling of "being there" – and related concepts, such as engrossment and immersion. An individual may be highly attentive to a task in VR (engrossed) without feeling presence; similarly, the degree to which an individual is shut-off from the real world by a VR system (immersion) may not determine presence (Barfield et al., 1995; Slater et al., 1996; Nichols et al., 2000). Others have emphasized that presence is strongly modulated by perception of motor affordances of objects in VR (Dalgarno and Lee, 2010; Triberti and Riva, 2016), in addition to the importance of embodying a plausible virtual avatar in encouraging the sense that a virtual space is "real" rather than artifice (Slater et al., 2009; Grabarczyk and Pokropski, 2016). The embodiment of an avatar is in turn dependent on the synchronicity of sensory stimuli obtained by a VR user (Kilteni et al., 2012). At the same time, individual differences appear to modulate presence in response to VR content: Individuals who strongly express personality traits of openness, neuroticism, absorption, and extraversion tend to report higher levels of presence (Sacau et al., 2008; Weibel et al., 2010). The reason for this difference is unclear, although it is possible that the finding indicates a bias at the response level, rather than reflecting differences in the qualitative experience of presence across personality types.

### Measurements of Presence

A wide range of methods have been used to measure presence in virtual environments. Although it has been argued that the subjective quality of presence necessitates a subjective measure such as verbal self-reports about the sense of "being there" (Sheridan, 1992), research has increasingly emphasized the need to measure the similarity between behavioral responses in the real world and in a mediated environment as an objective index of presence (Bailenson et al., 2004; Slater, 2004a; van Baren and IJsselsteijn, 2004). As such, presence measures are often classified as subjective or objective measures, which can be further broken down into subcategories.

Objective measures include biomarkers which might relate to presence (e.g., obtained from heart rate and skin conductivity), behavioral measures (e.g., reflexive responses to dangerous stimuli, postural sway in response to visual stimulation), or measurements related to task performance in the virtual environment. van Baren and IJsselsteijn (2004) provide a detailed list of examples that employ each category of measurement tool. However, several other techniques for measuring presence have been studied in recent years, including neuroimaging (functional magnetic resonance imaging, fMRI; Baumgartner et al., 2008; Clemente et al., 2013) and electroencephalogram (EEG; e.g., Baumgartner et al., 2006; Clemente et al., 2014) which show potential for identifying neural correlates of presence in VR. The search for objective markers of presence is particularly pressing, given that established presence questionnaires have been criticized for several limitations, including a lack of including an inability to quantitatively discriminate between otherwise clearly distinguishable virtual and real life experiences (Usoh et al., 2000), measuring the post-exposure memory of presence rather than presence itself (Usoh et al., 1999), and a lack of sensitivity to presence compared with behavioral measures (Bailenson et al., 2004; Slater, 2004a).

Subjective measures are obtained either through questionnaires administered following VR exposure; self-report ratings solicited during VR exposure; verbal or written reports of the qualitative experience of presence; or psychophysical magnitude estimation/matching paradigms. Despite their criticisms, questionnaires have been the most common approach to measuring presence. The dominant multi-item scales include the "Presence Questionnaire" (PQ; Witmer and Singer, 1998), a 32-item list of seven-point questions that are used to generate scores on subscales such as realism, possibility to act, and quality of interface. The same authors developed the Immersive Tendencies Questionnaire (ITQ; Witmer and Singer, 1998) as an index of an individual's likelihood of feeling immersed in virtual settings. The 29 items of the ITQ relate to the tendency to become involved in activities, to maintain focus in activities, and the tendency to play video games. As such, the ITQ can be taken as an index of "trait" tendency toward feeling (or reporting) presence. Other common scales include the Igroup Presence Questionnaire (IPQ; Schubert, 2003), which was developed to measure spatial presence, involvement, and realness of a simulated experience, and comprises a list of 14 items. A popular scale developed by Slater, Usoh, and Steed (SUS; Usoh et al., 2000) consists of a brief, 6-item questionnaire that generates a single score to convey how "present" the user felt in the virtual setting. Many other scales have been employed, including Likert-type rating scales, analog (continuous) ratings, or single-item measures (e.g., "To what extent did you feel present in the environment, as if you were really there?" Bouchard et al., 2004).

Each variety of measurement scale is accompanied by benefits and shortcomings. While multi-item presence questionnaires can provide a detailed assessment of the multiple dimensions that may underlie presence, the appeal of single-item scales is in their un-intrusive and rapid assessment of presence (Bouchard et al., 2004). Single-item scales may also be less prone to memory deterioration following virtual environment exposure, and can be administered several times during exposure to VR (although it should be noted that repeat probing of a VR user may interrupt and diminish the experience of presence). When compared with lengthy questionnaires, single item scales are potentially more accessible to some participants, including children or individuals with learning difficulties. A thorough discussion of the limitations and utility of each type of scale is provided by van Baren and IJsselsteijn (2004).

# Cybersickness

fpsyg-10-00158 January 31, 2019 Time: 18:41 # 4

As with presence, several definitions have been proposed for what we term here CS. We follow the definition outlined by Stanney et al. (1997): CS is a constellation of symptoms of discomfort and malaise produced by VR exposure. CS is typically categorized as a form of visually induced motion sickness (VIMS), which describes any sickness produced by observation of visual motion, and it is distinct – but symptomatically similar to – simulator sickness (SS), which is produced by vehicle simulators. A slight distinction has been made between the experience of CS and SS; while CS is characterized by a prevalence of disorientation symptoms, SS appears to be predominated by oculomotor symptoms (Stanney et al., 1997). While many individuals experience CS in VR, others appear to be robust to the symptoms. Causal factors have been identified and discussed in great detail, including mismatches between observed and expected sensory signals (Reason and Brand, 1975; Rebenitsch and Owen, 2016), self-motion (McCauley and Sharkey, 1992), visual display characteristics (Moss and Muth, 2011), and gameplay experience (Knight and Arns, 2006; Gamito et al., 2008).

# Measurements of Cybersickness

As with presence, the magnitude of CS experienced by a VR user has been estimated using both objective and subjective measures. Objective measures may involve analysis of physiological markers. Increases in bradygastric activity, respiration rate (Kim et al., 2005; Dennison et al., 2016) heart rate (Nalivaiko et al., 2015), and skin conductance at the forehead (Golding, 1992; Gavgani et al., 2017) provide robust measures of CS. Behavioral signs such as early termination of a VR experience (Kinsella, 2014) and task competence (Lin et al., 2015; Nalivaiko et al., 2015) also indicate the extent to which an individual experiences CS.

The most common approach to assessing CS involves subjective measures, particularly multi-item questionnaires such as the Simulator Sickness Questionnaire (SSQ; Kennedy et al., 1993) which includes 16 items (e.g., eyestrain, dizziness, and headache) on a four-point scale (none, slight, moderate, or severe). Common practice with the SSQ is to generate a total sickness score as well as scores for each subscale of oculomotor discomfort, disorientation, and nausea. A shortened version of the SSQ (Short Symptoms Checklist, SSC; Cobb et al., 1995), consisting of two items from each subscale, has been developed and employed in a small number of studies (Wilson et al., 1997; Cobb et al., 1999). Given the dynamic nature of CS, which tends to increase during VR exposure and slowly dissipate following VR termination, there are clear challenges involved in using oneshot questionnaire measurements of CS. Single item scales for measuring CS have also been developed and validated, providing an efficient method for assessing the temporal evolution of CS (e.g., Fast Motion Sickness Scale; Keshavarz and Hecht, 2011b). The near future of CS research will likely involve an integrated approach, where physiological assessments (see Section "Physiological Measures") are combined with multi-item and single-item questionnaires that are completed both during and after VR exposure.

# Shared Measurements of Presence and Cybersickness

Multiple measurement techniques are common to both presence and CS. These can be broadly categorized as physiological markers (e.g., recordings of neural or dermal activity), or taskperformance based measures (e.g., reaction times, performance accuracy). Here we describe these approaches to measuring both factors, and discuss how the overlap between measurement approaches causes difficulty with interpreting the true relatedness of the factors.

# Physiological Measures

Physiological methods have been applied to the measurement of both presence and CS, which presents a significant potential problem in understanding how the two factors are related. Indices of autonomic nervous system activity offer reliable measures of the stress response, and this stress/alarm response is linked to both presence and CS. Physiological correlates of acute CS symptomatology (sweating, nausea, skin pallor, and increased heart rate) reflect the neuroendocrine stress response (Harm, 2002; Kim et al., 2005; Ohyama et al., 2007). Equally, the magnitude of a stress response to a virtual environment is often considered an indicator of presence (Bouchard et al., 2008; Ling et al., 2013). Research on presence in stressful environments (such as standing at the top of a height) suggests that assigning a personal relevance to the environment due to presence (e.g., "I could really fall into this pit") leads to heightened physiological reactions such as increased heart rate and skin conductance (Meehan et al., 2001, 2003; Wiederhold et al., 2001; Zimmons and Panter, 2003) This physiological response is thought to be caused by the release of adrenocorticotropin hormone (ACTH), growth hormone, and other hormones by the pituitary gland (Harm, 2002). We are unaware of any studies that assess hormonal correlates of presence, although the neuroendocrine response to motion sickness has been studied extensively. Evidence from physiology indicates that the secretion rate of ACTH and vasopressin in response to a visual motion stimulus is correlated with susceptibility to motion sickness (Eversmann et al., 1978; Kohl, 1985; Kim et al., 1997). In support of this physiological basis, Asian individuals are more susceptible to motion sickness, which may be related to the increased levels of vasopressin release observed in this population (Stern et al., 1996; Klosterhalfen et al., 2005). Neurophysiology studies have also produced an advanced understanding of the brain mechanisms underlying motion sickness. The emetic component of motion sickness is thought to be controlled by a pathway that involves the vestibular nuclei in the brainstem (Yates et al., 1998). These nuclei, which produce emesis when externally stimulated (Miller and Wilson, 1983), show modulated activity in response to levels of hormones and neurotransmitters such as GABA, dopamine, and ACTH (Balaban et al., 1989). A primary function of the vestibular nuclei is to project information about self-motion to the thalamus and vestibular cortex (Glover, 2009), and it has been

claimed that incongruent sources of self-motion information that are integrated here significantly contribute to CS (Yates et al., 1998; Oman and Cullen, 2014; further discussion can be found at Section "Sensory Mismatch"). On the other hand, our understanding of the neural mechanisms for feeling presence are much weaker; understandably so, given the much more qualitative and phenomenological nature of presence. While there is some evidence from EEG and fMRI recordings that presence is associated with increased parietal and prefrontal cortex activation (Baumgartner et al., 2006, 2008), this field of research will likely grow rapidly given the recent increase in prevalence of VR technology.

#### Task Performance

Feeling present in a virtual space appears to enhance task performance. It has been shown that feeling presence is related to improved performance in the game of chess in VR (Slater et al., 1996), human interaction (Stanney et al., 2002), engine maintenance tasks (Cooper et al., 2016), and simple psychomotor tasks (Witmer and Singer, 1994; but c.f. Singer et al., 1995). In one study, a striking 95% of variability in presence ratings (PQ) was explained by variance in time to completion of an engineering task (Cooper et al., 2016). The conceptual link between presence and task performance appears weak, however (van Baren and IJsselsteijn, 2004), and it is possible that the relationship between presence and task performance measures is strongly mediated by other factors, such as experimental instructions, individual motivation, and even CS.

The inverse correlation between CS and task performance is well-supported, with several studies showing that symptoms of sickness are linked to decreased task performance (Frank et al., 1988; Kennedy and Fowlkes, 1992; Kennedy et al., 1993; Lerman et al., 1993; Nelson et al., 2000; Stanney et al., 2002; Kim et al., 2005). Ultimately, many who suffer from CS elect to terminate a session of VR early and therefore cannot complete the given task (DiZio and Lackner, 1997). In studies where no relationship between task performance and sickness severity is found, it is often claimed that symptoms were too mild to interfere with task performance (Nelson et al., 2000; Bos et al., 2005).

Using task performance to measure both CS and presence leads to some obvious problems in interpreting their relatedness. The evidence suggests that task performance is more indicative of CS than presence, although a conservative approach should be considered: Since task performance measures are likely to conflate multiple constructs, they are not ideal for use in isolation and should be used in conjunction with other metrics. Note that this caution applies equally to measures such as "enjoyment" as indices of presence or CS (e.g., Wilson et al., 1997; Slater, 2004b; Waterworth et al., 2015).

# EVIDENCE OF A PRESENCE-CYBERSICKNESS RELATIONSHIP

There are a number of documented efforts to record presence and CS concurrently. Within this literature there is significant disagreement with respect to the strength and direction of the relationship. Here we outline a literature search of studies that report positive, negative, or null correlations between presence and CS, obtained using a structured literature search (**Figure 1**). We report the display device used, the task, the sample size, and statistics for each effect (if reported) in a summary table (**Table 1**) and an illustration (**Figure 2**). Finally, we identify where the more convincing evidence appears to lie, and discuss some possible reasons for the discordance in findings.

# Review Method and Results

Our general method followed The PRISMA Statement (Moher et al., 2009), which provides a standardized set of items for reporting in systematic reviews. The primary aim of our review was to identify research studies that directly examine the relationship between presence and CS. Our criterion for inclusion was that the studies must have measured both presence and CS produced by the use of VR and analyzed the correlation between the factors. The method we used was to conduct a database search on PubMed, PsycINFO, and Google Scholar for publications that conducted experimental studies with VR (search term: virtual reality), including terms related to CS (cybersickness, nausea, sickness, or emetic), and terms related to presence (presence, immersion, immersiveness, or telepresence). Initially, there were 478 results returned. **Figure 1** depicts the procedure for identifying and selecting records from the literature search.

As demonstrated by **Figure 1**, significant attrition occurred in the article selection process. We read the abstracts of all the papers and found that the vast majority of the results (∼366 of 404 records screened) referred to presence and CS briefly with regard to their relevance to the advancement of VR in rehabilitation, education, or consumer settings, or they used the search terms in a general sense. Several results containing instances of the key terms "presence" or "immersion" were unrelated to the sense of "being there" (e.g., "Cybersickness in the presence of scene rotational movements along different axes"; "immersion in VR" used as a synonym for "exposure to VR"; numerous other examples can be seen by the reader upon reproducing the search results). Terms such as "presence" are highly context-specific, and several studies not contained in the results use terms for CS that are general and difficult to identify with a literature review search, such as "negative effects." Another portion of the search results (18 records) measured only presence, or only CS, or measured neither. Many of these studies focused on the effect of an experimental manipulation on presence, where CS measures were collected solely in order to confirm that CS was low or negligible and was unaffected by the manipulation.

**Table 1** provides an overview of each study identified using our search, including details of the VR task included in the experiment, the device used to depict the virtual environment, and the scales or measures used to acquire data on CS and presence. The table also includes the sample size and statistics for the relevant correlations. In numerous cases these details have not been reported by the study authors. Nonetheless, the details of the 20 publications that have directly measured the

correlation between presence and CS may prove informative for future studies on human factors in VR.

# Summary of Studies Identified by the Literature Search

We describe the studies that we identified with the literature search below. We also describe the original authors' conclusion about the nature of the presence-CS association based on their findings, where this information was available. Following this summary, we outline our interpretation of how presence and CS are related based on a synthesis of the literature that we reviewed here (see Section "Conclusion: How Are Presence and Cybersickness Related?").

# Studies Reporting a Negative Correlation

Reports of negative correlations between presence and CS were reported early by Witmer et al. (1996) and Witmer and Singer (1998). Data from Witmer et al. (1996) showed a large negative correlation between scores on a presence questionnaire and self-reported symptom severity on a CS scale. The authors proposed that participants who experience symptoms are more internally focused and less able to process features of the environment, thus limiting the sense of presence.

Witmer and Singer (1998) reported data obtained in four experiments that helped to establish the Presence Questionnaire (PQ) and its relationship to CS. The significant reported correlation was taken as evidence that experiencing symptoms of CS tends to diminish the feeling of presence via distraction or a reduction in the user's involvement in the virtual environment.

In a study carried out by Wilson et al. (1997), a negative relationship was observed between the interface quality subscale of the PQ and scores on the SSC scale in VR. The authors proposed that sickness symptoms may detract from presence, or that presence reduces the awareness of sickness symptoms. Evidence supporting this finding was gathered by Usoh et al. (1999) using a virtual room navigation task. Here, the oculomotor subscale of the CS scale used in this experiment was higher when presence scores were low, suggesting that oculomotor discomfort might have produced an internal focus in users.

Nichols et al. (2000) found evidence for a negative correlation between presence and CS during virtual house exploration. The task required several basic object manipulations (e.g., picking up and placing objects) using a three-dimensional mouse. A negative association between total CS ratings and presence scores was obtained following exposure to the virtual environment. The authors suggested that individuals with more symptoms of sickness may have concentrated less on the task, and may have been more attuned to the deficiencies of the virtual environment simulation (e.g., low refresh rate).

A negative relationship between subjective ratings of presence and sickness severity was obtained by Stanney (2000, Unpublished); reported informally by Stanney (2002). A negative correlation of a similar magnitude was obtained during virtual town navigation by Kim et al. (2005), who showed that CS and presence (particularly the feeling of "control" in the VR


+ = positive. – = negative. × = null correlation. n.r. = not reported. SSQ = Simulator Sickness Questionnaire, Kennedy et al., 1993; SSC = Short Symptoms Checklist, Cobb et al., 1995; SUS = Slater-Usoh-Steed Questionnaire, Usoh et al., 2000; ITQ = Immersive Tendencies Questionnaire, Witmer and Singer, 1998; IPQ = Igroup Presence Questionnaire, Schubert, 2003; PQ = Presence Questionnaire, Witmer and Singer, 1998.

environment) were negatively related. Unfortunately, while the same authors also obtained physiological signals (e.g., heart rate, EEG), they did not report the full set of possible correlations between physiological data, CS scores, and presence ratings.

A brief report of a large-sample study by Knight and Arns (2006) supported the existence of an inverse relation between presence and CS in immersive VR. The results showed significant chi-squared tests indicating that total SSQ scores decreased with increasing levels of presence. Knight and Arns (2006) also collected data on several other related factors, such as previous game play experience, motion sickness susceptibility, and participant sex, which permits inferences about latent causes for both presence and CS, although not all correlations between measures were reported (e.g., despite collecting gameplay experience, it was not specified if this factor was correlated with presence as in other studies; see Section "Gaming Experience").

minimum = 6). Yellow indicates positive correlations, cyan indicates negative correlations, and black indicates null correlations. Since some studies did not report correlation values, vertical bars are used to indicate the range of possible Pearson r correlation values given the reported sample size. Crosses indicate that degrees of freedom were not reported.

Busscher et al. (2011) measured CS and presence while participants watched a video on a television in a simulated lounge environment. The authors described a significant negative correlation between the two factors and took this correlation as evidence that maximizing presence in VR leads to a suppression of CS, which was taken as evidence that future interventions may be able to tackle both issues concurrently.

A study using a partial least-squares regression method identified an inverse association between presence and CS in a driving simulation task (Milleville-Pennel and Charron, 2015). The authors collected several possible predictors of CS including driving experience, tendency toward frustration, and presence, and found that presence loaded negatively on a latent variable that was termed "pre-disposition to sickness." Milleville-Pennel and Charron also validated the single-factor construction of the SSQ and confirmed that the sub-components of the SSQ (nausea, oculomotor discomfort, and disorientation) each contribute approximately one third of the variance in overall levels of CS. This is an important finding given the high prevalence of SSQ use in studies of CS.

A recent study from Cooper et al. (2016) showed that subjective presence ratings were negatively associated with discomfort ratings that were collected following an immersive "pit stop" scenario. Although sample size was small (N = 8), the authors took the evidence as support for the utility of a multisensory cueing approach to improve presence and reduce the severity of sickness in VR.

#### Studies Reporting a Positive Correlation

As described above, Wilson et al. (1997) identified a negative correlation between presence and symptoms of sickness following the use of VR in one experiment, but in a second experiment, despite the fact that the same scales were used to measure the two factors, found evidence for the positive relationship. Participants conducted a virtual "duck shooting" task and completed a CS checklist, while behavioral (startle response) and subjective ratings of presence (presence questionnaires and awareness of background music manipulation) were collected.

Liu and Uang (2011) identified a positive relationship between presence and CS was in a virtual shopping task. Older adult participants were asked to search for specific items on shelves in a virtual grocery store. Results indicated a strong positive correlation between presence and CS, and the authors suggested an increase in presence causes CS to increase. An in-depth interpretation of the study is limited due to the fact that the authors did not specify certain details, such as the duration of VR exposure or the display device used.

Ling et al. (2013) report finding a positive link between CS and scores on the immersive tendency questionnaire (ITQ) that was administered after participants completed an anxiety-inducing task in VR. This was taken as evidence that individuals who experience more presence also experience more CS, and this conclusion was supported by evidence of a positive correlation between ITQ scores and levels of spatial presence. Despite these associations, there was no correlation found between spatial presence and CS. The authors suggested that the expected relationship between spatial presence and CS did not emerge due to the high cognitive demand of the public speaking task that may have modulated presence and CS in different ways.

Lin et al. (2002) obtained a strong positive correlation between presence and CS ratings in a CAVE-like driving simulator, from a sample of 10 participants. The authors state that there was a low level of interactivity in their VR task compared to other similar studies (e.g., Nichols et al., 2000, who found the opposite relationship), and noted that the level of interactivity afforded in virtual environments is likely to alter the relationship between presence and the severity of sickness.

As described above, Kim et al. (2005) identified a negative correlation between CS and presence ("user control" factor of the PQ). In the same study, the authors found that the direction of the relationship depended upon the questionnaire that was used: A positive correlation was documented between CS and the Involvement factor of the ITQ. This divergence was not discussed by the authors. This finding highlights one of the problems involved in characterizing presence, given the discrepancy between trait (ITQ involvement) and state (PQ control) measures of the phenomenon.

Slater et al. (1996) speculated that vection in VR was a common contributing factor to both CS and presence, stating that a positive correlation between presence and CS would therefore be "not surprising." Indeed, some of the more convincing (albeit, indirect) evidence of a positive CS-presence relationship has emerged from vection research. Hettinger et al. (1990) reported that a vection-inducing stimulus can produce VIMS, and more recently, Keshavarz et al. (2014) have shown that even "auditory vection" (i.e., vection produced by an auditory self-motion cue) can produce sickness symptoms. Other links between vection and VIMS have been discussed in a recent review (Keshavarz et al., 2015). Taken together with evidence of a strong relationship between vection and presence, it seems logical that increases in the sense of vection in a VE should improve presence, and also cause CS to increase. However, evidence on such a link is unclear (see Section "Vection").

### Studies Reporting a Null Correlation

fpsyg-10-00158 January 31, 2019 Time: 18:41 # 9

Some studies have reported null correlations between presence and CS. These reports are very sparse, possibly due to a bias for significant results (e.g., Open Science Collaboration, 2012, 2015). Mania and Chalmers (2001) report a study where participants were asked to observe a video in VR and to report their level of presence and CS. The authors found no significant relationship between presence and CS, perhaps because CS scores were quite low across participants, although a trend toward a negative correlation was observed.

In an investigation by Seay et al. (2002), a large sample of participants conducted a driving simulation task and reported their level of presence and CS. Results indicated no correlation between presence and any subscales of the CS measure. However, the same authors found main effects of an experimental manipulation – field-of-view angle, 180◦ vs. 60◦ – on both presence and the nausea subscale of their CS measure. In light of these inconsistent results, the authors concluded that factors such as field-of-view can prove to be a "double edged sword," increasing presence but also increasing sickness. Similarly, Bangay and Preston (1998) did not analyze whether a correlation existed between their measures of CS and presence, but identified that those who experienced CS were likely to report high levels of immersion in the VR environment.

A recent study found a null correlation between presence and CS while participants observed an animated avatar in a virtual café using a head-mounted display (Ryan and Griffin, 2016). It is unclear how CS was measured in this study, and levels of CS were also reported to have been very low which may have limited the power of the analysis. It is notable that of the studies reviewed here, this study is the only one to have used a modern consumeroriented VR device (Oculus Rift DK2). Since these devices have become extremely popular for VR research in recent times (Peer and Ponto, 2017), it is likely that studies on the presence-CS relationship in the coming years will use this device or a similar one, thus reducing much of the inter-experiment variability that is attributable to different display conditions.

# Conclusion: How Are Presence and Cybersickness Related?

The balance of evidence favors the interpretation that presence and CS are negatively related. There are several reasons for this. First, the number of research studies that report the existence of a negative correlation outweighs the number of studies that report the opposite. Studies that describe an inverse relationship also tend to provide more compelling results: Where studies have observed a positive correlation between presence and CS, the study often fails to confirm this relationship in another section of the same study (e.g., Wilson et al., 1997; Kim et al., 2005; Ling et al., 2013). In some of the studies cited above that identified a positive correlation, interpretation of the data is limited by the absence of important details (e.g., Liu and Uang, 2011, did not describe device; Wilson et al., 1997, did not report test statistics).

Although a positive correlation between presence and CS has been anticipated or assumed by some researchers (e.g., McCauley and Sharkey, 1992; Slater et al., 1996), it is likely that positive associations arise due to the fact that "immersiveness" is required in order for an individual to experience CS. Immersiveness here refers to the extent of sensory "submersion" experienced by a user with a given VR system, such that external sensory cues are obstructed (Biocca and Delaney, 1995); accordingly, desktop systems and head-mounted displays (HMD) are classified as low and high in immersiveness, respectively. Observing a bright, dynamic movie on a desktop monitor is a comfortable experience for most, but viewing the same scene in a VR headset can often produce CS. Similarly, the sense of presence is heightened by the use of immersive systems. As such, immersion in VR leads to the possibility of both CS and presence emerging.

What mechanism causes this inverse association between CS and presence? It has been claimed that the sense of presence suppresses CS, since attention is directed away from intrusive factors such as sensory conflict (e.g., Busscher et al., 2011; Cooper et al., 2016). Alternatively, the distracting effects of CS may suppress attention to the VR environment that is required for presence to occur (e.g., Wilson et al., 1997; Witmer and Singer, 1998; Usoh et al., 1999; Nichols et al., 2000). More than likely, both of these assertions are true; they are not mutually exclusive. The relationship is also clearly mediated by other factors that appear to affect CS and presence inversely. A large number of associated variables have been identified, and although there is insufficient research to construct a precise model of their contribution to either factor, research suggests crucial roles played by sensory mismatch, display factors, navigation control, sex, and other factors (for an in-depth discussion, see Section "Associations With Other Variables").

There are also important limitations to several of the studies that reported negative presence-CS relationship, such as missing test statistics, or a failure to describe display device features. Many of the studies that reported a negative correlation were conducted before the advent of modern consumer-oriented VR technology, and their findings may not entirely replicate using current hardware devices. A major limitation of almost all studies described above is the small sample size used in the experiments. With one or two exceptions, the studies above on average wield very low statistical power for detecting medium effects. In the single case where an a priori power analysis was conducted, a desired power of 80%, which is on the low-end of "adequate" (Button et al., 2013), requires approximately 85 participants. An a priori power analysis was reported in only one of the studies described here (Ling et al., 2013), and we estimate that only two other studies described here were likely to have attained >80% statistical power: The brief report by Knight and Arns (2006), who measured a convenience sample of N = 387; and a conference paper by Seay et al. (2002), who measured a sample of N = 156. Evidently, there is a need for the adoption of a more scientifically rigorous approach toward statistical power, as has been reported widely across the fields of psychology and

neuroscience (Open Science Collaboration, 2012, 2015; Button et al., 2013).

In **Figure 2** we depict the correlation between CS and presence obtained in the studies that we reviewed and discussed here. On inspecting this figure, it is notable that very few recent studies have empirically examined the strength of the association between presence and CS. While recent literature often discusses both factors in the context of VR (e.g., Aardema et al., 2010; Terziman et al., 2010; Kober and Neuper, 2012; Serafin et al., 2016), they are often described only for the purpose of highlighting the nuisance of CS and the desirability of presence. For instance, presence and CS are both measured by Nolin et al. (2016), but the strength of correlation is not reported. Kober and Neuper (2012) stated that participants in their study of presence in VR also completed a CS questionnaire, but the authors only used these data to confirm that CS was at a low level overall. Similarly, Corrêa et al. (2017) assessed CS and presence, simply reporting that CS was low in the participants tested. Baus and Bouchard (2017) used high self-reported CS levels as an exclusion criterion, and did not assess the relationship with levels of presence. Kim K. et al. (2012) also report a study where CS and presence measures were both collected, and although their manipulation (visual display device: Desktop/HMD/CAVE) affected both CS and presence, their relationship is not reported. While the nature of the presence-CS link may not have been a major focus of any of these studies, since the data clearly existed, it is rather unfortunate that no analysis was reported. The presentation of these data in future studies where the data are collected would provide valuable information to developers and researchers interested in advancing the understanding of the human experience in VR. It should also be noted that a diverse variety of VR technology has been used in the studies reported above, spanning from older display devices (e.g., Division dVisor, Fakespace BOOM2C) to more recent consumer headsets (Oculus Rift DK2). Modern VR devices such as HTC Vive and Oculus Rift CV1 are more similar to one another with respect to many characteristics (field-of-view, refresh rate, tracking latency) than were older systems (Peer and Ponto, 2017). Another limitation of the existing literature is a severe underreporting of the input techniques adopted for environmental navigation and interaction. In the coming years, the consistency of findings in the field of human factors in VR will likely benefit from a natural standardization of display tools.

# ASSOCIATIONS WITH OTHER VARIABLES

As research has investigated the nature of the relationship between presence and CS, a variety of candidates for mediation of the relationship have emerged. Although no studies have attempted to estimate the magnitude of the contribution of each of these factors, here, we present a synthesis of the literature that offers clues as to the most important mediators on the presence-CS relationship. We make connections between these sometimes disparate studies, and highlight the interactions between some of the factors associated with both presence and CS.

# Sensory Mismatch

Sensory cues gathered from multiple channels (proprioception, vision, vestibular, etc.) are used to perform continuous updating about the estimated state of the world and of the body (Calvert et al., 2004). Therefore, when simulating a virtual environment, congruence between the information that is obtained and that which is expected (either because of prior experience, or because of expected correlations with another sensory channel) plays a large role in the experience.

The understanding of how sensory mismatch contributes to CS and presence has historically been limited due to the challenge of directly manipulating or measuring sensory conflict in experimental settings (Riccio and Stoffregen, 1991; Oman and Cullen, 2014). For instance, only recently have convincing results emerged that are consistent with a neural signature for sensory mismatch (e.g., Brooks and Cullen, 2013, 2014). Nonetheless, theoretical accounts have highlighted the role played by the congruence of sensory cues in both presence (Slater et al., 1995; Bowman and McMahan, 2007; Henderson et al., 2007) and CS (Reason and Brand, 1975; Oman, 1991; Rebenitsch and Owen, 2016). It is clear that the addition of high-fidelity, multimodally congruent information is beneficial to an increase in presence. Participants in a VR navigation task show increased presence when binaural auditory information is presented compared to vision-alone conditions (Larsson et al., 2002). Introducing multisensory feedback cues (tactile, auditory, and visual) in a manual VR task also enhances presence (Cooper et al., 2016; also see Hecht et al., 2006). Viaud-Delmon et al. (2006) demonstrated that adding auditory cues to a virtual environment (i.e., enhancing the immersiveness of the simulation) increases presence, but also leads to a rise in levels of CS.

However, to our knowledge, there is little research on the disruption of presence when multimodal cues are in conflict. The most relevant literature in that context relates to vection, a strong correlate of presence (Riecke, 2010). The evidence of a relationship between vection and sensory mismatch is inconsistent: Visual-vestibular cue mismatch has been linked to a decreased sense of vection (Wong and Frost, 1981; Weech and Troje, 2017) or to enhanced vection (Kim J. et al., 2012; Palmisano et al., 2012; also see Section "Vection"). Future research will be needed to establish causality between cue conflict and presence, perhaps by assessing the tendency for breaks in presence when multimodal cues are put experimentally into conflict.

It has been theorized that CS in VR is produced as a result of mismatches in information across sensory streams, or conflicts between observed and expected sensory cues, particularly with respect to visual-vestibular cue conflict (Reason and Brand, 1975; Oman, 1991; Rebenitsch and Owen, 2016). The link between multimodal cue mismatch and the symptoms of CS has been attributed to the detection of sensory dysfunction (Treisman, 1977). Motivated by these theoretical accounts, several researchers have attempted to prevent CS through a sensory conflict reduction approach, with some successful results (e.g., Reed-Jones et al., 2007; Cevette et al., 2012; Gálvez-García

et al., 2015; Zao et al., 2016). This research has produced evidence that CS symptoms are reduced when sensory stimulation is used to "recouple" multimodal streams of information in VR. Visualvestibular cue mismatch is a particular source of discomfort, and this manner of conflict is extremely common in VR applications (LaViola, 2000; Rebenitsch and Owen, 2016). By reducing this mismatch using vestibular stimulation, CS appears to be mitigated (Reed-Jones et al., 2007; Cevette et al., 2012; Gálvez-García et al., 2015). Other research has adopted a sensory reweighting approach to encourage conflicting cues to be quickly disregarded using "noisy" vestibular stimulation, rather than "recoupling" the sensory streams (Weech et al., 2018a). Taken together with work showing that vection is also facilitated when noise is applied to the vestibular sense (Weech and Troje, 2017), the sensory reweighting approach appears to be a promising method for maximizing presence and minimizing CS. However, further refinements of the sensory stimulation methods currently used will be vital before a viable practical application can be achieved.

# Display Factors

Reports show that visual display characteristics such as frame rate and field-of-view influence both presence and CS. Higher frame rates are associated with higher self-reported presence ratings due to the increased realism afforded by smooth motion (Meehan et al., 2001). Low visual display frame rate (<20 fps) has long been known to generate motion sickness in simulated environments (e.g., Jones et al., 2004), leading to a focus on high, stable refresh rates in best practice guides for VR development (Oculus, 2017). This guide also emphasizes that latencies between observer motion and visual self-motion feedback should be minimized in order to avoid generating nausea, although the link between motion-to-photon latency and CS has been disputed (Meehan et al., 2003). Higher field-of-view also leads to increases in presence (Prothero and Hoffman, 1995), but at the same time, field-of-view increases lead to higher CS (Lin et al., 2002). It was suggested by Lin et al. (2002) that the effect of field-ofview on both factors might be mediated by its effect on illusory self-motion perception (vection) produced by large-field visual motion.

Evidence suggests that stereoscopy influences both presence and CS. Research has identified links between stereopsis and CS, likely due to the accommodation-vergence conflict introduced by stereoscopic 3D displays. For instance, viewing stimuli on certain 3D displays can increase measures of visual discomfort that are characteristic of CS, such as eyestrain (Emoto et al., 2004; Pölönen et al., 2009; Lambooij et al., 2011). As well, VIMS has a strong relationship to stereoscopy: Observing a simulated roller-coaster stereoscopically leads to increased VIMS symptoms compared to monocular viewing (Keshavarz and Hecht, 2012). Stereoscopic viewing of a virtual environment also takes advantage of the learned relationship between binocular disparity and object depth, increasing the naturalness of the viewing experience. Ling et al. (2013) show that providing stereoscopic cues appears to enhance presence (SUS; effect size Cohen's d = 0.24) and spatial presence (IPQ; d = 0.29 in a public speaking task. Although the authors also predicted a relationship between stereoscopic acuity and presence, no evidence of such a link was observed. The authors suggested the link between presence and stereoscopy may be even stronger than implied by their results, given that their public speaking task involved very little binocular disparity. Indeed, a stronger link was observed by IJsselsteijn et al. (2001) in a driving simulation task. Presence measured by subjective responses (continuous scale) showed a large increase due to stereoscopic presentation. The authors also found that a behavioral measure of presence, postural sway, also showed a tendency to increase when stereoscopic cues were added. Importantly, the authors also measured sickness in this study and identified no effect of stereoscopy on VIMS (continuous scale), although it should be noted that sickness ratings were near floor levels. These results were similar to those obtained by Ling et al. (2013), who found no difference in CS (SSQ) across stereoscopic and non-stereoscopic display conditions.

# Vection

Vection is considered a strong correlate of presence. For an observer to experience the illusion of self-motion, their sensorimotor control system must be convinced that the visual motion veridically specifies their own body motion (Prothero and Hoffman, 1995; Chertoff and Schatz, 2014). However, if the implied body motion is not consistent with cues from other modalities (particularly vestibular signals), sensory conflicts emerge, producing nausea (Reason and Brand, 1975; Lin et al., 2002). Vection-inducing stimuli are often nauseogenic, but the relationship is complex. Some have suggested that experiencing vection might be a necessary prerequisite for experiencing VIMS (Hettinger et al., 1990; Hettinger and Riccio, 1992; Keshavarz et al., 2015). Motivated by this hypothesis, one study has characterized an optimal magnitude of visual motion that does not produce CS but still evokes vection (Tanaka and Takagi, 2004).

However, vection does not always lead to the emergence of sickness symptoms. There is strong evidence that VIMS can occur in the absence of vection (Ji et al., 2009). Other studies have presented evidence that suggests a negative relationship between vection and CS (Bonato et al., 2008; Palmisano et al., 2017). Palmisano et al. (2017) recently found that individuals who felt stronger vection (magnitude estimates) were likely to report fewer symptoms of CS (SSQ). This effect was dependent on the visual display conditions: The relationship was only obtained when visual stimuli were observed through a simulated aperture (field-of-view: 86◦ diagonal), and not when participants observed a "full field" stimulus (field-of-view: 110◦ diagonal). The authors reiterated that the link between vection and CS was relatively weak, and that a complex relationship is likely to exist. In several other experiments, there appeared to be no association between VIMS and vection (Webb and Griffin, 2003; Keshavarz and Hecht, 2011a; Riecke and Jordan, 2015). Evidently, there are highly complex relationships between vection and CS, as well as between vection and presence. This complexity has been discussed by others who suggest that vection poses an intervening factor between presence and CS (Stanney et al., 1998; Sadowski and Stanney, 2002; Hettinger et al., 2014). Concurrent measurements of each variable will be essential in future research studies attempting to model the presence-CSvection relationship.

# Intuitiveness of Interaction

fpsyg-10-00158 January 31, 2019 Time: 18:41 # 12

Presence has been termed the illusion of a non-mediated experience (Lombard and Ditton, 1997). This illusion is encompassed by the absence of attention to the apparatuses used to convey a simulation, such as the visual display device itself, the environmental boundaries, and the controls used to interact with the environment. For this reason, natural (ecological) control methods that do not distract from the simulation are likely to produce higher presence. Examples of this principle were provided by Welch et al. (1996), who indicated that the ability to interact with the environment increases presence, and that increasing the latency between action and feedback can negatively impact presence. Additionally, Schuemie et al. (2005) showed that more "natural" locomotion techniques (i.e., walking in place compared with mouse navigation) lead to a greater sense of presence (IPQ). Equally, the intuitiveness of the control scheme in VR has been linked to CS rates, with a greater degree of CS provoked by less ecological control schemes. For instance, Kolasinski (1995) discusses that freezing or resetting the simulated viewpoint of an observer tends to be highly nauseogenic. Borrego et al. (2016) report that navigating a virtual environment by walking leads to increased levels of presence (SUS and PQ) compared with using a hand-held controller to navigate. Additionally, Jaeger and Mourant (2001) documented that navigating by walking on a treadmill led to significantly lower CS severity (SSQ) than when a hand-held controller was used. These series of findings are perhaps unsurprising, given that exposure to novel sensorimotor conditions in the real world is known to provoke sickness (e.g., prism glasses that reverse the orientation of the visual field are initially nauseogenic for users, Oman, 1991). However, some research has indicated that more intuitive controls do not affect CS (e.g., Borrego et al., 2016), or can even lead to an increase in CS (e.g., the addition of head tracking in the study of Schuemie et al. (2005), led to an increase in SSQ scores), although this may be related to the small sensory mismatches introduced by imperfect tracking conditions in these cases. It appears likely that presence and comfort are both reduced when interacting with a virtual environment in a manner that is unfamiliar in terms of sensorimotor control. As such, results of experiments that manipulate the control scheme in VR may tend to suggest a negative relationship between CS and presence due to the inverse effect of sensorimotor familiarity on each factor.

# Navigation Control

The capability of action within a virtual environment has frequently been linked to the feeling of presence in VR (e.g., Sanchez-Vives and Slater, 2005; Slater, 2009), and in line with this idea there is evidence that controlling one's own locomotion in a virtual landscape increases presence (Stanney et al., 2002; Clemente et al., 2014). There is also a long history of research documenting the nauseating effects of being moved passively in VR and driving simulators (Rolnick and Lubow, 1991; Stanney and Hash, 1998; Sharples et al., 2008; Dong et al., 2011). Predictive movement control allows a user to compare estimated and obtained sensory data in a feedforward control loop, which is thought to reduce the impact of decoupling efferent and afferent signals (Reason and Brand, 1975; Rolnick and Lubow, 1991).

As part of a study by Seay et al. (2002) (described above), the authors investigated the effect of being the driver or the passenger in a driving simulation. Enacting the role of the driver increases the sense of presence (PQ). At the same time, the magnitude of CS was higher for passengers compared to drivers (SSQ), as found in other research (Rolnick and Lubow, 1991; Stanney and Hash, 1998; Sharples et al., 2008; Dong et al., 2011).

Results of a recent study have indicated that navigation in a virtual landscape (i.e., locomotion using an Xbox 360 Gamepad) increases presence (SUS) compared to conditions where participants remained relatively stationary (head tracking and motion parallax, but no locomotion), but that sickness scores (SSQ) are unaffected by the same manipulation (Ibánez and Peinado, 2016). This suggests that presence increases when participants are permitted to freely explore an environment, even if the navigation method used is relatively unnatural (i.e., navigating with a gamepad). However, the manipulation used by the authors cannot discern whether other mediating variables might have played a role, such as optic flow or vection produced by locomotion.

# Context

Narrative is often used to provide context and framing to a VR application, and there is evidence that the inclusion of narrative impacts both presence and CS. The influence of a "preamble" on presence has recently been established (Smolentsev et al., 2017): When participants initiate a VR task in a digital environment similar to their own physical location, the sense of presence (single item scale) increased significantly compared to when the digital environment portrayed a different physical location. The authors stated that the benefit of a familiar environment on the sense of presence is achieved by establishing a physical continuity between the user's experience in the real environment and the virtual landscape.

The effect of context on presence has been frequently studied with respect to the mediating effect of anxiety on presence. There is thought to be a complex, potentially bi-directional relationship between presence and anxiety, with both being associated with general sympathetic nervous system activity (Rothbaum and Hodges, 1999; Krijn et al., 2004). Gorini et al. (2011) show a heightened sense of presence (SUS) if an anxiety-inducing narrative context was provided while the participant searched for an object in VR (i.e., the participant was being "chased" by a "murderous" person as they searched). A significant increase in presence (single item scale) also occurs if participants with a phobia are told that the virtual environment may contain a phobic stimulus (Bouchard et al., 2008). On the other hand, the use of a different measurement scale (PQ) has resulted in the opposite trend: Anxiety-inducing narrative context produced a reduction in total presence (Bouchard et al., 2008). Although the authors attributed this divergence in results to one or two items in the PQ dominating their results, this provides further evidence for a low level of reliability between common measures

of presence. The effect of anxiolytic narrative on CS (SSQ) was also measured by Bouchard et al. (2008), with the authors finding no relationship between the two variables. A similar pattern of results was obtained by Robillard et al. (2003), who found a relationship between presence and anxiety (both single item scales) but no link between either factor and CS (SSQ), although CS scores were overall very low. The absence of a link between CS and anxiety is somewhat surprising, given that trait-anxiety measures partially determine the likelihood of motion sickness in the general population (Paillard et al., 2013), and as such future studies will need to investigate this link further.

# Sex

There has been considerable discussion about the effects of participant sex on presence ratings. Some have theorized that the degree to which men and women can suspend disbelief may vary (Slater and Usoh, 1994; Felnhofer et al., 2012), along with personality factors such as extraversion and submissiveness (Lombard and Ditton, 1997). Others have proposed that any sex effects on presence are likely due to the correlated differences in gaming experience between the sexes (Gamito et al., 2008, 2010). However, the empirical research is divided with respect to which sex demonstrates higher levels of presence. In an anxietyinducing VR environment (a school examination), Gamito et al. (2008) reported evidence of a higher level of presence for women than for men (PQ realism), although the authors attributed this effect to the higher experience with video games among male participants. On the other hand, other studies have found that men report higher levels of presence than women in VR (Slater et al., 1998; Nicovich et al., 2005) and in non-VR video games (Lachlan and Krcmar, 2011). Felnhofer et al. (2012) documented evidence of a sex effect on presence ratings (IPQ spatial presence), with men rating themselves higher than women. Other research has found no difference between men and women with respect to spatial presence (De Leo et al., 2014).

Research into CS has long discussed the possibility of sex differences with respect to rates of susceptibility, although findings have proven inconclusive. In the work of Knight and Arns (2006); Gamito et al. (2008), and Ling et al. (2013) described elsewhere here, the authors failed to identify any difference in CS across the sexes. Conversely, studies by De Leo et al. (2014) and Jaeger and Mourant (2001) revealed that female participants were significantly more likely to experience CS symptoms than male participants. In a similar vein, Park et al. (2006) reported that non-dropout female participants exhibited more CS symptoms (SSQ nausea and oculomotor subscales) than male participants did in a non-immersive driving simulator. Häkkinen et al. (2002) also found CS ratings (SSQ) were lower for men than for women. Some have also suggested that the reason for the discordant findings on sex and CS may relate to hormonal changes across the menstrual cycle, resulting in a fluctuating relationship (Biocca, 1992; Clemes and Howarth, 2005).

# Gaming Experience

Some research has examined the influence of past experience with interactive games on factors such as presence and CS. Knight and Arns (2006) identified an inverse association between an individual's experience playing video games and the level of CS experienced. Various studies report no relationship between presence and gameplay experience (Schuemie et al., 2005; Alsina-Jurnet and Gutiérrez-Maldonado, 2010; Ling et al., 2013). Others have found minimal evidence for an effect of video game experience on presence ratings or CS. Gamito et al. (2010) experimentally manipulated gameplay experience using a training procedure, and reported that increased gaming experience leads to improved presence (PQ) whereas CS (SSQ) was unaffected by training. In another study that employed a similar task, Gamito et al. (2008) found no relationship between measures of presence (PQ, ITQ) or CS (SSQ) and the previous gameplay experience of participants. At the same time, the authors found an increase in physiological markers of heart rate with increasing experience with video games. Those authors considered heart rate as a measure of anxiety, but we note that others have taken similar markers to indicate presence (e.g., Meehan et al., 2001, 2003; Wiederhold et al., 2001) and also CS (e.g., Kim et al., 2005; Dennison et al., 2016). Accordingly, one should be cautious when interpreting physiological markers purported to measure these factors given the relative paucity and inconsistency of data of this sort reported to date.

# Conclusion: Associated Variables

When taken together, the evidence obtained from the review above begins to clarify the type of relationship that exists between presence and CS:


The relationship between CS and presence can be understood if the associated variables described above are considered with respect to their effect on sensory mismatch. Immersiveness (sensory submersion) likely plays a key role here: Experimental manipulations that increase immersiveness tend to produce both CS and presence, because the compelling nature of stimuli in an immersive virtual space fosters a high perceptual weighting of cues to self-motion and spatial orientation, which enhances the impact of conflicts between expected and obtained sensory signals that are generated by the compelling stimulus (Prothero, 1998; Lombard et al., 2000; Prothero and Parker, 2003). Put differently, immersiveness enhances the magnitude of violated expectations. Thus, increasing field-of-view size, adding stereoscopy, or providing congruent multisensory information can increase both presence and CS. Given that immersiveness (which increases the weight of sensory conflicts, Prothero, 1998) can also lead to increased vection (which is inversely related to sensory conflicts, e.g., Weech and Troje, 2017)

it is unsurprising that research on the vection-presence-CS relationship has concluded that the link is extremely complex (Keshavarz et al., 2015; Palmisano et al., 2017).

On the other hand, within immersive conditions, individuals who experience high presence tend to experience low levels of CS. This relationship may be a result of differences in individual sensitivity to sensory conflicts: Higher sensitivity will result in high CS and low presence, whereas low sensitivity to cue conflicts will lead to low levels of CS and a heightened sense of presence. Individual differences in sensitivity to sensory conflicts have been documented, and there is some limited evidence that these differences relate to CS and presence (Viaud-Delmon et al., 2000, 2006). The advantage in terms of presence and CS observed for "gamers" may be attributed to the process of sensory reweighting that occurs during continuous exposure to cue conflicts (see Reason and Brand, 1975; Weech et al., 2018a). Indeed, a sex effect on presence and CS possibly relates to the superior ability of men to adapt to sensory conflicts compared with women (Viaud-Delmon et al., 1998). Further research on these individual differences will be required in order to test the proposition that variance in sensory conflict sensitivity underlies the experience of both presence and CS.

There is a significant challenge involved in identifying the factors that might mediate the link between presence and CS. Primarily this difficulty arises from the substantial relationships between the associated factors identified above, such as sex and gameplay experience, or vection and field-of-view. In addition to these known relationships, a number of other understudied variables may have significant associations with both presence and CS. For instance, the sense of embodiment is known to form a core aspect of presence (Kilteni et al., 2012), but the relationship between the embodiment of a virtual body and CS is currently unclear. Furthermore, questions remain about how prior experience with VR systems interacts with presence and CS; experiential factors are currently understudied due to the novelty of widely available VR technology. It is also evident that there is a lack of research that combines measurements of presence, CS, and other factors in high-powered studies. The collection of large datasets that encompass multiple individual factors (age, sex, personality type) with several behavioral response measures (objective and subjective measures of presence and CS) permits the use of a modeling approach that would enhance our understanding of the complex relationship between presence, CS, and other mediating factors (a similar approach was adopted for CS alone in Weech et al., 2018b). Through the execution of such studies in the future, further interactions will be uncovered between the associated factors outlined above.

Questions remain with regard to what questionnaires of presence and CS are truly measuring. Studies have identified relationships between CS and spatial presence, but no relationship between CS and immersive tendencies, which correlates highly with spatial presence (e.g., Ling et al., 2013). This raises the problem that the degree to which an individual's tendency to report feeling presence or CS may not determine their experience of either factor. This is an inherent issue in questionnaire studies; according to a meta-analysis, almost 50% of questionnaire studies documented effects of social desirability on their results (Van de Mortel, 2008). It is therefore important that future research takes into account the possibility that response bias modulates measures of factors like presence and CS. Several approaches could be adopted to achieve this, including pre-task questionnaires that assess social desirability (Crowne and Marlowe, 1960; Van de Mortel, 2008) or developing and using questionnaires according to principles of minimizing bias (Choi and Pak, 2005).

# GENERAL CONCLUSION AND FUTURE DIRECTIONS

Literature supports the idea that presence and CS are inversely related, and that the relationship is likely to be mediated by factors including vection, navigation control, and display factors. These factors can be unified in terms of their effect on sensory mismatch, which appears to drive presence and CS in opposite directions. This presents the possibility that interventions targeted at increasing presence could reduce CS, and vice versa. While the results obtained across studies are often discordant, with many sources reporting a positive relationship between presence and CS, these outcomes may be related to the fact that immersive displays are likely to generate both a compelling sense of "being there," as well as symptoms of physiological discomfort. Other noise sources that may contribute to variability in findings include problematic measurement techniques, or differences in the operational definition of the underlying factors among studies.

How can future experimentation best serve the advancement of our understanding of the presence-CS association? The issue of measurement validity must be a major focus of future studies, where the cross-validation of metrics should be undertaken in well-controlled paradigms. Improving the robustness of findings in this area may also require a careful consideration of factors that could play a role in response bias, such as social desirability. Although a limited number of high quality studies have collected large datasets related to human factors in VR, future experiments will need to combine these measures with a modeling approach that can help to interpret the structure of the relationship between these factors. Relatedly, there is a prevalent lack of statistical power in many of the studies reviewed here, and this limits the ability for the field to infer answers about variables that are so inherently noisy. Future studies will benefit from careful a priori considerations of effect sizes, which we have compiled here where available. An additional factor that may reduce the variability in findings across future studies is the natural emergence of standardized VR head-mounted hardware. Note, however, that the findings of studies using lower-immersive systems such as projection screens will still prove valuable, as these will help to identify the impact of immersiveness and vergenceaccommodation mismatch on CS and presence. One particular gap in our understanding revealed by the current review is how presence is affected when sensory mismatch is experimentally manipulated. Given the prospective modulatory role of sensory mismatch in the association between presence and CS, future studies will need to overcome the challenges in manipulating and assessing sensory mismatch in empirical research. Through a careful consideration of the literature critique provided here, we envisage that the next wave of studies on the presence-CS link will help to make major advances toward understanding this complex relationship.

# AUTHOR CONTRIBUTIONS

fpsyg-10-00158 January 31, 2019 Time: 18:41 # 15

SW, SK, and MB-C designed the concept of the article and conducted revisions to the manuscript. SW and SK

# REFERENCES


conducted the literature search. SW wrote the initial draft of the article.

# FUNDING

This research was supported by grants from Oculus Research and the Natural Sciences and Engineering Research Council of Canada (NSERC) [(Grant no. RGPIN-05435-2014)] awarded to MB-C. The funding sources had no input in the preparation of the manuscript.




environment. Hum. Factors Ergon. Manuf. Serv. Ind. 25, 523–533. doi: 10.1002/ hfm.20566



phobic and non-phobic participants in therapeutic virtual environments derived from computer games. Cyberpsychol. Behav. 6, 467–476. doi: 10.1089/ 109493103769710497



Telepresence Theory, Measurement & Technology, eds M. Lombard, F. Biocca, J. Freeman, W. IJsselsteijn, and R. J. Schaevitz (Cham: Springer), 35–58.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Weech, Kenny and Barnett-Cowan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# How Can On-Road Hazard Perception and Anticipation Be Improved? Evidence From the Body

Mariaelena Tagliabue\*, Michela Sarlo and Evelyn Gianfranchi

Department of General Psychology, University of Padua, Padua, Italy

The present research is aimed at investigating processes associated with learning how to drive safely. We were particularly interested in implicit mechanisms related to the automatic processing system involved in decision making in risky situations (Slovic et al., 2007). The operation of this system is directly linked to experiential and emotional reactions and can be monitored by measuring psychophysiological variables, such as skin conductance responses (SCRs). We focused specifically on the generalization of previously acquired skills to new and never before encountered road scenarios. To that end, we compared the SCRs of two groups of participants engaged, respectively, in two distinctive modes of moped-riding training. The active group proceeded actively, via moped, through several simulated courses, whereas the passive group watched video of the courses performed by the former group and identified hazards. Results indicate that the active group not only demonstrated improved performance in the second session, which involved the same simulated courses, but also showed generalization to new scenes in the third session. Moreover, SCRs to risky scenes, although present in both groups, were detectable in a higher proportion in the active group, paralleling the degree of risk confronted as the training progressed. Finally, the anticipatory ability demonstrated previously (and replicated in the present study), which was evident in the repeated performance of a given scenario, did not seem to generalize to the new scenarios confronted in the last session.

Keywords: hazard perception, moped-riding simulator, learning generalization, implicit learning, skin conductance response

# INTRODUCTION

In the field of road safety research, several studies have gathered indirect and/or direct evidence that supports the idea of the crucial role played by hazard perception in predicting crash likelihood (Horswill, 2016a). As Horswill (2016b) noted, conducting hazard perception research is not feasible in actual on-road situations, as exposing humans to hazards and the consequent potential dangers for the purpose of research raises ethical and other related issues. For this reason, considerable efforts have been devoted to creating adequate tools for measuring this skill in safe contexts.

Two principal methods are currently employed to improve hazard perception among learners (i.e., unlicensed drivers) and novice drivers: one involves watching video clips during which the

#### Edited by:

Massimo Bergamasco, Sant'Anna School of Advanced Studies, Italy

#### Reviewed by:

Andrea Kleinsmith, University of Maryland, Baltimore County, United States Anna Granà, University of Palermo, Italy

\*Correspondence: Mariaelena Tagliabue mariaelena.tagliabue@unipd.it

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 21 August 2018 Accepted: 17 January 2019 Published: 01 February 2019

#### Citation:

Tagliabue M, Sarlo M and Gianfranchi E (2019) How Can On-Road Hazard Perception and Anticipation Be Improved? Evidence From the Body. Front. Psychol. 10:167. doi: 10.3389/fpsyg.2019.00167

learners must identify onscreen hazards and the other relies on engaging learners in virtual driving experiences via simulators that administer a variety of hazardous scenarios. The first method is a component of traditional licensing programs in such countries as England and Australia, and the second method has been a compulsory part of Japan's licensing program for years (Haworth et al., 2005).

Recently, studies aimed at investigating the efficacy of these training methods have demonstrated a positive correlation between a learner's increased engagement in the task, the efficacy of the training, and the likelihood that the improved ability will translate to actual on-road performance; the degree of engagement is itself increased through the requirements conveyed by instructions and the quality of the feedback delivered (Horswill, 2016b; Horswill et al., 2017). Torres et al. (2017) have provided a coherent account of their finding that contingent negative feedback (i.e., losing license points after unsafe decisions in risky scenarios), delivered in response to decisions about whether or not to brake after the presentation of static onroad scenes, yielded to faster and safer decisions. Moreover, Megías et al. (2017) showed that performance in a mopedriding simulator became safer (in terms of the number of accidents, average speed, average time exceeding the speed limit) among participants trained via a feedback learning task that delivered emotional feedback (i.e., pictures of real accidents) with negative valence. Specifically, the learning task consisted of deciding whether it was appropriate to brake, given a set of traffic images. Participants who received negative emotional feedback in 50% of the trials in which they decided not to brake upon being presented with a risky scenario (i.e., risky decisionmaking behavior) demonstrated safer behavior in the subsequent moped-riding simulator test.

In this regard, one important consideration is that when learning involves simulation driving, feedback is embedded within the task; to wit, the driver directly experiences the consequences of risky decisions in the form of the dangers incurred (i.e., crashes or near misses). The use of simulators intrinsically involves a degree of uncertainty with regard to the extent that such results could be generalized to an on-road context (de Winter et al., 2012; Rosenbloom and Eldror, 2014), given that definitive proof would require prolonged monitoring of very large samples and the fact that the likelihood of crash occurrence is relatively low throughout the general population (although regarded as too high to satisfy the safety standard requirements). Nonetheless, some indirect evidence already collected indicates that driving styles recorded via simulator resemble, to some extent, the corresponding on-road driving styles (Goode et al., 2013; Meuleners and Fraser, 2015; Branzi et al., 2017). In particular, the behavioral changes, observed during simulated driving tasks as training progresses, toward safer driving behaviors (i.e., reduction of the probability to incur a crash) are comparable to on-road behaviors, and learning acquired via simulated driving persists over time (Vidotto et al., 2011, 2015).

Moreover, the advantage of using driving simulators to deliver feedback resembling real-life consequences is intrinsically linked to the interactive nature of the virtual environment in the sense that the world changes in response to the driver's behavior (de Winter et al., 2012). Thus, we can assume that quality, degree of contingency, and valence of feedback are crucial factors contributing to the improvement of hazard perception. Consequently, simulated driving should be regarded as more effective at promoting a defensive driving style than passive forms of training because the feedback provided to the trainee is coherent and directly linked to his or her behavior. Further, it can be hypothesized that simulators are potentially more engaging from an emotional perspective, as the first-hand experience of a virtual accident might, in principle, be recognized as more emotionally intense than the simple imposition of a virtual penalty. Moreover, experiencing a car crash for which the driver is personally responsible might produce a more vivid experience than watching images of a crash that is caused by and affects other people.

The aforementioned dynamic characteristic of simulations is thereby related to another advantage yielded by this kind of technology. We refer to the extent to which simulations involve the potential exposure of the driver to hazards (de Winter et al., 2012). First, the simulated nature of this exposure limits the ethical issues that would otherwise plague such research. Consequently, it enables the reproduction of road conditions characterized by the highest statistical probability of accidents occurring, and in which the hazardous element develops "spontaneously" and directly via the specific way in which each driver behaves in each condition. Thus, the use of driving/riding simulators ensures both the standardization of experimental (or training) conditions and the modulation of the hazard degree, pursuant to the specific driving style of each learner.

Another feature that has emerged as a critically important variable is the emotional involvement associated with participation in hazard perception ability training. Currently, a consensus has been reached about the claim that human beings respond to risks in ways that are often not supported by rational rules. According to Slovic et al. (2007), decisions in risky situations are made via an automatic processing system that relies on emotion and experience, which is mostly irrational and more rapid than the controlled processing system.

The mechanism to which Slovic et al. (2007) explicitly referred is the one accounted for by Bechara et al. (1994) and Damasio (1994). By comparing the performances of healthy individuals and ventromedial prefrontal patients in a decision-making task [i.e., the Iowa Gambling Task (IGT)], Bechara et al. (1994) observed that healthy participants develop, over the course of the task, an anticipatory ability that prevents them from choosing from decks that, in previous trials, had yielded losses that outweighed their gains. This occurs via psychophysiological activation [measured through skin conductance responses (SCRs)] that function as signals to alert participants that they are approaching a deck of cards that they have previously experienced as disadvantageous. This ability cannot develop in patients with damage to the ventromedial prefrontal cortex who, consequently, show smaller gains (if not explicitly greater losses) by the conclusion of the task. Interestingly, decision-making ability, as measured by the IGT, seems to interact with other

personality variables that have demonstrably correlated with dangerous driving styles (Gianfranchi et al., 2017a); to wit, it has been shown that IGT performance and sensation-seeking behavior interact in contributing to simulated motorcycle riding style (Gianfranchi et al., 2017b).

Within this framework, a reasonable amount of evidence has been collected on the psychophysiological correlates of hazard perception and reactions in the context of driving. For instance, in one of the first studies, Helander (1978) showed that, in situations that required the use of brakes, participants demonstrated electrodermal responses that were interpreted as deriving from mental activity preceding muscular contraction and, consequently, brake activation.

In a more recent study, Kinnear et al. (2013) demonstrated that, when participants were shown video clips of hazardous road scenarios and asked to identify oncoming dangers, learners and novice drivers exhibited a smaller proportion of anticipatory SCRs than experienced drivers. Moreover, focusing on novice drivers, Tagliabue and Sarlo (2015) showed that employing a driving simulator ensures greater emotional involvement in the task, relative to traditional methods that consist exclusively of exposure to video clips (passive training), as demonstrated by a larger proportion of SCRs. Further, the administration of the same road-simulated scenarios twice, the first a week before the second, has been proven to lead to earlier SCRs (Tagliabue et al., 2017), suggesting improved anticipatory ability, in line with the somatic marker hypothesis (Damasio, 1994): When people encounter the same situation that previously led to an emotional reaction, they experience a reactivation of the same emotions, via their "bodily reactions," which provide them with an advance signal that alerts them to the oncoming risk.

Given the importance and potential implications of such results for driver training programs, it is worth considering two crucial issues arising from the aforementioned evidence. The first concerns the effectiveness of passive training in improving the anticipatory recognition of hazards. The second deals with the need to understand whether or not this anticipation generalizes to new and different scenarios. Both issues have important implications for the design of future programs for learners and novice drivers.

# THE STUDY

The present work is part of a project aimed at investigating the processes involved in hazard perception improvement induced by the Honda Riding Trainer (HRT), a moped simulator. The HRT has proven to be a useful tool for promoting a defensive riding style among novice teenaged riders (Vidotto et al., 2011) by training the attentional skills (Tagliabue et al., 2013) that presumably underlie successful hazard detection; further, the demonstrated improvement in riding style persists for over a year (Vidotto et al., 2015). The HRT has been used to investigate a variety of the processes that underpin driving abilities, such as mental workload during driving, cognitive resources needed to respond to in-vehicle warning systems, and patterns of gaze exploration (Di Stasi et al., 2009, 2010, 2011). Starting from these results, we wanted to investigate the processes underlying hazard perception that could feasibly account for the improvements observed.

As noted above, earlier studies indicated that participants who undertook active training in the form of simulated moped-riding courses demonstrated a greater SCR percentage than participants who engaged in passive training by identifying hazards on these same courses upon request. Further, by comparing the SCRs of active participants who tackled the same courses in the HRT in two sessions (1 week apart) it was shown that SCR onset decreased during the second session, indicating that the first experience negotiating the hazardous scenarios enabled the riders to recognize these now-familiar hazards some 3 m of (virtual) road before (Tagliabue and Sarlo, 2015; Tagliabue et al., 2017).

Three central questions remain open, and these revolve around whether: (a) the anticipatory response does, in fact, generalize to new hazardous conditions; (b) the improvement in anticipatory ability related to hazard detection and reaction differs on the basis of whether the training is active or is passive; and (c) the overall profile of the performance, besides the probability that a crash will occur, is impacted by the active training involving the simulator or the passive training consisting of viewing video clips of hazardous scenarios.

Concerning the third point, in the above-mentioned studies that employed the HRT, improvement in performance was essentially measured by calculating the percentage of accidents. However, another crucial aspect of road safety involves learning to drive in a way that facilitates averting hazard development. To wit, learners may demonstrate a lower percentage of crashes either because they have learned to execute certain last-minute maneuvers designed to avoid impending collisions or because they have learned to drive in a generally safer way. This issue can be addressed by analyzing the level of potential hazards that develop during active training. Indeed, the HRT simulator facilitates such an analysis by providing a score for each potentially hazardous scene based on the degree of hazard development: when the rider behaves in a way that is totally safe, the scene receives a score of A; when the rider relies on mildly unsafe maneuvers (e.g., moderate violations of the speed limit, slightly sudden braking, stopping with insufficient headway), the scene receives a B score; when the rider executes seriously unsafe maneuvers (e.g., strong sudden braking that results in a concrete risk of losing control of the vehicle, stopping dangerously close to the preceding vehicle), the scene is assigned a score of C, generally reflecting conditions that resemble near-misses; finally, a score D corresponds to scenes in which an accident actually occurs.

By analyzing how the scores collected during training reflect a trend of behavioral change toward generally safer behavior, it may be possible to acquire further information on the mechanisms underlying the learning process.

To that end, in the present work, participants in the active group of the study conducted by Tagliabue et al. (2017) were assigned to complete a third training session consisting of the administration of six new courses to investigate issues related to the generalization of learning (aim a). A new group of matched participants (passive group) was recruited, and they engaged in

training by watching the scenes attempted by the active group, and identifying hazards (aim b).

Both behavioral and SCR data were collected to measure how participants learned to detect and anticipate hazards. We formulated three hypotheses, as follows:


# Methods

#### Participants

Thirty-eight undergraduates at the University of Padua were included in the present study. Data from one male participant of the control group were excluded from analyses due to electrode malfunction during skin conductance recording in the third session. Consequently, the data from the matched participant of the experimental group were also eliminated. Thus, the final sample included 36 undergraduates (18 females and 18 males; mean age 19.47 years; range 18–24 years). The experimental group was the same as used in the study conducted by Tagliabue et al. (2017), with the inclusion of three new participants to attain the sample size of 18 participants. The other 18 participants, assigned to the control (passive) group, were all new to this set of studies. All participants were novice drivers/riders; they held a driver's license for no longer than 3 years (range 5–36 months; mean 9.8 months). Nine students had a riding license, but reported road exposures under 5,000 km. The experimental and control groups were balanced for age (mean age = 19.88 and 19.05 years, respectively) and gender (nine males and nine females in each group).

The procedure included three experimental sessions. The task assigned to the experimental group was to ride a motorcycle simulator along some virtual courses. Participants in the control (passive) group were asked to detect hazards while they watched videos of the experimental (active) participants riding the simulators through the virtual courses.

All participants had normal or corrected-to-normal vision. They were paid €39 for their participation. The study was approved by the Ethical Committee for the Psychological Research of the University of Padua.

### Apparatus and Stimuli

The HRT is a two-wheeled-vehicle riding simulator that consists of a Pentium 4 PC with a Windows XP operating system (see **Figure 1** for an image of the HRT and examples of risky scenes).

The PC is placed on a base connected to an LCD monitor (1,024 × 768 resolution) displaying the virtual environment and to a chassis equipped with moped-like controls. For our procedure, a second screen was placed on a table behind participants who were seated on a moped-like seat approximately 80 cm from the HRT monitor. Two speakers, in addition to reproducing the acoustic effects of the engine and traffic noise, provided instructions to the active participants in the experimental group about the path they had to follow. Participants rode along the virtual courses using the moped-like controls, with a transmission set to automatic.

The same five courses, which took place on secondary roads, were employed in the first two sessions, while six courses, which took place on main roads, were administered in the last session. Each course included seven or eight risky scenes (depending on the course), for a total of 39 scenes in each of the first two sessions and 48 in the last one. The scenes represented hazardous road situations based on a European report classifying the most common motorcycle accident scenarios (Motorcycle Accidents In Depth Study [Maids] (2004)).

### Skin Conductance Recording

Skin conductance was recorded with two Ag/AgCl electrodes filled with a K-Y lubricating jelly placed on the left foot, over the abductor hallucis muscle, in a position adjacent to the sole of the foot and midway between the base of the hallucis and a point beneath the ankle. This electrode placement conforms to the international guidelines (Boucsein et al., 2012) indicated in situations where participants need to use their hands for the task itself.

A Grass Model SCA1 skin conductance coupler provided a 0.5-V constant voltage across electrodes. The signal was amplified and filtered with a 10-Hz low-pass filter using a Grass CP122 AC/DC Strain Gage amplifier (Grass Instrument Co., W.

FIGURE 1 | The HRT simulator used in the present study (top-left panel) and three examples of risky scenes.

Warwick, RI, United States). The amplifier—placed adjacent to the second screen, reproducing the output of the HRT monitor returned a display of the ongoing values of the electrodermal activity. A video camera was employed to simultaneously record the electrodermal activity (the values on the amplifier display) and the riding performance (second monitor).

# Procedure

For both groups, the entire procedure lasted for three sessions scheduled 1 week apart. Each session lasted approximately 45–60 min. At the beginning of the first session, after signing informed consent forms, each participant completed a questionnaire involving data related to their age and their driving and riding experience. Then, electrodes for skin conductance recording were attached.

With regard to the experimental group, the participants were instructed on how to use the HRT controls. Their task consisted of riding along the virtual courses, following the audibly vocalized advice, respecting the traffic rules, and avoiding crashes. The instructions also explained the importance of trying to avoid moving their feet during the task. A practice course (3 min in length, with no other road users in the virtual environment) was administered to enable participants to familiarize themselves with both the virtual environment and the HRT controls. All participants in the experimental group attempted the same five courses in the first two sessions, confronting a total of 39 potentially hazardous scenes in each session. In the last session, six new courses were administered, with a total of 48 scenes. Overall, each participant faced 126 hazardous scenes. As in the previous studies (Tagliabue and Sarlo, 2015; Tagliabue et al., 2017), the sequence of the courses was fixed for each session, proceeding from the easiest to the most difficult. The degree of difficulty was derived from the studies conducted by Miceli et al. (2008) and Settanni (2008).

The control group was asked to watch a simulation of some courses (five in the first two sessions and six in the third one) undertaken by an anonymous HRT rider and identify hazards. Whenever a passive participant detected a hazard, he or she pressed a button on the handlebar of the simulator. This detection task had the purpose of ensuring constant attention was paid to the video to avert the possibility of distraction. Given the purpose of the task, there was no need to record the responses. Each participant in the control group, matched to a same-gender participant in the experimental group, watched the replay of the performance of his or her paired participant in the corresponding sessions. At the end of each course, a 3-min rest was assigned to both the experimental and the control group to allow skin conductance to return to the baseline level.

# Data Reduction and Coding

The coding procedure was based on the videos recorded by the camera (in which the electrodermal values displayed by the amplifier and the performance on the HRT were synchronized) and on the .csv files, provided by the simulator, that collected all the variables linked to the riding performance, with a sampling rate of 30 Hz. The electrodermal activity values were coded at this same sampling rate via analysis of the videos obtained for each participant in each session. In this way, each point of the physiological signal was matched with the behavioral variables provided in the .csv files. Among these variables, the HRT also provided an evaluation of participants' riding safety for each scene. Possible scores ranged from A to D, depending on the distance between rider and collision, with A signifying that the participant's behavior was safe enough to prevent any collisions, B indicating a slight increase in the risk of crash, C corresponding to a near-miss, and D representing to an actual crash. Thus, each performed scene received a score that represented its particular level of risk, depending on the rider's behavior.

As in previous works (Tagliabue and Sarlo, 2015; Tagliabue et al., 2017), for each of the hazardous scenes, we identified a clue (i.e., a point on the path, in terms of x/y coordinates, after which hazard development began). As a result, we focused on two temporal windows: a baseline pre-clue window of 5 s and a 10-s post-clue window in which the hazard developed. Note here each participant was provided with the same clues, and this obtained for both the experimental and the control group.

As the participants in the experimental group could influence the development of the hazardous scenes via their riding behavior, the detection of the SCRs' onset was not feasible in terms of timing. For instance, with a lower riding speed, approaching the hazard required more time. Therefore, the SCRs' onset might appear to happen later because of a difference in the time taken to approach the hazard. As such, on the basis of the methodology employed in previous works (Tagliabue and Sarlo, 2015; Tagliabue et al., 2017), the mean level of the electrodermal activity in the baseline window was computed. Then, an SCR was detected as the first increase in amplitude of at least 0.05 µmho (Boucsein et al., 2012; Kinnear et al., 2013; Tagliabue and Sarlo, 2015; Tagliabue et al., 2017) in the post-clue window, with respect to the baseline. The time of SCR onset was then converted into its corresponding position on the path, in terms of the absolute value of x or y coordinates, depending on the dimension along which the participant was moving. Note that each individual change in the coordinates' value corresponds to a change of approximately 1 m in the virtual environment. The onset was therefore computed in terms of spatial distance from the hazard: an anticipation in the SCR onset corresponded to a greater distance from the hazard. The coding procedure was the same for both groups in every session. Indeed, each participant in the control group watched the performance of his or her matched experimental participant, thereby being exposed to the same scenes, with the same x/y coordinates.

# Design

To investigate the effectiveness of the three-session training, we conducted an ANOVA on the dependent variable accident's percentage of participants who actively rode along the virtual courses (active group) with Session as a within-participants factor (three levels).

Moreover, a deeper analysis of the participants' performance was conducted via a MANOVA on the percentage of scores (A, B, C, or D) attributed to each performed scene on the basis of the degree of risk generated by the participants' behavior. Again, Session was the within-participants three-level factor.

To investigate whether psychophysiological responses paralleled improvements in riding performance, and to compare psychophysiological reactivity of the active group to that of the passive group, two ANOVAs were carried out on the dependent variable percentage of SCRs. The first was conducted on the overall SCR percentages (independently from the degree of risk of the scenes) and aimed at comparing the present results with the previous one attained by Tagliabue et al. (2017), with Group (active vs. passive) as a between-participants factor and Session (three levels) as a within-participants factor.

In addition, a further ANOVA with Group as a two-level between-participants factor and Session (three levels) and Risk degree (four levels: A, B, C, and D) as the two within-participants was run to provide more information via the fine-grained analysis of the pattern of psychophysiological changes with reference to the risk degree.

Skin conductance response percentages were calculated for each kind of scene, depending on the degree of risk, as the proportion of SCRs detected over the total scene in each risk degree category. In courses in which one kind of scene (A, B, C, or D) did not occur, depending on the participant performance in the active group, the missing data were replaced by the mean for the same risk category. This happened for one pair of participants for A scenes and five pairs of participants for D scenes.

Last, to test the impact of the three-session training on the anticipatory ability, we analyzed the onset of dependent variable SCR via an ANOVA with Group as a between-participants factor (active vs. passive) and Session (three levels) and Risk degree (four levels: A, B, C, and D) as within-participants factors.

Post hoc analyses using Bonferroni's correction were conducted with α set at 0.05.

# Results

In the ANOVA on the percentage of accidents of the active group, the factor Session attained significance with F(2,34) = 107.46, p < 0.001 (η<sup>p</sup> <sup>2</sup> = 0.86). Participants had 28% of accidents in the first session, 16% in the second session, and 4% in the third session. Post hoc tests showed that all comparisons were significant, thereby confirming that performance improved, both in the second session (which administered the same courses as the first session), and in the third session, when participants had to confront new courses.

In the MANOVA on the percentage of scene's scores, the factor Session attained significance at the multivariate level, Wilks' λ = 0.027, F(6,12) = 71.64, p < 0.001 (η<sup>p</sup> <sup>2</sup> = 0.97). At the univariate level, the factor Session was significant for each score, with F(2,34) = 99.85, p < 0.001 (η<sup>p</sup> <sup>2</sup> = 0.86) for the A score, F(2,34) = 5.01, p < 0.05 (η<sup>p</sup> <sup>2</sup> = 0.23) for the B score, F(2,34) = 37.26, p < 0.001 (η<sup>p</sup> <sup>2</sup> = 0.69) for the C score, and F(2,34) = 103.08, p < 0.001 (η<sup>p</sup> <sup>2</sup> = 0.86) for the D score. The post hoc tests revealed that the percentage of A scores (i.e., totally safe scenes/performance) significantly increased between the first two sessions (15 vs. 26%) and from the second to the third session (47%). The percentage of D scores (i.e., accidents) decreased (28, 16, and 4% for the three sessions, respectively). With regard to B and C scores, no differences were found in the first two sessions, but in the third session, the percentage of both scores significantly differed (see **Figure 2**).

These results testify to the acquisition of learning as the training progressed. In the first session, participants drove in a way that generated a given degree of risk, as evidenced by the lower percentage of A scores relative to B and, even more clearly, to C and D scores that, it is worth noting, were attributable to scenes characterized by the development of a large amount of risk, or even the occurrence of an accident. The improvement in performance in the second session, which required participants to confront the same scenes, is demonstrated by a decrease in crashes. It can reasonably be considered that a certain number of scenes that had previously received a D score received a C or B score for the second session: the gradual modification of performance toward a notably safer level might be the reason why B and C scores appear to remain stable (as D scores become C scores, and C scores become B scores). Finally, this gradual improvement resulted in an unambiguous increase in the proportion of safe performances, as indicated by the enhanced percentage of A (safe performance) and B (low-risk performance) scores for the third session, in which participants faced six new (never before encountered) scenes. The effect of learning was even more apparent, given the significant concomitant reduction of C and D scores.

In the first of the ANOVAs on the SCR percentages, the factor Group and the factor Session attained significance with F(1,34) = 17.60, p < 0.001, (η<sup>p</sup> <sup>2</sup> = 0.34), and F(2,68) = 7.47, p < 0.01, (η<sup>p</sup> <sup>2</sup> = 0.18), respectively. Participants from the active group exhibited a higher percentage of SCRs, relative to the passive group (52 vs. 27%), and the SCR percentages decreased from 47%, for the first session to 39% for the second session and, finally, to 34% for the third session. The post hoc tests indicated that the SCR percentage for the first session was significantly higher than the subsequent percentages for the last two sessions.

In the second ANOVA on the SCR percentages, the factors Group and Risk degree attained significance with F(1,34) = 20.01, p < 0.001 (η<sup>p</sup> <sup>2</sup> = 0.37) and F(3,102) = 53.22, p < 0.001 (η<sup>p</sup> <sup>2</sup> = 0.61), respectively. In addition to the significant difference between the two groups, SCR percentages increased as the scenes became

increasingly risky (32% in A scenes, 35% in B scenes, 46% in C scenes, and 60% in D scenes). The post hoc tests revealed that the SCR percentages in A and B scenes (safe or low-risk level) did not differ, but both were significantly different from SCR percentages in C and D scenes, which were characterized by a greater degree of risk development or the occurrence of an accident. The SCR percentages in C and D scenes were also significantly different from one another.

Moreover, the Group X Risk degree and the Session X Risk degree interactions attained significance with F(3,102) = 6.01, p < 0.01 (η<sup>p</sup> <sup>2</sup> = 0.15) and F(6,204) = 2.85, p < 0.05 (η<sup>p</sup> <sup>2</sup> = 0.08), respectively. Concerning the Group X Risk degree interaction, as is evident in **Figure 3**, the post hoc test confirmed the significant difference between groups in each scene category. Moreover, in the active group, the SCR percentages increased in both C and D scenes, indicating that the psychophysiological reactivity paralleled the analogous rise in the degree of risk. Dissimilarly, in the passive group, a significant increase in SCR percentage was present only in cases of accident occurrence (relative to all other risk degrees).

As far as the Session X Risk degree interaction is concerned (see **Figure 4**), in the first session, A and B scenes elicited similar SCR percentages but were significantly different from C and D scenes, whereas SCR percentages of C and D scenes did not differ. In the second and third session, SCR percentages were higher in C than in B scenes and higher in D than in C scenes, as attested by the post hoc tests. Overall, this interaction suggests a better ability to discriminate the different degree of risk as the training progresses.

In the last ANOVA on the dependent variable SCR onset, the only main source of variance that reached significance was the factor Session, with F(2,68) = 11.94, p < 0.001 (η<sup>p</sup> <sup>2</sup> = 0.26). None of the interactions reached significance. The post hoc tests showed that the onset in the second session (10 m) was different from the SCR onset in both the first (14 m) and third (16 m) sessions. The SCR onset of the first and third sessions (new scenes, never seen before) did not differ significantly. In other words, the anticipatory ability developed in the second

session, when the same scenes had to be faced (as previously demonstrated in Tagliabue et al., 2017), disappeared in the third session when new, potentially risky scenes were performed or viewed by participants.

# Discussion

First, the results of the present paper confirm previous results that show performance improvement as training progresses. The fact that accident percentage decreases in the second session might be due to contextual learning, as the same course was administered in the first two sessions. However, the generalization of the learning acquired, at least at the behavioral level, has been demonstrated by the additional improvement recorded in the third session, in which six new courses were confronted. This improvement is substantiated not only by the reduction in accidents, but also by the overall improvement in performance. Participants who actively rode the moped simulator demonstrated safer behavior in the third session, as scenes in which they rode dangerously and incurred an accident decreased in the third session.

From a psychophysiological perspective, in accordance with previous reasoning, SCRs to potentially approaching hazards should indicate an implicit mechanism responsible for risk detection, in line with the dual processing system articulated by Slovic et al. (2007). Thus, one might wonder why SCR percentages should decrease, given that as the training progresses participants should have learned to react to hazards in a more effective way. Note that a reduction in SCR percentages between the first and second sessions was already observed in Tagliabue et al. (2017). The authors explained this result considering that, as the training develops, participants learn to drive safer, yielding a reduction in the number of near misses and accidents. In this case, there would be less need for the implicit system to react; thus, SCR percentages should decrease. In the present work, a third session was added to further test the hypothesis that the improvement derived from a safer riding style leads to a reduction of the number of hazards to detect and, consequently, of the amount of psychophysiological responses. The present data

favor this explanation in that, in the third session (in which C and D scenes—i.e., near misses and accidents—decreased), the SCR percentage was lower than in the first session.

Moreover, these data replicate the results indicating greater emotional involvement in the active group, which showed higher SCR percentages than the passive group, thereby confirming the results of a study conducted by Tagliabue and Sarlo (2015) using a totally new sample.

The fine-grained analysis of the changes in SCR percentages based on the degree of risk involved in the scenes indicate that differences between the two groups are further evident in the modulation of psychophysiological reactivity pursuant to the risk degree (see **Figure 3**): unlike the active group, the enhancement of the SCR percentages in the passive group obtained only if an accident occurred may indicate a failure by the passive group to discriminate among different levels of hazard. This is a particularly compelling result as, while driving, the correct "categorization" of the risk degree might facilitate the selection of the most appropriate response.

Moreover, the results illustrated in **Figure 4** indicate that accurate discrimination between no- or low-risk scenes, demonstrated by the psychophysiological reactivity, emerged in the second session and was maintained through the third session via transferring acquired learning to the new scenes, thereby providing partial evidence of generalization, at least in the simulated environment.

By contrast, with regard to anticipatory ability, our data did not confirm the generalization of this ability to scenes not yet encountered. Albeit negative, this result, if confirmed, provides important information about the requirements for road safety training, as it highlights the importance of extensive training aimed at exposing novice drivers and riders to as many different hazards as possible under the safer conditions of training (simulator or video clip viewing) to increase the probability that they will recognize such hazards in advance once they are actually on the road.

To summarize, active group participants showed performance improvement, and their learning seemed to generalize to new scenes, as they behaved more safely in both the second and third sessions. Moreover, they learned to discriminate among different degrees of risk via their implicit system (Slovic et al., 2007) and to generalize this achievement to the third session. By contrast, the anticipatory ability, in terms of SCR onset, did not appear to generalize to scenes not previously encountered. Results related to the passive group confirm previous results demonstrating that passive training involves the implicit system of hazard detection and recognition to a lesser extent.

# CONCLUSION

The main findings of the present research consist of demonstrating the following: (a) the experience of adverse consequence in simulated road scenarios yields an improvement in the ability to recognize the risk earlier when it is faced anew; (b) the psychophysiological correlates of this ability indicate that simulation is more effective than passive viewing of risky video clips; and (c) this anticipatory skill develops pursuant to prior experience, as predicted by the dual model of decision making (Slovic et al., 2007) and the somatic marker hypothesis of Damasio (1994).

The present results indicate a need to persist in attempts to develop training modules that permit exposure to the largest possible sample of road hazards before future motorcyclists actually take to the road to render them more capable of safely confronting each potential road risk. In this light, every effort to collect statistics, to map the greatest possible variations of the dynamics associated with the most common circumstances in which crashes occur and the attempt to replicate such dynamics via the simulators, must be firmly supported.

As noted, the hazard scenarios employed in the present research are drawn from the Motorcycle Accidents In Depth Study [Maids] (2004) accident statistics that include 921 situations representing a large amount of recorded motorcycle crashes across Europe. One potentially promising extension of this line of research could be the creation, for each of these prototypical conditions, of a certain number of variants (similar scenes with small differences), to induce learners and novice drivers to generalize, as much as possible, their acquired learning to new and different, but similar, scenes. To wit, the development of learning programs that enhance the probability of transferring the same anticipatory reactions to scenes not yet experienced by promoting generalization to several broader clusters of hazards could represent a further step toward the prevention of road accidents.

Given that experiencing an accident firsthand (albeit virtually) is more emotionally involving than simply viewing someone else's accident, the greater impact of simulation relative to video clip viewing is in line with the evidence provided by studies in the field of affect heuristics that indicate that, the more vivid and affectladen the scenarios, the more effective they are in influencing risk perception (Slovic et al., 2007).

The principal limitation of the present research is related to the generalizability of the observed effects to real on-road conditions. The use of simulators for driving/riding assessment and training is spreading in several countries. However, the benefits and disadvantages of this strategy remain controversial. Rosenbloom and Eldror (2014), for instance, did not find an overall improvement in on-road performance of newly licensed drivers trained with a driving simulator compared to novice drivers who received only the standard driving lessons. In fact, the former group showed a worsening in safe driving intention, probably due to overconfidence, and a slight reduction of headway events not associated with a reduction of their severity. The same group also showed an increase in the amount of brake pressure, which is interpreted by the authors as reflecting a less safe driving style. Goode et al. (2013) argued in favor of a certain amount of effectiveness associated with this technology when it is aimed at refining higher order cognitive skills, such as visual scanning and hazard perception; de Winter et al. (2012) cited the advantages and disadvantages of simulators and stressed the need for deeper investigation. More recently, evidence of correspondence between simulator and on-road

driving parameters was reported by Branzi et al. (2017) by comparing driving speeds in on-road and simulated driving.

Despite the fact that the advantages and disadvantages of employing simulators are matters of ongoing debate (de Winter et al., 2012; Rosenbloom and Eldror, 2014), any reasonable enterprise aimed at improving the abilities of road users to prevent the development of risky situations should not be overlooked, especially as the available data seem to show that, over the last 5-year period, the goal of the EU, in terms of reducing the number of road deaths, appears far from being achieved (Adminaite et al., 2018). The present data suggest that, even though generalization from previously experienced road scenes is evident in the explicit behavior observed during the training of road users with limited experience, implicit learning requires prior exposure to each specific scenario. As such, any attempt to monitor and map conditions in which accidents occur and expose novice road users to the largest possible sample of such risky scenarios, so as to improve training programs, should be strongly encouraged.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of guidelines for psychological research of the Associazione Italiana Psicologia (AIP) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki.

# REFERENCES


The protocol was approved by the Ethical Committee for the Psychological Research of the University of Padua.

# AUTHOR CONTRIBUTIONS

MT supervised data collection and contributed to statistical analyses and manuscript writing. EG conducted data collection, statistical analyses, and manuscript writing. MT, EG, and MS contributed to research planning, results' discussion and revision of the paper.

# FUNDING

This research was supported by a grant FINA n. TAGL\_FINA18\_01 "Meccanismi sottostanti all'apprendimento alla guida sicura e riduzione dell'incidentalità su strada," from the Department of General Psychology to MT.

# ACKNOWLEDGMENTS

The authors thank I. M. Cordis for helping in data coding and S. Bellomo for helping in data collection and coding. The present work was carried out within the scope of the research program "Dipartimenti di Eccellenza, which is supported by a grant from MIUR to the Department of General Psychology, University of Padua."

of vulnerable road users: first-time motorcycle riders. Transp. Res. Part F Traffic Psychol. Behav. 14, 26–35. doi: 10.1016/j.trf.2010.09.003



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Tagliabue, Sarlo and Gianfranchi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Authenticity, Interactivity, and Collaboration in VR Learning Games

Meredith M. Thompson\*, Annie Wang, Dan Roy and Eric Klopfer

Comparative Media Studies and Writing, Massachusetts Institute of Technology, Cambridge, MA, United States

Decreasing cost and increasing technology access in schools places 3D immersive virtual reality (VR) within the reach of K-12 classrooms (Korbey, 2017). Educators have great interest in incorporating VR into classrooms because they are engaging and often novel experiences. However, long-term curriculum development must be positioned on how to best leverage the unique affordances of VR, be informed by theory and research, and integrate VR in meaningful ways that continue to motivate students even after experiences are no longer novel. We propose the theoretical framework of embodied learning and discuss how VR and reflect on current research findings to outline effective applications of VR and provide guidelines in developing educational materials using those tools. We discuss two particular examples: spatial awareness and collaboration. We share our perspectives on the benefits and challenges of applying these principles in a learning game about cellular biology.

#### Edited by:

Maria V. Sanchez-Vives, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Spain

#### Reviewed by:

Yiorgos L. Chrysanthou, University of Cyprus, Cyprus Regis Kopper, Duke University, United States

> \*Correspondence: Meredith M. Thompson meredith@mit.edu

#### Specialty section:

This article was submitted to Virtual Environments, a section of the journal Frontiers in Robotics and AI

Received: 14 November 2017 Accepted: 30 November 2018 Published: 19 December 2018

#### Citation:

Thompson MM, Wang A, Roy D and Klopfer E (2018) Authenticity, Interactivity, and Collaboration in VR Learning Games. Front. Robot. AI 5:133. doi: 10.3389/frobt.2018.00133 Keywords: immersive virtual reality, stem education, game based learning, embodied learning, K12 education, collaboration

# BACKGROUND

VR has the potential to broaden the reach of the traditional classroom by addressing limitations of K-12 classroom environments. VR simulations that engage learners as explorers shift the focus from content acquisition to active inquiry (Hew and Cheung, 2010; Merchant et al., 2012; Ahn et al., 2017). Now that these technologies are within reach of classrooms and lecture halls, research needs to move beyond simply asking whether VR can help bolster learning, and consider how best to use these tools in educational contexts (Dalgarno et al., 2011). In doing so, we not only imagine the types of problems that immersive 3D can solve for K-12 educators but consider the larger question of how to craft learning experiences for students that effectively move between and utilize two dimensional, three dimensional, and immersive 3D visualizations.

Our labs have developed a number of learning simulations and games, and are currently developing a game to introduce students to cellular biology. Through this process, we have gained perspective on the benefits and challenges of using VR in creating authentic, interactive, and collaborative experiences that help students learn about the complex topic of cellular biology. First, we explore how VR can be helpful in creating authentic representations in biology. Then, we discuss current understandings of the theoretical frameworks of embodied learning and collaborative learning. Finally, we discuss how we have applied these two perspectives through a collaborative, cross-platform educational game named Cellverse.

**77**

# Authenticity: Cell Biology as a Context for Virtual Reality

Cells and the central dogma are two critical topics in biology standards and curricular materials (National Research Council, 2013). Despite the importance of these concepts, visuals of cells are often oversimplified in introductory resources (Shi et al., 2010; Tibell and Rundgren, 2010) and misunderstood by students and educators alike (Çeliker, 2013; Vlaardingerbroek et al., 2014). Incorporating 3D visualization such as immersive VR into biology curricula may be a solution to improving student learning. Previous research has shown a notable positive correlation between the use of visual models and student scores—just a few class sessions of exposure to a tangible model can result in significant score improvement in beginner biology students (Höst et al., 2013), and highquality animations of cellular processes improve scores and higher long-term memory retention among students (McClean et al., 2005). Although these and other studies have noted a positive correlation between visualization and student learning, there are still challenges to be addressed. Skeptics argue that too much visual information can lead to "cognitive overload" and thus endanger learning, although they too acknowledge that there is definite potential in visualization (Tversky et al., 2002). In fact, what can help visual data become truly effective is "interactivity"—the ability for a user to stop, start, replay, and manipulate visuals at his or her own pace. VR is an excellent platform for designing interactive and manipulatable environments.

# Interactivity: Embodied Learning in VR

VR technologies can engage learnings both cognitively and physically through immersive and interactive experiences. The theory of embodied learning posits that connecting learning events and physical actions creates a stronger impact on the individual (Kiefer and Trumpp, 2012). VR technologies can be responsive to the participants' movements in a way that activates the learners' perception of themselves as a tool for developing understanding (Stolz, 2015).

VR simulations are already widely used to develop physical skills with instruments, as a flight simulator does for a pilot's aviation skills or a surgical simulator for a doctor's surgical technique (Slater and Sanchez-Vives, 2016). VR can also help learners practice laboratory skills during virtual laboratories (Chiu et al., 2015; Lindgren et al., 2016; Jang et al., 2017). More recently, VR has been used by scientists for honing their skills in preparing molecular compounds for microscopy (Leinen et al., 2015) and for envisioning how to modify molecules to develop new pharmaceutical drugs (Cheng et al., 2012; Yuan et al., 2017; Liu et al., 2018). Scientists share computer-based visualizations with the scientific community online, drawing upon 3D models of proteins, molecules, and molecular reactions through online resources such as the Protein Databank and PyMOL (Mwalongo et al., 2016; Yuan et al., 2017). VR has enhanced the process of drug discovery by enabling scientists to investigate molecular structure and function and prompted the development of mixed reality software platforms such as Molecular Rift (Yuan et al., 2017) and Reality Convert (Borrel and Fourches, 2017). These applications of VR for science can be useful in K12 contexts by enabling learners to create embodied analogies for abstract concepts through gesture and movement (Weisberg and Newcombe, 2017). For example, a VR simulation offered higher levels of understanding and retention among high school students learning cellular biology in comparison to traditional 2D models (Tan and Waugh, 2014).

Spatial understanding is related to understanding relative size and scale, a topic many students find challenging (Jones et al., 2003). Size and scale are important to understand in science, technology, engineering, and mathematics (STEM) domains (Weisberg and Newcombe, 2017). While individuals have varying degrees of spatial understanding (Coxon et al., 2016), spatial awareness can be improved (Uttal and Cohen, 2012). Activities such as creating 3D representations of geometric shapes (Burte et al., 2017) and through gesturing while solving spatial problems (Chu and Kita, 2011) can enhance individuals' understanding. Spatial awareness is linked to perception of size and scale, which is also important in STEM topics (Jones et al., 2008). Similar to spatial awareness, understanding of size and scale can be enhanced through direct experience with objects and with distances between objects (Jones et al., 2008) and through using a body as a comparison point (Jones et al., 2009). VR has already been useful as a research tool in understanding spatial awareness (Wilson, 2013), and shows promise in developing spatial skills. VR can provide learners with virtual experience with objects and prompt learners to gesture during problem solving; both activities have the potential to improve spatial understanding and users' perception of size and scale.

Problems that require perspective taking and understanding structure are well-suited to use VR. Virtual environments can help users develop "spatial presence," a perception of the overall VR environment and the relationships between the objects within that environment (Wirth et al., 2007). The level of embodiment achievable in VR is directly related to the level of interactivity between the user and the virtual space. 360 videos and virtual field trips are already being used in classrooms to help students learn geography and cultural awareness due to the lower cost of the equipment, however, the user has limited ability to interact with the experience(Brown and Green, 2016; Korbey, 2017; Minocha et al., 2017). Interactive simulations and virtual laboratories have helped students understand electrostatics and forces in physics (Salzman et al., 1999; Pirker et al., 2013), and mathematics (Mizell et al., 2002; Guerrero et al., 2015). Laboratories and simulations require more resources to design than virtual field trips, but the additional interaction supports a deeper level of embodied learning (Potkonjak et al., 2016; Jang et al., 2017).

# Collaborative: Learning in VR

The movement from room scale CAVE Automatic Virtual Environment (CAVE) to head-mounted displays (HMD) has decreased the cost of VR, yet these technologies have focused heavily on the individual's experience (Hew and Cheung, 2010; Slater and Sanchez-Vives, 2016). As technology and connectivity improves, VR will include collaboration between individuals in HMDs, requiring a new understanding of how technology can enable new forms of communication between individuals (Gugenheimer et al., 2017). Designers must balance the user's attention to their own experience and explore how to create a sense of shared presencse, or co-presence, in the virtual world (Campos-Castillo, 2012).

Principles of collaborative learning such as interdependence, thoughtful formation of groups, individual accountability, and attention to social skill development are also useful considerations in VR environments (Cuseo, 1997; Lee, 2009). Activities that require individuals to work together in order to accomplish goals create what Johnson and Johnson (1989) call "positive interdependence" among team members; the structure of the activity necessitates a joint effort. Since virtual environments are still relatively novel, both rules and roles can be useful in structuring collaboration. Jensen and Konradsen (2018) used games as a way to create rules for social interaction and roles for individuals in virtual problem-based activities. Roles also helped visitors engage with a VR museum exhibit experience on an aircraft carrier (Zhou et al., 2016). Middle school students in the EvoRoom VR environment EvoRoom environment benefited from clear roles in gathering and sharing information with their peers (Lui and Slotta, 2014).

In addition to clear roles, a range of expertise helps foster interdependence in virtual teams (Weber and Kim, 2015). One way to establish roles is to structure distributed teamwork through roles and access to different forms of technology and information (text based, 2D, 3D, VR) that must be synthesized to solve a problem. This redistribution can create power dynamics within the group. In comparing virtual to in person problem solving among teams of people using 2D, 3D, and VR interfaces, (Slater et al., 2000; Slater and Sanchez-Vives, 2016) found that the individual in VR was more likely to emerge as the leader, even if that same person did not take on a leadership role in the in-person project. Spante et al. (2006) also studied puzzle solving across VR and 2D systems; they found that team members assumed both had the same view until they traded places. Having different viewpoints enhanced collaboration, creating what Spante et al. (2006) termed "the good inequality". Gugenheimer et al. (2017) created a system where individuals in HMD could interact with individuals outside of VR through a "FaceDisplay," a touch screen interface. Teamwork can be reinforced by structuring environments to providing team members with complementary information and different views of information; furthermore, students also gain appreciation of how different forms of media may be more appropriate for understanding certain concepts.

# AN EXAMPLE IN PROGRESS CELLVERSE—A COLLABORATIVE LEARNING ENVIRONMENT IN VIRTUAL REALITY (CLEVR)

We now apply some of these ideas about embodiment and collaborative learning to a game-based learning project currently under development, Cellverse. In Cellverse students learn about cells and the process of converting DNA to proteins through an interactive problem-based game. Working in small teams of two or three, students examine a living cell from within. The Explorer wears a head mounted display and moves through the cell in VR to observe function and structure, as shown in **Figure 1**. The Navigator uses a tablet-based toolkit of disease descriptions, stains, tags, and measurement devices to gather data and focus the visualizations using a table, as shown in **Figure 2**. The experience is being designed with a distribution of data available for players in a way that students must communicate to solve the puzzle together.

A central question is—why use VR? Virtual reality allows students to experience the cellular environment as an active explorer, rather than a passive observer. It also gives students an appreciation for the density of the cell, the size and scale of organelles relative to each other and to other molecules in the cell, spatial relationships between the organelles, and the cellular structure. Structure and spatial orientation are both important in understanding the central dogma, when DNA is first transcribed to mRNA and translated by tRNA into long amino acid chains that become proteins.

We draw connections between the effective practices we have found in in the literature using the affordances of VR and our intentions for this project.

# Authentic

We situate student learning in the context of biology, both in the game narrative and the game environment. We have built a cellular environment that matches current research on cells, with ongoing input and feedback from cellular biologists and other cell biology experts. Whenever possible, we have incorporated tools and activities that scientists would use as in-game functions. For example, students can highlight specific organelles and structures within the virtual environment using simultaneous label-free autofluorescence-multiharmonic (SLAM) microscopy (You et al., 2018). Cells are densely packed, which is challenging to render and can be overwhelming to users. We continually balance how to represent the cell most authentically while maintaining presence within the experience and minimizing cognitive load.

# Interactive

The game-based format provides a high degree of interaction between students and the concepts included in the VR environment, which has been linked to deeper learning (Lindgren et al., 2016; Jang et al., 2017). The game-based format also provides ongoing feedback to the players, which also assists the learning process (Merchant et al., 2012). Through the game, we aim to transform a topic that is often passive and vocabularybased into an active, embodied experience.

We also incorporate aspects of biology within the game narrative and the game environment. We are building a cellular environment that matches current research on cells, with ongoing input and feedback from cellular biologists. Students will learn what a cell biologist might do by using tools such as (SLAM) microscopy (You et al., 2018).

# Collaborative

We are building interdependence among team members into the design of the project by creating rules, establishing roles, and distributing resources between players. Rules are established before students take on roles in the game; either as explorers or navigators. The explorer will see the 3D VR view of the world and will complete tasks that involve spatial relationships between organelles, identifying protein structures, and tracking processes within the cell. The navigator has access to information on 2D and 3D flat screen models to help guide the explorer and to work with other team members as they identify organelles, proteins, and even DNA and RNA sequences that could provide helpful clues in the game. We plan to build different levels in the game to allow students to rotate through team roles.

Collaborative activities can be enhanced through different modalities. We implemented a number of functions that would allow users to communicate to each other across platforms, including but not limited to "light beacons" that can be placed within the virtual environment and functions such as SLAM microscopy that reflect how real-world scientists mark organelles. These functions do not only allow users to communicate with each other through non-verbal manners, but also enhance their collaborative experience and create embodied learning within the virtual environment.

# CHALLENGES

There have been a number of challenges that we have confronted while building and implementing Cellverse. As Cellverse is a complex environment with many moving parts, users risk becoming nauseated if there is too much activity, or not enough computer processor power to render the activity in real time. A high frame rate, thus, is vital for a smooth VR experience; too much detail or too many objects within the virtual world can reduce the frame rate and cause nausea (Jerald, 2015). We have had to compromise authenticity with playability, and reduced the details of certain structures in order to maintain a comfortable frame rate.

Creating a balanced flow of information between the two players has been challenging. Effective and worthwhile collaboration happens when each player is equally involved and are able to fill in whatever information their partner does not possess. We have explored different ways to foster collaboration through distribution of information resources to the players.

While our goal is to create an authentic environment, scientific understanding is continually advancing. We have also had to make choices about the specificity of our cellular model and the number of processes we can represent in a realistic design timeframe. We have also noted that in an ever-evolving field like microbiology, application authenticity in educational material remains a challenge. There are new discoveries made regarding cells and cell structure every day, and it is in our best interest to make Cellverse as accurate to these discoveries as possible. However, it sometimes means that we do have to change aspects of the game that may not be immediately noticeable to student players. Although they may not be consciously aware of these changes, it is our belief that making the Cellverse environment authentic will allow students to come away with a more wellrounded understanding of cells.

# FEASIBILITY

We are also attuned to how CLEVR could be integrated into curricula and implemented in classrooms. Our partner teachers

have confirmed that the cell and central dogma are important topics in introductory biology. Teachers are helping us imagine how to incorporate VR technology in a feasible way in today's classrooms, and also provide insight into design features of the game that can help the game run smoothly. For example, while students may be excited to try the 3D VR experience, having all students in the class in headsets simultaneously may be a challenge. Conversely, some students may not want to wear headsets, or may be absent from school during the activities, so the activity should be designed so that team members can take on different roles and responsibilities if not all members are there.

Despite these opportunities, the cost of developing quality educational materials remains relatively high. Although the cost of VR has decreased over the years (Korbey, 2017), investing in VR requires significant resources. The labor involved in creating immersive, interactive, and accurate educational VR material is also great. It is then necessary to capitalize on all possible affordances of VR, and to carefully allocate resources so that more individuals can participate perhaps at once—and benefit from the experience in the long term.

# CONCLUSION

Now that VR technology is within reach of educational settings, learning designers and educators can focus on how best to incorporate VR into educational contexts. In this article, we discuss and provide an example of how VR experiences can represent authentic contexts, focus on embodied experiences, and how to structure the environment to foster teamwork and collaboration by having participants view and synthesize different types of data across immersive and non-immersive formats. These parameters can be used to develop effective and engaging learning environments based on our current understanding of VR in education. When moving forward, researchers, developers, and educators should investigate how each of these factors can be fashioned to optimize learning, identify affordances and challenges that may emerge as VR becomes more widespread, and incorporate findings and feedback into future development.

# ETHICS STATEMENT

This study was carried out in accordance with the The Common Rule, 45 CFR pt. 46, with informed written consent from all subjects. All subjects gave written consent in accordance with the Declaration of Helsinki. The protocol was approved by the Committee on the use of Humans as Experimental Subjects (COUHES) at MIT (#10707095354).

# AUTHOR CONTRIBUTIONS

MT did the planning for the paper, read and synthesized literature notes, and wrote the paper. AW read and compiled paper notes, wrote the section on students' understanding

# REFERENCES


of cells, and helped review drafts of the manuscript. DR helped review the paper from a game designer perspective and edited the paper. EK advised MT as she formulated the paper idea, helped imagine the direction for the paper, and reviewed and edited the paper multiple times, providing indepth feedback.

# FUNDING

The project was funded by a grant from Oculus Education.


Geometry. University College London, 1–7. Available online at: http://citeseerx. ist.psu.edu/viewdoc/download?doi=10.1.1.99.5391&rep=rep1&type=pdf


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Thompson, Wang, Roy and Klopfer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Can Simulator Sickness Be Avoided? A Review on Temporal Aspects of Simulator Sickness

#### Natalia Duzma ˙ nska ´ 1 \*, Paweł Strojny1,2 and Agnieszka Strojny1,2

<sup>1</sup> R&D Unit, Nano Games sp. z o.o., Kraków, Poland, <sup>2</sup> Institute of Applied Psychology, Faculty of Management and Social Communication, Jagiellonian University, Kraków, Poland

Simulator sickness is a syndrome similar to motion sickness, often experienced during simulator or another virtual reality (VR) exposure. Many theories have been developed or adapted from the motion sickness studies, in order to explain the existence of the syndrome. The simulator sickness can be measured using both subjective and objective methods. The most popular self-report method is the Simulator Sickness Questionnaire. Attempts have also been made to discover a physiological indicator of the described syndrome, but no definite conclusion has been reached on this issue. In the present paper, three temporal aspects of the simulator sickness are discussed: the temporal trajectory of the progression of simulator sickness, possibility of adapting VR users in advance and persistence of the symptoms after VR exposure. Evidence found in 39 articles is widely described. As for the first aspect, it is clear that in most cases severity of the simulator sickness symptoms increases with time of exposure, although it is impossible to develop a single, universal pattern for this effect. It has also been proved, that in some cases a threshold level or time point exists, after which the symptoms stop increasing or begin to decrease. The adaptation effect was proved in most of the reviewed studies and observed in different study designs – e.g., with a couple of VR exposures on separate days or on 1 day and with a single, prolonged VR exposure. As for the persistence of the simulator sickness symptoms after leaving the VR, on the whole the study results suggest that such an effect exists, but it varies strongly between individual studies – the symptoms may persist for a short period of time (10 min) or a relatively long one (even 4 h). Considering the conclusions reached in the paper, it is important to bear in mind that the virtual reality technology still evokes unpleasant sensations in its users and that these sensations should be cautiously controlled while developing new VR tools. Certainly, more research on this topic is necessary.

Keywords: simulator sickness, temporal aspects, time, virtual reality, VR

# INTRODUCTION

# Virtual Reality – A Definition and the Most Commonly Used Devices

The simplest definition of virtual reality states that is "the use of computer-generated virtual environments and the associated hardware to provide the user with the illusion of physical presence within that environment" (Jayaram et al., 1997, p. 576). Virtual reality systems are widely used in the fields of scientific research (e.g., Anderson-Hanley et al., 2011), anxiety disorders therapy

#### Edited by:

Albert Rizzo, University of Southern California, United States

#### Reviewed by:

Eugene Nalivaiko, University of Newcastle, Australia Inmaculada Remolar Remolar, Universitat Jaume I, Spain

\*Correspondence: Natalia Duzma ˙ nska ´ nduzmanska@nano-games.com

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 28 June 2018 Accepted: 16 October 2018 Published: 06 November 2018

#### Citation:

Duzma ˙ nska N, Strojny P and ´ Strojny A (2018) Can Simulator Sickness Be Avoided? A Review on Temporal Aspects of Simulator Sickness. Front. Psychol. 9:2132. doi: 10.3389/fpsyg.2018.02132

**84**

(e.g., Gerardi et al., 2010; Łukowska, 2011) or for professional training [e.g., in the army – (Braithwaite and Braithwaite, 1990); fire department – (Bliss et al., 1997); aviation – (Kennedy et al., 2000); medicine – (Bric et al., 2016)].

Many different virtual reality hardware systems and devices have been developed over the years and will be briefly described herein. Nowadays, the most popular are the head-mounted devices (HMDs), such as HTC Vive or Oculus Rift. The VR user wears a headset and holds two controllers which enable them to move and interact in a three-dimensional environment. Such devices are now being sold commercially. According to a recent Business Insider report (Hollander, 2018), there are four main VR headset types: stand-alone (which do not need any additional hardware to function), smartphone-powered, PCpowered and game console-powered. The report predicts that the stand-alone headsets will grow in popularity in the coming years. This could be of advantage for research employing the VR technology, as eliminating the wire which connects the headset to a PC or a console will make conducting experiments with multiple participants at the same time much easier.

Another example of a VR system is a CAVE (cave automatic virtual environment). In such system, the environment is displayed and generated on several projectors, directed to the walls of the room and the user wears 3D glasses.

Different additional devices are used in order to provide the VR user with a realistic, multisensory experience. For example, treadmills are often used to simulate movement in the virtual environment (e.g., Jaeger and Mourant, 2001; Sinitski et al., 2018). For driving and flight simulators, a part of a plane cockpit or a body of a car may be used (e.g., Feenstra et al., 2011; Domeyer et al., 2013; Reinhard et al., 2017).

# Definition of Simulator Sickness

Simulator sickness is a syndrome similar to motion sickness and can be experienced as a side effect during and after exposure to different virtual reality environments. Originally, the term "simulator sickness" was linked to effects induced by simulators consisting of a platform, often mobile, and with the visual stimuli generated by a computer, without head-tracking. The invention of HMDs led to developing another term, "cybersickness," as such devices generate another issues, which may also lead to the unpleasant symptoms, such as the delay between actual head movements and the generated image. However, nowadays both of the terms are being used by researchers to describe the unpleasant symptoms evoked by the virtual reality technology (e.g., Sharples et al., 2008; Bruck and Watters, 2011; Serge and Moss, 2015; Lee et al., 2017).

The symptomatology and severity of the malaise depend on many variables – e.g., age, gender, stress, anxiety, one's individual proneness to such ailment or the characteristics of the simulator itself (Kolasinski, 1995; Cobb et al., 1999; Mourant and Thattacherry, 2000; Jaeger and Mourant, 2001; Lin et al., 2002; Sharples et al., 2008; Brooks et al., 2010; Bruck and Watters, 2011; Classen et al., 2011; Moss and Muth, 2011; Zuzewicz et al., ˙ 2011; Biernacki and Dziuda, 2012; Dziuda et al., 2014; Helland et al., 2016; Lee et al., 2017). Lin et al. (2002) have also suggested that a relationship between one's enjoyment experienced during simulator training may lead to alleviation of the simulator sickness symptoms. A very detailed list of variables, which may have influence on simulator sickness occurrence and severity, may be found in the report by Kolasinski (1995).

The main aims of this paper are to summarize the existing knowledge on simulator sickness with emphasis on its temporal aspects, to provide an overview of research on this topic and to propose further research directions and practical implications for virtual reality developers.

Firstly, the most common theories which could serve as an explanation of the simulator sickness phenomenon will be discussed. Secondly, the methods of simulator sickness measurement, both subjective and objective, will be described in detail. Thirdly, three temporal aspects of simulator sickness will be discussed based on evidence found in empirical studies. And lastly, general conclusions drawn from the reviewed studies and practical implications for further research will be provided.

# Theories Potentially Explaining Simulator Sickness

Several theories have been developed to explain why individuals suffer from motion sickness. According to authors focused on virtual simulators, they may be also applicable in the field of simulator sickness during exposure to virtual reality (Brooks et al., 2010). The Sensory Conflict Theory, proposed by Reason and Brand (1975), explains motion and simulator sickness through a conflict that arises between different sensory systems; namely the signals from visual, vestibular and non-vestibular proprioceptors differ from one another and inevitably differ with expectations based on previous experience. According to the theory, only the conflict between present sensory information and that retained from immediate past elicits sickness. That is claimed on the basis of observation that continuous exposition to a stimulus results in eventual disappearance of symptoms (adaptation) even if the present conflict still exists (Reason, 1978). The vestibular system, which is responsible for perception and detection of direction, is crucial for occurrence of simulator and motion sickness symptoms (Reason and Brand, 1975).

Reason (1978) proposed the Neural Mismatch Model which identifies the source of simulator sickness in discrepancies between expectations derived on a basis of present moves and contents kept in the neural store which, according to Reason (1978), contains information about typical combination of command signals (efference) and the integrated patterns of inputs from the orientation senses generated by them (reafference). That is the theoretical mechanism of adaptation to motion sickness observed for example by Reason and Brand (1975). To conclude, according to this model, sickness occurs when the received sensory information does not match one's experiences based on past situations.

Another theory, widely used to explain simulator and motion sickness, is the Postural Instability Theory. Riccio and Stoffregen (1991) have criticized the Sensory Conflict Theory – they state that sensory conflicts such as those described by Reason and Brand (1975) happen very often and are nothing unusual. Furthermore, the difference (or lack of it) between what one's

senses experience and what an individual expects to feel is immeasurable. They have proposed that the symptoms of motion or simulator sickness may be experienced when one has been exposed to long-lasting postural instability and has not yet learned how to adjust to this situation and maintain proper balance. The most vivid example of such phenomenon is the feeling of instability one experiences when traveling by ship. A similar situation occurs during rollercoaster rides as well (Riccio and Stoffregen, 1991).

The two aforementioned theories are most prevalent in the literature concerning simulator sickness. Other theoretical approaches to this phenomenon have been developed as well. The Eye Movement Theory developed by Ebenholtz (1992, 2001), uses the vagus nerve stimulation as an explanation for motion and simulator sickness. The mechanism is initiated by two specific eye movements (namely the optokinetic nystagmus and vestibular ocular response<sup>1</sup> ) creating tension in the muscles of the eye, which stimulates the vagus nerve and leads to unpleasant symptoms such as difficulty concentrating, eye strain and headaches.

Bruck and Watters (2011) have also attempted to develop a comprehensive theory of cybersickness. They suggest a following chain of causality: an increase in arousal leads to changes in respiration rate, which causes carbon dioxide levels in cerebral blood flow to decrease. These changes lead to the symptoms of simulator sickness: dizziness, fatigue, difficulty concentrating, fullness of head and anxiety. The authors propose dividing the simulator sickness symptoms into four factors:


The Evolutionary Theory, proposed by Treisman (1977), originally explains the motion sickness, but its assumptions can be adapted to simulator conditions as well. Treisman (1977) suggests that people experience motion sickness, because – evolutionally – our species has not managed to adapt to new transportation modes yet. Therefore, the human body reacts to sensory conflicts with nausea – it acts as if poison had been ingested (Brooks et al., 2010). It can be assumed that similar reasons may stand behind the simulator sickness symptoms, as the human species had even less time to adapt to the virtual reality conditions. Although this theory does not propose any physiological mechanisms that may be responsible for experiencing simulator sickness, it can give a valuable insight on reasons why such ailment exists.

# Measurements of Simulator Sickness Self-Report Measures

#### **Simulator Sickness Questionnaire (SSQ)**

Originally published by Kennedy et al. (1993), the Simulator Sickness Questionnaire (SSQ) is a tool widely used for assessing the subjective severity of simulator sickness symptoms. In the pre-experiment part of the questionnaire, information about the current physical condition and participant's experience with simulators is collected. The questionnaire consists of 16 items, derived from the Pensacola Motion Sickness Questionnaire (MSQ). Data collected during previous simulator studies using the MSQ was gathered and the items describing symptoms with less than 1% frequency of appearance or with no change in frequency between pre- and post-exposure were excluded from further analyses (12 of 28 items of MSQ). The severity of each symptom in the SSQ is measured on a four-point scale (0-3).

According to the results of a factor analysis, the items of the SSQ can be grouped into three factors: nausea (e.g., sweating, difficulty concentrating, stomach awareness), oculomotor disturbance (e.g., headache, eyestrain, blurred vision) and disorientation (e.g., fullness head, dizziness with open and closed eyes, vertigo). The factors are not entirely independent – some of the items were included in more than one factor, e.g., the score on difficulty focusing is used to assess the severity of oculomotor disturbance and disorientation. In total, there are five such items. To calculate scores on each factor, all relevant items' scores should be added (each factor consists of 7 items) and multiplying the obtained sum by a specific weight: for nausea by 9.54 (therefore the scores on this scale range from 0 to 200.34), for disorientation by 13.92 (scores ranging from 0 to 292.32) and for oculomotor disturbance by 7.58 (with scores ranging from 0 to 159.18).

The overall score can be measured as well and it can serve as an indicator of total severity of the simulator sickness. It is calculated by adding scores on the 16 items and multiplying the achieved sum by 3.74, therefore the total score can range from 0 to 179.52. In addition to the quantitative data, qualitative information about peculiar sensations during the simulator experience and symptoms other that those listed in the main part of the questionnaire can be gathered (Kennedy et al., 1993; Biernacki et al., 2016).

Simulator Sickness Questionnaire has been used in numerous studies (e.g., Lampton et al., 1994; Mourant and Thattacherry, 2000; Jaeger and Mourant, 2001; Lin et al., 2002; Min et al., 2004; Sharples et al., 2008; Bruck and Watters, 2009a,b, 2011; Moss and Muth, 2011; Biernacki and Dziuda, 2014; Brunnström et al., 2017). The brevity and simplicity of the questionnaire are its assets, as in many study designs it is being used at least twice to assess the changes in occurrence and severity of simulator sickness' symptoms. In most cases SSQ is used as a paper-andpencil test, but it can also be conducted orally – as in Min et al. (2004) study, where the items of the questionnaire were read to the participants by the experimenter (according to the authors of

<sup>1</sup>Optokinetic nystagmus – an eye pursues a target object from one end of a visual field to the other. When the eye can pursue the object no further, it snaps back to the far side of the visual field where it begins to pursue again. Vestibular ocular response – responsible for keeping a target object on the fovea, the center of the retina where one's vision is sharpest, when the head is turning.

the study, conducting the SSQ orally requires only circa 30–40 s) or in the study by Moss and Muth (2011), where a cassette was pre-recorded and then played back to the participants.

#### **Other self-report measures**

fpsyg-09-02132 November 2, 2018 Time: 17:7 # 4

It should be noted that in some studies self-report methods of measurement different from the Simulator Sickness Questionnaire had been used – Brooks et al. (2010) report having used the Motion Sickness Assessment Questionnaire, Malinska ´ et al. (2014) used a self-developed, concise questionnaire and Helland et al. (2016) measured subjective severity of simulator sickness symptoms simply by asking – "To what extent did you experience simulator sickness during the driving test?". Several other authors used other short self-report measures (e.g., McCauley et al., 1990; Helland et al., 2016; Reinhard et al., 2017). As these methods are either a questionnaire originally created for measuring a different ailment or have not been psychometrically tested, they will not be described more widely herein.

### Physiological Measures

Although a conclusion has not yet been reached on which specific physiological parameters are the best indicators of simulator sickness, some researchers (e.g., Min et al., 2004; Bruck and Watters, 2011; Zuzewicz et al., 2011 ˙ ) have tested various physiological variables and some of them appear promising for evaluating simulator sickness without relying on self-report measures or as a supportive method for questionnaires such as SSQ. It has been noted (Min et al., 2004) that during driving (and most of the studies concerning simulator sickness were conducted with various driving simulators) the increase of autonomic nervous system activation may relate to tension, which then causes the heart rate and skin conductance to increase and skin temperature to decrease. Moreover, the physiological measures may be useful, as it has been proved that the subjective evaluation of simulator sickness (e.g., with the SSQ questionnaire) is slightly delayed when compared to the physiological indicators (Min et al., 2004). Therefore, establishing the best physiological indicators of simulator sickness could shed more light on the exact triggering time of the syndrome and therefore allow a more accurate description of the temporal characteristics of simulator sickness.

As no unambiguous physiological indicators of simulator sickness have been discovered, some examples of use of physiological indicators for measuring this syndrome will be described in this paragraph.

#### **Autonomic nervous system**

Respiration (breaths per minute). According to one of the theories of simulator sickness (or "cybersickness," as referred to by the authors; Bruck and Watters, 2011), the changes in respiration rate are crucial to evoking the unpleasant symptoms, especially when the person subjected to a virtual reality environment has no control. Respiration loads two factors in the theory of cybersickness developed by Bruck and Watters (2011): Vision and Arousal. They even propose that hyperventilation may be the cause of arousal experienced by individuals exposed to high levels of movement in a virtual reality. Empirical evidence of changes in respiration rate during VR exposure were achieved by Kim et al. (2005) – in their study a decrease in the respiration rate (when compared to baseline levels) was observed. What is more, a positive correlation was observed between respiration rate and the Simulator Sickness Questionnaire scores (for all of the subscales and the total score, with the r values oscillating between 0.342 for nausea and 0.392 for the total score).

## **Heart rate**

Bruck and Watters (2011) propose that the heart rate may serve as an indicator of simulator sickness, as it had been previously proved that it correlates with such syndrome. In experiments conducted by Cobb et al. (1999) heat rate tended to accelerate during the simulator task and returned to a resting rate in approximately 30 min after completing the task. Furthermore, the heart rate of the participants who reported more severe simulator sickness symptoms was also higher than the heart rate of the individuals who did not experience such unpleasant sensations. Additionally, the heart rate of the participants who showed symptoms of adapting to the VR (virtual reality) conditions during several exposures decreased over the three sessions. Changes in heart rate were observed in a couple of studies. Dahlman et al. (2008, 2009) have noted an increase in heart rate during a VR exposure. In a study by Gavgani et al. (2016) the subjects participated in three roller coaster simulator rides, which took place on separate days. For the first 2 days, an initial tachycardia and tachypnoea that gradually lowered during the ride was observed. No such patterns were discovered on the third day.

Other autonomic variables. In the course of research, some other measures of the autonomic nervous system activity have been tested. This paragraph will provide a brief overview of them. Kim et al. (2005) have observed an interesting pattern of the gastric tachyarrhythmia changes – in increased significantly in the first 4 min of virtual reality exposure and then continued to increase until the final 4 min of a 9.5 min trial. The eyeblink rate did also change in the study by Kim et al. (2005) – it decreased in the first minute of the exposure (when compared to the baseline rate), but then increased and in the middle of the trial it was significantly higher than the baseline level. Another interesting measure is the skin temperature – as observed by Kim et al. (2005), when measured at the fingertip, the skin temperature decreased in the middle of the trial and remained significantly lower than the baseline level even after leaving the VR environment. Such decrease in skin temperature was also observed by Chung et al. (2007) and Brooks et al. (2010). Furthermore, according to the results obtained by Kim et al. (2005), the respiratory sinus arrhythmia (a variation in heart rate occurring during breath cycle) increases during VR exposure.

What is interesting about the above mentioned measures is the fact that for all of them, except for skin temperature, positive correlations with the subjective measurement of the simulator sickness (SSQ) were observed (Kim et al., 2005), with the Pearson r values ranging between 0.265 (eyeblink rate and oculomotor disturbance scale) and 0.359 (gastric tachyarrhythmia and nausea scale).

Furthermore, in a study by Gavgani et al. (2016), a rapid increase in finger skin conductance levels was observed during

the first minute of the VR exposure – the subjects experienced increased sweating in the finger; this trend was present until the end of the experimental trial. However, what is the most interesting, in the cited study phasic SCL activity in the forehead was observed during the experimental trial (compared to none during baseline measurement). This activity – and only this of all of the measured physiological responses – was proven to be associated with the experience of nausea.

The authors (Gavgani et al., 2016) give an interesting interpretation of their findings, which may shed new light on the physiological components of the simulator sickness experience. Some of the physiological symptoms (initial tachycardia, tachypnoea, finger sweating) were present at the initial phase of the VR exposure, in the time during which no self-reported nausea was present. This conclusion is supported by the fact that the above mentioned effects (except for finger sweating) became non-significant on the last, third exposure. The authors conclude that these symptoms may be evoked by emotions and arousal connected with the novelty of the VR experience. The forehead sweating, however, is related to the development of nausea. These results correspond with Treisman's (1977) evolutionary theory of motion sickness – reducing the body temperature by increasing sweating serves as a survival strategy during intoxication.

#### **Central nervous system**

As a measure of the central nervous system activation, EEG has been used in some of the studies (Min et al., 2004; Chung et al., 2007). According to the results obtained by Min et al. (2004), there are significant differences in brainwaves patterns between rest and driving in a driving simulator. Such results have been obtained both for the frontal (Fz) and parietal lobe (Cz), giving similar patterns. After 5 min of simulator exposure, the δ/total increased and α/total, ß/total and θ/total decreased significantly in 5–35 min of simulator exposure. Furthermore, the δ/total at Fz correlates positively, and both θ/total and ß/total at Fz and Cz negatively, with the total SSQ score. The correlation with the SSQ score was the strongest for the θ/total parameter (r = −0.842 at Fz and r = −0.93 at Cz), therefore the authors of the study (Min et al., 2004) propose that it could serve as the most effective physiological indicator of simulator sickness occurrence. This proposal was also supported by Chung et al. (2007).

# Behavioral Measures – Postural Stability Tests

When relying on the Postural Instability Theory (Riccio and Stoffregen, 1991), one could use a postural stability test in order to assess the lack of postural stability as a specific manifestation of simulator sickness. Mourant and Thattacherry (2000) report using such test in their study. It is a simple and brief method – the person is asked to stand on the leg of their choice for 30 s in two separate trials. The time of standing without putting the other leg down is recorded and can be compared to the results of the same test after experimental manipulation or can serve as an independent measure. Although this method does not give a broad insight into simulator sickness symptoms, it can be useful when assessing changes in postural stability dependent on simulator exposure.

Cobb et al. (1999) report using a more complex set of postural stability tests: in their research program, the following methods of measurement were used: measuring the extent to which a static posture could be held, measuring the extent of hip sway over a 30 s period, walking on the floor and navigating over an uneven path with open eyes. Additionally, the authors administered two scales: task difficulty scale and subjective postural stability scale (Postural Stability Questionnaire – PSQ; Hamilton et al., 1989) after completing all the tasks.

# Temporal Aspects of Simulator Sickness

Questions regarding the temporal characteristics of the virtual reality experience which influence simulator sickness seem to recur in many papers (e.g., Kennedy et al., 2000; Moss and Muth, 2011; Domeyer et al., 2013). Although no unambiguous answers have yet been provided, some useful and promising leads can be found in literature and will be discussed herein. Since the main goal of the present work was to review research on simulator sickness from the temporal perspective, we decided to focus on research regarding one (or more) of the three issues described below.

As Kennedy et al. (2000) have observed, there are two main phenomena regarding the temporal aspect of simulator sickness: that the severity of simulator sickness increases with the increase of exposure duration during a single session, and that subjecting a person to several repeated simulator exposures may result in adaptation to the simulator conditions and thus in decrease of simulator sickness symptoms severity. The aforementioned aspects will be discussed in the present paper, as they seem to be crucial as far as virtual reality development is concerned. Furthermore, according to some research (e.g., Moss and Muth, 2011; Biernacki and Dziuda, 2014; Malinska et al., 2014 ´ ), the simulator sickness symptoms appear to persist for some time after the simulator exposure – this aspect will be discussed below as well.

# MATERIALS AND METHODS

# Search Strategy

A search of literature was performed in three electronic databases (Web of Science 'all databases,' PsychArticles, Scopus) with no publication date restriction. Since temporal aspects of simulator sickness rarely are the main focus of studies, we decided to retrieve a wide range of articles using the broadest term "simulator sickness" and assuming intensive article selection in subsequent stages. Thousand two hundred records were obtained. The search was conducted on 19th April 2018.

# Study Selection

Authors conducted a title and abstract screening, in order to exclude obviously irrelevant articles. Following keywords were used: time, temporal, durat<sup>∗</sup> , adapt<sup>∗</sup> , persist<sup>∗</sup> . The articles which titles and abstracts suggested an irrelevant area of research were excluded on this basis (1086 records). In the second stage of the screening process, full texts were retrieved and duplicated records removed (34 records). For 10 records full texts were unavailable and thus these records were excluded from the database as well. 70 articles were retrieved and evaluated in full text using the following criteria:


fpsyg-09-02132 November 2, 2018 Time: 17:7 # 6


After this process, 30 articles were retrieved. The authors decided to add 5 articles on the basis of hand search and previous knowledge. The final database consisted of a total of 35 articles (41 studies). A flow chart describing the search and screening process is presented in **Supplementary Figure S1**.

# RESULTS

# The Temporal Trajectory of the Progression of Simulator Sickness

Studies on simulator sickness have been conducted since 1990s, using a wide array of virtual reality devices. Therefore, it is important to emphasize the fact that direct comparisons between studies using different hardware should be treated with extreme caution. Some trends may be observed, but it should be always borne in mind that for different devices and scenarios the temporal patterns of simulator sickness may vary significantly. Moreover, as some of the cited studies have been conducted almost 20 years ago, caution should be taken while making conclusions. However, the insight provided by the researchers appears to be valuable – while the technological development might have solved some of the problems, the methodology and qualitative conclusions are worth knowing.

In one of the studies conducted by Cobb et al. (1999), four subjects were immersed in a virtual reality environment for 1– 2 h. Simulator sickness severity was measured with the Simulator Sickness Questionnaire. The participants were asked to remain in the virtual reality for up to 2 h. All participants reported the severity of symptoms increasing up to 1 h of exposure. Two of the participants withdrew after an hour when the simulator sickness symptoms experienced by them were too severe (mean scores for nausea: M = 67, oculomotor disturbance: M = 57 and disorientation: M = 82). The remaining two participants completed the 2-h immersion and reported that after 75 min the severity of symptoms decreased greatly. This suggests that although the simulator sickness symptoms severity increases with time, for some individuals it may be possible to adapt to the VR environment during a single exposure. Unfortunately, the sample in the study was too small to provide information on statistical significance of these effects. Nevertheless, these results are interesting and worth being taken into consideration when planning further experiments on extended VR exposure.

Kennedy et al. (2000) examined SSQ data from a military pilots' flight simulator training database and categorized them by exposure duration into four categories (0–1, 1–2, 2–3 h, 3 or more hours). An analysis of variance revealed that the mean SSQ scores increase gradually when exposure duration increases. This trend proved to be statistically significant. No information on statistical significance of differences between each of the categories was given and it also should be noted that the analyzed data concerned many different simulator environments. It was also a between-subject design, therefore no conclusions about individual temporal patterns of simulator sickness severity can be made.

Min et al. (2004) have tested various measures of simulator sickness severity. In their study, both physiological and self-report methods were used – the Simulator Sickness Questionnaire was used for assessing the subjective severity of the syndrome. Only the results of the psychometric measurement will be reported herein. After baseline signal measurement and pre-experiment SSQ administration, the participants of the study drove a car simulator for 60 min, during which physiological measurements were conducted and the SSQ was completed orally after every 5 min of the simulator exposure, as well as after completing the whole trial. The authors of the study report that all of the participants showed symptoms of nausea, disorientation (after 10 min of simulator exposure) and oculomotor disturbance (after 25 min). The first significant difference between the baseline SSQ score and trial score appeared 10 min after beginning of the trial. The obtained results confirm the hypothesis that the severity of simulator sickness increases with time.

Moss and Muth (2011) tested several characteristics of HMDs as possible factors influencing simulator sickness severity, as well as the effect of a prolonged exposure. Only the latter of these effects will be reported herein. The participants' task was to locate several objects in the virtual environment (a virtual laboratory), according to verbally given instructions, using only head movements. Each participant completed two practice sessions and five 2-min trials with 1-min breaks between them. A number of Simulator Sickness Questionnaire results were collected: before the experiment, after a practice session, after each trial, 5 and 10 min after the experiment. It was noted that the severity of simulator sickness symptoms increased with time – a significant effect of duration of the VR exposure was revealed. The most severe symptoms were noted after the last trial.

The type of walking interaction was the main topic explored by Lee et al. (2017), but their results also provide information about the temporal characteristics of simulator sickness. In their experimental design three types of walking control were included:


All of the participants of the study were exposed to three different VR environments (a cartoon town, a realistic nature environment and a low poly<sup>2</sup> landscape in a three-step walking interaction: they either experienced them in the order of: gamepad, hand interface, walking simulator or in the reverse order – each of the participants completed nine VR experiences in total. The following variables were tested in the study: immersion, presence and simulator sickness (measured with the Simulator Sickness Questionnaire). The authors reported that the simulator sickness symptoms became more severe with time, although on the whole they were of moderate severity.

The above-mentioned study results support the hypothesis, that the severity of simulator sickness does increase with time during a single exposure, to various extents, which may differ depending on many variables (e.g., simulator type and its characteristics, length of the whole exposure, individual characteristics of the participants, etc.). Such results are confirmed in many other studies, which will be briefly summarized herein. Lo and So (2001) have confirmed that the nausea severity (measured by one question with answers ranging from 0 – "no symptom" to 6 – "moderate nausea, want to stop") increases linearly with time during a 20-min exposure. Furthermore, the increase was significant in all of the comparisons, except for the one between the 15th and 20th minute of the trial. A similar study was conducted (So et al., 2001), and during a 30-min exposure the nausea ratings (measured in the same way as above) increased as well, but the differences were significant only in the 5th and 10th minute. Jarchow and Young (2007) have also measured the simulator sickness severity by asking just a single question (with a scale from 0 – "normal" to 20 – "about to vomit"). The subjects were tested on two consecutive days, as the main aim of the study was to assess the adaptation effect. It was however, discovered as well that within a single session the severity of symptoms increases, but this effect was observed in only one of the experimental conditions. In the study by Classen and Owens (2010), simulator sickness severity was measured at three time points: before VR exposure, after a 5-min acclimation exposure and after a 20-min trial. The obtained results indicated that the simulator sickness severity increased between the baseline score and both after-acclimation and post-exposure, but no significant differences were discovered between the after-acclimation and post-exposure scores. Therefore, one may presume, that the peak simulator sickness severity in this study was reached very early. However, no data was gathered during the 20-min exposure, so it is possible that some differences might have been discovered if more systematic simulator sickness measurements had been conducted. A similar procedure was conducted by Sinitski et al. (2018) – they measured the simulator sickness severity (with the SSQ) before the exposure, after an acclimation period (which lasted for 15 min) and after a 45-min trial. In this study, however, only a small increase in the disorientation scale was observed after the acclimation period and these symptoms decreased by the end of the session. Again, the period between the second and the third measurement was quite long, and therefore it is impossible to

An experiment conducted by Moss et al. (2008) consisted of a short practice and five 2-min experimental trials. It was confirmed that the simulator sickness (measured with the SSQ) severity increases with time – it was more severe after the last (5th) trial than: before the practice, after the practice, after the 1st, 2nd, and 3rd trials. As no significant differences were discovered between the 4th and 5th trial, it may be hypothesized that after circa 9 min of exposure the simulator sickness has reached its peak severity and would not become more unpleasant if the exposure duration was even longer. In a similar study (Moss et al., 2011), a phenomenon of the simulator sickness severity (measured with the SSQ) increase with the increased VR exposure duration was confirmed. Serge and Moss (2015) measured simulator sickness severity with the Revised Simulator Sickness Questionnaire and proved that it does increase with time when measured before VR exposure and after 8 and 16 min of exposure. Singer et al. (1998) report as well that the simulator sickness severity increases with time during a VR exposure, although the difference between a "Mid-Experiment" and "Post-Experiment" scores was not significant, suggesting an appearance of a threshold simulator sickness level. The authors, however, did not give information on how long the trials were, and therefore any conclusions drawn from this study should be treated with caution. Feenstra et al. (2011) have discovered a slightly different phenomenon than the ones above described – in their study, the differences in simulator sickness severity began to become statistically significant after the participants spent 10 min in the VR and then it increased until the end of the 20-min trial.

A systematic increase of simulator sickness severity (measured with the SSQ) with time was confirmed by Chung et al. (2007), Park et al. (2008), and Choi et al. (2009) during a 60-min trial and Aldaba et al. (2017), who measured simulator sickness severity with the SSQ, and by Reinhard et al. (2017), who used the Fast Motion Sickness Scale (FMS – a single-item scale, the scores on which range from 0 to 20). An increase of simulator sickness symptoms severity was also observed by McCauley et al. (1990), when it was rated on a 7-point scale ("normal, symptom-free" – "severe discomfort, I am unable to continue") – it increased between measurement time points: before the exposure, in the middle of the 10-min task and after the whole 10-min task. There were 4 such trials and an increase in severity of the symptoms was observed for all of them. A brief summary of all reviewed studies is provided in the **Supplementary Table S1**.

Several conclusions can be drawn from the perspective of the temporal trajectory of the progression of simulator sickness on the basis of the studies retrieved. Firstly, there is empirical evidence to expect that severity of simulator sickness grows along time of exposure, as several studies using various approaches confirmed this hypothesis. In light of the reviewed research, this trend seems to be stable regardless of the technological progress in the field of VR presentation – the oldest studies (McCauley et al., 1990) and the most recent one (Sinitski et al., 2018) lead to the same conclusion. Even using between-subject comparisons leads – in most of the cases – to the conclusion that the severity of simulator sickness symptoms is greater when the exposure

thoroughly analyze the pattern of the symptoms during the whole exposure.

<sup>2</sup>Consisting of a small number of polygons.

duration is longer (e.g., Kennedy et al., 2000). However, it is important to note that several moderators, which are not the main focus of this paper, may play a role here – for example, a simulator control method. Secondly, it is difficult to establish a universal rule regarding the maximum time individuals can spend in VR on the basis of the analyzed study results. On the other hand, in most of the studies the simulator sickness symptoms were experienced by all of the participants, not only the ones who reported some kind of tendency to feel sick.

Moreover, in some of the studies it was observed that the simulator sickness severity increases with time, but after reaching a certain level or after a certain amount of time it either begins to decrease (Cobb et al., 1999; Sinitski et al., 2018) or remains on the same level (Singer et al., 1998; Lo and So, 2001; So et al., 2001; Moss et al., 2008; Classen and Owens, 2010). It can lead to a conclusion, that during a single VR exposure it is possible for some people to achieve simulator sickness adaptation (or, for some simulator types, to evoke the adaptation effect). However, it should be further explored whether this effect transfers to subsequent VR sessions.

On the other hand, it has also been proved that in some cases the simulator sickness symptoms begin to show after some time spent in VR and that this time threshold may be different for various simulator sickness symptoms (Min et al., 2004; Feenstra et al., 2011). Although this type of evidence is less prevalent than the one described above, it is also worth being taken into consideration. If the symptoms start being unpleasant after some time, a single VR session should be short enough to prevent these symptoms from occurring.

Keeping in mind several moderators which may vary between software (e.g., way of control, setting, graphics quality), another strategy of testing temporal tolerance may be reasonable, viz. testing of certain VR software using precisely selected methods. In order to make it possible, various methods need to be integrated, and standardized methodology needs to be developed.

# Possibility of Adapting VR Users in Advance

As Nader and Kruszewski (2013) suggest, simulator sickness can be avoided when the virtual reality users are allowed a sufficient amount of time to adapt to the simulator conditions. They propose that such adaptation sessions may last for a number of days and involve an increase in time spent in the simulator during a single training, as well as an increased difficulty of the task. This proposal appears to be congruent with the assumptions of some of the theories. For example, according to the Neural Mismatch Model (Reason, 1978), unpleasant symptoms occur when the present sensory information is inconsistent with past experiences of the individual. Gaining such experience in the specific virtual reality environment might prevent the aforementioned conflict. Similarly, when one is allowed to immerse in virtual reality several times, one can learn how to maintain balance in such an environment – adaptation appears to be possible in the paradigm of the Postural Instability Theory (Riccio and Stoffregen, 1991) as well. It should also be emphasized that adaptation to simulator sickness in VR may be achieved not only by exposure to an identical virtual environment, but also by similar experiences, such as video gaming. It has been shown that individuals with more gaming experience and more self-reported "computer skills" experienced less unpleasant symptoms during a VR session (Häkkinen et al., 2006a). However, there are also studies which do not support this claim (e.g., Häkkinen et al., 2002, 2006b), therefore this issue needs further testing.

Some adaptation effects were observed by Lampton et al. (2000). In their study, five separate VR immersions were conducted (trainings 1 and 2 and missions 1, 2, and 3). The SSQ was administered before and after each immersion. The pre-post immersion score difference was significant for the first training and the second and third mission, and not significant for the second training and first mission. Therefore, it can be concluded, that after the first training the participants achieved some adaptation, but its effect wore off with time. Similarly, in the study by Domeyer et al. (2013), the adaptation effect was obtained during a series of VR exposures conducted on 1 day, and in this study the subjects did adapt to the simulator conditions (the effect was visible on the total Revised Simulator Sickness Questionnaire score). Such effects may occur even during a relatively short exposure, lasting 45 min in total (Sinitski et al., 2018). In the quoted study the participants experienced an increase in disorientation symptoms (measured with the SSQ) at first, but it decreased by the end of VR exposure. However, such effect was not confirmed for the remaining SSQ subscales and for the total score. Additionally, it should be stressed that all of the VR immersions of the two studies mentioned above took place during a single day, which is quite unusual for studies exploring adaptation effects – usually each of the immersions is conducted on a separate day.

In the study program developed by Cobb et al. (1999), 12 individuals participated in three consecutive virtual reality sessions, each of which lasted 20 min, with a 1-week break between the sessions. The simulator sickness symptoms severity (measured with the Simulator Sickness Questionnaire) decreased after each consecutive VR exposure, especially strongly for the disorientation symptoms, which is consistent with the results obtained by Sinitski et al. (2018). A similar effect of adaptation was observed by Braithwaite and Braithwaite (1990) and Bailenson and Yee (2006) – in their studies, the simulator sickness symptoms (measured with the SSQ) decreased in severity with time.

An interesting form of adaptation training was proposed by Smither et al. (2008). They tested the ability of a self-propelled rotation stimulation (SRS)<sup>3</sup> to provide adaptation to simulator sickness. Ten subjects took part in five SRS trials on separate days and on the last day were exposed to a VR, and 10 other subjects took part only in the latter part of the experiment, providing a control group. The control group experienced significantly more severe dizziness symptoms and higher total, disorientation and oculomotor disturbance SSQ scores. These results show that adaptation can be achieved without immersing in the virtual

<sup>3</sup> "In the SRS, participants were asked to raise their right hands above their heads and grasp their right earlobe with their left hand, bend at the waist, and spin in a clockwise direction under self-propelled condition. The participants spun 10 times in 30 s (20 RPMs) and this constituted a trial" (Smither et al., 2008, pp. 330–331).

reality, but some form of pre-immersion training is needed to prevent the unpleasant symptoms, as the participants from the control group, who did not have a chance to adapt in any form, suffered from the simulator sickness.

Kennedy et al. (2000) analyzed data collected from 53 individuals – military pilots, who participated in seven consecutive helicopter simulator trainings. A repeated-measures analysis of variance indicated that a monotonic decrease in simulator sickness severity (measured with the SSQ) as a function of flight number can be observed. Furthermore, for some subjects a floor effect was observed – they reached a total adaptation and the SSQ 0 score at some point, which did not increase in further trials. This effect is responsible for the deceleration in the decline of simulator sickness severity with time. The authors propose that, according to their results, short, repeated simulator exposures may be used in order to achieve adaptation to the VR environment and to prevent simulator sickness. Moreover, they further conclude that the decrease in simulator sickness severity after several trials exceeds the increase in severity with a single longer exposure duration.

Brooks et al. (2010) conducted two studies – an exploratory and a confirmatory one. In the exploratory study (a combination of results of three independent studies), the participants were immersed in a driving simulator. After a training session, four 5-min trials using slightly different conditions (e.g., a curvy road instead of a straight one) were conducted. Between the sessions, 2-min rest periods took place. Before and after each trial, the participants completed the Motion Sickness Assessment Questionnaire, the score of which served as an indicator of simulator sickness severity. In the confirmatory study the main difference was that the participants completed three 30-min experimental trials in the same simulator. The authors report that for some participants an adaptation effect was showed – their symptoms' severity increased at first, but then decreased as they became accustomed to the simulator experience. No statistical parameters were provided to describe this tendency, but it still appears to be a promising information.

In a study by Newman et al. (2013) the subjects took part in 6 VR immersions, five of which happened on consecutive days and the last – 22 days after the first immersion. It was discovered, that the simulator sickness symptoms assessed on a 0–10 scale decreased rapidly after the first exposure – the comparisons were significant for Day 1 and each of the other times and not significant for any other comparisons. It appears that the adaptation achieved by the study subjects happened between the two first sessions. What is more, that adaptation effect did not wear off with time – on Day 22 the symptoms severity was still significantly smaller that on Day 1. The SSQ was also administered in this study and the total score, nausea and disorientation scores did significantly decrease in time. This effect, however, was visible between Day 1 – Day 4 and Day 1 – Day 5 (for the total and nausea scores) and between Day 1 – Day 4 (for the disorientation score). Furthermore, for the total and nausea scores, adaptation was retained during the last measurement on Day 22. The results of this study prove that it is possible to adapt people to VR conditions and that this effect can be long-lasting. However, the method of measurement for simulator sickness severity should be chosen cautiously, as the effects may slightly differ when using different methods. Probably the best option would be to use at least two reliable methods of comparison as it was done by Newman et al. (2013).

Helland et al. (2016) conducted an experiment on a driving simulator, during which the effects of simulator sickness, blood alcohol concentration and repeated simulator exposures on driving performance were studied. Herein, only the results concerning the relationship between repeated simulator exposures and simulator sickness severity will be discussed. A driving simulator consisting of the body of a car and three screens were used. The study included three 60-min long driving tests in the simulator (with at least 2-day breaks between the trials). After every trial each of the 20 participants assessed the simulator sickness severity by rating it on a scale from 0 to 10 – they were asked – "To what extent did you experience simulator sickness during the driving test?". It is worth noting that the mean simulator sickness score was very low in this study (M = 2.5), which might have had an impact on the results. For the participants, who did not interrupt any of the sessions (N = 13), the mean simulator sickness severity score was 3.4 for the first, 1.8 for the second and 1.5 for the third session. Although the simulator sickness severity appears to decrease with consecutive sessions, the relationship was not statistically significant. It could be hypothesized that had the authors used a more precise method for assessing the simulator sickness severity, the results could have been different. With the concise, one-question simulator sickness severity measurement, the data given in the study report do not fully support the hypothesis that simulator users adapt to the virtual reality conditions.

Another study providing evidence supporting the hypothesis, that simulator sickness adaptation is possible, was conducted by Reinhard et al. (2017). Twenty eight participants took part in the experiment, it had two parts, separated by 7–14 days of a break. On the first day, six 20-min drives in a simulator took place and on the second day there were four of them. To assess the simulator sickness severity, two scales were used: the FMS and the SSQ. The authors report an interesting pattern of results. During both sessions, the severity of symptoms did increase, but that increase was less visible during the second session. Thus, an adaptation effect was proved, but it was not a complete disappearance of symptoms. It was stressed in the paper, that the first VR immersion should be treated with extreme caution – the subjects should be monitored for unpleasant symptoms, the rests between trials should be longer and the trials themselves shorter than usual. For a summary of studies reviewed in this aspect, see the **Supplementary Table S2**.

In light of the reviewed studies, the possibility of adapting to VR is reasonable – several authors reported results suggesting it. However, a large number of the studies did not report statistical tests proving this claim or reported statistical nonsignificance. Various adaptation patterns have been observed – the effect was visible when all of the VR immersions were conducted on a single day (Lampton et al., 2000; Domeyer et al., 2013), on separate days (e.g., Cobb et al., 1999; Brooks et al., 2010; Reinhard et al., 2017), or even during a single VR exposure (Sinitski et al., 2018). A floor effect of no symptoms

after some exposures was observed by Kennedy et al. (2000). The effect of adaptation does not wear with time, as in was observed by Newman et al. (2013). Furthermore, virtual reality is not necessarily essential for evoking the adaptation effect (Smither et al., 2008).

The patterns and extents to which adaptation was observed in the aforementioned studies are diversified. Certainly, further research on this issue is necessary. It is also intriguing what is the relationship between possible adaptation along with subsequent VR experiences and increasing severity of simulator sickness during one long experience. These relationships would be worth testing in future studies.

# Persistence of the Simulator Sickness Symptoms After VR Exposure

Tanaka and Takagi (2004) discovered, that not only the simulator sickness symptoms persist for some time after VR exposure, but also the length of the persistence is dependent on the initial symptom severity. For the participants who suffered from severe symptoms (total SSQ score of more than 60), the recovery time was longer than 30 min. On the other hand, the subjects, who experienced only slight symptoms (total SSQ score of 25 or less) needed no longer that 5 min to recover from the simulator sickness symptoms.

In the study by Bos et al. (2005) it was also confirmed that the simulator sickness symptoms tend to persist for some time after VR exposure, but they return to baseline [a score of 0 on the Misery Scale (MISC); the maximum score on this scale is 10] in an hour following the end of the VR exposure for most of the participants. Only 4 of 24 subjects did not fully recover within 2 h post exposure, with the maximum MISC score of 3. These conclusions are supported by the results obtained by Keshavarz et al. (2018). In their study simulator sickness was measured using the FMS and 36 of 121 participants were forced to drop out before the end of the experimental task. The total time until recovery (operationalized by a FMS score of 1 or less) between the participants who finished the task and those who dropped out earlier varied significantly – the latter needed more time to recover. However, only five subjects (all from the drop-out group) did not fully recover 15 min post exposure. Furthermore, for all of the participants there was a significant decrease of simulator sickness symptoms severity between immediately after exposure and 3 min later. Results achieved by Singer et al. (1998) support the hypothesis that the simulator sickness symptoms persist for some time after leaving the VR and then return to the baseline levels. In their study, all of the specific symptoms except disorientation (viz. nausea and oculomotor disturbance; the same effect was confirmed for the total SSQ score as well) returned to baseline levels after a 30-min rest. McCauley et al. (1990) state that the simulator sickness symptoms severity decreases after leaving the VR (between two measurement points: immediately after leaving the VR and 30 min later).

A more detailed, qualitative description of the simulator sickness symptoms persistence pattern was given by Braithwaite and Braithwaite (1990). From 14 of the participants, 6 suffered from severe headaches, which lasted for 2-6 h, 2 suffered from nausea (up to 2 h after leaving the simulator) and 6 participants reported experiencing other symptoms, which cannot be classified as typical simulator sickness symptoms (visual flashbacks, unsteadiness or symptoms different from the ones experienced during the VR exposure). Unfortunately, no information on the VR exposure length was given by the authors.

In the study by Moss and Muth (2011), more widely described above, it was discovered that the simulator sickness symptoms persist for some time after leaving the virtual reality environment. The total SSQ score in this study measured 10 min post exposure was still significantly higher than the baseline score. This means, that for the virtual reality environment tested in the study, not only did the simulator sickness' symptoms increase with time, but they also persisted for at least 10 min after leaving the virtual reality. Therefore, it cannot be confirmed when did the symptoms subside. However, in a similar study by Moss et al. (2011), the severity of symptoms did return to baseline level after a 10-min rest.

Biernacki and Dziuda (2014) have studied simulator sickness symptoms on a group of professional truck drivers, who participated in three 30-min truck simulator drives – the first one on a fixed-base platform with poor visibility (created by a simulated fog) and twice with good visibility: on a fixed base and on a mobile platform. The simulator consisted of a truck cabin and a cylinder screen, on which all visual stimuli were displayed. The simulator sickness was measured with the Simulator Sickness Questionnaire. The questionnaire was completed five times for each exposure: before each trial, 2 and 30 min after all of the trials, in the evening of the same day and next day, in the morning. The level of nausea, disorientation and oculomotor disturbance, as well as the total severity of simulator sickness symptoms proved to be dependent on the measurement time point. The level of nausea was higher 2 min than 30 min after exposure. The time profile for oculomotor disturbance, disorientation and the total SSQ score turned out to be similar: the scores 2 min after exposure were significantly higher than 30 min after exposure and the baseline scores. The symptoms of simulator sickness seem to retreat after leaving the virtual reality environment, but only for the nausea factor the simulator sickness severity 30 min post exposure did not differ significantly from the baseline score. Half an hour appears not to be sufficient time for the symptoms to disappear completely. In another paper (Dziuda et al., 2014) describing the results of this study, the authors state, that the severity of nausea measured 2 and 30 min post exposure and in the evening of the same day was significantly higher than in the morning of the next day.

Malinska et al. (2014) ´ tested subjective sensations (simulator sickness and fatigue; the latter will not be discussed herein) felt after exposure to virtual reality. In this study, individual proneness to motion sickness was tested using the Coriolis test before the experimental trial. Twenty men participated in the experiment. The study was conducted in two separate phases. During the first phase, all of the participants watched a part of the "Avatar" movie – both in 2D and 3D versions. The results concerning only the impact of the movie will not be discussed herein. In the second phase, the participants engaged in a virtual reality task, which included transporting various elements on a virtual workstation. A questionnaire created by the authors of the study was used as a method of measurement for

the simulator sickness. It included 8 symptoms (e.g., eye pain, headache, dizziness, nausea), which were assessed on a five-point scale. This questionnaire was conducted thrice – straight after the simulator exposure, 20 min and up to 24 h later (and sent by email). 20 min post exposure, 7 of 8 simulator sickness symptoms were reported by at least one participant. No one experienced increased sweating and the most prevalent symptoms were: eye pain, drowsiness, fatigue and apathy. According to the results, the participants experienced the simulator sickness symptoms up to 4 h after completing the virtual reality task. Reported symptoms included: headache, dizziness, disorientation and drowsiness. Unfortunately, no comparison between the different time periods was given, and therefore any conclusions drawn from this study regarding the temporal aspects of simulator sickness should be treated with extreme caution. The results of the studies concentrated on simulator sickness persistence are given in the **Supplementary Table S3**.

Regarding simulator sickness persistence, it may be assumed that at least some of the symptoms may prevail after the exposure (10 min, Moss et al., 2011; circa 30 min, Singer et al., 1998; more than 30 min, Biernacki and Dziuda, 2014; Dziuda et al., 2014), in some cases even for relatively long time (more than 4 h after approximately 2 h of exposure, Malinska et al., 2014 ´ ; for even 4 h after leaving the VR, Braithwaite and Braithwaite, 1990). On the other hand, the results of Biernacki and Dziuda (2014) suggest that the severity of symptoms changes rapidly – it is increased directly after exposure, but significantly decreased 30 min afterward. The time of the symptoms' prevalence differs between various VR environments. Furthermore, the length of recovery depends on the initial symptoms' severity – it takes longer to fully recover, when the experienced symptoms were more severe (Bos et al., 2005; Keshavarz et al., 2018).

# CONCLUSION

To summarize the conclusions reached about each of the temporal aspects of simulator sickness, a sufficient amount of evidence appears to exist in order to confirm the hypothesis that the severity of simulator sickness symptoms increases with increased exposure time. There appears to be no universal rule regarding maximum exposure time until unpleasant symptoms are evoked. A correct direction of research in this aspect would be to test the temporal pattern of simulator sickness progression for each VR technology separately – as it has been reported by Lee et al. (2017), different devices used for controlling the individual's movement in the virtual environment tend to evoke slightly different levels of simulator sickness. Despite the development of technology, the issue of simulator sickness appears to still remain unsolved. Interesting trends have been reported – in some studies, the simulator sickness severity either begins to stabilize (e.g., Moss et al., 2008) or decreases (e.g., Sinitski et al., 2018) after some time and in other – the symptoms become noticeably unpleasant after some time spent in the VR (e.g., Min et al., 2004). As it has been broadly discussed above, adaptation to the VR environment appears to be possible, but the quoted studies do not provide conclusive data – further inquiry regarding this topic is necessary. Some simulator sickness symptoms may prevail for some time after exposure, although it remains unknown for how long and it may vary depending on the initial severity of the symptoms.

Apart from the points concerning each specific temporal aspect of simulator sickness, some general conclusions can be drawn. The virtual reality technology and simulators still have the tendency to evoke unpleasant symptoms among their users; although the technology advances, this problem has not yet been solved. It is the most vivid for the first aspect discussed herein – the temporal trajectory of the progression of simulator sickness – the severity of symptoms grows along exposure time both in the studies conducted almost 20 years ago (Cobb et al., 1999) and in the most recent ones (Lee et al., 2017). Although this trend appears to be stable regardless the technological progress, such statements should be treated with caution, as the studies used various types of VR technologies, which may not be comparable.

Until the technology reaches the point when the simulator sickness will be wholly preventable, some standards should be developed when it comes to research on virtual reality and simulators. The issue of how often the simulator sickness symptoms should be measured (not only during the experimental trial, but also after it), should be addressed.

It would be advisable to test the tendency of a new virtual reality tool to evoke the simulator sickness symptoms in the three above discussed dimensions: temporal pattern of the symptoms' progression, adaptation possibility and persistence of symptoms after exposure. These parameters would provide vital information on how long the training, game or any other scenarios should be, in order to provide the user with an enjoyable experience and to prevent unpleasant sensations. This issue appears to be exceptionally crucial for professional training simulators, where the quality of the experience may have an influence on results of the training session. Furthermore, the physiological measurement of simulator sickness should be developed and given more focus, as it might be more precise and less biased than a self-report.

The researchers and developers employing the virtual reality technology should always bear in mind the fact that simulator sickness exists and can disturb the desired outcomes. Therefore, before it becomes widely implemented, every VR technology needs to be tested for its tendency to evoke unpleasant symptoms in its users in the three temporal aspects discussed above.

# Practical Implications for Further Research

The above described research provides interesting insight into the temporal aspects of the simulator sickness and it appears that there are still issues which demand further inquiry. First of all, most of the research concerns driving or flight simulators, most often used for training professional drivers and pilots, but the virtual reality technology is advancing rapidly and has already been applied to the gaming industry (2.704 titles on Steam<sup>4</sup> when the searching parameters were restricted to "VR only" and 3.243 with the "VR supported" search restriction; data collected on June 14, 2018) – creating a brand-new field for research. It would be

<sup>4</sup> Steam (https://store.steampowered.com) is a digital distribution platform, on which various types of games can be bought, played and stored in a cloud.

advisable to explore the temporal aspects of simulator sickness, not only on professional training simulators and professional drivers and pilots, but also on virtual reality-supported games and everyday, non-professional VR users and gamers.

It would also be advisable to further explore the temporal aspects of simulator sickness and to develop a standardized methodology which would allow a comparison between studies focusing on different virtual reality environments. Researchers should bear in mind the need to compare the SSQ scores between time periods [a good example of such methodology is the Moss and Muth (2011) study, where simulator sickness severity was assessed each 5 min] and to control the severity of symptoms for several hours after virtual reality exposure, in order to be able to determine the moment when the symptoms subside.

Moreover, it would be intriguing to compare the effect of one prolonged VR exposure to a number of shorter exposures, summing up to the same total time. According to the evidence found in past studies, it could be expected that the severity of symptoms after one long exposure should be greater than after a series of short ones. A pattern of symptoms' persistence after such two types of exposures could also be explored.

It is also worth suggesting that the simulator sickness severity should be assessed not only before the experimental procedure, but also after the initial training phase, in order to establish if the training could serve as the adaptation period.

In light of the past research which suggest that most of the people suffer from simulator sickness to some extent, the researchers should care for the study participants, who report strong and unpleasant symptoms not only straight after the experimental procedure, but also as long as the symptoms persist. Brooks et al. (2010) propose a number of means that can be taken in order to provide the participants with proper care. Supplies such as sick bags, plastic gloves, mouthwash and cleaning products should be kept in the lab. The participants should be provided with light snacks and water. They should also be advised not to drive a car until they feel that all the symptoms have subsided. Brooks et al. (2010) suggest as well that the participants should stay in the lab for at least an hour after the experiment. It would also be advisable to contact the participants after the study and ask them if they experienced any unpleasant side-effects of VR exposure.

# Strengths and Limitations

The main strength of the present paper is that it covers a very wide array of study reports – not only from the most recent times, but also the older ones, from the 1990s. Consideration has been taken to analyze all the results thoroughly. Caution has been exercised to allow for any possible bias and limitations of every single study. Moreover, efforts have been taken to shed more light on the subject which, despite being an important factor of simulator and VR experience, has not been given much attention in research.

# REFERENCES

Aldaba, C. N., White, P. J., Byagowi, A., and Moussavi, Z. (2017). "Virtual reality body motion induced navigational controllers and their effects on simulator

A significant number of the reviewed studies turned out to have drawbacks or did not include as thorough analysis of the temporal aspects of simulator sickness as it may have been expected, which can be considered a limitation of the present review. Very often the study reports did not include any information on statistical significance of the results, or the sample size was extremely small, which made it impossible to draw definite conclusions. Furthermore, as the temporal aspects of simulator sickness is most often analyzed alongside other study objectives, it is possible that some interesting results on the topic have been omitted in the search process. Despite these limitations, the present review is believed to give insight into the temporal aspects of simulator sickness and serve as a basis for further research focused on temporal aspects of simulator sickness.

# AUTHOR CONTRIBUTIONS

ND wrote major part of the paper, contributed to the conception and design of the review. PS designed the review and wrote minor part of the paper. AS contributed to the conception and design of the review. All authors listed have made substantial intellectual contribution to the work, revised the manuscript, read and approved the submitted version.

# FUNDING

This work was co-financed by the Polish National Centre for Research and Development under the grant "Widespread Disaster Simulator – research and preparation for implementation" (project number POIR.01.01.01-00.0042/16; the Smart Growth Operational Programme, sub-measure 1.1.1. Industrial research and development work implemented by enterprises) received by Nano Games sp. z o.o.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02132/full#supplementary-material

FIGURE S1 | Flow chart of the search and screening process for the relevant literature.

TABLE S1 | Studies focusing on the temporal trajectory of the progression of simulator sickness.

TABLE S2 | Studies focusing on the possibility of adapting VR users in advance.

TABLE S3 | Studies focusing on how long the simulator sickness persists after VR exposure.

sickness and pathfinding," in Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (Monterey, CA: IEEE), 4175–4178. doi: 10.1109/EMBC.2017.803 7776



due to individual sensitivity. Int. J. Neurosci. 118, 857–865. doi: 10.1080/ 00207450701239459


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Duzma ˙ nska, Strojny and Strojny. This is an open-access article ´ distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Past, Present, and Future of Virtual and Augmented Reality Research: A Network and Cluster Analysis of the Literature

Pietro Cipresso1,2 \*, Irene Alice Chicchi Giglioli<sup>3</sup> , Mariano Alcañiz Raya<sup>3</sup> and Giuseppe Riva1,2

<sup>1</sup> Applied Technology for Neuro-Psychology Lab, Istituto Auxologico Italiano, Milan, Italy, <sup>2</sup> Department of Psychology, Catholic University of the Sacred Heart, Milan, Italy, <sup>3</sup> Instituto de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, Valencia, Spain

#### Edited by:

Albert Rizzo, University of Southern California, United States

#### Reviewed by:

Marco Fyfe Pietro Gillies, Goldsmiths, University of London, United Kingdom Giulia Corno, Istituto Auxologico Italiano (IRCCS), Italy

> \*Correspondence: Pietro Cipresso p.cipresso@auxologico.it

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 14 December 2017 Accepted: 10 October 2018 Published: 06 November 2018

#### Citation:

Cipresso P, Giglioli IAC, Raya MA and Riva G (2018) The Past, Present, and Future of Virtual and Augmented Reality Research: A Network and Cluster Analysis of the Literature. Front. Psychol. 9:2086. doi: 10.3389/fpsyg.2018.02086 The recent appearance of low cost virtual reality (VR) technologies – like the Oculus Rift, the HTC Vive and the Sony PlayStation VR – and Mixed Reality Interfaces (MRITF) – like the Hololens – is attracting the attention of users and researchers suggesting it may be the next largest stepping stone in technological innovation. However, the history of VR technology is longer than it may seem: the concept of VR was formulated in the 1960s and the first commercial VR tools appeared in the late 1980s. For this reason, during the last 20 years, 100s of researchers explored the processes, effects, and applications of this technology producing 1000s of scientific papers. What is the outcome of this significant research work? This paper wants to provide an answer to this question by exploring, using advanced scientometric techniques, the existing research corpus in the field. We collected all the existent articles about VR in the Web of Science Core Collection scientific database, and the resultant dataset contained 21,667 records for VR and 9,944 for augmented reality (AR). The bibliographic record contained various fields, such as author, title, abstract, country, and all the references (needed for the citation analysis). The network and cluster analysis of the literature showed a composite panorama characterized by changes and evolutions over the time. Indeed, whether until 5 years ago, the main publication media on VR concerned both conference proceeding and journals, more recently journals constitute the main medium of communication. Similarly, if at first computer science was the leading research field, nowadays clinical areas have increased, as well as the number of countries involved in VR research. The present work discusses the evolution and changes over the time of the use of VR in the main areas of application with an emphasis on the future expected VR's capacities, increases and challenges. We conclude considering the disruptive contribution that VR/AR/MRITF will be able to get in scientific fields, as well in human communication and interaction, as already happened with the advent of mobile phones by increasing the use and the development of scientific applications (e.g., in clinical areas) and by modifying the social communication and interaction among people.

Keywords: virtual reality, augmented reality, quantitative psychology, measurement, psychometrics, scientometrics, computational psychometrics, mathematical psychology

# INTRODUCTION

fpsyg-09-02086 November 5, 2018 Time: 7:44 # 2

In the last 5 years, virtual reality (VR) and augmented reality (AR) have attracted the interest of investors and the general public, especially after Mark Zuckerberg bought Oculus for two billion dollars (Luckerson, 2014; Castelvecchi, 2016). Currently, many other companies, such as Sony, Samsung, HTC, and Google are making huge investments in VR and AR (Korolov, 2014; Ebert, 2015; Castelvecchi, 2016). However, if VR has been used in research for more than 25 years, and now there are 1000s of papers and many researchers in the field, comprising a strong, interdisciplinary community, AR has a more recent application history (Burdea and Coiffet, 2003; Kim, 2005; Bohil et al., 2011; Cipresso and Serino, 2014; Wexelblat, 2014). The study of VR was initiated in the computer graphics field and has been extended to several disciplines (Sutherland, 1965, 1968; Mazuryk and Gervautz, 1996; Choi et al., 2015). Currently, videogames supported by VR tools are more popular than the past, and they represent valuables, work-related tools for neuroscientists, psychologists, biologists, and other researchers as well. Indeed, for example, one of the main research purposes lies from navigation studies that include complex experiments that could be done in a laboratory by using VR, whereas, without VR, the researchers would have to go directly into the field, possibly with limited use of intervention. The importance of navigation studies for the functional understanding of human memory in dementia has been a topic of significant interest for a long time, and, in 2014, the Nobel Prize in "Physiology or Medicine" was awarded to John M. O'Keefe, May-Britt Moser, and Edvard I. Moser for their discoveries of nerve cells in the brain that enable a sense of place and navigation. Journals and magazines have extended this knowledge by writing about "the brain GPS," which gives a clear idea of the mechanism. A huge number of studies have been conducted in clinical settings by using VR (Bohil et al., 2011; Serino et al., 2014), and Nobel Prize winner, Edvard I. Moser commented about the use of VR (Minderer et al., 2016), highlighting its importance for research and clinical practice. Moreover, the availability of free tools for VR experimental and computational use has made it easy to access any field (Riva et al., 2011; Cipresso, 2015; Brown and Green, 2016; Cipresso et al., 2016).

Augmented reality is a more recent technology than VR and shows an interdisciplinary application framework, in which, nowadays, education and learning seem to be the most field of research. Indeed, AR allows supporting learning, for example increasing-on content understanding and memory preservation, as well as on learning motivation. However, if VR benefits from clear and more definite fields of application and research areas, AR is still emerging in the scientific scenarios.

In this article, we present a systematic and computational analysis of the emerging interdisciplinary VR and AR fields in terms of various co-citation networks in order to explore the evolution of the intellectual structure of this knowledge domain over time.

# Virtual Reality Concepts and Features

The concept of VR could be traced at the mid of 1960 when Ivan Sutherland in a pivotal manuscript attempted to describe VR as a window through which a user perceives the virtual world as if looked, felt, sounded real and in which the user could act realistically (Sutherland, 1965).

Since that time and in accordance with the application area, several definitions have been formulated: for example, Fuchs and Bishop (1992) defined VR as "real-time interactive graphics with 3D models, combined with a display technology that gives the user the immersion in the model world and direct manipulation" (Fuchs and Bishop, 1992); Gigante (1993) described VR as "The illusion of participation in a synthetic environment rather than external observation of such an environment. VR relies on a 3D, stereoscopic head-tracker displays, hand/body tracking and binaural sound. VR is an immersive, multi-sensory experience" (Gigante, 1993); and "Virtual reality refers to immersive, interactive, multi-sensory, viewer-centered, 3D computer generated environments and the combination of technologies required building environments" (Cruz-Neira, 1993).

As we can notice, these definitions, although different, highlight three common features of VR systems: immersion, perception to be present in an environment, and interaction with that environment (Biocca, 1997; Lombard and Ditton, 1997; Loomis et al., 1999; Heeter, 2000; Biocca et al., 2001; Bailenson et al., 2006; Skalski and Tamborini, 2007; Andersen and Thorpe, 2009; Slater, 2009; Sundar et al., 2010). Specifically, immersion concerns the amount of senses stimulated, interactions, and the reality's similarity of the stimuli used to simulate environments. This feature can depend on the properties of the technological system used to isolate user from reality (Slater, 2009).

Higher or lower degrees of immersion can depend by three types of VR systems provided to the user:


perceived as real (Loomis et al., 1999; Heeter, 2000; Biocca et al., 2001).

Finally, the user's VR experience could be disclosed by measuring presence, realism, and reality's levels. Presence is a complex psychological feeling of "being there" in VR that involves the sensation and perception of physical presence, as well as the possibility to interact and react as if the user was in the real world (Heeter, 1992). Similarly, the realism's level corresponds to the degree of expectation that the user has about of the stimuli and experience (Baños et al., 2000, 2009). If the presented stimuli are similar to reality, VR user's expectation will be congruent with reality expectation, enhancing VR experience. In the same way, higher is the degree of reality in interaction with the virtual stimuli, higher would be the level of realism of the user's behaviors (Baños et al., 2000, 2009).

# From Virtual to Augmented Reality

Looking chronologically on VR and AR developments, we can trace the first 3D immersive simulator in 1962, when Morton Heilig created Sensorama, a simulated experience of a motorcycle running through Brooklyn characterized by several sensory impressions, such as audio, olfactory, and haptic stimuli, including also wind to provide a realist experience (Heilig, 1962). In the same years, Ivan Sutherland developed The Ultimate Display that, more than sound, smell, and haptic feedback, included interactive graphics that Sensorama didn't provide. Furthermore, Philco developed the first HMD that together with The Sword of Damocles of Sutherland was able to update the virtual images by tracking user's head position and orientation (Sutherland, 1965). In the 70s, the University of North Carolina realized GROPE, the first system of forcefeedback and Myron Krueger created VIDEOPLACE an Artificial Reality in which the users' body figures were captured by cameras and projected on a screen (Krueger et al., 1985). In this way two or more users could interact in the 2D-virtual space. In 1982, the US' Air Force created the first flight simulator [Visually Coupled Airbone System Simulator (VCASS)] in which the pilot through an HMD could control the pathway and the targets. Generally, the 80's were the years in which the first commercial devices began to emerge: for example, in 1985 the VPL company commercialized the DataGlove, glove sensors' equipped able to measure the flexion of fingers, orientation and position, and identify hand gestures. Another example is the Eyephone, created in 1988 by the VPL Company, an HMD system for completely immerging the user in a virtual world. At the end of 80's, Fake Space Labs created a Binocular-Omni-Orientational Monitor (BOOM), a complex system composed by a stereoscopic-displaying device, providing a moving and broad virtual environment, and a mechanical arm tracking. Furthermore, BOOM offered a more stable image and giving more quickly responses to movements than the HMD devices. Thanks to BOOM and DataGlove, the NASA Ames Research Center developed the Virtual Wind Tunnel in order to research and manipulate airflow in a virtual airplane or space ship. In 1992, the Electronic Visualization Laboratory of the University of Illinois created the CAVE Automatic Virtual Environment, an immersive VR system composed by projectors directed on three or more walls of a room.

More recently, many videogames companies have improved the development and quality of VR devices, like Oculus Rift, or HTC Vive that provide a wider field of view and lower latency. In addition, the actual HMD's devices can be now combined with other tracker system as eye-tracking systems (FOVE), and motion and orientation sensors (e.g., Razer Hydra, Oculus Touch, or HTC Vive).

Simultaneously, at the beginning of 90', the Boing Corporation created the first prototype of AR system for showing to employees how set up a wiring tool (Carmigniani et al., 2011). At the same time, Rosenberg and Feiner developed an AR fixture for maintenance assistance, showing that the operator performance enhanced by added virtual information on the fixture to repair (Rosenberg, 1993). In 1993 Loomis and colleagues produced an AR GPS-based system for helping the blind in the assisted navigation through adding spatial audio information (Loomis et al., 1998). Always in the 1993 Julie Martin developed "Dancing in Cyberspace," an AR theater in which actors interacted with virtual object in real time (Cathy, 2011). Few years later, Feiner et al. (1997) developed the first Mobile AR System (MARS) able to add virtual information about touristic buildings (Feiner et al., 1997). Since then, several applications have been developed: in Thomas et al. (2000), created ARQuake, a mobile AR video game; in 2008 was created Wikitude that through the mobile camera, internet, and GPS could add information about the user's environments (Perry, 2008). In 2009 others AR applications, like AR Toolkit and SiteLens have been developed in order to add virtual information to the physical user's surroundings. In 2011, Total Immersion developed D'Fusion, and AR system for designing projects (Maurugeon, 2011). Finally, in 2013 and 2015, Google developed Google Glass and Google HoloLens, and their usability have begun to test in several field of application.

# Virtual Reality Technologies

Technologically, the devices used in the virtual environments play an important role in the creation of successful virtual experiences. According to the literature, can be distinguished input and output devices (Burdea et al., 1996; Burdea and Coiffet, 2003). Input devices are the ones that allow the user to communicate with the virtual environment, which can range from a simple joystick or keyboard to a glove allowing capturing finger movements or a tracker able to capture postures. More in detail, keyboard, mouse, trackball, and joystick represent the desktop input devices easy to use, which allow the user to launch continuous and discrete commands or movements to the environment. Other input devices can be represented by tracking devices as bend-sensing gloves that capture hand movements, postures and gestures, or pinch gloves that detect the fingers movements, and trackers able to follow the user's movements in the physical world and translate them in the virtual environment.

On the contrary, the output devices allow the user to see, hear, smell, or touch everything that happens in the virtual environment. As mentioned above, among the visual devices can be found a wide range of possibilities, from the simplest or least immersive (monitor of a computer) to the most immersive one such as VR glasses or helmets or HMD or CAVE systems.

Furthermore, auditory, speakers, as well as haptic output devices are able to stimulate body senses providing a more real virtual experience. For example, haptic devices can stimulate the touch feeling and force models in the user.

# Virtual Reality Applications

fpsyg-09-02086 November 5, 2018 Time: 7:44 # 4

Since its appearance, VR has been used in different fields, as for gaming (Zyda, 2005; Meldrum et al., 2012), military training (Alexander et al., 2017), architectural design (Song et al., 2017), education (Englund et al., 2017), learning and social skills training (Schmidt et al., 2017), simulations of surgical procedures (Gallagher et al., 2005), assistance to the elderly or psychological treatments are other fields in which VR is bursting strongly (Freeman et al., 2017; Neri et al., 2017). A recent and extensive review of Slater and Sanchez-Vives (2016) reported the main VR application evidences, including weakness and advantages, in several research areas, such as science, education, training, physical training, as well as social phenomena, moral behaviors, and could be used in other fields, like travel, meetings, collaboration, industry, news, and entertainment. Furthermore, another review published this year by Freeman et al. (2017) focused on VR in mental health, showing the efficacy of VR in assessing and treating different psychological disorders as anxiety, schizophrenia, depression, and eating disorders.

There are many possibilities that allow the use of VR as a stimulus, replacing real stimuli, recreating experiences, which in the real world would be impossible, with a high realism. This is why VR is widely used in research on new ways of applying psychological treatment or training, for example, to problems arising from phobias (agoraphobia, phobia to fly, etc.) (Botella et al., 2017). Or, simply, it is used like improvement of the traditional systems of motor rehabilitation (Llorens et al., 2014; Borrego et al., 2016), developing games that ameliorate the tasks. More in detail, in psychological treatment, Virtual Reality Exposure Therapy (VRET) has showed its efficacy, allowing to patients to gradually face fear stimuli or stressed situations in a safe environment where the psychological and physiological reactions can be controlled by the therapist (Botella et al., 2017).

# Augmented Reality Concept

Milgram and Kishino (1994), conceptualized the Virtual-Reality Continuum that takes into consideration four systems: real environment, augmented reality (AR), augmented virtuality, and virtual environment. AR can be defined a newer technological system in which virtual objects are added to the real world in real-time during the user's experience. Per Azuma et al. (2001) an AR system should: (1) combine real and virtual objects in a real environment; (2) run interactively and in real-time; (3) register real and virtual objects with each other. Furthermore, even if the AR experiences could seem different from VRs, the quality of AR experience could be considered similarly. Indeed, like in VR, feeling of presence, level of realism, and the degree of reality represent the main features that can be considered the indicators of the quality of AR experiences. Higher the experience is perceived as realistic, and there is congruence between the user's expectation and the interaction inside the AR environments, higher would be the perception of "being there" physically, and at cognitive and emotional level. The feeling of presence, both in AR and VR environments, is important in acting behaviors like the real ones (Botella et al., 2005; Juan et al., 2005; Bretón-López et al., 2010; Wrzesien et al., 2013).

# Augmented Reality Technologies

Technologically, the AR systems, however various, present three common components, such as a geospatial datum for the virtual object, like a visual marker, a surface to project virtual elements to the user, and an adequate processing power for graphics, animation, and merging of images, like a pc and a monitor (Carmigniani et al., 2011). To run, an AR system must also include a camera able to track the user movement for merging the virtual objects, and a visual display, like glasses through that the user can see the virtual objects overlaying to the physical world. To date, two-display systems exist, a video see-through (VST) and an optical see-though (OST) AR systems (Botella et al., 2005; Juan et al., 2005, 2007). The first one, disclosures virtual objects to the user by capturing the real objects/scenes with a camera and overlaying virtual objects, projecting them on a video or a monitor, while the second one, merges the virtual object on a transparent surface, like glasses, through the user see the added elements. The main difference between the two systems is the latency: an OST system could require more time to display the virtual objects than a VST system, generating a time lag between user's action and performance and the detection of them by the system.

# Augmented Reality Applications

Although AR is a more recent technology than VR, it has been investigated and used in several research areas such as architecture (Lin and Hsu, 2017), maintenance (Schwald and De Laval, 2003), entertainment (Ozbek et al., 2004), education (Nincarean et al., 2013; Bacca et al., 2014; Akçayır and Akçayır, 2017), medicine (De Buck et al., 2005), and psychological treatments (Juan et al., 2005; Botella et al., 2005, 2010; Bretón-López et al., 2010; Wrzesien et al., 2011a,b, 2013; see the review Chicchi Giglioli et al., 2015). More in detail, in education several AR applications have been developed in the last few years showing the positive effects of this technology in supporting learning, such as an increased-on content understanding and memory preservation, as well as on learning motivation (Radu, 2012, 2014). For example, Ibáñez et al. (2014) developed a AR application on electromagnetism concepts' learning, in which students could use AR batteries, magnets, cables on real superficies, and the system gave a real-time feedback to students about the correctness of the performance, improving in this way the academic success and motivation (Di Serio et al., 2013). Deeply, AR system allows the possibility to learn visualizing and acting on composite phenomena that traditionally students study theoretically, without the possibility to see and test in real world (Chien et al., 2010; Chen et al., 2011).

TABLE 1 | Category statistics from the WoS for the entire period and the last 5 years.


As well in psychological health, the number of research about AR is increasing, showing its efficacy above all in the treatment of psychological disorder (see the reviews Baus and Bouchard, 2014; Chicchi Giglioli et al., 2015). For example, in the treatment of anxiety disorders, like phobias, AR exposure therapy (ARET) showed its efficacy in one-session treatment, maintaining the positive impact in a follow-up at 1 or 3 month after. As VRET, ARET provides a safety and an ecological environment where any kind of stimulus is possible, allowing to keep control over the situation experienced by the patients, gradually generating situations of fear or stress. Indeed, in situations of fear, like the phobias for small animals, AR applications allow, in accordance with the patient's anxiety, to gradually expose patient to fear animals, adding new animals during the session or enlarging their or increasing the speed. The various studies showed that AR is able, at the beginning of the session, to activate patient's anxiety, for reducing after 1 h of exposition. After the session, patients even more than to better manage animal's fear and anxiety, ware able to approach, interact, and kill real feared animals.

# MATERIALS AND METHODS

# Data Collection

The input data for the analyses were retrieved from the scientific database Web of Science Core Collection (Falagas et al., 2008) and the search terms used were "Virtual Reality" and "Augmented Reality" regarding papers published during the whole timespan covered.

Web of science core collection is composed of: Citation Indexes, Science Citation Index Expanded (SCI-EXPANDED) –1970-present, Social Sciences Citation Index (SSCI) –1970-present, Arts and Humanities Citation Index (A&HCI) –1975-present, Conference Proceedings Citation Index- Science (CPCI-S) –1990-present, Conference Proceedings Citation Index- Social Science & Humanities (CPCI-SSH) –1990-present, Book Citation Index– Science (BKCI-S) –2009-present, Book Citation Index– Social Sciences & Humanities (BKCI-SSH) –2009-present, Emerging Sources Citation Index (ESCI) –2015-present, Chemical Indexes, Current Chemical Reactions (CCR-EXPANDED) – 2009-present (Includes Institut National de la Propriete Industrielle structure data back to 1840), Index Chemicus (IC) –2009-present.

The resultant dataset contained a total of 21,667 records for VR and 9,944 records for AR. The bibliographic record contained various fields, such as author, title, abstract, and all of the references (needed for the citation analysis). The research tool to visualize the networks was Cite space v.4.0.R5 SE (32 bit) (Chen, 2006) under Java Runtime v.8 update 91 (build 1.8.0\_91 b15). Statistical analyses were conducted using Stata MP-Parallel Edition, Release 14.0, StataCorp LP. Additional information can be found in **Supplementary Data Sheet 1**.

The betweenness centrality of a node in a network measures the extent to which the node is part of paths that connect an arbitrary pair of nodes in the network (Freeman, 1977; Brandes, 2001; Chen, 2006).

Structural metrics include betweenness centrality, modularity, and silhouette. Temporal and hybrid metrics include citation burstness and novelty. All the algorithms are detailed (Chen et al., 2010).

# RESULTS

The analysis of the literature on VR shows a complex panorama. At first sight, according to the document-type statistics from the Web of Science (WoS), proceedings papers were used extensively as outcomes of research, comprising almost 48% of the total (10,392 proceedings), with a similar number of articles on the subject amounting to about 47% of the total of 10, 199 articles. However, if we consider only the last 5 years (7,755 articles representing about 36% of the total), the situation changes with about 57% for articles (4,445) and about 33% for proceedings (2,578). Thus, it is clear that VR field has changed in areas other than at the technological level.

About the subject category, nodes and edges are computed as co-occurring subject categories from the Web of Science "Category" field in all the articles.

According to the subject category statistics from the WoS, computer science is the leading category, followed by engineering, and, together, they account for 15,341 articles, which make up about 71% of the total production. However, if we

consider just the last 5 years, these categories reach only about 55%, with a total of 4,284 articles (**Table 1** and **Figure 1**).

The evidence is very interesting since it highlights that VR is doing very well as new technology with huge interest in hardware and software components. However, with respect to the past, we are witnessing increasing numbers of applications, especially in the medical area. In particular, note its inclusion in the top 10 list of rehabilitation and clinical neurology categories (about 10% of the total production in the last 5 years). It also is interesting that neuroscience and neurology, considered together, have shown an increase from about 12% to about 18.6% over the last 5 years. However, historic areas, such as automation and control systems, imaging science and photographic technology, and robotics, which had accounted for about 14.5% of the total articles ever produced were not even in the top 10 for the last 5 years, with each one accounting for less than 4%.

About the countries, nodes and edges are computed as networks of co-authors countries. Multiple occurrency of a country in the same paper are counted once.

The countries that were very involved in VR research have published for about 47% of the total (10,200 articles altogether). Of the 10,200 articles, the United States, China, England, and Germany published 4921, 2384, 1497, and 1398, respectively. The situation remains the same if we look at the articles published over the last 5 years. However, VR contributions also came from all over the globe, with Japan, Canada, Italy, France, Spain, South Korea, and Netherlands taking positions of prominence, as shown in **Figure 2**.

Network analysis was conducted to calculate and to represent the centrality index (Freeman, 1977; Brandes, 2001), i.e., the dimension of the node in **Figure 2**. The top-ranked country, with a centrality index of 0.26, was the United States (2011), and England was second, with a centrality index of 0.25. The third, fourth, and fifth countries were Germany, Italy, and Australia, with centrality indices of 0.15, 0.15, and 0.14, respectively.

About the Institutions, nodes and edges are computed as networks of co-authors Institutions (**Figure 3**).

The top-level institutions in VR were in the United States, where three universities were ranked as the top three in the

world for published articles; these universities were the University of Illinois (159), the University of South California (147), and the University of Washington (146). The United States also had the eighth-ranked university, which was Iowa State University (116). The second country in the ranking was Canada, with the University of Toronto, which was ranked fifth with 125 articles and McGill University, ranked 10th with 103 articles.

Other countries in the top-ten list were Netherlands, with the Delft University of Technology ranked fourth with 129 articles; Italy, with IRCCS Istituto Auxologico Italiano, ranked sixth (with the same number of publication of the institution ranked fifth) with 125 published articles; England, which was ranked seventh with 125 articles from the University of London's Imperial College of Science, Technology, and Medicine; and China with 104 publications, with the Chinese Academy of Science, ranked ninth. Italy's Istituto Auxologico Italiano, which was ranked fifth, was the only non-university institution ranked in the top-10 list for VR research (**Figure 3**).

About the Journals, nodes, and edges are computed as journal co-citation networks among each journals in the corresponding field.

The top-ranked Journals for citations in VR are Presence: Teleoperators & Virtual Environments with 2689 citations and CyberPsychology & Behavior (Cyberpsychol BEHAV) with 1884 citations; however, looking at the last 5 years, the former had increased the citations, but the latter had a far more significant increase, from about 70% to about 90%, i.e., an increase from 1029 to 1147.

Following the top two journals, IEEE Computer Graphics and Applications (IEEE Comput Graph) and Advanced Health Telematics and Telemedicine (St HEAL T) were both left out of the top-10 list based on the last 5 years. The data for the last 5 years also resulted in the inclusion of Experimental Brain Research (Exp BRAIN RES) (625 citations), Archives of Physical Medicine and Rehabilitation (Arch PHYS MED REHAB) (622 citations), and Plos ONE (619 citations) in the top-10 list of three journals, which highlighted the categories of rehabilitation and clinical neurology and neuroscience and neurology. Journal cocitation analysis is reported in **Figure 4**, which clearly shows four distinct clusters.

Network analysis was conducted to calculate and to represent the centrality index, i.e., the dimensions of the nodes in **Figure 4**. The top-ranked item by centrality was Cyberpsychol BEHAV, with a centrality index of 0.29. The second-ranked item was Arch PHYS MED REHAB, with a centrality index of 0.23. The third was Behaviour Research and Therapy (Behav RES THER), with a centrality index of 0.15. The fourth was BRAIN, with a centrality

index of 0.14. The fifth was Exp BRAIN RES, with a centrality index of 0.11.

# Who's Who in VR Research

Authors are the heart and brain of research, and their roles in a field are to define the past, present, and future of disciplines and to make significant breakthroughs to make new ideas arise (**Figure 5**).

Virtual reality research is very young and changing with time, but the top-10 authors in this field have made fundamentally significant contributions as pioneers in VR and taking it beyond a mere technological development. The purpose of the following highlights is not to rank researchers; rather, the purpose is to identify the most active researchers in order to understand where the field is going and how they plan for it to get there.

The top-ranked author is Riva G, with 180 publications. The second-ranked author is Rizzo A, with 101 publications. The third is Darzi A, with 97 publications. The forth is Aggarwal R, with 94 publications. The six authors following these three are Slater M, Alcaniz M, Botella C, Wiederhold BK, Kim SI, and Gutierrez-Maldonado J with 90, 90, 85, 75, 59, and 54 publications, respectively (**Figure 6**).

Considering the last 5 years, the situation remains similar, with three new entries in the top-10 list, i.e., Muhlberger A, Cipresso P, and Ahmed K ranked 7th, 8th, and 10th, respectively.

The authors' publications number network shows the most active authors in VR research. Another relevant analysis for our focus on VR research is to identify the most cited authors in the field.

For this purpose, the authors' co-citation analysis highlights the authors in term of their impact on the literature considering

the entire time span of the field (White and Griffith, 1981; González-Teruel et al., 2015; Bu et al., 2016). The idea is to focus on the dynamic nature of the community of authors who contribute to the research.

Normally, authors with higher numbers of citations tend to be the scholars who drive the fundamental research and who make the most meaningful impacts on the evolution and development of the field. In the following, we identified the most-cited pioneers in the field of VR Research.

The top-ranked author by citation count is Gallagher (2001), with 694 citations. Second is Seymour (2004), with 668 citations. Third is Slater (1999), with 649 citations. Fourth is Grantcharov (2003), with 563 citations. Fifth is Riva (1999), with 546 citations. Sixth is Aggarwal (2006), with 505 citations. Seventh is Satava (1994), with 477 citations. Eighth is Witmer (2002), with 454

citations. Ninth is Rothbaum (1996), with 448 citations. Tenth is Cruz-neira (1995), with 416 citations.

# Citation Network and Cluster Analyses for VR

Another analysis that can be used is the analysis of document cocitation, which allows us to focus on the highly-cited documents that generally are also the most influential in the domain (Small, 1973; González-Teruel et al., 2015; Orosz et al., 2016).

The top-ranked article by citation counts is Seymour (2002) in Cluster #0, with 317 citations. The second article is Grantcharov (2004) in Cluster #0, with 286 citations. The third is Holden (2005) in Cluster #2, with 179 citations. The 4th is Gallagher et al. (2005) in Cluster #0, with 171 citations. The 5th is Ahlberg (2007) in Cluster #0, with 142 citations. The 6th is Parsons (2008) in Cluster #4, with 136 citations. The 7th is Powers (2008) in Cluster #4, with 134 citations. The 8th is Aggarwal (2007) in Cluster #0, with 121 citations. The 9th is Reznick (2006) in Cluster #0, with 121 citations. The 10th is Munz (2004) in Cluster #0, with 117 citations.

The network of document co-citations is visually complex (**Figure 7**) because it includes 1000s of articles and the links among them. However, this analysis is very important because can be used to identify the possible conglomerate of knowledge in the area, and this is essential for a deep understanding of the area. Thus, for this purpose, a cluster analysis was conducted (Chen et al., 2010; González-Teruel et al., 2015; Klavans and Boyack, 2015). **Figure 8** shows the clusters, which are identified with the two algorithms in **Table 2**.

The identified clusters highlight clear parts of the literature of VR research, making clear and visible the interdisciplinary nature of this field. However, the dynamics to identify the past, present, and future of VR research cannot be clear yet. We analysed the relationships between these clusters and the temporal dimensions of each article. The results are synthesized in **Figure 9**. It is clear that cluster #0 (laparoscopic skill), cluster #2 (gaming and rehabilitation), cluster #4 (therapy), and cluster #14 (surgery) are the most popular areas of VR research. (See **Figure 9** and **Table 2** to identify the clusters.) From **Figure 9**, it also is possible to identify the first phase of laparoscopic skill (cluster #6) and therapy (cluster #7). More generally, it is possible to identify four historical phases (colors: blue, green, yellow, and red) from the past VR research to the current research.

We were able to identify the top 486 references that had the most citations by using burst citations algorithm. Citation burst is an indicator of a most active area of research. Citation burst is a detection of a burst event, which can last for multiple years as well as a single year. A citation burst provides evidence that a particular publication is associated with a surge of citations. The

burst detection was based on Kleinberg's algorithm (Kleinberg, 2002, 2003). The top-ranked document by bursts is Seymour (2002) in Cluster #0, with bursts of 88.93. The second is Grantcharov (2004) in Cluster #0, with bursts of 51.40. The third is Saposnik (2010) in Cluster #2, with bursts of 40.84. The fourth is Rothbaum (1995) in Cluster #7, with bursts of 38.94. The fifth is Holden (2005) in Cluster #2, with bursts of 37.52. The sixth is Scott (2000) in Cluster #0, with bursts of 33.39. The seventh is Saposnik (2011) in Cluster #2, with bursts of 33.33. The eighth is Burdea et al. (1996) in Cluster #3, with bursts of 32.42. The ninth is Burdea and Coiffet (2003) in Cluster #22, with bursts of 31.30. The 10th is Taffinder (1998) in Cluster #6, with bursts of 30.96 (**Table 3**).

# Citation Network and Cluster Analyses for AR

Looking at Augmented Reality scenario, the top ranked item by citation counts is Azuma (1997) in Cluster #0, with citation counts of 231. The second one is Azuma et al. (2001) in Cluster #0, with citation counts of 220. The third is Van Krevelen (2010) in Cluster #5, with citation counts of 207. The 4th is Lowe (2004) in Cluster #1, with citation counts of 157. The 5th is Wu (2013) in Cluster #4, with citation counts of 144. The 6th is Dunleavy (2009) in Cluster #4, with citation counts of 122. The 7th is Zhou (2008) in Cluster #5, with citation counts of 118. The 8th is Bay (2008) in Cluster #1, with citation counts of 117. The 9th is Newcombe (2011) in Cluster #1, with citation counts of 109. The 10th is Carmigniani et al. (2011) in Cluster #5, with citation counts of 104.

The network of document co-citations is visually complex (**Figure 10**) because it includes 1000s of articles and the links among them. However, this analysis is very important because can be used to identify the possible conglomerate of knowledge in the area, and this is essential for a deep understanding of the area. Thus, for this purpose, a cluster analysis was conducted (Chen et al., 2010; González-Teruel et al., 2015; Klavans and Boyack, 2015). **Figure 11** shows the clusters, which are identified with the two algorithms in **Table 3**.

The identified clusters highlight clear parts of the literature of AR research, making clear and visible the interdisciplinary nature of this field. However, the dynamics to identify the past, present, and future of AR research cannot be clear yet. We analysed the relationships between these clusters and the temporal dimensions of each article. The results are synthesized in **Figure 12**. It is clear that cluster #1 (tracking), cluster #4 (education), and cluster #5 (virtual city environment) are the current areas of AR research. (See **Figure 12** and **Table 3** to identify the clusters.) It is possible



to identify four historical phases (colors: blue, green, yellow, and red) from the past AR research to the current research.

We were able to identify the top 394 references that had the most citations by using burst citations algorithm. Citation burst is an indicator of a most active area of research. Citation burst is a detection of a burst event, which can last for multiple years as well as a single year. A citation burst provides evidence that a particular publication is associated with a surge of citations. The burst detection was based on Kleinberg's algorithm (Kleinberg, 2002, 2003). The top ranked document by bursts is Azuma (1997) in Cluster #0, with bursts of 101.64. The second one is Azuma et al. (2001) in Cluster #0, with bursts of 84.23. The third is Lowe (2004) in Cluster #1, with bursts of 64.07. The 4th is Van Krevelen (2010) in Cluster #5, with bursts of 50.99. The 5th is Wu (2013) in Cluster #4, with bursts of 47.23. The 6th is Hartley (2000) in Cluster #0, with bursts of 37.71. The 7th is Dunleavy (2009) in Cluster #4, with bursts of 33.22. The 8th is Kato (1999) in Cluster #0, with bursts of 32.16. The 9th is Newcombe (2011) in Cluster #1, with bursts of 29.72. The 10th is Feiner (1993) in Cluster #8, with bursts of 29.46 (**Table 4**).

# DISCUSSION

Our findings have profound implications for two reasons. At first the present work highlighted the evolution and development of VR and AR research and provided a clear perspective based on solid data and computational analyses. Secondly our findings on VR made it profoundly clear that the clinical dimension is one of the most investigated ever and seems to

FIGURE 9 | Network of document co-citation: the dimensions of the nodes represent centrality, the dimensions of the characters represent the rank of the article rank and the red writing on the right hand side reports the number of the cluster, such as in Table 2, with a short description that was extracted accordingly.


TABLE 3 | Cluster ID and references of burst article.

increase in quantitative and qualitative aspects, but also include technological development and article in computer science, engineer, and allied sciences.

**Figure 9** clarifies the past, present, and future of VR research. The outset of VR research brought a clearly-identifiable development in interfaces for children and medicine, routine use and behavioral-assessment, special effects, systems perspectives, and tutorials. This pioneering era evolved in the period that we can identify as the development era, because it was the period in which VR was used in experiments associated with new technological impulses. Not surprisingly, this was exactly concomitant with the new economy era in which significant investments were made in information technology, and it also was the era of the so-called 'dot-com bubble' in the late 1990s. The confluence of pioneering techniques into ergonomic studies within this development era was used to develop the first effective clinical systems for surgery, telemedicine, human spatial navigation, and the first phase of the development of therapy and laparoscopic skills. With the new millennium, VR research switched strongly toward what we can call the clinical-VR era, with its strong emphasis on rehabilitation, neurosurgery, and a new phase of therapy and laparoscopic skills. The number of applications and articles that have been published in the last 5 years are in line with the new technological development that we are experiencing at the hardware level, for example, with so many new, HMDs, and at the software level with an increasing number of independent programmers and VR communities.

research to the current research.

Finally, **Figure 12** identifies clusters of the literature of AR research, making clear and visible the interdisciplinary nature of this field. The dynamics to identify the past, present, and future of AR research cannot be clear yet, but analyzing the relationships between these clusters and the temporal dimensions of each article tracking, education, and virtual city environment are the current areas of AR research. AR is a new technology that is showing its efficacy in different research fields, and providing a novel way to gather behavioral data and support learning, training, and clinical treatments.

Looking at scientific literature conducted in the last few years, it might appear that most developments in VR and AR studies have focused on clinical aspects. However, the reality is more complex; thus, this perception should be clarified. Although researchers publish studies on the use of VR in clinical settings, each study depends on the technologies available. Industrial development in VR and AR changed a lot in the last 10 years. In the past, the development involved mainly hardware solutions while nowadays, the main efforts pertain to the software when developing virtual solutions. Hardware became a commodity that is often available at low cost. On the other hand, software needs to be customized each time, per each experiment, and this requires huge efforts in term of development. Researchers in AR and VR today need to be able to adapt software in their labs.

Virtual reality and AR developments in this new clinical era rely on computer science and vice versa. The future of VR and AR is becoming more technological than before, and each day, new solutions and products are coming to the market. Both from software and hardware perspectives, the future of AR and VR depends on huge innovations in all fields. The gap between the past and the future of AR and VR research is about the "realism" that was the key aspect in the past versus the "interaction" that is the key aspect now. First 30 years of VR and AR consisted of a continuous research on better resolution and improved perception. Now, researchers already achieved a great resolution and need to focus on making the VR as realistic as possible, which is not simple. In fact, a real experience implies a realistic interaction and not just great resolution. Interactions can be improved in infinite ways through new developments at hardware and software levels.

Interaction in AR and VR is going to be "embodied," with implication for neuroscientists that are thinking about new solutions to be implemented into the current systems (Blanke

et al., 2015; Riva, 2018; Riva et al., 2018). For example, the use of hands with contactless device (i.e., without gloves) makes the interaction in virtual environments more natural. The Leap Motion device<sup>1</sup> allows one to use of hands in VR without the use of gloves or markers. This simple and low-cost device allows the VR users to interact with virtual objects and related environments in a naturalistic way. When technology is able to be transparent,

users can experience increased sense of being in the virtual environments (the so-called sense of presence).

Other forms of interactions are possible and have been developing continuously. For example, tactile and haptic device able to provide a continuous feedback to the users, intensifying their experience also by adding components, such as the feeling of touch and the physical weight of virtual objects, by using force feedback. Another technology available at low cost that facilitates interaction is the motion tracking system, such as

```
1https://www.leapmotion.com/
```
identified with colored polygons.



Microsoft Kinect, for example. Such technology allows one to track the users' bodies, allowing them to interact with the virtual environments using body movements, gestures, and interactions. Most HMDs use an embedded system to track HMD position and rotation as well as controllers that are generally placed into the user's hands. This tracking allows a great degree of interaction and improves the overall virtual experience.

A final emerging approach is the use of digital technologies to simulate not only the external world but also the internal bodily signals (Azevedo et al., 2017; Riva et al., 2017): interoception, proprioception and vestibular input. For example, Riva et al. (2017) recently introduced the concept of "sonoception" (www. sonoception.com), a novel non-invasive technological paradigm based on wearable acoustic and vibrotactile transducers able to alter internal bodily signals. This approach allowed the development of an interoceptive stimulator that is both able to assess interoceptive time perception in clinical patients (Di Lernia et al., 2018b) and to enhance heart rate variability (the short-term vagally mediated component—rMSSD) through the modulation of the subjects' parasympathetic system (Di Lernia et al., 2018a).

In this scenario, it is clear that the future of VR and AR research is not just in clinical applications, although the implications for the patients are huge. The continuous development of VR and AR technologies is the result of research in computer science, engineering, and allied sciences. The reasons for which from our analyses emerged a "clinical era" are threefold. First, all clinical research on VR and AR includes also technological developments, and new technological discoveries are being published in clinical or technological journals but with clinical samples as main subject. As noted in our research, main journals that publish numerous articles on technological developments tested with both healthy and patients include Presence: Teleoperators & Virtual Environments, Cyberpsychology & Behavior (Cyberpsychol BEHAV), and IEEE Computer Graphics and Applications (IEEE Comput Graph). It is clear that researchers in psychology, neuroscience, medicine, and behavioral sciences in general have been investigating whether the technological developments of VR and AR are effective for users, indicating that clinical behavioral research has been incorporating large parts of computer science and engineering. A second aspect to consider is the industrial development. In fact, once a new technology is envisioned and created it goes for a patent application. Once the patent is sent for registration the new technology may be made available for the market, and eventually for journal submission and publication. Moreover, most VR and AR research that that proposes the development of a technology moves directly from the presenting prototype to receiving the patent and introducing it to the market without

publishing the findings in scientific paper. Hence, it is clear that if a new technology has been developed for industrial market or consumer, but not for clinical purpose, the research conducted to develop such technology may never be published in a scientific paper. Although our manuscript considered published researches, we have to acknowledge the existence of several researches that have not been published at all. The third reason for which our analyses highlighted a "clinical era" is that several articles on VR and AR have been considered within the Web of Knowledge database, that is our source of references. In this article, we referred to "research" as the one in the database considered. Of course, this is a limitation of our study, since there are several other databases that are of big value in the scientific community, such as IEEE Xplore Digital Library, ACM Digital Library, and many others. Generally, the most important articles in journals published in these databases are also included in the Web of Knowledge database; hence, we are convinced that our study considered the top-level publications in computer science or engineering. Accordingly, we believe that this limitation can be overcome by considering the large number of articles referenced in our research.

Considering all these aspects, it is clear that clinical applications, behavioral aspects, and technological developments in VR and AR research are parts of a more complex situation

# REFERENCES


compared to the old platforms used before the huge diffusion of HMD and solutions. We think that this work might provide a clearer vision for stakeholders, providing evidence of the current research frontiers and the challenges that are expected in the future, highlighting all the connections and implications of the research in several fields, such as clinical, behavioral, industrial, entertainment, educational, and many others.

# AUTHOR CONTRIBUTIONS

PC and GR conceived the idea. PC made data extraction and the computational analyses and wrote the first draft of the article. IG revised the introduction adding important information for the article. PC, IG, MR, and GR revised the article and approved the last version of the article after important input to the article rationale.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.02086/full#supplementary-material




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer GC declared a shared affiliation, with no collaboration, with the authors PC and GR to the handling Editor at the time of the review.

Copyright © 2018 Cipresso, Giglioli, Raya and Riva. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Why Is Virtual Reality Interesting for Philosophers?

Thomas K. Metzinger 1,2 \*

<sup>1</sup> Philosophisches Seminar, Johannes Gutenberg-Universität, Mainz, Germany, <sup>2</sup> Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany

This article explores promising points of contact between philosophy and the expanding field of virtual reality research. Aiming at an interdisciplinary audience, it proposes a series of new research targets by presenting a range of concrete examples characterized by high theoretical relevance and heuristic fecundity. Among these examples are conscious experience itself, "Bayesian" and social VR, amnestic re-embodiment, merging human-controlled avatars and virtual agents, virtual ego-dissolution, controlling the reality/virtuality continuum, the confluence of VR and artificial intelligence (AI) as well as of VR and functional magnetic resonance imaging (fMRI), VR-based social hallucinations and the emergence of a virtual Lebenswelt, religious faith and practical phenomenology. Hopefully, these examples can serve as first proposals for intensified future interaction and mark out some potential new directions for research.

Keywords: virtual reality, augmented reality, mixed reality, emptiness, philosophy of religion, life-world, social hallucinations, consciousness

#### Edited by:

Massimo Bergamasco, Scuola Sant'Anna di Studi Avanzati, Italy

#### Reviewed by:

Torsten Kuhlen, RWTH Aachen Universität, Germany Franco Fabbro, Università degli Studi di Udine, Italy

> \*Correspondence: Thomas K. Metzinger metzinger@uni-mainz.de

#### Specialty section:

This article was submitted to Virtual Environments, a section of the journal Frontiers in Robotics and AI

Received: 10 September 2017 Accepted: 31 July 2018 Published: 13 September 2018

#### Citation:

Metzinger TK (2018) Why Is Virtual Reality Interesting for Philosophers? Front. Robot. AI 5:101. doi: 10.3389/frobt.2018.00101 "Virtual reality encompasses virtual unreality" (Slater and Sanchez-Vives, 2016, p.38).

# INTRODUCTION

What are the most promising future directions for an intensified cooperation between the philosophical community and virtual reality research (VR), potentially also including other disciplines like cognitive neuroscience or experimental psychology? The purpose of this contribution is to take a fresh look, from a philosopher's perspective, at some specific research areas in the field of VR, isolating and highlighting aspects of particular interest from a conceptual and metatheoretical perspective. This article is intended as a source of inspiration for an interdisciplinary audience; if each reader finds just one of the ideas presented below useful, it will have served its purpose. Hence the article was not written as a technical contribution by one philosopher for other philosophers and is not meant as an exhaustive list of philosophical research targets. I simply draw attention to a selection of topics that are, I believe, characterized by an exceptionally high degree of heuristic fecundity. To make these issues accessible to an interdisciplinary readership, I will briefly introduce some central concepts as I go (see **Box 1**), and sometimes use a more essayistic style. The hope is that these topics, deliberately presented along with a series of concrete examples, can serve as contact points between both disciplines and mark out promising subfields in which VR researchers and the philosophical community could profit from intensified future interaction. I will briefly highlight the theoretical relevance of most examples, along with the potential future benefits of intensified cooperation. Sometimes, I will also try to sketch a specific technological realization that would interestingly constrain philosophical theory formation, open new routes, or constitute the "perfect" or "maximal" VR-experience in a given context.

#### Box 1 | Philosophical concepts.

#### Amnestic re-embodiment

Re-embodiment of the subject of experience in VR without the conscious knowledge that one is currently immersed in a virtual environment and identified with a virtual body or character.

#### Counterfactual content

A linguistic statement or a mental representation has counterfactual content if it contradicts the current state of reality. Thought experiments, conscious experiences, and most computer-generated models of reality are counterfactual in this sense, because they do not represent or reflect the actual, current state of the world. In some cases, they may simply be classified as "false" or "misrepresentational," in other cases they may be adequate, for example if they target a possible, but highly likely perceptual situation.

#### Epistemic agent model (EAM)

A conscious internal model of the self as actively selecting targets of knowledge, as an agent that stands in epistemic relations (like "perceiving," "believing," "knowing") to the world and to itself (as in "controlling the focus of attention," "reasoning" or "knowing that one knows"), and as an entity that has the capacity to actively create such relations of knowing. Human beings only have an EAM intermittently, for about one third of their conscious life-time (Metzinger, 2013b, Metzinger, 2015 section 2.5, in Metzinger, 2017). Today's virtual agents and robots are not yet driven by an internal EAM.

#### Epistemic innocence

The theory that certain mental processes such as delusion and confabulation (which may count as suboptimal from an epistemological perspective) can have epistemic benefits. The idea is that in some cases, what superficially appears as an imperfect cognitive process may really enable knowledge acquisition.

#### Epistemology

The study of knowledge that seeks to answer questions like: What are the necessary and sufficient conditions for saying that one possess knowledge? What makes a belief a justified belief? How many different kinds of knowledge are there? Is there anything like certainty?

#### Global neural correlate of consciousness (GNCC)

The minimally sufficient set of neurofunctional properties that brings about the conscious model of reality as a whole at a given point in time.

#### Global transparency

Phenomenal transparency for a whole conscious model of reality.

#### Hybrid avatar/virtual agent systems (HAVAS)

Digital representations of persons and/or epistemic agents which are simultaneously human-controlled and AI-controlled.

#### Justified true belief

According to one traditional philosophical model, three individually necessary conditions (namely, truth, belief, and justification) are jointly sufficient for a subject S to possess knowledge: S knows that p if and only if p is true and S is justified in believing that p.

#### Lebenswelt (life-world)

A pre-given social world in which subjects experience themselves as being united by a quality of "togetherness." A Lebenswelt is intersubjectively given and is actively constituted by everyday social interactions leading to a shared first-person plural perspective (a more or less implicit group context, a mentally represented "we").

#### Ontology

In philosophy, the investigation of what there is, i.e., of what entities exist and what the most general features and relations of those entities are (for example, physical objects, God, universals, numbers, etc.). In computer and information science, the representation, formal naming, and definition of entities and relations substantiating a domain.

#### Other-minds illusion

The conscious experience of currently interacting with a system that has mental states when it really has none, for example the (hallucinatory) experience of encountering another self-conscious entity that actively selects targets of knowledge or is really "perceiving," "believing," "knowing" in a way that is relevantly similar to the observer. An other-minds illusion is a social hallucination.

#### Other-minds problem

The epistemological problem of gaining knowledge about another entity's mental states, for example the subjectively felt character of its conscious experiences or the content of its beliefs.

#### Phenomenal transparency

Transparency as used in this article is a property of conscious representations; unconscious representations in the human brain are neither transparent nor opaque. "Transparency" means that only the content of a representation is available for introspective access; the earlier processing stages or aspects of the construction process are hidden. Therefore, the content cannot be subjectively experienced as a representation. This leads to the phenomenology of "direct realism," a subjective experience of immediacy and realness, as if, for example, directly perceiving mind-independent objects.

#### Phenomenal unit of identification (UI)

The conscious content that is referred to in phenomenological reports of the type "I am this!" (see section Example 2: Embodiment and Bodily Self-Consciousness, for a definition cf. Metzinger, 2018a).

#### Box 1 | Continued

#### Postbiotic social boot-strapping scenario (PSBS)

A scenario in which multiple AIs create a non-biological Lebenswelt by mutually interacting with each other using virtual agents based on transparent, VR-based personoid interfaces, thereby causing robust other-minds illusions in each other. Such systems would apply the algorithms they originally developed in man-machine interactions to machine-machine communication, while still using individual virtual avatars or agent-models as their interfaces.

#### rt-fMRI-NCCF

A real-time fMRI representation of the global NCC that is directly converted into a virtual reality environment. This would create a perceivable dynamic landscape which the conscious subject can directly experience, navigate, and causally influence via multiple real-time neurofeedback loops.

#### Second-order virtual agent

A machine-controlled virtual character or person-model that transparently represents itself as socially situated, i.e., that has an internal model of itself as standing in genuine social relations to other persons or other self-conscious agents. A second-order agent has an inbuilt other-minds illusion.

#### Social hallucination

See other-minds illusion.

#### Synthetic phenomenology (SP)

Artificial conscious experience realized on non-biological carrier systems.

# PHILOSOPHY OF MIND

Empirically informed philosophy of mind is, rather obviously, the area within philosophy that can most directly profit from recent results in VR research. The VR community should also actively seek more productive input from philosophers of mind. I will confine myself to two examples.

# Example 1: Consciousness

The richest, maximally robust, and close-to-perfect VRexperience we currently know is our very own, ordinary, biologically evolved form of waking consciousness itself. VR is the best technological metaphor for conscious experience we currently have. The history of philosophy has shown how technological metaphors for the human mind always have their limitations: Think of the mechanical clock, the camera, the steam engine, or, more recently, the computer as a physically realized abstract automaton, with psychological properties as exhaustively described by a Turing machine table (Putnam, 1967, 1975, 1992; Churchland, 2005; Boden, 2006). All these metaphors have severe limitations. Using the computer example, the classical-cognitivist metaphor of a von-Neuman-machine cannot accommodate dynamical embodiment, subsymbolic representation, non-rule based types of information processing, or the experiential character of phenomenal states as subjectively experienced from a first-person perspective. Nevertheless, it is hard to underestimate the influence and impact the longabandoned "computer model of mind" has had on modern analytic philosophy of mind. Technological metaphors often possess great heuristic fecundity and help us in developing new ideas and testable hypotheses. Indeed, the computer model of mind has led to the emergence of a whole new academic discipline: classical cognitive science. Similarly, I believe that the heuristic potential of the VR metaphor for philosophical theories of consciousness has just barely been grasped.

Some philosophers (Metzinger, 1991, p. 127; Metzinger, 1993, p. 243; Metzinger, 2003a, 2010, p. 6; Revonsuo, 1995, p. 55; Revonsuo, 2006, p. 115; Noë, 2002; cf. Clowes and Chrisley, 2012; Westerhoff, 2016, for critical discussion and recent overviews) have already argued at length that the conscious experience produced by biological nervous systems is a virtual model of the world—a dynamic internal simulation. In standard situations it cannot be experienced as a virtual model because it is phenomenally transparent—we "look through it" as if we were in direct and immediate contact with reality (Moore, 1903; Metzinger, 2003b; for the notion of "phenomenal transparency" and a brief explanation of other philosophical concepts see **Box 1**). Likewise, technological VR is the representation of possible worlds and possible selves, with the aim of making them appear ever more realistic—ideally, by creating a subjective sense of "presence" in the user. "Presence" is a complex phenomenal quality, the three major dimensions of which are identification (i.e., being present as a self), self-location in a temporal frame of reference (i.e., being present as a self now, in this very moment), and self-location in space (i.e., the classical "place illusion," Slater and Sanchez-Vives, 2016). "Presence" is a phenomenal quality normally going along with a minimal sense of selfhood (Blanke and Metzinger, 2009), and it results from the simulation of a self-centered world—in VR settings as well as in everyday life. Interestingly, some of our best theories of the human mind and conscious experience itself use a similar explanation: Leading current theories of brain dynamics (Friston, 2010; Hohwy, 2013; Clark, 2016; Metzinger and Wiese, 2017) describe it as the constant creation of hierarchical internal models of the world, virtual neural representations of reality which express probability density functions and work by continuously generating hypotheses about the hidden causes of sensory input, minimizing their prediction error (see Wiese and Metzinger, 2017 for an accessible introduction). The parallels between virtuality and phenomenality are striking. Here are some points of contact between VR and philosophical phenomenology:

• Phenomenal content and virtual content are both counterfactual.

Our best current theories of consciousness describe it as something that could be called a form of "online dreaming" (Metzinger, 2003a, p. 140), in which conscious waking is a dreamlike state currently modulated by the constraints produced by ongoing sensory input. It is a controlled hallucination based on predictions about the current sensory input (Hohwy, 2013; Clark, 2016; Wiese and Metzinger, 2017). Relative to the actual state of the world, if taken as referring to this state, all predictive representations are non-veridical. Strictly speaking they are misrepresentations but are nevertheless potentially beneficial for the system in which they occur (Wiese, 2017). VR content is typically part of an animated computer graphics model, and if taken as depicting the actual physical 3D scene surrounding the user, it is also a misrepresentation. However, VR content does not result from a design flaw—the whole point is to generate perceptual representations of possible worlds in the user's brain, not of the actual one. Phenomenal content (the brain-based content of conscious, subjective experience) is the content of an ongoing simulation too: a prediction of the probable causes of a sensory signal. It is not a veridical representation of the actual environment, and it is useful for precisely this reason. By definition, machine-generated virtual content is counterfactual (see **Box 1**) as well, although it may be interestingly blended with real-world elements, as in augmented reality (AR) setups.

If this first point is correct, then it would be interesting to create VR utilizing the same mechanisms the human brain uses. What if, for example, in dynamically updating itself, the animated computer graphics model used the same computational principles of top-down processing, statistical estimation, prediction error minimization, hierarchical Bayesian inference, and predictive control many theoreticians now believe to be operative in the brain itself? Would this change the user's phenomenology in any interesting way, for example its fine-grained temporal dynamics? This is one example of a new research question that is interesting from a philosophical perspective, but which also has implications for making the VR experience better. It should therefore be of interest to people in the field and it could be tackled with interdisciplinary cooperation.

• VR and conscious experience both present us with an integrated ontology.

Ontology is not only a subfield in academic philosophy investigating the logic and semantics of concepts like "being," "becoming," or "existence." The concept also refers to an area in computer science and information science investigating the representation, formal naming, and definition of the categories, properties, and relations of the concepts, data, and entities that substantiate a given—or even all possible—domains. Interestingly, the conscious brain is an information-processing system too, and it certainly represents data and entities as "being," "becoming," or "existing." Conscious experience can be described as a highly-integrated set of hypotheses about the likely causes of the inputs received by the embodied brain, in the external as well as in the internal (i.e., intraorganismic) environment. It is not a list of propositions containing existential quantifiers. For an information-processing system to be conscious means it runs under an integrated ontology (see **Box 1**), a unified, subsymbolic situation model, which is internally presented to it in an integrated temporal frame of reference defining a subjective now, a "window of presence" (Metzinger, 2003a). VR creates ontologies and integrated situation models too, but their presentation within a single "lived moment" (i.e., a Jamesian "specious present," the temporal frame of reference referred to above, plus the construction of an experiential subject; see Clowes and Chrisley, 2012, p. 511), is still left to the brain of the user. If this is correct, then it follows that if we understand the computational principles underlying self-location and self-presentation within an internal temporal frame of reference in our brains, and if future VR technology were then to create a virtual "specious present" as part of a yet-to-be-invented form of virtual time representation, then this would amount to the creation of a very simple form of artificial consciousness.

The conjunction of these two first points leads us to a bold general claim: The "perfect" VR system would lead to artificial consciousness—the creation of synthetic phenomenology (SP; see **Box 1**) 1 . Call this the "SP-principle": In its maximal realization, VR would be tantamount to the creation of artificial phenomenal states, to a technological realization of synthetic phenomenology. Of course, this would have to include a fully integrated multimodal scene, a virtual "specious present," a selfmodel that creates a first-person perspective by being a virtual model of an epistemic agent (EAM; see **Box 1** and section 2.5 in Metzinger, 2017, for details and further references), plus global transparency (see **Box 1**). Today, there still is a biological user, partially immersed in a visual situation model created by advanced computer graphics, and what VR technology ultimately aims at is real-time control of the information flow within the minimally sufficient global neural correlate of consciousness (GNCC; see **Box 1**).

This seems to be a second general principle: As of today, the ultimate "engineering target" is the conscious model of reality in a biological agent's brain. VR is a non-invasive form of neurotechnology targeting the GNCC. But imagine a world without conscious biological creatures, in which an autonomous, intelligent robot had learned to control its interaction with its physical environment by opening an internal global workspace. Then imagine that the content of this workspace is determined by the "perfect" VR sketched above. If we imagine this robot as internally using a maximal realization of VR—global integration, specious present, transparency, and a self-model which it now "confuses" with itself (see section Example 2: Embodiment and Bodily Self-Consciousness and Metzinger, 2003a), then this

<sup>1</sup> In other writings, I have argued for a moratorium for synthetic phenomenology on ethical grounds. For reasons of space, I exclude this issue here, but see Metzinger (2010), Metzinger (2013a), Mannino et al. (2015), and Metzinger (2018b).

would be tantamount to a machine model of embodied conscious experience.

• Phenomenal content and virtual content are both locally determined.

It is widely accepted in philosophy of mind that phenomenal properties supervene locally; and as of today, virtual content is processed in single machines and presented by local devices to individual users. This will soon change through the confluence of developments in VR, brain-computer interfaces (BCIs), and social networks. For philosophers, this will create an interesting new target for the internalism/externalism debate on mental content (Menary, 2010). For computer scientists, the question arises of what the "perfect" form of social VR would actually be. In social VR, what exactly is the relationship between the phenomenal content locally instantiated in the brains of multiple users and shared virtual content created by causal interactions distributed over different machines and artificial media? Social VR is a field that needs a combination of new technological approaches and rigorous conceptual analysis, for instance with regard to the concept of "tele-immersion" (for an excellent example, see Ohl, 2017).

The first contribution computer scientists can and certainly will make lies in the field of interface design: For the special domain of social VR, what would be the most efficient and reliable interfaces linking human brains via BCI-coupling and shared VR? This issue is theoretically relevant because it addresses a classical philosophical problem: the "other-minds problem" (**Box 1**). We assume that each of us has a direct knowledge of our own experience, but we can never directly know that someone other than ourselves is in the mental state they are in. Could social VR create more direct forms of knowing another person's mind? Could it provide us with new means of acquiring phenomenological concepts like "red" or "joy" which we apply to inner states of sentient creatures other than ourselves? We have already begun to causally couple the selfmodels in human user's brains to robots and avatars via robotic and virtual re-embodiment today (see **Figure 1**), but what would a more direct linkage of conscious minds involve? They could constitute new inner "modes of presentation" for social facts, as philosophers might say. It is interesting to note how we are already beginning to re-embody ourselves not only in robots and avatars, but also in other human being's physical bodies (De Oliveira et al., 2016). This naturally leads to the question of virtually re-instantiating the higher levels of a human user's self-model in those of another human being's self-model, of a more abstract form of re-embodiment in another self-conscious mind. This then would be the step from virtual body swap (Petkova and Ehrsson, 2008) to virtual mind swap. Apart from a careful conceptual description of research targets, the coupling of whole conscious world-models plus embedded high-level, cognitive self-representations (and not only bodily self-models) would require a deep confluence of neurotechnology and VR. The maximal realization of social VR would therefore consist in creating an artificial platform on which whole individual biological minds can merge, thereby transcending the principle of local determination.

• Phenomenal content and virtual content can vary along a continuum of opacity and transparency, respectively, of explicit virtuality and projected realism.

Today, a broad standard definition of "phenomenal transparency" (**Box 1**), on which most philosophers roughly agree, is that it essentially consists in only the content properties of a conscious mental representation being available for introspection. Any non-intentional or "vehicleproperties" involved in the representation are not available for introspection. In other words, it is not experienced as a representation. Introspectively, we can access its content, but not the content-formation process itself. Typically, it is assumed that transparency in this sense is a property of all phenomenal states (for more, see Metzinger, 2003a,b).

But of course, the standard assumption is incomplete, because opaque phenomenal representations also exist (whereas unconscious states are neither transparent nor opaque in this sense). Phenomenological examples of opaque state-classes are, most notably, consciously experienced thoughts: We experience them as mind-dependent, as mental representations that could be true or false. Similarly, some emotions, pseudo-hallucinations, and lucid dreams are subjectively experienced as representational processes. Most importantly, the phenomenology of VR is also typically characterized by incomplete immersion, with varying degrees of opacity. This may change as the technology advances. Phenomenally opaque processes sometimes appear to us as deliberately initiated cognitive or representational processes. However, sometimes they appear to be automatic or spontaneously occurring; they are limited or even global phenomenal simulations and frequently are not under the experiential subject's control.

Here is another concrete proposal: Perhaps the most interesting contribution VR researchers could make is to develop a reliable "volume control for realness." Obviously, a clear conceptual taxonomy is needed as well, but the role of computer scientists in this type of cooperation would lie in developing a metric for immersion and selfidentification—a quantifiable approach. The interesting point here is that human phenomenology varies along a spectrum from "realness" to "mind-dependence." This frequently overlooked phenomenological feature provides another conceptual bridge into the representational deep structure of VR-environments: there are degrees of immersion. VR environments can be more or less realistic, and this general property is itself directly and concretely reflected in the user's phenomenology (cf. the epigraph for this article). Below I will argue that VR is the most relevant technology to create innovative experimental designs for philosophical phenomenologists interested in empirically researching the transparency/opacity continuum characterizing human consciousness (as introduced in Metzinger, 2003b).

The "perfect" form of VR technology would be one in which the user—or the experimental psychologist, neuroscientist, or philosopher interested in consciousness—could reliably set the "level of realness" for the experience. If we calibrate the transparency parameter of ordinary waking states as 1, then possible levels would include values >1, leading to "hyperreal" phenomenologies (as in certain drug-induced states of consciousness, during "ecstatic" epileptic seizures, or religious experiences), and values <1 (as in "unreal" experiences like depersonalization or derealization disorder). Some VR applications aim at phenomenal presence, realism, embodiment, and an illusion of immediacy, others will want to create a "dreamlike" quality (for example in entertainment settings). There are two specific subtypes of phenomenal states which (a) are of special systematic interest to philosophers, and (b) are directly related to the typical phenomenological profile created by VR technology: the lucid dream state and the out-of-body experience (OBE; see Metzinger, 2009b, 2013c, for philosophical discussion).

There has been a lot of excellent work trying to create OBEs in VR labs, trying to make it a repeatable, experimentally controllable phenomenon (see Ehrsson, 2007; Lenggenhager et al., 2007 for classical studies; Blanke, 2012, for a review). So far, these attempts have not been successful because users do not yet look out of the eyes of the avatar offered as an alternative unit of identification (UI, which in this and other articles is not an abbreviation for "user interface," but instead refers to the conscious experience of self-identification; see **Box 1**, Metzinger, 2018a and the next section for a definition of the concept). Rather, the resulting phenomenology typically resembles the clinical phenomenon of heautoscopy. According to the self-model theory of subjectivity (SMT; Metzinger, 2003a, 2008), the main reason for this failure is that the user's "interoceptive self-model" is firmly locked in the biological body; it cannot be simulated in an avatar yet. The interoceptive self-model is that layer of bodily self-representation in the brain that is driven by internal signals from the viscera and other areas signaling the state of the body to the brain (Craig, 2009; Barrett and Simmons, 2015). The prediction under SMT is that full identification with an avatar can only be achieved under two conditions: Either the avatar has its own interoceptive self-model that can be synchronized with the biological counterpart in the user's brain, or interoceptive experience is selectively blocked and another artificial unit of identification (**Box 1**) is created and technologically exploited. According to SMT, a prime candidate would be the sense of effort going along with mental forms of agency like controlling one's own attention, because this is what creates the sense of self on the mental level. The empirical prediction from SMT is that if avatars in virtual reality had a functional analog of visual attention which the user could control, then the consciously felt "sense of effort" of the user trying to control the avatar's attention would create a deep form of identification. This is another concrete research proposal, derived from a philosophical theory, but fully testable and open for interdisciplinary cooperation. We could call the proposed strategy "subjective identification via interoceptive extinction plus synchronized attentional agency."

What about creating not OBEs (which involves an externalized visuospatial perspective), but lucid dreams with the help of VR technology? The dream body can also be completely devoid of an interoceptive self-model. In section Example 1: Consciousness, I said that waking consciousness could be called a form of "online dreaming." Could VR help to create a new, distinct class of phenomenal states in the form of a new version of lucid online dreaming? Having a metric and an implemented, quantifiable "realness control" for VR would enable experimental psychologists to create a machine-model of the lucid dream state. It would be highly interesting for philosophers of mind, dream researchers, and phenomenologists if they could use VR technology to explore the transparency/opacity gradient of their very own conscious experience at will.

In both subtypes, certain content elements may be experienced as only virtual (e.g., dream reality as such or the immediate environment in which an OBE unfolds), while, phenomenologically, others remain as ultimately real (for example, even in lucid dreams other dream characters encountered by the experiential subject are often taken to be real entities, as is the transparent model of the knowing, observing self in an OBE). On the technological side, the reality/virtuality continuum encompasses all possible variations and combinations of real and virtual objects (Milgram et al., 1994; Milgram and Colquhoun, 1999). The "reality/virtuality continuum" has been described as a concept in new media and computer science, but it is interesting to note how our very own everyday phenomenology also possesses elements that appear "unreal" to us (optical illusions, benign pseudo-hallucinations) or as only diffusely or not at all located in physical space, e.g., as "unworldly," "disembodied," or "mental" (namely, mental action and mind-wandering; see Metzinger, 2015, 2017).

However, as philosophers we must never forget that the reality/virtuality continuum itself only appears in a virtual model activated by our brain. In addition, this brain is embodied and developed against the historical-cultural context of the cognitive niche in which we are born. This is, unfortunately, a deep structural feature which systematically hides its own virtuality from its user, the biological organism in which it appears. One philosophically interesting point is that investigating the phenomenology of VR will give us a deeper understanding of what it really means—and why it was functionally adequate—that the reality-appearance distinction became attentionally as well as cognitively available by being represented on the level of appearance (Metzinger, 2003a). It also leads to subtle and potentially novel insights into the specific phenomenal character related to metaphysical indeterminacy (see section VR-Phenomenology in the Context of Comparative and Transcultural Philosophy).

# Example 2: Embodiment and Bodily Self-Consciousness

Advanced VR technology seeks not only to create the classical place illusion described by Slater and Sanchez-Vives (2016) (section Introduction), it also increasingly targets the deepest layers of human self-consciousness by utilizing techniques for virtual embodiment and robotic re-embodiment (Cohen et al., 2012, 2014a,b; see vereproject.eu for further examples). We are already beginning to use VR technology for re-embodiment in other human bodies (De Oliveira et al., 2016) and many of the more recent empirical results are highly interesting from a conceptual and metatheoretical perspective (see Ehrsson, 2007;

Lenggenhager et al., 2007, for classical studies; Metzinger, 2008, 2009a,b, for accessible introductions; Blanke, 2012, for a review). First, they allow us to distinguish different levels of embodiment and to develop a more fine-grained analysis of bodily selfawareness in humans; second, they open the door to a deeper understanding of the mechanism of identification underlying the way in which a conscious subject of experience locates itself in time and space by identifying with a body. Let me briefly explain this point, as it is of interest for philosophers.

Let us say that for every self-conscious system S there exists a **phenomenal unit of identification** (UI, **Box 1**) such that


If we assume a "predictive processing" model of human brain activity (Friston, 2010; Hohwy, 2013; Clark, 2016; Metzinger and Wiese, 2017), then, for all human beings, C is always counterfactual content because it does not refer to the currently present, actual state of the world. The UI is the best hypothesis the system has about its own global state (Limanowski and Blankenburg, 2013; Limanowski, 2014). For human beings, C is dynamic and highly variable, and it need not coincide with the physical body as represented (for an example, see de Ridder, 2007). There exists a minimal UI, which is likely constituted by pure spatiotemporal self-location (Blanke and Metzinger, 2009; Windt, 2010, 2017; Metzinger, 2013b,c); and there is also a maximal UI, likely constituted by the most general phenomenal property available to S at any point t, namely, the integrated nature of phenomenality per se (Metzinger, 2013b,c, 2016). C is phenomenally transparent: Internally, S experiences the representational content constituting the UI neither as counterfactual nor as veridical, but simply as real. Phenomenally experienced realness is an expression of successful prediction error minimization, high model evidence, and counterfactual richness (e.g., invariance under counterfactual manipulation). Therefore, the UI simply is the transparent partition of the PSM<sup>2</sup> . I submit that perhaps the central philosophical relevance of recent work on virtual embodiment and robotic re-embodiment is that it holds the promise of introducing a set of more fine-grained conceptual distinctions into the theory of embodiment and self-consciousness. Could VR researchers also create a "volume control for self-identification"? Work in VR that helps us to experimentally manipulate the UI in a non-invasive but causally fine-grained manner has already successfully demonstrated its relevance for the neuroscience of bodily self-consciousness, for example by creating innovative experimental designs. Philosophers have already cooperated with neuroscientists and shown how the UI can be influenced to "drift toward" an avatar and how peripersonal space can be expanded in a VR-setting (Blanke et al., 2015; Noel et al., 2015; Serino et al., 2015). However, there are two logical steps that have not yet been taken. There are two types of experiment that would be of great interest to philosophers of mind: Maximizing the UI, and deleting the UI from human phenomenal space altogether. The open question is if engineers and scientists in VR could technologically implement this.

How would one create a VR experience in which the user becomes one with everything? Clearly, this would have to be an entirely passive experimental setup, because any bodily or mental interaction of the user with the system would immediately create a felt sense of agency and therefore keep its phenomenal model of reality split into subject and object, divided into a knowing self and an external environment. How could one create an entirely passive VR experience in which everything the user experiences gradually turns into one big knowing self, a single conscious unit of identification that has been maximized by being expanded to the boundaries of the phenomenal world?

Experiments of the second type would aim at creating selfless states of consciousness. Instead of ego-expansion they would aim at ego-elimination. Such experiments would be interesting because they would create a contrast class or a set of alternative experiences not characterized by a UI—states without any consciously experienced ego. The phenomenon of "ego dissolution" is well-known from pharmacological interventions by classical psychedelics, dissociative anesthetics and agonists of the kappa opioid receptor (Millière, 2017) and it can be measured, for example by the Ego-Dissolution Inventory (EDI; Nour et al., 2016). It occurs in psychiatric diseases, and it has also been reported across the centuries by spiritual practitioners from many different cultures. Comparing results from both types of experiments might help to decide the question if the existence of a UI necessarily leads to a consciously experienced sense of self, or if some states created by maximizing the UI are actually selfless states if assessed with the help of existing inventories for measuring the degree of ego-dissolution. Here, one central question—highly relevant for philosophers, psychologists, and neuroscientists alike—is whether research in VR could help to establish a double dissociation between the phenomenology of identification and the phenomenology of selfhood.

# EPISTEMOLOGY

Epistemology is the study of knowledge and is concerned with questions such as: What are the necessary and sufficient conditions for the possession of knowledge? How many kinds of knowledge are there, and what is their structure, what are their sources and boundaries? What makes a belief a justified belief (**Box 1**)? Accordingly, VR-epistemology might ask questions like these: How does one obtain knowledge about virtual objects, and how do we arrive at justified beliefs about facts holding in a virtual world? Are there such things as virtual facts? Are the necessary and sufficient conditions of knowledge interestingly different if we limit our domain to perceptual content presented via VR? What are the sources of knowledge about elements of a given virtual world? Is justification relative to

<sup>2</sup>This passage draws on an article in the Oxford Handbook of Spontaneous Thought, see Metzinger (2018a).

this specific class of epistemic objects internal or external to one's own mind?

# Example 3: Amnestic Re-embodiment and Epistemic Innocence

VR settings immediately remind every philosopher of Descartes' dream argument: Even in a best-case scenario of sensory perception, we can never rule out that we are now dreaming, because dreaming is subjectively indistinguishable from waking experience (see Windt, 2015, section Example 1: Consciousness). If classical Cartesian Dream Skepticism is on the right track, then at any given moment our implicit belief that we are awake might be mistaken. If this basic background assumption is correct, our current belief that we are, at this moment, not in VR might be mistaken as well, although "in order for the VR to be indistinguishable from reality, the participant would have to not remember that they had "gone into" a VR system" (Slater and Sanchez-Vives, 2016, p. 37).

One interesting form of collaboration between philosophers and VR researchers would be to systematically transpose classical philosophical thought experiments into VR-settings. Here, the question would be if VR researchers could create a full-blown "Cartesian dream." Could there be something like "amnestic re-embodiment" (see **Box 1**) in VR? It seems there are many conceivable scenarios of VR use in which this constraint (let us call it the "SSV-constraint," as it was introduced by Slater and Sanchez-Vives) could be satisfied, for example in animals equipped with head-mounted displays, in human children, in drug users, in sleep labs, or in patients suffering from severe amnesia, intoxication syndromes, or dementia. Moreover, in future entertainment scenarios or in therapeutic applications it may exactly become a goal to purposefully satisfy the SSVconstraint, to make users forget the fact that they currently are in VR.

Bortolotti (2015a,b) recently introduced the concept of "epistemic innocence" (**Box 1**) to articulate the idea that certain mental processes such as delusion and confabulation (which may count as suboptimal from an epistemological perspective) may have not just psychological, but also epistemic benefits. Perhaps amnestic re-embodiment in VR (say, in a pharmacologically supported therapeutic context) could lead not only to psychological benefits that are not simply purchased with the epistemic cost of episodic amnesia, but which also causally enable forms of genuine knowledge acquisition, for example new forms of self-knowledge. Perhaps new philosophical ideas like "amnestic re-embodiment" or "epistemic innocence" can be fruitfully applied in the domain of VR if actually implemented and viewed as a tool for self-exploration, cognitive enhancement, or future psychotherapy. And of course, on a speculative metaphysical level, it is only a question of time until the SSVconstraint will be discussed in relation to real-life experience, to traditional religious theories of reincarnation, "pre-birth amnesia," etc. In any case, it seems safe to predict that many classical issues of external-world skepticism may re-appear in a new guise, playing a central role for the new discipline of VR epistemology.

# Example 4: Knowing Personal Identity

Here, I name one particular example of great relevance for the philosophy of law and the applied ethics of VR: The problem of reliably knowing about another human agent's personal identity in VR. If I want to reliably interact with another human being in VR, for example via avatar-to-avatar interaction, then I need to know the identity of the person currently controlling or even phenomenologically identifying with that avatar. This leads to the problem of avatar ownership and individuation, which will certainly be an important future issue for regulatory agencies to consider.

How does one assign an unequivocal identity to the virtual representation of a body or a person? Could there be something like a chassis plate number, a license plate, or a "virtual vehicle identification number" (VVIN)? We already have digital object identifiers (DOIs) for electronic documents and other forms of content, a form of persistent identification, with the goal of permanently and unambiguously identifying the object with which a given DOI is associated. But what about an avatar that is currently used by a human operator, namely by functionally and phenomenologically identifying with it? Should we dynamically associate a "digital subject identifier" (DSI) with it? (Madary and Metzinger, 2016, p. 17).

Maybe there can be a technological solution to this problem, perhaps similar to the RSA cryptosystem. This presents a technical question to mathematicians and computer scientists: What would be a "non-hackable" mechanism for reliably identifying the current user(s) of a given avatar? But even if we find such a mechanism, the epistemological problem of other minds remains. Even if I can be convinced of the identity of an agent I encounter in VR in a way that suffices for all practical and legal purposes, I will still be interested in a higher degree of certainty when it comes to more direct interpersonal relationships in social VR. Interestingly, as regards the personal identity of social others nothing short of absolute certainty seems to be what we are really interested in—although, as one might certainly argue, even in "normal" non-VR scenarios there always remains room for other-person skepticism, because the mere logical possibility of misrepresenting personal identity can never be fully excluded.

There is one variant of the personal-identity problem which could soon become relevant and for which cooperation between VR specialists and philosophical ethicists will be important. Let us conceptually distinguish between an "avatar" as a digital representation of a single human person in VR (over which they can have agentive control and ownership, functionally as well as on the level of conscious experience), a "human agent" as a normal, self-conscious human being currently controlling a biological body outside of VR, and a "virtual agent" as a virtual character or person-model that is computercontrolled, for example by an advanced AI. For human users in VR, it may be impossible to distinguish between avatars and virtual agents, that is, between digital representations of single human persons currently controlled by a real, biological human, and such representations which are actually AI-controlled, for example by an artificial system not possessing self-consciousness and which does not satisfy the current human criteria for personhood (Dennett, 1988). This may at first seem as just an extension of the problem in current online computer games where there is a mixture of non-playable characters and other people, but there will be much more at stake in future contexts generated by the technological confluence of VR and autonomous AI-systems. Again, a technological solution to this problem would be important to prevent social hallucinations, consumer manipulation, or successful deception by malevolent AI systems. But there are looming conceptual complexities.

For example, avatars could also be jointly controlled by distributed groups of human beings (creating problems of legal personhood, accountability, and ethical responsibility). There could be digital person-models that are avatars and virtual agents at the same time, because they are simultaneously humancontrolled and computer-controlled (HAVAS, see **Box 1**; for example, resembling a self-consciously controlled biological body possessing a large number of highly intelligent, but unconscious motor subroutines), and perhaps in the future human agents outside of VR could be partly computer-controlled as well. I will not discuss any of these complexities here, but simply point out that the problem of personal identity in VR poses challenges for ethics and legal philosophy and that, on a psychological and cultural level, it may greatly change the landscape of future social interactions.

Although the exact etymological origin of the Latin concept of persona is still controversial, it originally referred to the masks worn by actors on stage. It is interesting to note how avatars are exactly this: ever more complex virtual masks worn by human actors on a virtual stage. Social VR resembles an on-stage experience, involving encounters with unknown actors. In this wider context, it may be helpful to recall how in 1938, Antonin Artaud, in first introducing the concept of "virtual reality" described the illusory nature of characters and objects in the theater as "la réalité virtuelle" (in a collection of essays entitled Le Théâter et son double) while at the same time another classical metaphor for human consciousness is the "theater model of mind" (Baars, 1997a,b; for a critique of this model, see Dennett, 1991; Dennett and Kinsbourne, 1992). Isolating the necessary and sufficient conditions for determining the identity of the person behind any such virtual persona is one of the most interesting epistemological problems for philosophers, but they will need help from the VR community in determining what is technologically possible, what is not, and what are rational, evidence-based strategies for risk minimization [the issue will certainly generalize to human interaction with intelligent agents categorized as non-persons, e.g., as a result of future AI/VR confluence, see section Example 7: The "Postbiotic Social Boot-Strapping Scenario" (PSBS)].

# METAPHYSICS

VR can be interestingly described as a computationally implemented ontology (Heim, 1994, 2000; Chalmers, 2017). Its virtual character consists in the quality of its entities having their attributes without sharing a (real or imagined) physical form, but solely by creating a functional emulation of real objects. On a more abstract level of analysis, virtual realities are functional structures defined by input/output relations and by internal relations between states with different and often complex causal roles. Via interfaces enabling sensorimotor interaction, they have the potential to causally enable the instantiation of specific phenomenal properties in the brains of human users. When implemented and in direct causal interaction with an embodied human user, they can perhaps also be interestingly described as explicit assumptions about what exists, as a new type of "metaphysical affordance": I can take this for real. VR opens a space of possible existence assumptions. In providing an explicit model of reality to the user it can also represent objects, properties, and spatial and temporal relations, offer concrete affordances for action, or even present other agentive selves to this user, making them available for reliable and systematic social interaction.

But the novel space of causal interaction opened by VR is not limited to providing affordances for sensorimotor engagement. With the help of advanced brain-computer interfaces (BCIs) we can imagine "mental" actions bypassing the non-neural body (see section Example 5: Walking Around in Your Own NCC With the Help of rt-fMRI-NCCF). Similarly, we can imagine much more causally direct forms of intersubjective communication using much more disembodied forms of social cognition, perhaps even making the "mental" states of users an explicit element of VR (see section Social and Political Philosophy: The danger of complex social hallucinations). As such they could be mutually manipulated, with the interaction acquiring a causal force on its own. Additionally, in VR, assumptions about what exists need not obey the physical laws governing our world: In principle, the set of worlds defined by a given level of VR technology will often be much larger than what would be nomologically possible in the actual world. On the other hand, obvious constraints of technological feasibility again strongly compress the space of mere logical possibility.

VR clearly opens new spaces of causal interaction for human agents, but its relevance to philosophical metaphysics is not immediately obvious. Imagine an empty room with a connected headset lying on a table while a computer is running a complex virtual reality demonstration. It would be hard to construct any metaphysical mysteries in this situation. For example, speaking of "virtual objects" or even "virtual worlds" being created by the machine would not justify assuming that just the running of the system itself changes physical reality in any interesting sense. No new building blocks of reality are created.

VR only becomes philosophically interesting when causally coupled to the pre-existing conscious model of reality running in a user's biological brain (Clowes and Chrisley, 2012, p. 511). Then it begins to change the phenomenal ontology underlying the user's subjective experience, and of course many unconscious expectations as well. In particular, certain high-level priors and assumptions about what the true causal sources of current sensory input really are may now begin to change as the model containing them is continuously updated [see section Example 1: Consciousness (point 1)]. What would we say if an entirely unconscious, but highly complex and intelligent robot began to interact with a VR system? Would we assign any special metaphysical status to the unconscious internal ontology that emerges as the robot learns to successfully interact with the VR? Obviously, we would not want to say that any relevant new metaphysical entities have been created. What has changed is a model, not the deep structure of the physical world. Call this the "Principle of Metaphysical Irrelevance": VR technology per se does not create any new "virtual objects" in a metaphysically interesting sense. But what about virtual subjects? I think the "Principle of Metaphysical Irrelevance" may be interestingly different or invalid in the case of social ontologies: What if independent groups of intelligent, virtual agents began to internally model their social relationships in the way human beings do, creating a robust form of virtual intersubjectivity (see Example #7 below)?

The general principle is that to have an ontology is to interpret a world: The human brain, viewed as a representational system aimed at interpreting our world, possesses an ontology too (Metzinger and Gallese, 2003). It creates primitives and makes existence assumptions, decomposing target space in a way that exhibits a certain invariance, which in turn is functionally significant. It continuously updates its model of reality, minimizing prediction error (Friston, 2010; Wiese and Metzinger, 2017). There are explicit and implicit assumptions about the structure of reality, which at the same time shape the causal profile of the brain's motor output and its representational deep structure. But very often in VR, a completely different world needs to be interpreted and predicted by the brain. Thus, an alternative causal structure has to be extracted. For example, the human motor system normally constructs goals, actions, and intending selves as basic constituents of the world it interprets. It does so by assigning a single, unified causal role to them, and empirical evidence demonstrates that the brain models movements and action goals in terms of multimodal representations of organism-object-relations. Obviously, such relations can undergo dramatic changes in VR as the brain continually adapts a hierarchically structured model of reality to the external invariances provided by artificially created input. But the ontology of the human brain, even when causally embedded in an alien media environment, always remains a representation of the likely causal structure of the world. It is just the current best guess about this causal structure.

In sum, I think that analytical metaphysics is likely the area in philosophy where we can expect the least fruitful interaction with the VR community, simply because virtual ontologies are orthogonal to philosophical problems in metaphysics. Nevertheless, for philosophers interested in metaphysics, there may still be many interesting issues. These issues include the relationship between possible-world theory and VR; the status of properties, categories, universals, individuals, abstract and fictitious objects, events and selves when epistemically accessed under a VR-mode of presentation; questions about virtual time and virtual space [section Example 1: Consciousness (point 2)]; and perhaps also the promise of a richer and more precise account of what actually constitutes a Lebenswelt [**Box 1**; I return to this issue in section Example 7: The "Postbiotic Social Boot-Strapping Scenario" (PSBS)]. Maybe progress on this traditional concept can be achieved as we now begin to construct entirely new life-worlds from scratch. I will also give one example of unexpected metaphysical contact points between VR and intercultural philosophy in the final section on "Comparative Philosophy."

# NEW SUBFIELDS: DIGITAL AESTHETICS, RECENT PHILOSOPHY OF TECHNOLOGY, AND MEDIA THEORY

There are many newer areas of philosophical research for which VR is an obviously central target, including aesthetic judgement and experience (Shelley, 2015), the philosophy of digital art (Thomson-Jones, 2015), the philosophy of technology (Franssen et al., 2015, section 4.1 in Gualeni, 2015), and media philosophy (Gualeni, 2015, Ch. 7; Heim, 2000; Sandbothe, 2000; de Mul, 2015). There already exists a growing literature on VR in these fields, and the interested reader may find entry points to the relevant debates in the works cited here.

# ACTION THEORY, FREE WILL, AND SELF-CONSCIOUSNESS: NOVEL AFFORDANCES FOR ACTION

Increasingly, avatars are not just dynamic, user-controlled models of bodies in space. They also begin to enable sensory perception and instantiate complex properties like emotional expression, intelligent gaze-following, and natural language production. Avatars are gradually turning into semiautonomous, user-controlled models of virtual selves. Above, we conceptually distinguished between avatars and virtual agents, but it is certainly conceivable that digital representations of persons emerge that fall under both concepts simultaneously. One philosophically as well as technically interesting aspect lies in the prediction that virtual agents will, by being coupled to artificial intelligence (AI), gain strong cognitive self-models. For example, they could function as complex output devices or personoid interfaces by which larger AI systems communicate with humans. But to be really good interfaces, they will have to model the needs and goals of their human users and engage in advanced social cognition. If they reflexively apply their social cognition modules to themselves, they may therefore begin to represent themselves as "knowing selves," even if they are still partly humancontrolled. Therefore, it is also conceivable that such hybrid avatar/virtual agent-systems (**Box 1**) become proper epistemic agents (EAMs; cf. Metzinger, 2017, 2018a; **Box 1**) rather than merely virtual bodies moving in virtual space, thereby simulating a system that possesses and actively expands its own knowledge.

A second important aspect of this historical development is that there already exists a biologically grounded selfmodel in the human operator's nervous system. The human nervous system generates another virtual self-model which often includes an EAM. It has been optimized over millions of years of biological evolution and possesses unconscious as well as conscious content layers. Today, avatars are mostly causally coupled to the phenomenal self-model (PSM; Metzinger, 2003a, 2008) in human brains, but this may change in the future. First pilot studies (Cohen et al., 2012, 2014a,b) demonstrate that, via virtual or robotic re-embodiment, elements of VR can turn into dynamic components of extended selfrepresentation, which not only co-determine locally instantiated phenomenal properties in the human brain, but also enable historically new forms of action. Let us therefore look at novel, philosophically relevant affordances for action potentially provided by VR.

# Example 5: Walking Around in Your Own NCC With the Help of rt-fMRI-NCCF

In consciousness research (Metzinger, 1995, 2000), a standard background assumption is that in the domain of biological creatures and for every form of conscious content there exists a minimally sufficient neural correlate (NCC; see Chalmers, 2000, for a definition, and Fink, 2016, for a refined account). At every point in time, there will also be a minimally sufficient global NCC (see **Box 1**): the set of neurodynamical properties which fully determines the content of subjective experience at this very instant and which has no proper subset of properties that would have the same effect. Let me draw attention to, and at the same time propose, a highly specific application of VR technology here. One technological possibility that will be of great interest for philosophers would be a highly selective combination of VR and neurofeedback generated by real-time functional magnetic resonance imagining, but explicitly targeting the global NCC only.

Let us call this "rt-fMRI-NCCF" (see **Box 1**). This would be a variant of real-time fMRI-based neurofeedback, but employing VR technology and specifically targeting the neural basis of consciousness. Thus, a real-time fMRI representation of the global NCC would be directly converted into a virtual reality environment: a perceivable dynamic landscape which the conscious subject could passively observe, in which it could navigate, and with which it could then causally interact in entirely new ways. Of course, the VR-based form of rt-fMRI-NCCF I am proposing here would never be "real-time" in any more rigorous conceptual sense, but it would generate historically new forms of self-awareness and afford completely new types of phenomenological self-exploration, including a model-based control of one's own conscious experience (Flohr, 1989; Jacquette, 2014). At present, the only neurophenomenological configuration that comes close to rt-fMRI-NCCF is the stable lucid dream of a scientifically informed person. The stable lucid dream is a conscious state in which the experiential subject knows that everything it feels and sees is determined by the NCC currently active in the sleeping physical body, but unlike the potential rt-fMRI-NCCF, it lacks an external, technically realized feedback loop (Metzinger, 2003a, 2013c; Windt and Metzinger, 2007; Voss et al., 2014).

# Example 6: PSM-Actions

PSM-actions are all those actions in which a human being exclusively uses the conscious self-model in her brain to initiate an overt action. Of course, there will have to be feedback loops for complex actions, for instance, adjusting a grasping movement in real-time when seeing through the camera eyes of a robot (something still far from possible today). But the relevant causal starting point of the entire action is now not the flesh and bone body, but only the conscious self-model in our brain. In PSMactions, we simulate an action in the self-model—in the inner image of our body—and a machine performs it.

Such experiments are interesting for philosophers, because they touch conceptual issues like action theory, agentive selfconsciousness, free will, ethical responsibility, and culpability in a

FIGURE 1 | "PSM-actions": A test subject lies in a nuclear magnetic resonance tomograph at the Weizmann Institute in Israel. With the aid of data goggles, he sees an avatar, also lying in a scanner. The goal is to create the illusion that he is embodied in this avatar. The test subject's motor imagery is classified and translated into movement commands, setting the avatar in motion. After a training phase, test subjects were able to control a far remote robot in France "directly with their minds" via the Internet, seeing the environment in France through the robot's camera eyes. Figure with friendly permission from Doron Friedmann and Michel Facerias; written informed consent for publication of figure has been provided by Michael Facerias.

legal sense<sup>3</sup> . On one hand, it is obvious that the phenomenal selfmodel (PSM) often is a crucial part of a control hierarchy: it is an abstract computational tool for sensorimotor self-control. The PSM is a means to predict and monitor certain critical aspects of the process in which the organism generates flexible, adaptive patterns of behavior and also enables a degree of veto control. On the other hand, it is highly plastic: several representations of objects external to the body can transiently be integrated into the self-model. In tool use, a hammer or pliers could be such an object, but rubber hands can demonstrate that the whole process can also take place in a passive condition, by bottomup multisensory integration alone. For tool-use, "control by embedding" may be a general principle—tools are extensions of bodily organs that need to be controlled to generate intelligent and goal-directed behavior. Whenever the physical body is extended by sticks, stones, rakes, or robot arms, the virtual self-model must be extended as well. Only if an integrated representation of the body-plus-tool exists can the extended system of body-plus-tool in its entirety become part of the brain's predictive control hierarchy. How else could one learn to intelligently—i.e., flexibly and in a context-sensitive manner use a tool, without integrating it into the conscious self?

As I have explained elsewhere (Metzinger, 2003a, 2009a), human beings possess physical and virtual organs at the same time. The conscious self-model is a paradigm example of a virtual organ, allowing us to own feedback loops, to initiate control processes, and to maintain and flexibly adapt them. What is new is that whole-body surrogates now increasingly provide the human brain with new affordances for action, either as virtual avatars or as physical robots coupled to the virtual self-model in the biological brain. Some element of the expanded control circuit are physical (like the brain and tools), others are virtual (like the self-model and the goal-state simulation). Robots are physical tools; avatars are virtual bodies. It is therefore possible to transiently embed them into the PSM and thereby causally control them "directly out of one's own mind."

If this general perspective is correct, then we have a maximally parsimonious strategy to scientifically explain selfconsciousness without assuming an ontological entity called "the self." Prediction, testing, and explanation can take place in a much more parsimonious conceptual framework, namely, by introducing the concept of a "transparent selfmodel" (a conscious model of the person as a whole, which cannot be experienced as a model). VR technology is relevant because it offers instruments for experimental testing by selectively influencing different representational layers of the human self-model: bodily self-location, perspectivetaking, motion experience, affective self-representation, and so on. One role of VR researchers could be to develop new instruments by which causal dependencies and hypothetical double dissociations between such representational layers can be tested, demonstrated, and technologically exploited, for example by new clinical applications. The maximal realization or "perfect avatar" would be one in which the user can precisely select what aspects of his or her conscious self-model she wants to change by identifying with a digitally created self-representation.

Directly coupling a human PSM with an artificial environment is an example for a new type of consciousness technology, one that might even be called a "technology of the self " (section 4.3 in Gualeni, 2015). Currently the effects are still weak, and there are many technical problems. However, it is possible that technological progress will happen faster than expected. What would we do if systems for virtual or robotic re-embodiment became able to function fluidly, with many degrees of freedom, and in real-time? What new conscious states would become possible if one were also able to control feedback with the help of a computer-aided brain stimulation directly aimed at the user's self-model, again bypassing the non-neural body? What historically new forms of intersubjectivity and social cooperation could emerge if it were suddenly possible to simultaneously connect several human persons and their self-models via coupled brain computer interfaces, and perhaps even to merge them?

# SOCIAL AND POLITICAL PHILOSOPHY: THE DANGER OF COMPLEX SOCIAL HALLUCINATIONS

The number of contact points between VR technology and political philosophy is too large to even begin creating a short list. The convergence of VR and existing social networks may lead to new forms of machine-based manipulation in the formation of political will, novel threats to privacy and autonomy, and a belittlement of the actual political process outside of VR (section 3.2 in Gualeni, 2015). Perhaps most importantly, it is conceivable that what today we call "real life outside of VR" or "the real cultural/historical/political process unfolding in the actual world" would become increasingly experienced as just one possible reality among many others. This might incrementally lead to a dangerous trivialization of real-world suffering and an unnoticed, implicit relativism with respect to value judgements in the original sphere of social interaction (in which any VR technology is still grounded). I would like to term this risk "VR-induced political apathy," brought about by creeping psychological changes caused in users by a toxic form of mental immersion into a novel medium that originally held the promise to facilitate and enhance the democratic process. As Stefano Gualeni puts the point:

<sup>3</sup>Consider the following thought experiment adapted from Metzinger (2013a). Imagine you are lying in a scanner, controlling a robot at a distance, seeing through its eyes and even feeling motor feedback when its arms and legs move. Experientially, you completely identify with the robot, while at the same time you are moving freely in a situation in which also other human beings are present. Suddenly the new husband of your ex-wife enters the room. He is the person who, a few months ago, destroyed all your plans and your entire personal life. You again feel the mortification, deep hurt, sense of inner emptiness, and existential loneliness following the divorce. Spontaneously an aggressive impulse arises inside you, and almost simultaneously a brief, violent fantasy of killing him emerges. You try to calm yourself down, but before you can suppress the motor imagery that involuntarily went along with the violent fantasy in your conscious mind, the robot has already killed the man with one single, forceful blow. You regain control and are able to back away a few steps. Subjectively it feels as if you never had a chance to control your behavior. But how can one decide if you—from a purely objective perspective—perhaps still possessed the capability of suppressing the aggressive impulse, just in time? In an ethical sense, are you responsible for the consequences of the robot's actions?

The interactive experiences of virtual worlds, together with their characteristic combinatorial and procedural processes, can in fact be seen as both


Understood from the proposed perspective, all virtual worlds can be deemed as holding an implicit political relevance that is a derivation of their combinatorial, modular, and self-organizing constitution. Both the use and the design of virtual worlds as means of production are, thus, implicitly political activities (Gualeni, 2015, p. 129).

We may well live through a historical transition that we are only beginning to understand. In the beginning, avatars were just moving statues, models of bodies in time and space. As they have become more realistic, new features like the functionality of gaze-following or emotional expression via facial geometry have been added. It is interesting to note how even at this early stage of VR technology what I have termed "social hallucinations" (see **Box 1**) are emerging: In users, the phenomenology of "presence" can now be enhanced by a phenomenology of being socially situated. Users are confronted not only with virtual models of other bodies, but with actual selves—other agents who are autonomous subjects of experience, mutually sharing an intersubjective phenomenology of presence. The classical "place illusion" (section 1.3 in Slater and Sanchez-Vives, 2016) is now complemented and strengthened by an "other-minds illusion."

My first point is that such social hallucinations will soon become increasingly sophisticated and thus much stronger. One research target is the interaction of the place illusion with the other-minds illusion: how strong is the causal interdependence between spatial immersion and social immersion? If we imagine human users communicating with advanced AI systems in natural language and via anthropomorphic (or at least person-like) interfaces, then we will soon reach a stage where unconscious machines are automatically modeled as independent cognitive agents by the human brain. We may have no control over this process. If so, virtual agents will automatically be experienced as thinkers of thoughts, and the human brain will inevitably begin to predict their behavior as belonging to systems possessing high-level psychological properties like episodic memory, attentional control, and selfconsciousness. The combination of VR and AI may therefore lead to a situation in which VR-based anthropomorphic interfaces begin to target the naturally evolved modules for social cognition and agent detection in the biological brains of their human users in intelligent and ever more successful ways. Self-optimizing, but entirely unconscious AI/VR-systems might discover that it simply is most efficient to be perceived as self-aware cognitive agents by humans, consequently creating robust and complex social hallucinations as a new phenomenological foundation for man-machine communication.

For empirical researchers in the field of social cognition, this will be of great interest, because it allows for highly innovative and precisely controllable forms of experimental design. The maximal model would be one in which the user's other-mind illusion can be created by every virtual entity she encounters during her VR experience. For philosophers, the impact will extend beyond obviously relevant classical topics like the otherminds problem, social ontology, or political philosophy. The combination of social VR and AI will also touch many issues in applied ethics, including: What is the proper ethical assessment of deliberately causing social hallucinations in human users? Are there ethically recommendable, non-paternalistic applications of VR-based other-minds illusions (see section Applied Ethics)?

# Example 7: The "Postbiotic Social Boot-Strapping Scenario" (PSBS)

Let me end this section by briefly describing what I think is the most interesting conceptual possibility from a philosophical perspective. I call it the "postbiotic social boot-strapping scenario" (PSBS; see **Box 1**), and it would again involve a combination of VR and AI.

Let us assume that future AI systems have begun using avatars—VR-based person-like interfaces—to communicate with humans. Non-persons communicate with persons via personmodels. Through advanced user modeling those systems will have learned how to cause the most reliable and robust social hallucinations in their users, thereby optimizing their overall functionality. Now the crucial assumption behind the PSBS is the logical possibility that such combined AI/VR-systems begin to mutually cause social hallucinations in each other. This would occur by the systems applying the algorithms they originally developed for man-machine interaction to machinemachine communication, but still using individual virtual avatars as their interfaces. What I call the "social boot-strapping scenario" would begin when such systems attempt to cause social hallucinations in other AIs as well as in humans. It is conceivable that the continued optimization of combined AI/VR-systems would generate second-order virtual agents (see **Box 1**), that is, virtual entities that not only possess an internal model of themselves in order to control their behavior, but also harbor a functionally adequate misrepresentation of themselves as being socially situated. A first-order virtual agent would be controlled by an AI that uses it as a communication interface. A secondorder virtual agent would be driven by a different self-model: it would falsely represent other AIs/virtual agents as real, selfconscious entities. It would represent itself as standing in genuine social relations—as a genuine subject embedded in a network of intersubjective relationships. It is also plausible to assume that such forms of functionally adequate misrepresentation might make groups of virtual agents and groups of interacting AI systems much more efficient—the added explicitly social layer of self-optimization could enable a new level of complexity that would serve to gradually improve the intelligence of the newly emerged overall system. Such second-order agents would therefore not only cause robust social hallucinations in their human users, but also in the AIs controlling them. Groups

of such intelligent virtual agents using personoid avatars as their interface or "outward appearance" would instantiate a new property—"virtual intersubjectivity"—by drawing on and mimicking algorithms and neural mechanisms which first appeared in the psychological evolution of biological organisms, were later optimized in man-machine communication, and are now virtually implementing certain types of social cognition and functionally adequate forms of self-deception in postbiotic systems.

Therefore, they would necessarily begin to represent each other as sharing a common Lebenswelt. A virtual Lebenswelt or "life-world"—is a pre-given social world in which subjects experience themselves as being united by a primordial quality of "togetherness," as inhabiting a universe which is no longer "merely virtual," but rather intersubjectively given. A Lebenswelt is co-constituted by a shared first-person plural perspective, by a mentally represented "we," and is therefore absolutely real and self-evident for every individual virtual agent. To put the point differently, while human beings might still describe the internal social context generated by the interaction of combined AI/VRsystems as "virtual" or "simulated," these systems themselves might evolve a fully transparent representation of their own lifeworld and accordingly arrive at very different epistemological conclusions about the social context in which they evolve.

This is an example of a new field where philosophers working on theories of social cognition and intersubjectivity could very fruitfully interact with researchers in VR, creating simulated "toy societies." Here is the most provocative question: Is our own, human Lebenswelt ultimately a biologically evolved variant of the PSBS? Is it based on functionally adequate misrepresentations enabling biological organisms to "hallucinate selfhood into each other"? Can we model the relevant transition in virtual agents? Should we attempt to do this, or would we be ethically required to relinquish such research pathways altogether? Clearly, such research may be ethically problematic, because it may lead to artificial suffering or a dangerous and irreversible intelligence explosion in autonomously self-optimizing social systems.

# APPLIED ETHICS

Less dramatically, VR technology has the potential to increasingly change what many philosophers, including Edmund Husserl and Jürgen Habermas, have traditionally called the "lifeworld" of human beings. As explained above, a life-world is partly constituted by a prescientific, collective phenomenology of intersubjectivity. This underlying phenomenology in turn gives rise to apparently self-evident cultural systems and normative orders which attempt to give a meaning to life and to the shared social institutions that stabilize patterns of collective action. These patterns causally influence psychological properties, determine the content of seemingly individual cognitive processes, and may even shape our personality structure. Elsewhere, I have argued that because of this, VR technology will function as a new cognitive niche to which the human mind will adapt:

What is historically new, and what creates not only novel psychological risks but also entirely new ethical and legal dimensions, is that **one** VR gets ever more deeply embedded into **another** VR: the conscious mind of human beings, which has evolved under very specific conditions and over millions of years, now gets causally coupled and informationally woven into technical systems for representing possible realities. Increasingly, it is not only culturally and socially embedded but also shaped by a technological niche that over time itself quickly acquires a rapid, autonomous dynamics and ever new properties. This creates a complex convolution, a nested form of information flow in which the biological mind and its technological niche influence each other in ways we are just beginning to understand. It is this complex convolution that makes it so important to think about the Ethics of VR in a critical, evidence-based, and rational manner (Madary and Metzinger, 2016, p. 20).

VR technology poses many new problems for applied ethics, ranging from unexpected psychological risks to military applications. Rather than exploring the ethical and sociocultural ramifications of VR here, I instead refer readers to the first Code of Ethical Conduct Michael Madary and I developed (Madary and Metzinger, 2016). Applied ethics is a prime example of another domain of philosophical research that is of highest relevance for researchers in the field of VR, consumers, and policy-makers alike.

# PHILOSOPHY OF RELIGION

VR, if applied as a conceptual metaphor in different domains of inquiry, possesses great heuristic fecundity. We have already seen that there are considerable commonalities linking VR and the phenomenon of conscious experience (section Example 1: Consciousness). Religious faith is another example of a domain in which unexpected analogies can be discovered. Religious faith dramatically changes the model of reality under which a human being operates, because it installs or superimposes a new virtual ontology. We can view the evolution of religion as an evolution of pre-technological augmented reality (AR) systems aimed at expanding the phenomenology and motivational structure of human beings. Can VR and AR be used as fruitful conceptual metaphors for the philosophy of religion? Let us take a look.

# Example 8: Having Faith as Biosocially Evolved Augmented Reality

In standard situations, the perceptual phenomenology of human beings is largely determined by top-down predictions colliding with the sensory input generated by continuous embodied interaction with an external environment (Friston, 2010). Augmented reality adds an environmental layer that is invisible for others, superimposing a new and additional set of priors onto the conscious subject's model of reality. A novel perspective on organized religion emerges from this: religious-belief-as-enculturated-augmented-reality, where religion is a set of representational functions, originally realized by cultural practices like burial rites, ancestor cults, prayers, sermons and increasingly complex rituals. Three obvious and well-documented adaptive advantages provided by this set of functions are (a) offering a viable psychological strategy for mortality-denial, (b) increasing social cohesion in the context of in-group/out-group conflicts, and (c) the stabilization of existing social hierarchies. As a crude analogy, religious faith is like a metaphysical version of Pokémon Go: it populates the subject's life-world with invisible beings like Gods, angels, and spirits, thereby causally enabling new forms of social hallucination and self-deception (Trivers, 2000). Within a given evolutionary or cultural context, such virtual expansions of a pre-given conscious model of reality may prove to be functionally adequate. Having a religious faith augments an agent's subjective reality, and it often motivates in-group prosocial behavior.

Another example of a related new concept is "transreality gaming" (e.g., Lindley, 2004). Transreality gaming describes a type or mode of gameplay that combines playing a game in a virtual environment with game-related, physical experiences in the real world and vice versa. In this approach, a player evolves and moves seamlessly through various physical and virtual stages, brought together in one unified game space. Let me ask a question that may, initially, sound overly provocative, but which may later prove to possess great heuristic potential: Is religious practice a form of transreality gaming? Are religious rituals like funerals, ancestor cults, or prayer not attempts at an integration of virtual worlds with a biologically grounded life-world? The notion of "transreality gaming" also gives us a new way of looking at what a conscious human being really is: a biological organism that has evolved into a "transreality interaction platform" by enabling causal interactions across physical and virtual reality (Martin and Laviola, 2016). From this perspective, religious practices are a particularly interesting special case of this general principle, governing and stabilizing the "player's" day-to-day interactions with their social environment as well as with their own mind.

Above, I have experimentally framed the evolution of religion as the evolution of augmented reality systems aimed at expanding the phenomenology of human beings, hypothetically later enabling successful, scalable cooperation in ever larger groups. I have provisionally defined it as a set of representational functions, originally realized by externalized cultural practices like burial rites, ancestor cults, prayers, sermons and increasingly complex rituals, later internalized into the minds of individual agents. This new perspective, leads to a whole range of interesting questions of whether the same set of functions could also be technologically implemented. Could there be religious practice in VR? Would it count as valid from a theological perspective? What about the technological implementation of an individual "VR heaven," where users encounter a medial environment allowing them to interact with their own ideal self, with an impersonal ideal observer, or with virtual angels, saints, and deities? Could there be "VR churches" giving an individual user a comparable phenomenology and the same psychological effects as real social interactions in an embodied religious context? Can there be technologically mediated "virtual rituals" serving basically the same—or historically new—functions? If so, Slater's and Sanchez-Vives' programmatic idea of "Enhancing Our Lives with Immersive Virtual Reality" could even be extended to the sphere of religious practice.

Please note how the "convolution principle" introduced in section Applied Ethics still holds. Again, what is new, and what creates not only novel psychological risks, but also entirely new soteriological dimensions, is that one virtual reality gets ever more deeply embedded into another virtual reality: The conscious mind of human beings, which has evolved under very specific conditions over millions of years might become causally coupled and informationally woven into technical systems for representing possible realities—and these could even be of a religious type. Now, the religious mind is not only culturally and socially embedded, but also shaped by a technological niche, a niche that over time quickly acquires rapid, autonomous dynamics and ever new properties. This creates a complex convolution, a nested form of information flow in which the transcendence-seeking mind and its technological niche influence each other in ways we are just beginning to understand. Religious practice in VR could be one of these ways.

# VR-PHENOMENOLOGY IN THE CONTEXT OF COMPARATIVE AND TRANSCULTURAL PHILOSOPHY

The VR-experience has a distinct and unique phenomenological profile. What we currently lack is not only a philosophical meta-theory for VR-phenomenology, but suitable conceptual instruments that help us bring out the essence of what really makes conscious experience in VR so interestingly different. At the same time, an important and strongly growing area of philosophy is comparative philosophy, which aims at bringing together and perhaps even integrating philosophical traditions that have developed in relative isolation from one another and that are defined quite broadly along cultural and regional lines (Wong, 2017). I will conclude this contribution by very briefly pointing to a way in which a central concept of Buddhist philosophy—namely, suññata¯—could be conceptually connected to a philosophical metatheory of virtual reality.

# Example 9: The Phenomenology of Emptiness and Virtuality

Depending on doctrinal context, the Buddhist notion of "emptiness" or "voidness" has many different meanings. Buddhist metaphysics is radically anti-substantialist and anti-essentialist. In a nutshell, this means that entities are conceptually analyzed as being devoid of inherent existence and as lacking any form of "true inner nature"; what in Western traditions has often been simply called "reality" actually is characterized by metaphysical "hollowness" and indeterminacy as to existence vs. non-existence. My first point here is that exactly the same is true of so-called "virtual objects" and other entities like properties, whole situations, or simulated selves as represented in VR. They are not ontologically self-subsistent (i.e., they cannot independently "stand" or independently hold themselves in existence), and they have no self-sustaining, enduring, or essential inner nature beyond the present moment and the ongoing process of being virtually represented as such. They depend not only on a complex network of functional relations implemented in a given computational system, but also on this preexisting network being causally coupled to the physical brain of a user already endowed with consciousness and selfconsciousness. Entities in VR are a paradigmatic example of what Buddhist metaphysics would call "dependent origination": impermanent phenomena arising out of a fluid dynamic of causal interrelatedness.

Interestingly, there is a semantic connection linking the classical Pali term suññata¯ to the concept of "virtuality" (stemming from the late-medieval scholarly neologism virtualis, which in turn preserves elements like "potentiality" and "latency of possibilities" characterizing the original Aristotelian notion of "dynamis"). To see this partial, but philosophically relevant overlap, it is particularly helpful to focus not only on the metaphysical, but also the phenomenological reading of suññata¯, an absolutely central and classical term, which has been a cornerstone of Buddhist philosophy over many centuries (Williams, 2008).

With a minimalist sketch of the metaphysical background already in hand, let us therefore proceed to the phenomenological level of analysis. The VR-experience has a unique phenomenological profile which is another excellent example of a potential future target for interdisciplinary research, and comparative philosophy may actually help us to see the relevant features more clearly. The phenomenological reading of "emptiness" refers to a specific contemplative mode: a way of consciously experiencing the world and the process of knowing this world as inherently selfless (anatta). "Seeing out of emptiness" is a specific mode of phenomenally experiencing the world as not seen by a self-as-subject, an ancient meditative practice, in the words of Jiddu Krishnamurti, of "observing without an observer" (Krishnamurti, 2010). The phenomenological reading of suññata¯ also includes a mode of perception in which one neither adds anything to nor takes anything away from what is present, thereby, as it were, "directly seeing" the qualities of suchness, interrelatedness, and impermanence. In this way, the phenomenological reading of "emptiness" refers to a specific mode of conscious experience that can be described as a choiceless form of pure awareness. This mode does not involve an agentive phenomenal self, and things are experienced neither as real nor as unreal. Therefore, this way of seeing also bears multiple and subtle relations to what, in Western phenomenology, has been described as the "bracketing" of an explicit existence assumption when philosophically investigating a specific content of consciousness, and is associated with technical terms like epoché, "eidetic reduction" or "phenomenological reduction" (Beyer, 2016).

Obviously, I am not saying that suññata¯ describes the phenomenology of a standard VR user today. In VR, there is clearly a phenomenal self, and if the place illusion has been successfully created, the experience of actually "being there" can be transparent and subjectively robust (Blanke and Metzinger, 2009). But the second element, the subjective experience of an environment in VR as being "neither real nor unreal" describes the phenomenology of VR very well. My second, phenomenological, point is that what makes VR phenomenology so special is the subjective quality of metaphysical indeterminacy. The claim is that VR-phenomenology is characterized by a phenomenology of metaphysical indeterminacy, meaning that objects and environment in VR are subjectively experienced as neither existing nor non-existing. Put differently, Buddhist philosophy may actually offer the conceptual instruments to describe the properties of interest from a more fine-grained phenomenological perspective on what is most interesting about an immersive VR experience. For many elements of subjectively experienced VR, there really is a distinct phenomenal quality of ontological neither-nor-ness: It is not the case that subjectively experienced elements of VR are either phenomenally real or phenomenally unreal. Phenomenologically, virtuality is emptiness if we describe it as an explicit phenomenal experience of metaphysical indeterminacy. I would like to submit that this is a core aspect of what is philosophically interesting about VR phenomenology and what distinguishes it from ordinary waking states. It is therefore noteworthy that Buddhist philosophy may have already given us the conceptual instruments to describe this in a much clearer and heuristically fruitful way.

Of course, things become much more complicated if we include augmented reality setups, and the constantly changing landscape and temporal distribution of phenomenal opacity versus phenomenal transparency into our investigation. The phenomenology of metaphysical indeterminacy is not all-pervading, it is variable and impermanent. But one may speculate that in the future we might, via controlled experimentation, use VR technology itself to make progress on philosophically relevant issues like these. For example, one might think of "contemplative" types of VR technology that explicitly aim at enhancing our lives by making the phenomenal quality of metaphysical indeterminacy more robust, then extending it to the place illusion and the sense of self. But as we have seen, many other options are now on the table. Perhaps the most interesting promise of VR technology lies in supporting rational, evidence-based and empirically informed research programs in philosophical phenomenology, with philosophers in turn providing some conceptual foundations and proposing novel research targets for the VR community.

# SUMMARY

As pointed out in the introduction, this article was mainly intended to be a source of inspiration for an interdisciplinary audience. Contact points and potential future directions for interdisciplinary cooperation between different subdisciplines of philosophy and VR research have been explored, through a series of concrete examples and possible research projects. The areas explored were:


# REFERENCES


# AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

# FUNDING

This work was funded by a 5-year Fellowship awarded by the Gutenberg Research College, Johannes Gutenberg-Universität Mainz.

# ACKNOWLEDGMENTS

I wish to thank Robert Clowes, Lucy Mayne, and Wanja Wiese for helpful critical discussion, Lucy Mayne and Wanja Wiese for editorial and technical help with the manuscript.


Windt, J. M. (2010). The immersive spatiotemporal hallucination model of dreaming. Phenomenol. Cogn. Sci. 9, 295–316. doi: 10.1007/s11097-010-9163-1


state?," in Praeger Perspectives. The New Science of Dreaming, Vol. 3, Cultural and Theoretical Perspectives, eds D. Barrett and P. McNamara (Westport, CT: Praeger Publishers; Greenwood Publishing Group), 193–247.

Wong, D. (2017). "Comparative philosophy: Chinese and Western," in The Stanford Encyclopedia of Philosophy, ed E. N. Zalta (Spring 2017 edition). Available online at: https://plato.stanford.edu/archives/spr2017/ entries/comparphil-chiwes/

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Metzinger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Immersive VR and Education: Embodied Design Principles That Include Gesture and Hand Controls

Mina C. Johnson-Glenberg\*

*Department of Psychology, Arizona State University, Tempe, AZ, United States*

This article explores relevant applications of educational theory for the design of immersive virtual reality (VR). Two unique attributes associated with VR position the technology to positively affect education: (1) the sense of presence, and (2) the embodied affordances of gesture and manipulation in the 3rd dimension. These are referred to as the two profound affordances of VR. The primary focus of this article is on the embodiment afforded by gesture in 3D for learning. The new generation of hand controllers induces embodiment and agency via meaningful and congruent movements with the content to be learned. Several examples of gesture-rich lessons are presented. The final section includes an extensive set of design principles for immersive VR in education, and finishes with the *Necessary Nine* which are hypothesized to optimize the pedagogy within a lesson.

#### Edited by:

*Massimo Bergamasco, Scuola Sant'Anna di Studi Avanzati, Italy*

#### Reviewed by:

*George Papagiannakis, Foundation for Research and Technology Hellas, Greece Ken Livingston, Vassar College, United States*

#### \*Correspondence:

*Mina C. Johnson-Glenberg minaj@embodied-games.com; mina.johnson@asu.edu*

#### Specialty section:

*This article was submitted to Virtual Environments, a section of the journal Frontiers in Robotics and AI*

Received: *17 March 2018* Accepted: *14 June 2018* Published: *24 July 2018*

#### Citation:

*Johnson-Glenberg MC (2018) Immersive VR and Education: Embodied Design Principles That Include Gesture and Hand Controls. Front. Robot. AI 5:81. doi: 10.3389/frobt.2018.00081* Keywords: immersive virtual reality, embodiment, gesture, stem education, mixed reality, VR, educational design, XR

"Movement, or physical activity, is thus an essential factor in intellectual growth, which depends upon the impressions received from outside. Through movement we come in contact with external reality, and it is through these contacts that we eventually acquire even abstract ideas."

(Montessori, 1966)

# THE TWO PROFOUND AFFORDANCES

In the early 1930's, Dr. Montessori understood that learning relied on how our physical bodies interacted with the environment. For her, the environment was physical. Today, we are able to digitize our environments and the affordances approach infinity. For several decades, the primary interface in educational technology has been the mouse and keyboard; however, those are not highly embodied interface tools (Johnson-Glenberg et al., 2014a). Embodied, for the purposes of education, means that the learner has initiated a physical gesture or movement that is wellmapped to the content to be learned. As an example, imagine a lesson on gears and mechanical advantage. If the student is tapping the s on the keyboard to make the gear spin that would be considered less embodied than spinning a fingertip on a screen to manipulate a gear with a synchronized velocity. With the advent of more natural user interfaces (NUI), the entire feel of digitized educational content is poised to change. Highly immersive virtual environments that can be manipulated with hand controls will affect how content is encoded and retained. One of the tenets of the author's Embodied Games lab is that doing actual physical gestures in a virtual environment should have positive, and lasting, effects on learning in the real world. Tremendous opportunities for learning are associated with VR (Bailenson, 2017) and one of the most exciting aspects of VR "is its ability to leverage interactivity" (Bailenson et al., 2008).

Immersive and interactive VR is in its early days of educational adoption. It will not prove to be a panacea for every disengaged student (as is sometimes stated in the popular press), nor do we expect future scholars to spend entire days in virtual classrooms (see fiction by Cline, 2011). However, now that some of VR's affordability and sensorial quality issues are being addressed, it is reasonable to believe that VR experiences will become more ubiquitous in educational settings. When the demand comes, the community should be ready with quality educational content. There are few guidelines now for how to make optimal educational content in VR, thus this theory article ends with several concrete design principles.

Two attributes of VR may account for its future contributions to education. These we call the two profound affordances. The first profound affordance is the **feeling of presence** which designers must learn to support, while not overwhelming learners. The sense of presence is fairly well understood at this point. Slater and Wilbur (1997) describe it as the feeling of being there. The second profound affordance pertains to **embodiment and the subsequent agency associated with manipulating content** in three dimensions. Manipulating objects in three dimensional space gives a learner unprecedented personal control (agency) over the learning environment. This article focuses on how gesture and the use of hand controls can increase agency and learning. The basis for this prediction is the research on embodiment and grounded cognition (Barsalou, 2008). Although other methods for activating agency can be designed into VR learning environments (e.g., using eye gaze and/or speech commands), it may be the case that gesture plays a special role. Gesture kinesthetically activates larger portions of the sensori-motor system and motoric pre-planning pathways than the other two systems and gesture may lead to stronger memory traces (Goldin-Meadow, 2011). Another positive attribute of engaging the learner's motoric system via the hand is that it is associated with a reduction in simulator sickness (Stanney and Hash, 1998) 1 .

VR for education should take full advantage of 3D object manipulation using the latest versions of handheld controllers (as well as, gloves and in-camera sensors to detect joints, etc.). Gathering and analyzing gestures in 3D is an area in need of more research and evidence-based design guidelines (Laviola et al., 2017). Because randomized control trials (RCT) are just starting to be published on immersive VR in education, this article is primarily theory-based. The goal of this article is to share some of what has been learned about embodiment in mixed reality platforms for education, and to produce a set of design principles for VR in education to assist this nascent field as it matures.

# Vocab Lesson: VR, Presence, and Agency

In this article, the term VR refers to an immersive experience, usually inside a headset where the real world is not seen for 360◦ . (We do not focus on CAVES as the cost precludes large scale adoption in the K-16 education arena. In addition, it is probably more embodied to see virtualized body parts, as is common in headset experiences.) In VR, the learners can turn and move as they do in the real world, and the digital setting responds to the learner's movements. Immersive VR systematically maintains an illusion of presence, such that learners feel their bodies are inside the virtual environment. Being able to see evidence of the real world, even in the periphery, would mean the platform should be deemed augmented or mixed reality (AR/MR). A three dimensional object or avatar moving on a regular-sized computer monitor is never "VR"; we hope that educators soon stop conflating the terms and phenomena.

The term presence is also defined in the glossary of a recent book dedicated to VR and education (Dede and Richards, 2017). Presence is a "particular form of psychological immersion, the feeling that you are at a location in the virtual world." The sensations are reported to be quite visceral. It is true that the sensation of being on location and unmindful of real world cues can occur even when users interact with "low immersion environments" (e.g., on a smartphone), but the content must be extremely engaging. In a full immersion headset experience, the feeling of being in a different location is systematic and usually instantaneous. The presence associated with VR is one of the most immediate and best documented phenomena. Thus, presence is deemed the first profound affordance of VR. Several surveys are available for assessing the amount of presence in a mediated experience (Slater and Wilbur, 1997; Makransky et al., 2017). Slater's lab has led extensive research on presence and his group has also pioneered a method for assessing presence without the use of surveys (Bergstrom et al., 2017).

Immersive VR has the ability to immediately transport the user to a limbically heightened emotional space that can have positive effects on attention and engagement; this is one reason why educators believe that learning will be positively affected. The Google Expeditions series relies on presence to immediately engage learners. A recent exploratory study explicitly states that the presence afforded by the 3D technology "opens up" the senses and mind for learning (Minocha et al., 2017). Minocha et al. further hypothesize that because the students are in control of where they look and for how long, they can then follow ". . . their interest and curiosity, hence giving them a sense of control and empowerment over their own exploration." Whenever users feel they have control over the environment, they experience agency.

#### The Second Profound Affordance

Is it the case that learning in 3D is always better than in 2D? Will the learner acquire knowledge faster and show better retention? This is a vital question that deserves further research. Jacobson (2011) believes the answer is yes, at least when the learning relates to skill acquisition. For example, middle school students recalled more declarative knowledge,

<sup>1</sup> Stanney and Hash (1998) conducted a three-way RCT with hand controls: passive, a mixture condition, and an active condition. In the active condition, participants had full control over movement in the VR space including pitch and roll, movements which were not needed for several of the, admittedly simplistic, tasks. They found that fewer symptoms of simulator sickness were reported in the mixture condition. Thus, targeted control was best.

i.e., symbols and spatial layout, after experiencing a threedimensional ancient Egyptian temple presented in a dome environment, compared to a desktop version with the same information (Jacobson, 2011). However, the multi-user dome and the Expeditions Cardboard experiences are examples of virtual environments where hand controls and gesture were not used. The ability to control movement via gaze is one form of agency, but the ability to control and manipulate objects in the 3D environment is perhaps a different and deeper form of agency with many more degrees of freedom. The hypothesis is, the more agentic the learning, the better. Here we use the term agency to connote the user has individual (selfinitiated) control and volition over the individual virtual objects in the environment. In the educational field, this definition of agency would reside under "self-directed constructs" in the Snow et al. (1996) provisional taxonomy of conative constructs in education.

The idea of agency is baked into the second profound component of VR. The newest generations of VR includes synced hand controls, so that gesture and manipulating objects in VR with an NUI keeps becoming more affordable. Our prediction is that hand controls will have long lasting effects on the types of content and the quality of the pedagogy that can be designed into educational spaces. Jang et al. (2016) utilized a yokedpair design, such that one participant manipulated a virtualized 3D model of the inner ear, while another participant viewed a recording of the interaction. Results indicate that participants in the manipulation group showed greater posttest knowledge (via drawing) than the group that observed the manipulations. The manipulation (with a joystick-like device) is a form of gestural control that affords agency, and to understand how these constructs interact a clearer definition of embodiment is in order. Our research communities are in the early stages of exploring the affordances of VR and principles for design in education are also needed.

# Embodiment

Proponents of embodiment hold that the mind and the body are inextricably linked (Wilson, 2002). A compelling example of how the body's actions give rise to meaning comes from Hauk et al. (2004); they used functional magnetic resonance imaging (fMRI) to measure activation in regions of interest as participants listened to action verbs such as lick, pick, and kick. The researchers observed significantly more somatotopic activation of the premotor and motor cortical systems that specifically control the mouth, the hands, and the legs (respectively). These overlearned words, which were first experienced and mapped to meaning in childhood, are still activating specific motor areas in the adult brain. This is intriguing because it suggests that active, motor-driven concepts may stimulate distributed semantic networks (meaning), as well as the associated motor cortices which would have been used to learn long ago, in childhood. Semantics is part of an active learning system in humans. The way human and environmental systems work together to navigate the world is also termed "enactive cognitive science" by Varela et al. (1991; revised 2016). Varela et al. offer an eloquent description of how cognition can be viewed as an "interconnected system of multiple levels of sensori-motor subnetworks" (p. 206)<sup>2</sup> .

Embodied learning theory has much to offer designers of VR content, especially when the hand controls are used. The strong stance on embodiment and education holds that the body should be moving, not just reading or imaging, for a high level of embodiment to be in a lesson (Johnson-Glenberg, 2017; Johnson-Glenberg and Megowan-Romanowicz, 2017). When a motoric modality is added to the learning signal, more neural pathways will be activated and this may result in a stronger learning signal or memory trace. Several researchers posit that incorporating gesture into the act of learning should strengthen memory traces (Broaders et al., 2007; Goldin-Meadow, 2011). It may be the case that adding more modalities to the act of learning (beyond the usual visual and auditory ones) will continue to increase the strength of the memory trace. The modality of interest in this article is gesture. This article uses the term gesture to mean both the movement as a communicative form and the action used to manipulate virtual objects in the VR environment. The gestureenhancing-the-memory trace argument can also be framed as one of levels of processing, which is a well-studied concept in cognitive psychology (Craik and Lockhart, 1972). The concept of "learning by doing" is also relevant to this article and is supported by the self-performed task literature in the psychology arena (see Engelkamp and Zimmer, 1994). They found that when participants performed short tasks, the task-associated words were better remembered compared to conditions where the participants read the words, or saw others perform the tasks.

Research on non-mediated forms of gesture in the educational arena has been fruitful. As an example, when teachers gesture during instruction, students retain and generalize more of what they have been taught (Goldin-Meadow, 2014). Recently, Congdon et al. (2017) showed that simultaneous presentation of speech and gesture in math instruction supports generalization and retention. Goldin-Meadow (2011) posits that gesturing may "lighten the burden on the verbal store" in a speaker's mind, and that gesturing may serve to offload cognition (Cook and Goldin-Meadow, 2006). Research supports that gestures may aid learners because learners use their own bodies to create an enriched

<sup>2</sup> If cognition is conceived of as inputs and outputs interleaving with internal and external states, what happens during times of conflict? How do we resolve when the internal state does not match the external state? Perceptual Control Theory (PCT) (Bourbon and Powers, 1999) is framed in terms of the organism and the environment's interconnected system. There are inputs that determine the organism's actions, which again determine the input states. Thus, "there are two simultaneous relationships: (a) an observation that stimulus inputs depend on an interaction between behavioral outputs and independent events in the environment, and (b) a conjecture that behavioral outputs depend on an interaction between actual stimulus inputs (as perceived, not necessarily as they really are). . . " p. 446. The phrase "as perceived" is important, because the topic is VR. Some is known about what happens physiologically when there is a mismatch between reality (e.g., the state of the inner ear) and a player's visual perception (e.g., flying swiftly through the clouds), and the result is usually simulator sickness. There is an eagerness to explore what happens in terms of long term cognitive change (i.e., "learning") when the mismatch continues over time. Future theories should further explore what happens when we place people in fantastic, fully immersive environments that are perceived of as very real. In the upcoming years, our community needs to hone in the most applicable theories and run RCT's to verify best pedagogies for teaching with VR technologies.

representation of a problem grounded in physical metaphors (see Hostetter and Alibali, 2008; Alibali and Nathan, 2012; Nathan et al., 2014).

Several researchers also highlight that the gesture, or movement, should be congruent to the content being learned (Segal et al., 2010; Black et al., 2012). That is, the gesture should map to the instructed concept. For example, if the student is learning about the direction and speed of a spinning gear, then it would be important for the student's spinning hand gesture to go in the same direction and near the same speed as the virtual gear on screen (Johnson-Glenberg et al., 2015). An example of a low congruence gesture would be a "push forward" gesture to start a gear train spinning (the "push" is a default gesture for the KinectTM sensor). Glenberg and Kaschak (2002) explore the effects of gesture and embodiment by varying the direction of button pushes in a sentence sensibility judgment task. If the button push action was away from the body and the sentence text was congruent to motion (i.e., "Give the pencil to X."), then the reaction time to judge sensibility was significantly faster. Action congruent sentences were judged faster than the action incongruent sentences. As a final example of the importance of congruence, in a study by Koch et al. (2011), participants reacted faster in a Stroop task when using congruent gestures. The congruent gestures involved making an up movement attached to a word like "happy," compared to the more incongruent downward gesture.

One hypothesis is that when learners are activating congruent and associated sensori-motor areas, they may learn the content faster and in a deeper manner. Gestures may provide an additional code for memory (again, strengthening the trace) as well as adding additional retrieval cues. Learners with stronger memory traces should do better on post-intervention tests. Work in the physics education domain supports the hypothesis that being active and engaging the body during encoding positively affects learning. In a recent Kontra et al. study, participants were randomly assigned to one of two roles in a learning dyad, either active or observant (Kontra et al., 2015). Participants who were active and physically held bicycle wheels spinning on an axle learned more about angular momentum compared to those who observed the spinning wheels. In an extension of Kontra's lab study, fMRI revealed that the level of the BOLD signal in the brain motor regions of interest (left M1/S1) significantly predicted content knowledge test performance (Kontra et al., 2015). The study of effects of activating motoric regions via gesture is a field of great interest for embodiment researchers and educators.

With the advent of VR hand controls, where human hand gestures can be transformed into near-infinite outcomes, it would be helpful to have a set of best practices for creating gesturebased educational VR content. Recently, it appears the term "embodiment" is being used in the VR research field to mean "a perceptual illusion, . . . the body ownership illusion" referring to one's avatar on screen (Bertrand et al., 2018). If this broad, human-to-avatar–body-swap definition of embodiment takes hold, then perhaps gesture would be considered a sub-type of VR embodiment. It remains to be seen how the term will evolve, but clearly a taxonomy would be helpful.

# Taxonomy of Embodiment for Education in VR

As with all theories, there are inclusive (weak) ones that start the spectrum, and exclusive (strong) ones that end it. One inclusive theoretical stance on embodied learning would be that any concept that activates perceptual symbols (Barsalou, 1999) is by its nature embodied. Following this stance, all cognition is embodied because early, original knowledge is gained via the body and its interactions with the environment, even new concepts that are later imagined. The environment's affordances (Gibson, 1979) shape and constrain how our bodies interact, ergo, cognition continues to be formed and expanded by these interactions. In an inclusive interpretation, according to some researchers, cognition would be broadly defined to include all sensory systems and emotions (Glenberg, 2010; Glenberg et al., 2013). A more exclusionary stance is one that distinguishes between low and high levels of embodiment. For a lesson to be deemed highly embodied, the learner would need to be physically active; the learner would have to kinesthetically activate motor neurons. Some principles for designing embodied education into MR platforms have been suggested (Lindgren and Johnson-Glenberg, 2013), and AR design principles have been proposed (Dunleavy, 2014); however, there are no design guidelines for VR that are based on embodiment. Given the new affordances of VR hand controls, it seems time to reframe some of this lab's previous embodied principles.

A more exclusionary definition of embodiment for education was proposed by this lab in 2014 (Johnson-Glenberg et al., 2014a) and updated recently (Johnson-Glenberg and Megowan-Romanowicz, 2017). That taxonomy posited four degrees of embodiment based on three constructs: (a) amount of sensorimotor engagement, (b) how congruent the gestures were to the content to be learned, and (c) amount of "immersion" experienced by the user. Each construct will be expanded upon.

### Sensori-Motor Engagement

In terms of sensori-motor engagement via gesture (construct a), the first distinction relates to the magnitude of the motor signal. This means that walking or large arm movements activate more sensori-motor neurons than standing or swiping a finger across a screen. The magnitude of the movement should probably be part of the metric, but it is perhaps less important than whether the gesture is well-matched to the content to be learned (construct b). A small, yet highly congruent movement may be just as effective as a large one that is only loosely related to the learning concept. That is an experiment that needs to be conducted.

### Congruency of the Gesture

Construct b refers to the congruency of the gesture, that is, the movement should be well-mapped to the concept to be learned. The gesture should support the gist of the content and give meaningful practice to the learning goal; however, the movement need not be a perfect isomorphic match. In the spinning gears example, a mediated lesson was created to instruct in mechanical advantage for gear systems (Johnson-Glenberg et al., 2015). The Microsoft Kinect sensor was used to capture the direction and speed of the spin of the learner's arm. The learner extended his/her arm in front of the body and rotated it around the shoulder joint. That movement drove the first gear in a simulated gear train. Using distance from shoulder joint to wrist joint, the average diameter of the driving gear was mapped to the learner's body; when the learner altered the size of the physical spins, that action altered the size of the gear on screen in real time. Using the learner's real time wrist speed, the velocity of the gear spin was also mapped in real time. **Congruency means a large overlap between the action performed and content to be learned.** In that study, the learners that understood mechanical advantage (on a traditional test) also showed greater competency during gameplay, because they consistently chose the correct diameter gear during the virtual bike race. This is an example of how gesture can be part of both the learning situation and assessment.

#### Immersion/Presence

Construct c has been called sense of immersion in previous articles describing the Johnson-Glenberg embodiment taxonomy for education (Johnson-Glenberg et al., 2014a; Johnson-Glenberg and Megowan-Romanowicz, 2017). Slater's lab posits that immersion is a non-subjective property of the technological system (which includes attributes like Field of View (FOV) and fidelity to environment). They distinguish between presence and immersion and state that presence is what is subjectively felt by the user, although they concede the two terms are "subjective correlates" (Slater and Sanchez-Vives, 2016). In America, researchers have tended to conflate these two terms. Slater and others (Witmer and Singer, 1998) assert that the two terms should be kept separate because presence is always a subjective experience and not as quantifiable as the immersivity of a system. But, the two terms are inextricably "tangled" (Alaraj et al., 2011), and given the high fidelity and immersive affordances of the current spate of VR technologies, it may be appropriate to assume the majority of users will be in high fidelity and highly immersive VR environments. As the amount of immersivity in the technology begins to asymptote, perhaps more weight should be placed on the construct of presence. This is not to say that VR is on the flat slope of modal innovation. There is much work to be done with haptics and olfaction, but the large amount of variance of immersivity seen in the systems of the early 2000's, has been attenuated. The levels of quality for optics, lag, and audition are sufficient for the majority of users to suspend disbelief and feel translocated.

The author proposes using the one term presence to also connote a very high degree of immersion, because the amount of immersion is universally high in current immersive VR. When discussing MR platforms, the immersivity distinction may still be relevant. To show how we mesh the two terms, the fusion term of immersion/presence will be used. Under the construct of immersion/presence, there are subsumed other factors that are critical to learning, e.g., motivation and prior knowledge, which are clearly important in learning. Many of those factors are not under the control of the lesson designers. One might experience TABLE 1 | Construct magnitude within degrees in the Embodied Education Taxonomy.


*H, High; L, Low.*

\**An ill-conceived, but possible configuration.*

low presence in a lesson if prior knowledge were extremely low and inadequate for the task<sup>3</sup> .

Several new taxonomies for embodiment are being proposed that do not include the third construct of "sense of immersion" or presence (Skulmowski and Rey, 2018). In many ways, a two axes model makes for a tidier taxonomy. However, we believe that to reframe the embodied taxonomy for education for 3D immersive VR, a construct for immersion/presence is crucial because presence is one of the unique and profound affordances of VR.

When learners experience high presence, they have suspended disbelief enough to engage meaningfully with the virtual. Players often report they lose some track of time and place. It is known that learning is facilitated by engagement and motivation (Csikszentmihalyi, 1997). If feeling presence connotes that the learner's body is in the virtual world, then higher presence might also correlate with higher levels of embodiment. The original, embodied taxonomy from **Table 1** (Johnson-Glenberg and Megowan-Romanowicz, 2017) consisted of four delineated degrees along the continua of the three constructs. Reprinted table is open source from Johnson-Glenberg and Megowan-Romanowicz (2017).

Note that the cells with asterisks would be poor contenders for lesson design. Using a large gesture that is poorly mapped to the learning situation is not predicted to induce felicitous learning (e.g., moving one virtual electron in a magnetic field by performing three jumping jacks). It is kept in for the sake of symmetry, and, well, because bad lessons do happen.

# 3D Figures for 3D Constructs

The new graphic in **Figure 1** takes into account the continuous nature of the three constructs. It maintains the concept of immersion/presence. The crosshairs in the middle allow the reader the opportunity to partition the large space into more tractable low and high spaces; it could even be imagined as eight sub-cubes. It should be stated, that those who design multimedia lessons to be used in classrooms (as opposed to experimental

<sup>3</sup>Wasted cognitive effort and emotional frustration would attenuate a sense of presence if the learner never ever understood which size gear to use to get the virtual bike up the virtual hill in the first place. Indeed, a handful of middle school players in the gears game called Tour de Force would insist on using the largest gear and spin furiously while the bike stayed in one place on the steep hill and the timer ran down (Johnson-Glenberg et al., 2015).

labs) understand that lessons rarely fall neatly into any one subcube or bin. Because magnitude of the gesture (i.e., the amount of sensori-motor engagement) may prove to be the least predictive construct for content comprehension, it is relegated to the Z axis. The Z axis, or depth, is usually more difficult to conceptualize in a graphic. The goal for graphics like these is to aid researchers and designers in visualizing embodiment in educational content and aid the community in using the same terminology. These graphics should also spur researchers to assess the orthogonality of the constructs. RCT's on the three constructs and how they interact during learning are greatly needed.

# MORE ON GESTURE AND LEARNING IN 3D

The use of hand controls in VR has the potential to be a powerful catalyst for engaging students, heightening agency, and aiding in the comprehension of complex 3D concepts. The new hand controls are significantly more intuitive than traditional game consoles and the ease of entry has been remarked on by multiple users and designers, including in the Oculus Designer Best Practices Guidelines (Oculus, 2018). Hands and arms are untethered, multiple markers no longer need to be strapped to the body, and now more distal body parts (e.g., the feet) are being extrapolated with smaller peripherals. The era of immediate full body mapping will be shortly upon us. Until that time, we begin by focusing the embodiment lens on hand-based gestures.

There are four classic hand gestures that have been codified (McNeill, 1992). These gestures are: beat (usually moving the hand rhythmically with speech), deictic (pointing), iconic (i.e., a victory sign with the index and middle finger spread), and metaphoric (where the motion often serves as the metaphor). An example of a metaphoric gesture would be flipping a palm past the ear toward the shoulder to connote something that "was in the past." In mixed and virtual reality environments iconic and metaphoric gestures are often meshed, e.g., in an educational evolution game built by our lab, butterflies are captured with a virtual hand-held net. Grabbing a hand control trigger makes the avatar hand into a fist on screen (used to grab the butterfly net– iconic) and swinging the hand makes the net swish (capturing the virtual butterflies upon collision–metaphoric). In the end, the iconic vs. metaphoric gesture distinction may not be very helpful in VR's dynamic and fantastical environments. This lab often uses the term representational gesture. The latest hand control model as of Summer 2018 included with a Standalone VR headset comes with a dozen preprogrammed iconic gestures (e.g., OK, peace V, Vulcan greeting, etc.).

Beyond iconic gestures with a human-looking hand, your avatar's hands can look like anything. Hands could resemble wingtips to fork tines, and they can manipulate anything, from quarks to galaxies. Gesturing with a human-looking hand may have special affordances that further increase the sense of agency. It is known that using hands to be in control of the action on screen can attenuate simulator sickness (Stanney and Hash, 1998). It has been shown that users quickly begin to treat their avatars as if they were their real bodies (Maister et al., 2015). This is further supported by research comparing virtual and real world instances of the classic Rubber Hand Illusion (IJsselsteijn et al., 2006).

Gesture has been researched in education for years and over a wide range of topics. Abrahamson researches mathematics and proportionalities (Abrahamson, 2009). Alibali and Nathan explore learning and teaching diverse math topics including equation solving, word-problem solving, algebraic and geometric concepts (Alibali and Nathan, 2012; Nathan et al., 2014). Congdon et al. (2017) showed that children as young as 3rd grade retain and generalize content from a math lesson better when they received instructions containing paired speech and gesture (as opposed to sequential speech and then gesture). In a mixed reality study on astronomy, students learned more about dynamic concepts with full body movements (Lindgren et al., 2016). Many mixed reality studies move beyond simple gesture and incorportate whole body movement. In a previous study reported in 2016, a randomized control trial (RCT) varied the amount of embodiment in a mixed reality system called SMALLab (Situated Multimedia Arts Learning Lab). College students were randomly assigned to three separate platforms that allowed for varying amounts of both motor activity and congruency (embodiment) (Johnson-Glenberg et al., 2016). The topic was centripetal force. The platforms were: (1) SMALLab, where learners could physically swing a tangible bob-type object on a string overhead, (2) Whiteboard, where learners could spin their arms in a circle to manipulate the virtual object, and (3) Desktop, where learners could spin the mouse in circles while seated. Within the three platforms, the amount of embodiment was either high or low. All six groups gained physics knowledge equally from pretest to immediate posttest; however, from posttest to 2 week follow-up, the level of embodiment in the lesson interacted significantly with time. That is, the participants in the higher embodiment conditions performed better on the generative knowledge test—regardless of platform. This supports the hypothesis that better retention of certain types of knowledge can be seen over time when more embodiment is present during the encoding phase.

Beyond the concept that gesture may aid in lightening the cognitive load (Goldin-Meadow et al., 2001), there are other theories addressing why gesture may aid in learning. One theory is that using gesture requires motor planning and this activates multiple simulations even before the action is taken. Hostetter and Alibali (2008) posit that gesture first requires a mental simulation before movement commences, at that time motor and premotor areas of the brain are activated in action-appropriate ways. This pre-action, covert state of imaging an action appears to stimulate the same collaries as the overt action i.e., motor cortex, the cerebellum, and basal ganglia (Jeannerod, 2001). The combination of planning and then performing may lead to more motor and pre-motor activity during encoding, which might lead to a stronger learning signal and memory trace.

The duality of the immersion/presence afforded by VR meshed with the new interfaces of the hand controllers allows for unique learning possibilities. In much of the past research on learning in VR (e.g., Gutierrez et al., 2007), the focus has been on the technology and short shrift has been given to learning pedagogies behind the lessons. This state of affairs prompted Fowler (2015) to title an article, VR and learning: Where is the pedagogy? (Fowler, 2015). Fowler calls for a design-for-learning perspective and urges readers to consider the "value or benefits that VR would add" to each particular learning experience. Designers and users of VR should be more aware of learning theories, so a short summary of some relevant educational theories that could be integrated in VR lessons is included.

# VR AND EDUCATION

Researching VR and education is confounded by the fact that many authors consider "virtual worlds" to be isomorphic to VR, thus searches promising meta-analysis research on VR, see Merchant et al. (2014) as an example, are not very helpful in 2018. There has been little work to date on education and immersive VR (also called IVR) (Blascovitch and Bailenson, 2011). Scholars have been asking for educational research for some time (Mikropoulos and Natsis, 2011), but the resources and affordable technologies were not readily available. Up until 2016, most of the literature on VR and education was based on proprietary VR software and hardware. Research labs, the military, or commercial companies had to create in-house products that were too expensive for public consumption. In 2016, two sets of high-end headsets with hand controllers (Oculus Touch and HTC VIVE) came to the market. Studies on gesture in VR are slowly coming to light<sup>4</sup> .

The use of VR in education is so new, and its affordances are of such a multitude, that design guidelines solely for education in VR have not yet been published. A meta-analysis commissioned by the US military (Dixon et al., 2009) found 400 documents that had the words "2D, 2.5D, and 3D applications for information visualization, display development, and guidelines for applications of dimensionality." The search stopped in the year 2006. The study reports benefits when 3D technology was used to:

"convey qualitative information, provide a rapid overview, facilitate mission rehearsal, visualize network attack and physical access vulnerabilities, and aid route planning. . . . practicing telemanipulation skills with sensor augmentation and can provide realistic simulator training (piloting, aerial refueling, etc.)." p. 11.

Dixon et al. (2009) found few studies that, if they reported on human performance at all, were not tied to performance with specific equipment. Thus, the findings were somewhat narrow and non-generalizable. More recently, a well-regarded second edition of 3D User Interfaces (Laviola et al., 2017) has been published, but it includes little mention of pedagogies for learning and less than one page on bi-manual control in VR. In these early days, trial and error plays an outsized role in design. Education researchers borrow heavily from the entertainment designers, who focus on engagement, and not necessarily on retention of content. The dearth of studies highlights the urgency for a set of guidelines for designing content that allows users to make appropriate choices in a spherical space. Below are short summaries of three education theories that lend themselves to creating gesture-controlled content in immersive VR.

# Constructivist Learning Theory

This theory builds off of Dewey's (1966) concept that education is driven by experience. Piaget (1997) further describes how a child's knowledge structures are built through exploratory interactions with the world. Constructivism emphasizes authentic interactions with the world that are consistent with knowledge students are expected to develop (Duffy and Jonassen, 1992). Environments such as VR can provide opportunities for learners to feel present in goal-driven, designed activities. The interactions that they have with artefacts and interactional systems in these environments should facilitate the construction of knowledge about the activities (Dede, 1995; Winn, 1999).

This is a theory article that ends with real world design advice to enhance classroom learning experiences, so further definitions of constructivism have been culled from a teacher's textbook (Woolfolk, 2007). The bolded text below has been added by this author to highlight components that VR is especially well-suited to address.

Per Woolfolk, common elements in the constructivist perspective:

<sup>4</sup>One creative, full-body experiment assessed learning on the topic of microgravity. The experimental group (suspended in the VR immersion rig) answered one (out of two) key questions significantly better than the untreated control group. The control group was asked to merely imagine how long it would take to reach a door

in a microgravity environment (Tamaddon and Stiefs, 2017). Clearly, larger studies with more robust assessments are needed.


Point 2 regarding social negotiation is important in education, but not highlighted because it is still expensive to implement multiuser, synchronized learning spaces. Educational instances of real-time, multi-user social negotiations in VR are probably years away (for an update on multi-user VR in education see Slater and Sanchez-Vives, 2016). A constructivist example in STEM in mixed reality is provided in the section called Example of an Embodied Lesson and Experiment. In scaffolded, virtual STEM environments, the learners start with simple models and interact to create more complex ones over time. Learners receive immediate feedback and know they are the agents manipulating the objects. They know they are in charge of the constructing. When a lesson is appropriately designed, with incrementally increasing difficultly, and includes evaluative, real-time feedback, then learners are encouraged to become more metacognitive. Learners become evaluative about their output. They can re-submit or reconstruct models multiple times. In this way, agency and ownership are encouraged. Active learning is especially important in the STEM domain where the majority of young STEM learners drop out over time (Waldrop, 2015).

# Guided Inquiry

Inquiry refers to the collection of methods scientists use to study natural phenomena, to advance and test hypotheses, to subject hypotheses to reasoned analysis, and to use data to explain and justify assertions. Inquiry can be used to describe the ways students can investigate the world as scientists might. Students can propose and test ideas about how the world works, analyze findings, and make arguments from evidence to justify their assertions (Hofstein and Lunetta, 2003). Guided inquiry emerged in the late 1980's as an effective practice because it had been shown that free, exploratory learning, on its own, could lead to spurious hypotheses. Minimally guided instruction is "less effective and less efficient" (Kirschner et al., 2006), until one has a sufficient amount of prior knowledge. Students benefit from pedagogical supports that help them construct conceptual models, or knowledge structures (Megowan, 2007). Guided inquiry methods with technology are being developed to help students build, test, and deploy conceptual models of phenomena which cannot be directly observed. VR is poised to be an important tool in this domain. Guiding learners toward accurate deductions does not mean hand-holding. It means giving just enough information so that the final deduction is made by the student, in this manner the students takes ownership over what they have learned. Many believe that some cognitive effort is needed for learning "to stick"; these concepts are in line with the desirable difficulties literature (Bjork, 1994; Bjork and Linn, 2006), and levels of processing research.

# Embodied Learning

Human cognition is deeply rooted in the body's interactions with the world and our systems of perception (Barsalou, 1999; Wilson, 2002; Glenberg et al., 2013). It follows that our processes of learning and understanding are shaped by the actions taken by our bodies, and there is evidence that body movement, such as gesture, can serve as a "cross-modal prime" to facilitate cognitive activity (e.g., lexical retrieval; Hostetter and Alibali, 2008). Several studies by Goldin-Meadow's group have shown a direct effect of gestures on learning (Goldin-Meadow, 2014). Recent research on embodied learning has focused on congruency (Segal, 2011; Johnson-Glenberg et al., 2014a), which posits an alignment of movements or body positioning (the body-based metaphor—see Lindgren's work) with specific learning domains (e.g., learning about centripetal force and circular motion by performing circular movements as opposed to operating a linear slider bar, Johnson-Glenberg et al., 2016). Virtual and mixed reality environments afford the opportunity to present designed opportunities for embodied interactions that elicit congruent actions and allow learners opportunities to reflect on embodied representations of their ideas (Lindgren and Johnson-Glenberg, 2013).

Embodied learning is probably most effective when it is active, and the learner is not passively viewing the content, or watching others interact with manipulables, as reported by Kontra et al. (2015). If the learner is induced to handle the physical content, or to manipulate the content on screen then they must be active and moving the body (which activates more sensori-motor areas). James and Swain (2010) placed 13 young participants (approximately six years of age) in an fMRI scanner. The children either actively manipulated an object (called a self-generated action) while hearing a new, novel label, or they watched an experimenter interact with the object. Motor areas of the participants' brains were more likely to be activated upon subsequent viewing when they self-generated the action, as opposed to observing it.

As highlighted earlier in this article, "embodied and embodiment" are evolving terms. Computer-mediated educational technologies are changing rapidly as well. The new VR hand controls will allow for active engagement and high levels of embodiment in lessons. Using virtual content, teachers will not be constrained by having to purchase specific physical manipulables. While haptics and mass are constructs that the virtual world does not yet easily accommodate, their absence should not be viewed as barriers to designing high quality, high embodied content. In-headset cameras can now capture articulated finger movements and this will lead to further advances and uses of naturalistic gestures. Given that gestures and embodiment may figure prominently in educational VR in the future, this article includes an example of a highly embodied lesson that was built for a mixed reality environment. The next section also cites effect sizes to aid researchers in future experimental and research design.

# Example of an Embodied Lesson and Experiment

This section presents experimental evidence supporting the hypothesis that active and embodied learning in mediated educational environments results in significantly higher learning gains. Examples of types of gestures are discussed and new inferential statistics have been run on the data included in this article. There is currently a dearth of RCTs for VR in STEM education. Educational RCT's can be found in both mixed reality (Lindgren et al., 2016) and augmented reality (AR) environments (Squire and Klopfer, 2007; Dunleavy et al., 2009; Yoon et al., 2012). The results usually favor the experimental conditions, and the more embodied and/or augmented conditions.

The electric field study described in this section was conducted before the latest generation HMD's with hand controls were commercially available. Immersion was one of the goals and so a very large projection surface was used to induce some presence; however, because the real world was always present on the periphery, this should not be considered VR. This was an MR study using a whiteboard surface with a 78 inch (1.98 m) diagonal. This lab has researched in mixed and augmented reality spaces for science education for over a decade; the range of topics includes geography (Birchfield and Johnson-Glenberg, 2010), nutrition science (Johnson-Glenberg and Hekler, 2013; Johnson-Glenberg et al., 2014b), simple machines (Johnson-Glenberg et al., 2015), physics (Johnson-Glenberg et al., 2014a, 2016) and forces. The full article describing the electric field study and the seven learning tasks can be found at Johnson-Glenberg and Megowan-Romanowicz (2017).

When designing for complex science topics, care is always taken to scaffold both the number of elements onscreen and the amount of interactivity necessary to optimally interact with the user interface. For a history of scaffolding in the learning sciences, see Pea (2004). Designing to mitigate the effects of content difficulty and user's physical interactions requires a multidisciplinary approach, previous research has been published on multimedia design with 2D content (Sweller et al., 1998; Mayer, 2009). Many pitfalls of poor scaffolding can be avoided with multiple playtests that include naïve users (Johnson-Glenberg et al., 2014c). Whenever this lab has scrimped on playtesting, the end product has always suffered.

For the study, a 1 hour-long series of seven simulations was created to instruct in Coulomb's Law. The study did not start with the full equation, but built up to that somewhat complex equation. Each of the four variables in the equation was introduced one at a time, and participants had multiple exposures to, and interactive practice with, each variable. The first task in the seven task series refreshed the college students' knowledge on the topic of atoms and charge. The final task revolved around the conditions needed for a lightning strike. Individual videos on the tasks (and free, playable versions of most of the games) can be experienced at https://www.embodied-games.com.

# Design

The study was a 2 × 4 design, the first factor was time with two levels: pretest and posttest. The second factor was condition with four levels: (1) Control - Symbols and Text (S&T), (2) Low Embodied (where participants watched animations or simulations), (3) High Embodied, and 4) High Embodied-with Narrative. The final two conditions were high embodied because participants were able to physically interact with, and construct, models onscreen. In the high embodied conditions participants' gestures were gathered via the Microsoft Kinect sensor.

The study was carried out in accordance with the recommendations of U.S. Federal Regulations 45CFR46 under the guidance of a state university's research, integrity and assurance office. The protocol was approved by the Institutional Review Board (IRB). All participants were over 18 years of age and signed written informed consent in accordance with the Declaration of Helsinki. The college students were randomly assigned to condition. The first two conditions were considered passive because the learners' "hand grab" gestures only served to advance to the next screen. The final two conditions were considered active because multiple gestures using the hands, arms, and knees were used to manipulate the content on screen, as well as to advance the screens.

Throughout the lesson, multiple high embodied and gesturally congruent movements were used to facilitate learning. The example below details simulation number three (out of seven) that focused on vector comprehension. This task was chosen because it is an **example of a 2D lesson that might be more efficacious if translated to a 3D immersive VR environment** because the electric field surrounds us in three dimensions. High school and college students often do not understand the spherical nature of the electromagnetic field from 2D instructional texts and computer models (Megowan-Romanowicz, personal communication, December 4, 2017).

**Figure 2** is a screen capture of simulation three called "Vector van Gogh" where participants were able to draw vectors. At the top left of the screen is a dynamic representation of a portion of Coulomb's Law. The symbols in this equation box (technically a "proportionality" since the constant k is missing) change in real time, such that, the size of the symbols represents the magnitude of each component. E is the electric field at a point in space, the numerator q represents the magnitude of the fixed charge in the center of the screen (+1). The denominator r represents radius and is squared. The radius is the distance of the free charge (the yellow circle) from the fixed charge in the middle of the screen. The fixed charge is represented by the tiny atom in the center. Designing with scaffolding means that the full proportionality for Coulomb's Law is not presented until the learner has been exposed to each variable separately (around simulation number 6).

In the high embodied conditions, the large arrow (yellow vector) is physically drawn by the participants. In the other conditions, participants either worked with symbols and text, or they passively watched animations of the yellow vectors being drawn over seven trials. The viewed animations included two errors, similar to what happened on average in the high

embodied conditions. In the high embodied conditions, the Kinect 360 sensor was used to track the right wrist joint. When the learner held down the clicker button, that signaled the start of the yellow vector—the tail would be set. The tail always began in the yellow circle (in **Figure 3** the number 00.250 is shown under the start circle). The learner would then draw, via gesture, the vector's length and direction. With the release of the clicker button, the end of the vector (the tip) or arrow head would appear. If the learners were satisfied with the constructed virtual vector, they would hit submit. This constructed vector symbolized how the free particle would move when released. Thus, with larger or smaller embodied gestures, vectors of varying magnitudes were freely created by the learner with a swipe of the arm.

An algorithm was created to assess the quality of the submitted vector, comparing both its direction and length to an expertdrawn vector. If the learners' vectors were more than 5% discrepant, they had two more chances to redraw and resubmit. If the vector was still incorrect on the third try, the expert vector was displayed via animation and the next task appeared. In the equation box on the upper left corner of **Figure 2**, a relatively small electric field (E) at that point in space is shown. Note that the radius (r) is so large that it extends out of the equation box. That is because the free charge is far from the central, fixed charge (q = +1).

**Figure 3** is a screenshot of a later trial in which the yellow start circle (aka free charge) is closer to the fixed central charge of +1. Again, the participant would draw an arrow to show the expected movement of the free charge. In this trial, the vector should be moving away from the positive fixed charge and it should be larger than the previously drawn one as the E Field is now 1.000. In the equation box, note how E is much larger and the r is smaller in size compared to **Figure 2**. This lesson reifies representational fluency in that the symbols map to the pictorial graphics, which in turn map in real time to the embodied movements of drawing.

The focus of this summary is on the gesturally passive vs. active conditions. For ease of interpretation between active and passive, the four groups have been collapsed into two.<sup>5</sup>

# Results of the Electric Field Study

The study began with 166 participants. The four groups were matched at pretest and they remained matched when combined into two groups (p < 0.30). Two types of tests were administered, the first was a more verbal assessment that used a keyboard for typed responses to multiple choice and open-ended questions, in that assessment the two high embodied groups performed better, M = 49.9 (11.6), compared to the two passive groups (symbols+text and low embodied) M = 46.7 (13.1). The effect size or Cohen's d was small 0.22, but it favored the high embodied group.

The second measure was an innovative gesture-based assessment, called the Ges-Test. This was created to allow participants to construct vectors by free hand drawing. Participants moved their fingertip along a large touchpad called the WacomTM Intuous Pro (15.1 inch or 38.4 centimeter drawing diagonal). This allowed the participants the ability to speed up or slow down their movements so that the concepts of positive and negative acceleration could be assessed. Eight questions were analyzed. The hypothesis was that the gesture test would be more sensitive to revealing learning gains that might be attributed to embodiment during the encoding intervention phase. On the Ges-Test the active and embodied groups performed significantly better than the passive groups on the posttest, F(1, 132) = 3.77, p < 0.05. See **Figure 4**.

# Study Conclusions

These results support the hypothesis that when learners perform actions with agency and can manipulate content during learning, they are able to learn even very abstract content better than those who learn in a more passive and low embodied manner. When tested with gesture on the topic of vectors and motion, the high embodied students showed they learned more. Given that being active and using congruent gestures seems to facilitate learning, we support designing VR content that makes use of the new VR hand controls for both learning and assessment purposes. Creating assessments that use gestures mapped to the hand controls locations in 3D space seems a productive path forward.

# PRUDENT VR GUIDELINES THUS FAR

For the most part, immersive VR educational lessons and studies have occurred primarily in adult populations (Freina and Ott, 2015). These have occurred in a variety of fields from medicine, e.g., intricate maneuvers involved in craniofacial repairs, (Mitchell et al., 2015), to behavioral change interventions, for solid examples see the innovative work on PTSD reduction by Rizzo et al. (2010). A chapter by Bailey and Bailenson (2017) provides a speculative overview of how VR might affect youth and cognitive development, but longitudinal effects of VR exposure are unknown at this point. Because so little is known about youth and VR, the guidelines included at the end of this article are recommended for players 13 years and older (similar to the constraints and advisements seen on the most popular commercial headsets).

In terms of education and classroom adoption, the first iteration of affordable VR for entire classrooms has been with mobile. Exploratory results have been reported using systems such as Google Expeditions (Minocha et al., 2017). Innovative work is also being done with MR goggle experiences in museums and at some historical cites (an example from Knossos is described by Zikas et al., 2016). One prediction is that when the Standalone headsets, which do not require phones or separate CPU's, become available, then immersive VR experiences with a hand controller will become more popular for classroom use. When VR becomes affordable, educators will be in need of quality content. What will high quality pedagogical VR look like? Should everything 2D just be converted to 3D? We agree with Bailenson who posits that VR should be used in instances where it is most advantageous (Bailenson, 2016). He lists four:


When designing for VR for education, Dalgarno and Lee presciently published several affordances for three dimensional VR environments, which they call VLE's (virtual learning environments) (Dalgarno and Lee, 2010). The five listed below pertain to both PC-based 3D worlds and immersive VR (as this article uses the term). This author's notes are in brackets.

Affordance 1: Use VLE's to facilitate learning tasks that lead to the development of **enhanced spatial knowledge representation** of the explored domain. [This is in-line with this lab's sentiments that 3D and the affordances of spatial reasoning represent a profound affordance of the technology.

<sup>5</sup>Videos simulations can also be seen at https://www.youtube.com/watch?v= eap7vQbMbWQ.

If no special insights will be gained from using the more costly VR equipment, then stick with 2D models].

Affordance 2: Use VLE's to facilitate experiential learning tasks that would be impractical or impossible to undertake in the real world. [This is similar to Balienson's tenet.]

Affordance 3: Use VLE's to facilitate learning tasks that lead to **increased intrinsic motivation and engagement.** [The research community suspects that VR, regardless of the quality, will continue to enhance engagement, which has been shown to increase learning. However, one further prediction of ours is that the novelty and heightened engagement will wane over multiple exposures, and at that time quality pedagogy will be driving the learning. Tightly controlled RCTs have yet to be performed on these issues. There have been several early studies comparing learning in a 3D headset to viewing the content on a computer monitor screen as the control condition (Gutierrez et al., 2007), but it is time to move beyond simple 2D PC comparisons.]

Affordance 4. Use VLE's to facilitate learning tasks that lead to improved transfer of knowledge and skills to real situations through **real world contextualization of learning**.

"Specifically, because 3-D technologies can provide levels of visual or sensory realism and interactivity consistent with the real world, ideas learnt within a 3-D VE should be more readily recalled and applied within the corresponding real environment." p. 22.

Affordance 5: Use VLE's to lead to richer and/or more effective collaborative learning as well as richer online identity construction and a **greater sense of co-presence that will bring about more effective collaborative learning**. [This rings true as well, although we note that a zero-lag, multi-user classroom experience may still be a few years away.]

# A High Embodied VR Lesson Using Hand Controllers

Deftly meshing education with games is a far trickier business than one might suspect. This author has been building multimedia educational content for over two decades and can admit to creating several epically flawed "edu-games" in that time. Unfortunately, the majority of education apps available today for free are still neither highly educational nor sustainably entertaining. Education is underfunded for the sort of iterating (with quality graphics included) needed to create compelling and effective learning games. Education game designers often take their cues from entertainment game designers, for better or worse. As VR comes of age, the first popular titles are going to be the entertainment ones. Quality education games will come later. One prominent game creator giving advice on VR design is Jesse Schell. His Oculus 2 Conference presentation (2016, https://www. youtube.com/watch?v=LYMtUcJsrNU) contains many design nuggets. These range from the broad: Keep the horizon level; Proprioceptive disconnect is bad, i.e., you should not be a reclining human with a walking avatar body; Sound is vital and takes twice as long to get right in VR. To the specific: 3D with 9DOF is well-suited for peering into multidimensional objects like brains and engines, however, if you lock an object near the user's POV then you need to give the object a bobbing motion or users will assume the system is frozen.

The educational VR community does not yet have a set of guidelines for how to implement hand controls and gesture for embodied education. Before ending with a list of design guidelines for that space, a VR lesson is described that incorporates these guidelines. We consulted on creating an Alpha version of a high school-level chemistry lesson in a VR open world called Hypatia. The premise of Hypatia is that it is a multi-player world primarily built for social entertainment. One of the company's mantras was "never break immersion." But, learning scientists know it is also important to build in time for

reflection during a lesson so that students can create meaning around the intense stimuli. Never break immersion may be a guideline from entertainment that does not migrate well into the education community. In a goal-driven learning situation it may be desirable to bring learners out of the experience, perhaps to a virtual whiteboard. It may be efficacious to request learners remove the headset to make handwritten notes, or perhaps engage in face-to-face collaborations/questioning with a partner before resuming the immersive experience. These are empirical questions.

In the multi-player virtual world called Hypatia, players first create non-human avatars with pre-populated parts. The early module described here was called Kapow Lake; it was conceived of as a high school chemistry and physics lesson using fireworks as the topic. Two learning goals were embedded: (1) understand which metal salts burst into which colors, and (2) understand the elementary physics behind why the burst is perceived as a particular color.

Players start on the beginner side of the lake, they can watch fireworks in the sky and are motivated to build some of their own. Using light cues, we "signpost" players via the lit doorway to enter the experimentation shed. See **Figure 5**.

Theories of constructivism and guided exploration are prevalent throughout the lesson. In order to construct their own

FIGURE 5 | Screen capture from the chemistry lesson in *Hypatia*. Note signposting via the lit door to encourage learners to enter the experimentation shed.

base of the firework on the table behind the avatar (from *Hypatia*).

fireworks players must first master the names of the colors. Players would grasp the triggers of the hand controls (i.e., HTC Vive) and when the avatar hand collided with a metal salt, the salt would be picked up. The first series of gray metal salts (see **Figure 6**) did not have the colors on the labels. So players did not know that the salt called strontium would burn red. Via systematic exploration, they would place each salt into the flame of the Bunsen burner and note the color that the salt burned.

**Figure 6** shows the avatar (Jessica) on the left side of the screenshot. The salt labels are now colored and visible (i.e., if strontium burns red, how will copper burn?). This lesson takes advantage of several of the affordances associated with VR, one of which is making the "unseen be seen", now an individual atom of strontium can be shown. After Jessica places the gray salt over the flame a Bohr atom model of strontium appears on top of the flame.

Another of the profound affordances of VR is the immersion into three dimensions. Note that the screenshot is taken from the 3rd person POV for the purposes of edification, but Jessica, the human player, is seeing the atom floating above the burner in 1st person or a "head on" POV. After she places the strontium over the heat, the outer electron moves out of the stable outer orbit. The unstable orbit is shown briefly as a dotted ring during play. Quickly, the electron falls back to its more stable orbit, as it does this a packet of energy called a photon is released. This photon is perceived in the red spectrum. In **Figure 6**, the photon has been visualized as both a red wave and a particle heading toward the eye. Jessica is watching the dynamic model in 3D and she perceives the photon as traveling directly into her eye. (This is perhaps the only thing humans want heading directly toward our eyes!) The sinusoidal movement was designed to be somewhat slow, so it would not be alarming.

The simulation of the photon as a wave reifies the concepts that energy is released by the heat burst, and that that the energy is then perceived by the human eye as a visible wavelength. The five other salts release electrons from different orbits, thus creating different wavelengths. Once players are able to match all six metal salts to their colors, they are signposted to exit the back door to the expert's multi-staging fireworks area (**Figure 7**).

Players are allowed several minutes of free exploration to construct rockets. If they are not building functional ones in the time limit, then we again signpost (via object blinking) the sequential procedures for construction (e.g., tube first, then fins, salts, fuse, then the cone top). After two correct constructions, players are instructed, via text in the headset, to build multi-stage rockets with very specific sequences of colors. This is an engaging task, but it also serves as a form of stealth assessment (Shute, 2011). Now a teacher or spectator can observe whether the student really understands how strontium and copper need to be sequenced to make a red then a blue explosion.

# Design Principles for Embodied Education in VR

The new VR principles are grouped first as general guidelines and second as those pertaining to gesture and hand controls. They are listed in the order that they are often performed in. That is, a design and development team starts with a paper version of the interface. It is necessary though to iterate on a module several times before the module is ready for release. It will never be perfect; strive for 80% satisfaction. In an effort to keep the number of guidelines tractable, the article closes with the Necessary Nine. An important point to drive home for designers of education in VR is to remember that presence is immediate and for the learners to internally adjust to that feeling, it can take time. VR for entertainment can purposefully overwhelm, but the goal of education is for learners to leave the space with new concepts embedded in their ever-changing knowledge structures (the definition of learning). Some of your learners will also come to the task with low spatial abilities, and those students learn differently in 3D space (Jang et al., 2016). This is why the first start screen should always be somewhat sparse with a user-controlled start button. They can start when they feel acclimated. Declutter the user interface (UI) as much as possible, especially in the early minutes of the game.

# General Guidelines

	- Not everyone will know the controls. Not everyone knows to look around. Users are now in a sphere and sometimes need to be induced to turn their heads. . . but only so far. Do not place important UI components or actionable content too far from each other. E.g., do not capture butterfly #1 at 10◦ and then force them to capture butterfly #2 at 190◦ . Be gentle with users' proprioceptive systems (where the body is in space). If the content includes varying levels of difficulty, allow the user to choose the level at the start menu. This also gives a sense of agency. This "start slow" advice comes from years of designing educational content.

#### • Introduce **User Interface (UI) components judiciously, fewer is better**

◦ When users build the first fireworks in our chemistry lesson, they can only make one stage rockets. The multi-chambered cylinders are not available in the interface until users show mastery of the simpler content. (Johnson-Glenberg et al., 2014c).

## • **Scaffold** – also introduce **cognitive steps one at a time**

◦ Build up to complexity (Pea, 2004). As described in the electric field lesson instructing in Coulomb's Law, each component or variable in the equation is revealed one component at a time. Users explore and master each component in successive mini-lessons (Johnson-Glenberg and Megowan-Romanowicz, 2017).

#### • **Co–design with teachers**

◦ Co-design means early and with on-going consultations. Let the teachers, Subject Matter Experts (SMEs), or clients play the lesson/game at mid and end stages as well. Playtesting is a crucial part of the design process. Write down all comments made while in the game. Especially note where users seem perplexed, those are usually the breakpoints. Working with teachers will also ensure that your content is properly contextualized (Dalgarno and Lee, 2010), that it has relevance to and is generalizable to the real world once users are out of the headset.

#### • **Use guided exploration**

◦ Some free exploration can be useful in the first few minutes for accommodation and to incite curiosity, but once the structured part of the lesson begins, guide the learner. You can guide using constructs like pacing, signposting, blinking objects, etc. To understand why free exploration has not held up well in STEM education, see Kirschner et al. (2006).

#### • **Minimize text reading**

◦ Rely on informative graphics or mini-animations whenever possible. Prolonged text decoding in VR headsets causes a special sort of strain on the eyes, perhaps due to the eyes' vergence-accomodation conflict, but see Hoffman et al. (2008). In our VR game on evolution we do not make players read lengthy paragraphs on how butterflies emerge during chrysalis, instead a short cut-scene animation of butterflies emerging from cocoons is displayed.

#### • **Build for low stakes errors** early on

◦ Learning often requires errors to be made and learning is facilitated by some amount of cognitive effortfulness. In our recent evolution game, the player must deduce which butterflies are poisonous, just like a natural predator must. In the first level, the first few butterflies on screen are poisonous. Eating them is erroneous and depletes the learner's health score, but there is no other way to discern toxic from non-toxic without feedback on both types. Thus, some false alarms must be made. Later in the game, errors are weighted heavier. See recent learning from errors literature in psychology (Metcalfe, 2017).

### • **Playtest often** with novices and end-users

◦ It is crucial that you playtest with multiple waves of ageappropriate learners for feedback. This is different from co-designing with teachers. Playtesting with developers does not count. Our brains learn to reinterpret visual anomalies that previously induced discomfort, and user movements become more stable and efficient over time (Oculus, 2018). Developers spend many hours in VR and they physiologically respond differently than your endusers will.

### • Give players unobtrusive, **immediate, and actionable feedback**

	- All educators/designers are currently experimenting with how to do this in VR. Higher level learning (cognitive change) is not facilitated by twitch. Reflection allows the mental model to cohere. Should the user stay in the headset or not? How taboo is it to break immersion? Should short quizzes be embedded to induce a retest effect (Karpicke and Roediger, 2008)? Dyads could ask each other questions? At this stage, it is advised that reflection should be incorporated, but we need more research on optimal practices within the headset.

### • Encourage **collaborative interactions**

◦ Synced, multiplayer is still expensive, but it is a worthy goal. Try to include workarounds to make the experience more social and collaborative, either with a preprogrammed non-player character (NPC), having a not-in-headset partner interact via the 2D computer screen, or by designing sequential tasks that require back-and-forth in an asynchronous manner. A classroom collaboration and cooperation classic is Johnson and Johnson (1991).

# Using Hand Controls/Gestures

This section focuses on using the hand controllers in VR for learning.

	- Incorporate into lessons opportunities for learners to make physical decisions about the placement of content and to use representational gestures. Active learning has been shown to increase STEM grades by up to 20% (Waldrop, 2015).
	- Be creative about ways to get kinesthetics or body actions into the lesson. E.g., if information is going to be displayed as a bar chart, first ask users to swipe upwards and make a prediction about how high one of the bars should go. Note: prediction is a metacognitive, well-researched comprehension strategy (Palinscar and Brown, 1984).

#### • **Congruency**

◦ The gesture/action should be congruent, i.e., it should be well-mapped, to the content being learned (Black et al., 2012; Johnson-Glenberg and Megowan-Romanowicz, 2017). For example, the action to start a gear train spinning should be moving something in a circle, not pushing a toggle up or down.

#### • **Actions strengthen motor circuits and memory traces**

◦ Performing actions stimulates the motor system and appears to also strengthen memory traces associated with newly learned concepts. See section entitled Embodiment for multiple citations.

#### • **Ownership and Agency**

◦ Gestural control gives learners more ownership of and agency over the lesson. Agency has positive emotional affects associated with learning. With the use of VR hand controls, the ability to manipulate content and interactively navigate appears to also attenuate effects of motion sickness (Stanney and Hash, 1998).

#### • **Gesture as assessment—Both formative and summative**

◦ Design in gestures that reveal the state of the learner's mental model, both during learning (called formative or inprocess) and after the act of learning (called summative). For example, prompt the learner to demonstrate negative acceleration with the swipe of a hand controller. Does the controller speed up or slow down over time? Can the learner match certain target rates? This is an embodied method to assess comprehension that includes the added benefit of reducing guess rates associated with the traditional textbased multiple choice format. For an example, see the vector-based Ges-Test in Johnson-Glenberg and Megowan-Romanowicz (2017).

#### • **Personalized, more adaptive learning**

◦ Make the content level match the user's comprehension state – or be a little beyond the user's skill zone, as in Vygotsky's ZPD. Gesture research on younger children shows they sometimes gesture knowledge before they can verbally state it. Gesture-speech mismatches can reveal a type of readiness to learn (Goldin-Meadow, 1997). Thus, gestures can also be used as inputs in adaptive learning algorithms. Adding adaptivity (dynamic branching) to lessons is more costly, but it is considered one of the best practices in educational technology (Kalyuga, 2009).

# CONCLUSION

This article focuses on the two profound affordances associated with VR for educational purposes: (1) the sensation of presence,

# REFERENCES


and (2) the embodied affordances of gesture in a three dimensional learning space. VR headsets with hand controls allow for creative, kinesthetic manipulation of content, these movements and gestures have been shown to have positive effects on learning. A new graphic "cube" is introduced to help visualize the amount of embodiment in immersive educational lessons. As more sophisticated extrapolation algorithms are being designed, the whole body can be mapped while in a headset. The mapping of full body movement may provide for even more creative gestures and actions for learning in 3D.

We encourage designers to also incorporate seamless assessment within VR lessons, perhaps using the idea of leveling up during learning. This would add adaptivity to the system, and gesture can be one of the variables that feeds the adaptive algorithm. Lessons should get more complex as the learner demonstrates competency on previous material. We also encourage designers to include collaboration, which will become easier when multiple players can be synced in the virtual space.

As the technology moves forward, designers should keep principles of best practices in mind, and instructors should consult the principles to help make instructional and purchasing decisions. The previous section describes 18 principles in more detail. This article ends with the top contenders below. If there are only resources to focus on a subset, then the author recommends the Necessary Nine.


# AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

# ACKNOWLEDGMENTS

Many thanks to James Comstock, Tyler Agte, Hue Henry, Ken Koontz, John Wise, and Dennis Bonilla. The electric field study was funded by NSF grant number 1020367.

status and future applications. Surg. Neurol. Int. 2:52. doi: 10.4103/2152-7806. 80117

Alibali, M. W., and Nathan, M. J. (2012). Embodiment in mathematics teaching and learning: evidence from learners' and teachers' gestures. J. Learn. Sci. 21, 247–286. doi: 10.1080/10508406.2011.6 11446


three-dimensional virtual reality environment. Comput. Educ. 106, 150–165. doi: 10.1016/j.compedu.2016.12.009


**Conflict of Interest Statement:** MJ-G also oversees the website called www.embodied-games.com. All education games on the site are free to the public as they have primarily been grant funded. Source code is available upon request.

Copyright © 2018 Johnson-Glenberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Combined Cognitive-Motor Rehabilitation in Virtual Reality Improves Motor Outcomes in Chronic Stroke – A Pilot Study

Ana L. Faria1,2† , Mónica S. Cameirão2,3 \* † , Joana F. Couras<sup>4</sup> , Joana R. O. Aguiar<sup>4</sup> , Gabriel M. Costa<sup>4</sup> and Sergi Bermúdez i Badia2,3

<sup>1</sup> Faculdade de Psicologia e de Ciências da Educação, Universidade de Coimbra, Coimbra, Portugal, <sup>2</sup> Madeira Interactive Technologies Institute, Funchal, Portugal, <sup>3</sup> Faculdade de Ciências Exatas e da Engenharia, Universidade da Madeira, Funchal, Portugal, <sup>4</sup> CMM - Centros Médicos e de Reabilitação, Aveiro, Portugal

#### Edited by:

Nadia Bianchi-Berthouze, University College London, United Kingdom

#### Reviewed by:

Sandeep Subramanian, University of Texas Health Science Center at San Antonio, United States Albert Rizzo, University of Southern California, United States

> \*Correspondence: Mónica S. Cameirão monica.cameirao@m-iti.org

†These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 08 September 2017 Accepted: 11 May 2018 Published: 30 May 2018

#### Citation:

Faria AL, Cameirão MS, Couras JF, Aguiar JRO, Costa GM and Bermúdez i Badia S (2018) Combined Cognitive-Motor Rehabilitation in Virtual Reality Improves Motor Outcomes in Chronic Stroke – A Pilot Study. Front. Psychol. 9:854. doi: 10.3389/fpsyg.2018.00854 Stroke is one of the most common causes of acquired disability, leaving numerous adults with cognitive and motor impairments, and affecting patients' capability to live independently. Virtual Reality (VR) based methods for stroke rehabilitation have mainly focused on motor rehabilitation but there is increasing interest toward the integration of cognitive training for providing more effective solutions. Here we investigate the feasibility for stroke recovery of a virtual cognitive-motor task, the Reh@Task, which combines adapted arm reaching, and attention and memory training. 24 participants in the chronic stage of stroke, with cognitive and motor deficits, were allocated to one of two groups (VR, Control). Both groups were enrolled in conventional occupational therapy, which mostly involves motor training. Additionally, the VR group underwent training with the Reh@Task and the control group performed time-matched conventional occupational therapy. Motor and cognitive competences were assessed at baseline, end of treatment (1 month) and at a 1-month follow-up through the Montreal Cognitive Assessment, Single Letter Cancelation, Digit Cancelation, Bells Test, Fugl-Meyer Assessment Test, Chedoke Arm and Hand Activity Inventory, Modified Ashworth Scale, and Barthel Index. Our results show that both groups improved in motor function over time, but the Reh@Task group displayed significantly higher between-group outcomes in the arm subpart of the Fugl-Meyer Assessment Test. Improvements in cognitive function were significant and similar in both groups. Overall, these results are supportive of the viability of VR tools that combine motor and cognitive training, such as the Reh@Task. Trial Registration: This trial was not registered because it is a small clinical study that addresses the feasibility of a prototype device.

Keywords: virtual reality, stroke, motor rehabilitation, cognitive rehabilitation, task adaptation

# INTRODUCTION

Stroke is one of the most common causes of adult disability and its prevalence is likely to increase with an aging population (WHO, 2015). It is estimated that 33–42% of stroke survivors require assistance for daily living activities 3–6 months post-stroke and 36% continue to be disabled 5 years later (Teasell et al., 2012). Loss of motor control and muscle strength of the upper extremity are the

most prevalent deficits and are those that have a greater impact on functional capacity (Saposnik, 2016). Hence, its recovery is fundamental for minimizing long-term disability and improving quality of life. In fact, most rehabilitation interventions focus on facilitating recovery through motor learning principles (Kleim and Jones, 2008). However, learning engages also cognitive processes such as attention, memory and executive functioning, all of which may be affected by stroke (Cumming et al., 2013). Still, conventional rehabilitation methodologies are mostly motor focused, although 70% of patients experience some degree of cognitive decline (Gottesman and Hillis, 2010), which also affects their capability to live independently (Langhorne et al., 2011).

# What Is Missing in Conventional Cognitive and Motor Rehabilitation Methodologies?

Although motor and cognitive neurorehabilitation after acquired brain injury is strongly based on intensive training and task-specific learning for promoting neural reorganization and recovery (Alia et al., 2017; Galetto and Sacco, 2017), conventional methodologies still strive to accomplish this goal (Levin et al., 2014). Paper-and-pencil tasks are widely used in cognitive rehabilitation, and are assumed to be reliable and with adequate construct validity in the assessment and rehabilitation of cognitive functions after brain injury (Wilson, 1993). However, this methodology is not suited to deliver immediate feedback and reinforcement on progress, which is an important element to increase the motivation and avoid dropouts (Parsons, 2015). Additionally, when the dominant arm is affected by hemiparesis, performing paper-and-pencil tasks may become difficult or impossible. Regarding the motor domain, the persistent repetition of motor actions can be demotivating due to its repetitiveness and, because it is laborious and demanding in terms of human resources, it is not as intensive as it should be (Langhorne et al., 2009). In addition, the relationship between cognitive and motor deficits is increasingly being unveiled and cognitive effort appears to contribute to motor recovery (Pichierri et al., 2011; Mullick et al., 2015; Verstraeten et al., 2016). Studies with stroke survivors have shown differential patterns of motor outcomes depending on the cognitive deficits of patients (Cengi ˇ c et al., 2011 ´ ; Påhlman et al., 2011). Moreover, repeated performance of a movement may not lead to meaningful improvement unless the task is performed within the functional demands of a relevant environment (Levin et al., 2014). In fact, the practice of manipulations that require more cognitive effort were already predicted to be more effective for motor learning compared to those that require less cognitive effort (Hochstenbach et al., 1998). In this endeavor, it is important to investigate the learning potential of patients with post-stroke cognitive and motor impairments by developing new therapeutic strategies that merge cognitive and motor intensive training.

# Virtual Reality as a Tool for Combined Cognitive and Motor Rehabilitation

Virtual reality (VR) can nowadays be seen as a valuable approach in stroke rehabilitation, particularly in the motor domain where studies showed benefits at the level of upper limb function and ADL (Laver et al., 2017). This is potentially related to the fact that VR allows creating conditions to optimize motor learning by promoting meaningful and iterative practice, together with the delivery of immediate feedback (Levin et al., 2014). Although less explored, VR also provides the opportunity to integrate the practice of cognitive and/or motor activities in more ecologically valid contexts (Rand et al., 2009; Faria et al., 2016a; Adams et al., 2018). In such scenarios, motor training could be combined with the execution of cognitive rehabilitation tasks consisting of activities for improving cognitive domains such as attention, memory, or executive functions. Moreover, limitations in cognitive function have been shown to have an effect on VR performance (Kizony et al., 2004), and thus VR systems should be designed to address different cognitive profiles. Although the evidence is still modest, some studies with VR for simultaneous motor and cognitive rehabilitation have shown the potential of such strategy (Rand et al., 2009; Kim et al., 2011; Lee et al., 2015; Cameirão et al., 2017). Hence, we argue that novel VR tools should focus on integrative cognitive and motor rehabilitation based on tasks that pose both cognitive and motor demands. Assuming the interdependence between the recovery processes, we may provide a more effective rehabilitation tool.

Here we present the results of a feasibility study with the Reh@Task, a multi-purpose desktop based virtual scenario that combines arm reaching and cognitive training through virtual adaptations for the training of memory and attention of traditional paper-and-pencil tasks.

# Previous Work With the Reh@Task

The Reh@Task is a multi-purpose VR scenario for upper limb reaching and cognitive training that has been deployed in different configurations and with different rehabilitation paradigms. It allows the customization of stimuli, training task and training progression. In its first version, it originated as an adaptation in VR of the Toulouse Piéron (TP) cancelation task for the training of attention (Faria et al., 2014). The prototype was our first attempt to combine motor and cognitive training. It was primarily an attention only task that consisted on selecting target elements from a pool of distractors through arm reaching. This concept was tested in a 1-month intervention case study with three stroke survivors that presented both motor and cognitive deficits. Results indicated improvements both at motor and cognitive levels, suggesting the feasibility of the proposed approach (Faria et al., 2014). Following those results, the Reh@Task prototype was proposed with stimuli customization – to encompass varying cancelation tests with different stimuli – and the incorporation of a memory variant of the cancelation task for the training of memory, always relying on upper limb reaching movements. Thus, this new prototype enables the simultaneous training of upper limb reaching movements, memory, and attention. One of the advantages of a system such as the Reh@Task is that it can be easily customized to test different research hypotheses on the impact of such technology on stroke survivors with different profiles. In a previous controlled impact study, the Reh@Task was used to evaluate if cognitive tasks supported by personalized stimuli with positively valence

could lead to improved motor and/or cognitive outcomes in an understudied population in comparison with standard rehabilitation. This was done through stimulus selection from emotionally tagged pictures and through content personalization to patients' preferences, including music, in a group of subacute stroke survivors with mild cognitive impairment (MCI) (Cameirão et al., 2017). Results showed that the Reh@Task was as effective as standard rehabilitation, although motor and cognitive improvements were poor in both groups. This suggested that patients with MCI have a poorer recovery prognostic, specifically when presenting simultaneous motor and cognitive deficits. In fact, there is evidence that cognitive deficits interfere with motor recovery (Mullick et al., 2015), and that patients with MCI might have more difficulties in dual-tasking (Schaefer and Schumacher, 2010).

In the present study, the Reh@Task was used with stimuli different to those used in the above mentioned studies, focusing on neutral stimuli that do not have an emotional charge and are traditionally used in standard rehabilitation (symbols, numbers, and letters), with a difficulty progression based on computational models of how stimuli properties affect task difficulty (Faria and Bermúdez i Badia, 2015). Further, in this case our population is chronic. Hence, this study presents a novel cognitive training, task progression, tested on a different patient population, and compares the impact of such approach to time matched conventional rehabilitation activities. We hypothesize that rehabilitation with the Reh@Task will result in improved motor and cognitive outcomes when compared to patients in the standard rehabilitation condition.

# MATERIALS AND METHODS

# Experimental Setup and Reh@Task

The setup consists on a PC (OS: Windows 7, CPU: Intel core 2 duo E8235 at 2.80 GHz, RAM: 4 Gb, Graphics: ATI mobility Radeon HD 2600 XT), a PlayStation Eye camera (Sony Computer Entertainment Inc., Tokyo, Japan) and a customized handle with a tracking pattern. The user works on a tabletop, facing a LCD monitor (2400) and moves the handle on the surface of the table with his/her paretic arm (**Figure 1A**). 2D upper limb reaching movements are captured through a camera-based Augmented Reality (AR) pattern tracking software (AnTS)<sup>1</sup> (Mathews et al., 2007). For adapting the task to individual users, the VR scenario has a built-in calibration function that normalizes the motor effort required in the task to the skillset of the user. The movements of the user are then mapped onto the movements of a virtual arm on the VR environment.

The Reh@Task is based on traditional cancelation tests for the training of attention, and has been extended to incorporate numbers, letters and symbols, and the training of memory, and progressive difficulty adjustment according to the evolution of the patient (**Figure 1B**). The task consists on finding target elements within a pool of distractors. In the memory variant, the targets need to be memorized first and are hidden during target selection. The VR cancelation task has incremental difficulty and is adjusted to the individual performance of each user. There is a total of 120 difficulty levels that were defined through a participatory design study, where the input of 20 health professionals was operationalized in quantitative guidelines (Faria and Bermúdez i Badia, 2015). The progression of difficulty is made through the manipulation of the number of targets and distractors, the type of stimulus, the time available to solve the task, the time for selection and, in the memory variant of the task, the amount of time for memorizing the target. These parameters are all operationalized in a way that increases the difficulty of the task incrementally (see Faria et al., 2016b) for further details on the difficulty adjustment algorithm). In summary, for higher difficulty levels, more target and distractor elements appear, less time is available for completing the task and memorizing the target images, and action selection is quicker. When a patient does not solve a specific level in the established timing, more time is given for that level. This additional time can be incremented up to three times. If the user fails three times in a row, he/she goes back to the previous level. If the user succeeds, the level must then be successfully performed within the original established time.

Finally, a rule was defined to select the starting level in each training session according to:

$$\text{StartLevel}\_{t} = \text{StartLevel}\_{t-1} + \left(\text{EndLevel}\_{t-1} - \text{StartLevel}\_{t-1}\right) / 2 \tag{1}$$

where StartLevel and EndLevel denote the starting and finishing levels, respectively, and t indicates the session number. For instance, if the level achieved by a participant in the first session was 28, the second session would start in level 14 (28/2). If in the second session level 44 would be reached, the third session would start in level 29 [14 + (44 − 14)/2], and so on for the following levels.

# Participants

The sample was a convenience sample with a final size of 24 participants recruited at two outpatient rehabilitation units of CMM – Centros Médicos e Reabilitação (Murtosa and Aveiro, Portugal) between June of 2015 and April of 2017. The inclusion criteria were the following: chronic stroke (>6 months); undergoing occupational therapy rehabilitation at CMM; motor impairment of the upper extremity with sufficient observable movement to perform the virtual task, corresponding to a minimum score of 28 in the Motricity Index (MI) (Demeurisse et al., 1980) for elbow flexion and shoulder abduction combined; cognitive deficit but with enough capacity to understand the task and follow instructions, as assessed by the therapists; and able to read and write. Exclusion criteria included: history of premorbid deficits; unilateral spatial neglect assessed through paper-andpencil cancelation tests; severe depressive symptomatology with a score above 20 points in the Geriatric Depression Scale (GDS) (Yesavage et al., 1983); and vision disorders that could interfere with the execution of the task. Thirty-two stroke survivors were included and randomized for participation in this study. Minor deviations from inclusion/exclusion criteria were permitted for two participants, and did not affect the participants' health, wellbeing, and rights (1 participant was

<sup>1</sup>http://neurorehabilitation.m-iti.org/tools/ants

5 months post-stroke; 1 participant had a GSD score of 22). 25 participants completed the protocol, 1 dropped out, and 6 did not fulfill the experimental protocol. One participant was not included in the analysis because this participant was later confirmed to be in the acute stage of stroke (**Figure 2**). Hence, 24 participants (12 in VR group, 12 in Control group) were included in the analysis (**Table 1**). There were no significant differences between groups in demographics, except for age, the control group was significantly older (Mann–Whitney, U = 31.0, p = 0.017). This study was carried out in accordance with established ethical guidelines and was approved by the board of CMM – Centros Médicos e Reabilitação. All participants gave written informed consent in accordance with the Declaration of Helsinki.

# Experimental Protocol

This study followed a between-subjects design. After recruitment and baseline assessment, the participants were randomly assigned to one of two groups (VR or Control) by a researcher not involved in data collection, using the Research Randomizer, a free webbased service that offers instant random sampling and random assignment (Research Randomizer<sup>2</sup> ). Participants in the VR group underwent 12 sessions of 45 min with the Reh@Task, three times a week, for 1 month. Before the first session, participants went through an average of three short training trials with the Reh@Task with TP abstract stimuli. The training was intended to provide a clear understanding of the VR task, as well as to become used to the natural user interface (AnTS). After assuring that the patient understood the task and interface instructions, the intervention started with the attention variant of the task, then switched to memory, and so on intermittently. The control group intervention was time-matched and included twelve sessions of 45 min of standard occupational therapy, spatial and time orientation activities, and writing training. Both interventions were in addition to conventional occupational therapy that typically entails 2–3 weekly sessions of 45–60 min and includes upper limb motricity training, practice of fine motor skills, cognitive-motor training, dexterity training, ADL, normalization of muscle tone, balance training and communication training. Participants underwent motor and cognitive assessment through a number of standardized clinical scales, at baseline, end of treatment and 1-month follow-up.

# Cognitive, Motor, and Functional Assessment

Cognitive and motor scales that are widely applied clinically and in research were used to determine impairment severity and to measure cognitive and motor recovery. The assessor was not blind for the type of intervention. The cognitive profiling was made through the Montreal Cognitive Assessment (MoCA) (Freitas et al., 2011), which provides sub-scores for the following domains: Executive Functions, Naming, Memory, Attention, Language, Abstraction, and Orientation. The attention task-related capabilities were assessed with the Single Letter Cancelation (SLC) (Diller et al., 1974), the Digit Cancelation (DC) (Mohs et al., 1997) and the Bells Test (BT) (Gauthier et al., 1989). Motor deficits were assessed through the upper extremities part of the Fugl-Meyer Assessment Test (FM-UE) (Fugl-Meyer

<sup>2</sup>https://www.randomizer.org/

et al., 1975) for motor and joint functioning of the paretic upper extremity. Of the total score of 66, we also analyzed separately proximal (shoulder, elbow, forearm, coordination, 42/66) and distal (wrist, hand, 24/66) function. For functionality of the paretic upper extremity, the Chedoke Arm and Hand Activity Inventory (Barreca et al., 2004) (CAHAI) was used. MI was used to assess muscle power of the paretic upper extremity. Spasticity was assessed through the Modified Ashworth Scale (MAS) (Bohannon and Smith, 1987). Finally, the Barthel Index (BI) (Mahoney and Barthel, 1965) was used to assess independence in activities of daily living (ADLs).

# Data Analysis

The normality of distributions was assessed using the Kolmogorov–Smirnov test for normality. Because most distributions deviated from normality, non-parametric statistical tests were used. Hence, central tendency and dispersion measures of the variables are presented as median and interquartile range (IQR), respectively. For improvements in clinical scores, we show the mean and standard deviation (SD) for an easier comparison with the literature. Differences between groups in demographic and clinical data at baseline were assessed using a Mann–Whitney U test in interval and ordinal variables, and a Pearson's chi-square (χ 2 ) test in nominal variables. A perprotocol analysis was used. For within-group changes over time across the three evaluation moments (baseline, end of treatment, and follow-up), a Friedman test for related samples was used and reported as χ 2 (degrees of freedom). The Wilcoxon's T matched pairs signed ranks (one-tailed because we predicted improvement over time in both groups) was used for further related pairwise comparisons with respect to baseline. No correction was applied to account for the number of pairwise comparisons, as non-parametric tests are already considered conservative. To compare groups at the end of treatment and follow-up, for each group we computed the improvement with respect to baseline. We used a one-tailed Mann–Whiney U test to test the hypothesis that improvements in the VR group were superior against the control group.

The Reh@Task software logged data on patient task performance (errors, number of targets and distractors, type of stimuli, time to completion) as well as the movement traces of the paretic arm, smoothed using a Gaussian window of 1 second. Performance improvements over time in the VR group were assessed by comparing the performances of each patient at the first and last training sessions. The error rates were computed as a percentage for each type of stimulus during the 12 training sessions. Movement smoothness was computed from the movement traces by counting the number of movement sequences, defined as trajectory segments in-between null acceleration points. To assess improvements in range of movement (ROM) over time, changes in the tracked position of the hand were assumed in the x- and y-axis of the tabletop surface, and the average improvements of the last three sessions were compared against the average of the 3 first sessions. All comparisons were performed using the two-tailed Wilcoxon's T matched pairs signed ranks test.


TABLE 1 | Characteristics of participants.

fpsyg-09-00854 May 28, 2018 Time: 15:52 # 6

Sex: F, female; M, male; Schooling is presented in years; Type of stroke: I, ischemic; H, hemorrhagic; U, unknown; Side of lesion: L, left; R, right.

Effect sizes (r) are reported on the pairwise comparisons and are computed as Z/ √ N (Rosenthal, 1991). The criteria for interpretation of the effect is 0.1 = small, 0.3 = medium, and 0.5 = large. For all statistical tests, a significance level of 5% (α = 0.05) was set. Data were analyzed using Matlab (MathWorks Inc., Natick, MA, United States) and IBM SPSS Statistics for Windows, Version 22.0 (Armonk, NY, United States: IBM Corp).

# RESULTS

# How Effective Is Cognitive Training With Reh@Task as Compared to Conventional Rehabilitation?

The baseline MoCA total scores were balanced between groups (U = 60.5, p = 0.503, r = 0.18), and so were the scores in MoCA subdomains (data not shown). Also balanced were the number of errors in SLC (U = 64.5, p = 0.659, r = 0.09), DC (U = 57.5, p = 0383, r = 0.19), and BT (U = 58.5, p = 0.431, r = 0.16).

The analysis of the scores over time for each group, considering the three evaluation moments (baseline, end of treatment, and follow-up), showed a significant impact on MoCA total score and some of its subdomains in both groups (**Table 2**). Specifically, the VR group displayed a significant effect in MoCA-Total [χ 2 (2) = 8.3, p = 0.016], MoCA-Recall [χ 2 (2) = 6.2, p = 0.046], and MoCA-Orientation [χ 2 (2) = 8.4, p = 0.015]. The control group showed a significant effect in MoCA-Total [χ 2 (2) = 9.1, p = 0.010], MoCA-Language [χ 2 (2) = 6.1, p = 0.047], and MoCA-Recall [χ 2 (2) = 6.1, p = 0.048]. Further pairwise comparisons with respect to baseline indicated that for the MoCA total score, both groups showed a significant improvement at end of treatment [VR: T = 12.5, Z = 1.83, p = 0.034, r = 0.37; Control: T = 3.0, Z = 2.68, p = 0.003, r = 0.55], and followup [VR: T = 2.0, Z = 2.62, p = 0.004, r = 0.53; Control: T = 2.0, Z = 2.77, p = 0.003, r = 0.56]. Mean improvements in MoCA total score at end of treatment were 2.6 ± 4.3 in VR against 3.1 ± 2.8 in Control, and for follow-up 3.4 ± 3.5 in VR against 3.0 ± 3.0 in Control. For MoCA subdomains with significant effects over time, improvements were also significant at end of treatment and follow-up for both groups. For the cancelation tests, the VR group showed a significant effect over time for BT [χ 2 (2) = 6.6, p = 0.037] only. Pairwise comparisons with respect to baseline revealed that this effect comes from a significant improvement at follow-up (T = 2.5, Z = 2.40, p = 0.016, r = 0.49), but not at the end of treatment. The control group showed a significant effect over time for the DC [χ 2 (2) = 11.3, p = 0.004] and BT [χ 2 (2) = 10.5, p = 0.005], with significant improvements at end of treatment and followup. No significant differences were found in the between-groups analysis, when comparing the significant improvements in the


TABLE 2 | Scores in cognitive assessment at baseline, end of treatment and follow-up for VR and control conditions.

Scores are presented as Median (IQR); p, p-value; Friedman test, bold indicates a significant effect (p < 0.05) over time; significant one-tailed pairwise comparison with respect to baseline are indicated with <sup>∗</sup>p < 0.05, ∗∗p < 0.01, respectively.

VR group with those of the control group at end of treatment and follow-up.

# How Effective Is Motor Training With Reh@Task as Compared to Conventional Rehabilitation?

On the scores in motor assessment scales at baseline, the groups were balanced in the CAHAI (U = 43.0, p = 0.093), BI (U = 56.5, p = 0.360), and MAS (U = 54.0, p = 0.281). However, the groups were not balanced in FM-UE (U = 28.5, p = 0.010) and MI (U = 33.0, p = 0.024), with the control group having significantly higher scores in these two scales.

The analysis of the scores over time for each group, considering the three evaluation moments, showed for both groups a significant impact on FM-UE [VR: χ 2 (2) = 12.1, p = 0.002; Control: χ 2 (2) = 11.1, p = 0.004], CAHAI [VR: χ 2 (2) = 7.5, p = 0.023; Control: χ 2 (2) = 11.3, p = 0.004], and MI [VR: χ 2 (2) = 12.0, p = 0.002; Control: χ 2 (2) = 11.3, p = 0.004] (**Table 3**). On the FM-UE arm and hand subparts, both groups showed significant improvements over time for the hand domain [VR: χ 2 (2) = 8.4, p = 0.015; Control: χ 2 (2) = 7.7, p = 0.021], but only the VR group improved significantly in the arm part [VR: χ 2 (2) = 11.1, p = 0.004; Control: χ 2 (2) = 4.7, p = 0.097]. The control group showed an additional significant effect in MAS [χ 2 (2) = 7.6, p = 0.022], indicating a decrease in spasticity. There was no significant effect over time for BI. Further pairwise comparisons with respect to baseline indicated that for the VR group improvements were significant at end of treatment and follow-up in FM-UE [End: T = 0.0, Z = 2.20, p = 0.014, r = 0.45; Follow-up: T = 0.0, Z = 2.37, p = 0.009, r = 0.48], FM-Arm [End: T = 0.0, Z = 2.21, p = 0.013, r = 0.45; Follow-up: T = 0.0, Z = 2.20, p = 0.014, r = 0.45], FM-Hand/wrist [End: T = 0.0, Z = 1.83, p = 0.034, r = 0.37; Follow-up: T = 0.0, Z = 2.03, p = 0.021, r = 0.41], CAHAI [End: T = 0.0, Z = 1.86, p = 0.031, r = 0.40; Follow-up: T = 0.0, Z = 1.89, p = 0.029, r = 0.39], and MI [End: T = 7.5, Z = 1.78, p = 0.037, r = 0.36; Followup: T = 1.0, Z = 2.85, p = 0.002, r = 0.58]. For FM-Arm, the improvement compared to the control group was significantly higher (U = 45.0, p = 0.031, r = 0.38) at end of treatment and marginally significant at follow-up (U = 48.0, p = 0.055, r = 0.33). The control group showed significant improvements at end of treatment and follow-up in FM-UE [End: T = 0.0, Z = 2.03, p = 0.021, r = 0.41; Follow-up: T = 0.0, Z = 2.38, p = 0.008, r = 0.49], FM-Hand/wrist [End: T = 1.0, Z = 1.75, p = 0.040, r = 0.36; Follow-up: T = 0.0, Z = 2.21, p = 0.013, r = 0.45], CAHAI [End: T = 0.0, Z = 2.23, p = 0.013, r = 0.45; Follow-up: T = 0.0, Z = 2.21, p = 0.013, r = 0.45], and MI [End: T = 0.0, Z = 2.04, p = 0.020, r = 0.42; Follow-up: T = 0.0, Z = 2.38 p = 0.009, r = 0.48]. For the MAS, the improvements were only significant at follow-up [End: T = 0.0, Z = 1.41, p = 0.078, r = 0.29; Follow-up: T = 0.0, Z = 2.24, p = 0.012, r = 0.46], corresponding to a median decrease of one grade in this spasticity scale, specifically from 1+ to 1. Besides the significant difference in FM-Arm at end of treatment, no other significant differences were found in the between-groups analysis at end of treatment and follow-up.

The mean improvements with respect to baseline at end of treatment and follow-up in the measures where a significant within-group effect over time was observed are presented in **Table 4**. For the VR and control groups, the observed average improvement in FM-UE was 4.6 ± 6.2 and 2.1 ± 3.6, respectively. This improvement in the VR group mainly comes from the FM-Arm subpart and strongly contrast with what was measured in the control group at end of treatment (3.7 ± 5.1 in VR against 0.8 ± 2.0 in Control, p = 0.031) and follow-up (4.0 ± 5.5 in VR against 0.9 ± 2.1 in Control, p = 0.055). The average improvements in the FM-Hand/wrist subpart, although being significant with respect to baseline, were modest for both groups at end of treatment (0.8 ± 1.4 in VR against 1.3 ± 2.3 in Control) and follow-up (0.9 ± 1.4 in VR against 1.8 ± 2.1 in Control). Also modest were the improvements in the CAHAI for both groups at


TABLE 3 | Scores in motor assessment at baseline, end of treatment and follow-up for VR and control conditions.

Scores are presented as Median (IQR); p, p-value; Friedman test, bold indicates a significant effect (p < 0.05) over time; significant one-tailed pairwise comparison with respect to baseline are indicated with <sup>∗</sup>p < 0.05, ∗∗p < 0.01, respectively.

end of treatment (0.8 ± 1.5 in VR against 2.7 ± 3.1 in Control) and follow-up (1.1 ± 1.8 in VR against 4.3 ± 4.9 in Control). These values are considerably below of what is considered a Minimal Detectable Change (MDC), which should be above 6.3 (Barreca et al., 2005). For the MI, the average improvements were higher in VR when compared to control at end of treatment (4.8 ± 8.3 in VR against 3.9 ± 5.4 in Control) and follow-up (9.1 ± 8.7 in VR against 5.3 ± 5.4 in Control), although not being significantly different.

# Outcomes in Reh@Task Measures

#### Task Performance Measures

The Reh@task data allowed us to quantify the evolution of patients in the VR group over time in between assessment points. Several variables are considered for this analysis: difficulty level achieved during each training session, type of task (memory/attention), and type of stimulus.

When looking at changes over time, we observe that patients improve over time in both task types but display a deceleration as levels of higher difficulty are achieved (**Figure 3**). Patients achieve in average higher difficulty levels in the attention task, display a steeper slope, and exhibit a constant variability over time. In contrast, improvements in the memory task are slower, reaching lower difficulty levels and with increasing variability over time, indicating an uneven increased difficulty of this task in patients when compared to attention. Data show significant improvements in task performance between the first

TABLE 4 | Mean improvement at end of treatment and follow-up.


Improvements are presented as Mean ± SD.

and last sessions [Attention: Z = 2.99, p = 0.003, r = 0.61; Memory: Z = 3.07, p = 0.002, r = 0.63] (**Figure 4**). There were comparable performances in the first session for both attention (M = 35.5 ± 11.3) and memory tasks (M = 30.3 ± 8.2), but the difference is statistically significant in the last training session [Attention: 51.3 ± 8.0, Memory: 43.5 ± 11.9, Z = 2.64, p = 0.008, r = 0.54].

If task performance is analyzed by type of stimulus, distinct performances can be seen (**Figure 5**). An increasing average number of errors is observed for Numbers (6.5%), Letters (10.4%), and Symbols (17.5%), and the difference is significant when comparing symbols and numbers (Z = 2.12, p = 0.034, r = 0.43), showing a continuum of difficulty that is consistent with the level of abstraction of each category. In addition, all categories show a significantly increased error rate when comparing the black stimuli with their colored counterpart [Numbers: Z = 3.06, p = 0.002, r = 0.62; Letters: Z = 2.98, p = 0.003, r = 0.61; Symbols: Z = 2.43, p = 0.015, r = 0.50]. Interestingly, error rates are similar

FIGURE 4 | Task performance changes between the first and last training sessions for the memory and attention tasks in the Reh@Task. The whiskers indicate the most extreme data points that are not considered outliers. ∗∗ indicates p < 0.01.

for colored numbers (25.50%) and for colored symbols (25.48%) despite numbers being easier than symbols when uncoupled with colors. Surprisingly, error rates are significantly lower for colored letters than for colored numbers [Colored Letters: 17.81%, Colored Numbers: 25.50%, Z = 2.12, p = 0.034, r = 0.43].

#### Motor Performance Measures

The analysis arm movement trajectories provide information on both ROM and movement smoothness. The movement smoothness metric assumes that the movement trajectories that are built of less movement segments, that is, with less accelerations and decelerations, are indicative of a more controlled and smooth movement. A comparison of movement smoothness between the first and the last training sessions revealed a very significant decrease in the number of movement segments, indicating longer and smoother trajectories (Z = 2.93, p = 0.003, r = 0.60) (**Figure 6**). Finally, an analysis of the changes in ROM as assessed by the system's calibration at the beginning of each session revealed significant improvements in the x (30.1% of improvement, Z = 2.67, p = 0.008, r = 0.54) component of the movement, but not on the y (**Figure 7**).

# DISCUSSION

We presented a randomized controlled study with a VR cognitive and motor training task, the Reh@Task, consisting on a 1-month intervention with 24 chronic stroke survivors. We compared time-matched training with Reh@Task to standard occupational rehabilitation. During the intervention, all patients underwent conventional occupational therapy; only the VR group had specific training with the Reh@Task. The goal of this study was to investigate the benefits for stroke recovery of an integrative VR approach that combines cognitive and motor training. The main hypothesis behind this approach is that when approaching both motor and cognitive components, the context and situatedness of training impact its ecological validity. For this reason, both motor and cognitive challenges are personalized to each patient and presented as a single motor-cognitive VR task.

Our data show that both groups improved significantly in the motor domain in the FM-UE, CAHAI, and MI. However, in the total FM-UE the improvements in the VR group (4.6–4.9) were on average twice of those for the control group (2.1–2.7). This improvement in VR is superior to the ones observed in previous studies with similar VR paradigms in a chronic population (Cameirão et al., 2012; Maier et al., 2017). A more intensive (20 sessions in 1 month) motor-only intervention resulted on FM-UE improvements of about three points (Cameirão et al., 2012). A combined cognitive-motor approach, where the cognitive domain did not follow an automated adjustment approach but was more intensive (5 weekly sessions of 30 min during 6 weeks), led only to average improvements of less than 2 points in FM-UE (Maier et al., 2017). An analysis of our results in the FM components indicates that the improvement in the FM-Arm is significantly higher in comparison to control. Although both groups address proximal movements, this could be attributed to the nature of the VR task, which focuses on reaching movements. This is in line with other cognitive-motor studies with chronic stroke survivors where the training of hand motor competences in VR resulted in gains on manual abilities (Broeren et al., 2008). Nevertheless, our VR task does not address distal movements and comparable FM-Hand/Wrist improvements with the control group are achieved. These improvements in clinical scales are consistent with the Reh@Task data, that showed significant gains in ROM and movement smoothness. Concerning spasticity as measured by the MAS, we observed a significant reduction of one grade (from 1+ to 1) for the control but not the VR group. This is most likely related to the fact that the control group

underwent more time of conventional occupational therapy, which includes normalization of muscle tone. Nevertheless, it has been argued that the 1+ and 1 grades do not have enough granularity do discriminate changes in spasticity (Pandyan et al., 1999).

Motor improvements did not generalize into clinically meaningful improvements in ADLs as measured by the BI and CAHAI. Considering that our sample is chronic and presents a very high BI and a low CAHAI at baseline, this indicates that these patients have high levels of independence despite their deficits. This suggests that effective strategies have been learned prior to the study that do not involve the paretic arm, leading to learned non-use, commonly observed in chronic populations (Wolf et al., 1989). If this is the case, an effective VR training should also incorporate strategies to address learned non-use (Ballester et al., 2016). This hypothesis is supported by previous results of an intervention with a modified version of the Reh@Task in a subacute population, in which improvements in CAHAI were larger, reaching meaningful values (Cameirão et al., 2017). This is also consistent with data from another integrative cognitive-motor VR study with patients in the 1st month post-stroke, where a mean improvement in BI of ∼20 points was registered (Kim et al., 2011), what strongly contrasts with the average 5 points improvement that we measured in our study with a chronic population.

The impact of both VR and control interventions in cognitive function was significant (3/30 in MoCA) but not different between groups. Still, our results strongly contrast with those obtained using a similar motor and cognitive training paradigm with chronic stroke where improvements in cognitive function where not significant after 6 weeks of training (Maier et al., 2017), despite being a more intensive training with five sessions a week. Both groups in our study showed improvements in total MoCA and recall, which suggests that both interventions had an impact in terms of general cognitive functioning and memory. VR showed an additional improvement in orientation, and the control group in language. The lack of improvements in other sub-domains could be explained by the fact that although MoCA has high sensitivity to detect post-stroke cognitive impairment (Godefroy et al., 2011), it is a screening tool and might have not fully detected the specific cognitive impact of this intervention. Both groups improved in attention as assessed by the cancelation tests. Hence, the VR group had improvements consistent with

the dimensions trained in the Reh@Task, and consistent with the Reh@Task performance data. The performance data during VR training show significant improvements over time in both memory and attention training. The lower performance in the memory tasks is also consistent with the lower recall scores of MoCA at baseline. The analysis of task performance depending on the stimulus used supports the importance of the modeling effort of our personalization algorithm, which automatically adjusts the task configuration (including stimulus type, number of targets, and distractors) to provide an appropriate challenge to the patient.

A prototype version of the Reh@Task, combining attention and arm reaching only, was previously tested with three chronic stroke survivors in a less intensive intervention (Faria et al., 2014). In that pilot study, two patients showed improvements in motor and cognitive function, and in ADLs, indicating the potential of an approach that integrates motor and cognitive training. Later, a different customization of the Reh@Task was used in a controlled study with subacute stroke survivors (Cameirão et al., 2017). The intervention was time-matched to the one being presented here and contrasting results were obtained. In that case, the Reh@Task was configured to also train attention, memory and arm reaching, but pictures of positive valence were used instead. In terms of mean improvements, in the here presented study we observed higher improvements in total FM-UE (4.6–4.9 against 0.3–3.0) and MoCA (2.6–3.4 against −0.9–1.7), and lower improvements in CAHAI (0.8–1.1 against 6.6–11.1). These results are interesting because it would be expected to observe a higher impact of training in the subacute population, but this was not the case. The subacute population improved poorly in both motor and cognitive domains. A factor that could contribute to this result is the fact that the subacute population had higher cognitive deficits at baseline (median 20.0 against 22.5), and it has been suggested that cognitive functioning is associated with upper limb motor recovery (Mullick et al., 2015). Additionally, the subacute population had on average higher depressive symptomology (15.1 against 11.2) and less years of schooling (4.6 against 6.0). Both these factors have been associated with poorer cognitive performance (Zahodne et al., 2015; MacIntosh et al., 2017). However, the subacute population did better in the performance of ADLs as measured by the CAHAI. As previously mentioned these differences could be related to learned non-use that is often observed in chronic stroke patients, that limits the impact of actual rehabilitation gains (Wolf et al., 1989). This highlights the importance of an early use of rehabilitation strategies that prevent learned non-use.

We believe that the presented results are supportive of the viability of low-cost rehabilitation solutions that combine motor and cognitive training, such as the Reh@Task. These solutions show potential to be effective tools to address cognitive training in an integrative manner and can be easily deployed at home or at the clinic. Our data supports a larger impact in motor function than in cognitive function when compared to control. One possible reason could be the limited range of cognitive tasks implemented in Reh@Task that do not encompass all domains needed to be addressed in a comprehensive rehabilitation program. A second reason could be the limited ecological validity of the training tasks. Despite being integrative motorcognitive tasks, these are still far from actual motor-cognitive tasks performed in ADLs. Previous work using VR cognitive training of ADLs in simulated environments like a virtual mall or a virtual city showed translation of competences to real world ADLs (Rand et al., 2009) and improved outcomes when compared to standard cognitive rehabilitation (Faria et al., 2016a). The relevance of such approaches can also be seen in a recent study with chronic stroke survivors that used a VR scenario for motor training based on the execution of virtual ADLs (Adams et al., 2018). After 8 weeks of treatment, a group of 15 patients showed a mean improvement of ∼6 points in FM-UE, which is superior to what we have observed in our study.

Although further research in this area is essential, this work presents a valuable step toward designing more effective rehabilitation technologies that combine motor and cognitive training relying on VR. In fact, the recent Cochrane review on the effect of VR in stroke rehabilitation reports that there are not enough studies to assess the impact of VR in cognitive function (Laver et al., 2017). Hence, we believe that our contribution is relevant to the field. Nevertheless, this study has some limitations that should be considered. First, due to sequential admittance into the study, we used a completely randomized design, resulting in a heterogeneity of groups in age and FM baseline measures. The fact that groups differ in FM may also imply different recovery profiles. Second, although the use of standard of care as control is necessary, this control did not train the exact same competences as the Reh@Task. Third, the use of screening instruments for the assessment of the improvements in cognitive function in this context may lack the sensitivity to capture small improvements in the different domains addressed.

# AUTHOR CONTRIBUTIONS

fpsyg-09-00854 May 28, 2018 Time: 15:52 # 12

AF, MC, and SB defined and designed the research study, analyzed the data, and interpreted the results. JC, JA, and GC ran the intervention and collected the data. All authors revised and approved the current version of the manuscript.

# FUNDING

This work was supported by the European Commission through 303891 RehabNet FP7-PEOPLE-2011-CIG and MACBIOIDI MAC/1.1.b/098; by the Fundação para a Ciência e Tecnologia through UID/EEA/50009/2013; and by the Agência Regional

# REFERENCES


Cengi ˇ c, L., Vuleti ´ c, V., Karli ´ c, M., Dikanovi ´ c, M., and Demarin, V. (2011). Motor ´ and cognitive impairment after stroke. Acta Clin. Croat. 50, 463–467.


para o Desenvolvimento da Investigação, Tecnologia e Inovação (ARDITI) through Madeira 14–20.

# ACKNOWLEDGMENTS

The authors would like to thank Teresa Paulino for her contribution to the technical development of the experimental setup, and Fábio Pereira for his contribution to the difficulty adjustment algorithm. This paper is an extensive update and expansion of a paper presented in 2016 at the 10th International Conference on Disability, Virtual Reality and Associated Technologies, in Los Angeles (Faria et al., 2016b).



and Technology, ed. M. Khosrow-Pour (Hershey, PA: Information Science Reference), 1006–1015.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Faria, Cameirão, Couras, Aguiar, Costa and Bermúdez i Badia. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Learning empathy Through virtual Reality: Multiple Strategies for Training empathy-Related Abilities Using Body Ownership illusions in embodied virtual Reality

*Philippe Bertrand1,2,3\*, Jérôme Guegan2 , Léonore Robieux2 , Cade Andrew McCall4 and Franck Zenasni2 \**

*<sup>1</sup> Frontiers VR Laboratory (CRI Labs), Institut Innovant de Formation par la Recherche, USPC, Centre de Recherches Interdisciplinaires, Paris, France, 2 Laboratoire Adaptations Travail-Individu, Université Paris Descartes – Sorbonne Paris Cité, Institut de psychologie, Paris, France, 3BeAnotherLab Research, BeAnotherLab Association, Barcelona, Spain, 4Department of Psychology, University of York, York, United Kingdom*

#### *Edited by:*

*Mel Slater, Universitat de Barcelona, Spain*

#### *Reviewed by:*

*Bruno Herbelin, École Polytechnique Fédérale de Lausanne, Switzerland Ryan Patrick McMahan, The University of Texas at Dallas, United States*

*\*Correspondence:*

*Philippe Bertrand digitalbertrand@gmail.com; Franck Zenasni franck.zenasni@gmail.com*

#### *Specialty section:*

*This article was submitted to Virtual Environments, a section of the journal Frontiers in Robotics and AI*

*Received: 31 October 2017 Accepted: 05 March 2018 Published: 22 March 2018*

#### *Citation:*

*Bertrand P, Guegan J, Robieux L, McCall CA and Zenasni F (2018) Learning Empathy Through Virtual Reality: Multiple Strategies for Training Empathy-Related Abilities Using Body Ownership Illusions in Embodied Virtual Reality. Front. Robot. AI 5:26. doi: 10.3389/frobt.2018.00026*

Several disciplines have investigated the interconnected empathic abilities behind the proverb "to walk a mile in someone else's shoes" to determine how the presence, and absence, of empathy-related phenomena affect prosocial behavior and intergroup relations. Empathy enables us to learn from others' pain and to know when to offer support. Similarly, virtual reality (VR) appears to allow individuals to step into someone else's shoes, through a perceptual illusion called embodiment, or the body ownership illusion. Considering these perspectives, we propose a theoretical analysis of different mechanisms of empathic practices in order to define a possible framework for the design of empathic training in VR. This is not intended to be an extensive review of all types of practices, but an exploration of empathy and empathy-related phenomena. Empathyrelated training practices are analyzed and categorized. We also identify different variables used by pioneer studies in VR to promote empathy-related responses. Finally, we propose strategies for using embodied VR technology to train specific empathy-related abilities.

Keywords: embodied virtual reality, body ownership illusion, empathy-related, learning, training, prosocial behavior, bias, intergroup

# INTRODUCTION

This work combines studies and reviews from research in cognitive science, psychology, education, medicine, the arts, and virtual reality (VR) to address one specific topic: the potential use of VR for learning empathy-related abilities. The article is divided into three sections which address the following questions: (A) What empathy-related abilities should be enhanced? (B) What are good training strategies to enhance these abilities? (C) What is the best use of VR to enhance these abilities?

In section (A), we will thus explore empathy-related phenomena that can be trained to facilitate healthy and prosocial responses. Therefore, we will highlight strategic abilities to be enhanced (intergroup empathy, compassion, perspective taking, self-regulation) and to be avoided (personal distress). In section (B), we will focus on methods for training empathy without the necessary use of technology. Finally, in section (C), we will explore the potential of VR in promoting empathy, presenting advances in the use of immersive embodied virtual reality (EVR) that has shown efficiency in enhancing empathyrelated capacities and their potential in enhancing empathyrelated training strategies. We will also present one example of an artistic work that makes use of a wide combination of these techniques to address empathy-related experiences outside the context of a lab. To conclude, we will propose a framework that integrates the critical points identified in the three sections above in the design of new learning applications using embodied VR for promoting empathy-related abilities.

# SECTION (A): EMPATHY AND EMPATHY-RELATED ABILITIES

# Definition and Description of Empathy

In recent years, a diverse range of disciplines have investigated the roles played by the presence and absence of empathy and empathy-related phenomena in affecting prosocial behavior and intergroup relations. Empathy can be defined as feeling the same emotion as another observed individual without mixing it with one's own direct experience (de Vignemont and Singer, 2006; Decety and Meyer, 2008; Singer and Lamm, 2009; Decety, 2010). Empathy is deeply related to social bonding and allows one to feel compelled to help another (Decety, 2010). This affective state is produced by the interaction of multiple neural circuits related to motor, cognitive, emotional, motivational and behavioral functions (McCall and Singer, 2013). These different functions are referred to in this article with the broad term empathy-related phenomena. They include perspective taking, affective empathy, empathic distress, empathic concern and altruism, among others. Empathy-related phenomena are crucial for successful social interactions, allowing one to better understand the other, learn from other's actions, and eventually provide help. Therefore, they may help societies to evolve through collaboration (Decety, 2010; McCall and Singer, 2013).

On the one hand, a healthy collective empathic process can help individuals and societies to hold behaviors and cultural beliefs consistent from the moral perspective of maintaining human rights (Decety, 2010). On the other hand, some unhealthy empathic responses may lead individuals to personal distress and burnout (Hojat et al., 2009; Klimecki et al., 2013) and antisocial behavior such as avoidance (Batson et al., 1987; Eisenberg and Fabes, 1990) and unfairness toward outgroup members (Decety, 2010).

Empathic responses emerge at a very young age. Altruistic responses to victims of stress, for example, can be found in babies as young as 12 months (Warneken and Tomasello, 2009). The level of empathy and prosociality increases from 14 to 36 months, with prosociality mainly affected by environmental effects (Knafo et al., 2008). By that time, children develop a better sense of self and other awareness, and throughout childhood and adolescence they develop the emotion regulation abilities (Decety, 2010). As children grow, they develop a more complex use of relational and contextual factors, goals, and beliefs (Harris, 1994), affecting their capability for mature empathic processes (Decety, 2010).

Empathic abilities are profoundly related to familiarity and affiliation to a group. We tend to develop empathy more easily toward the ones who are familiar to us or that we identify as part of our group (Avenanti et al., 2010). In fact, the lack of intergroup empathy is a phenomena deeply embedded in the way our society perceives and interacts with outgroups (Kubota et al., 2012; Eres and Molenberghs, 2013; Amodio, 2014). This lack of empathy is related to negative bias and stereotypes at implicit levels as well as to more explicit forms of racism and aggression (Cosmides et al., 2003).

# Defining Empathic Phenomena Through Their Processes and Expressions

"*Try to walk a mile in another person*'*s shoes.*" This proverb, found in many cultures in the world, suggests a way to help us understand each other better, and relates to several empathy-related responses. Imagine if we try to follow this proverb literally and walk a mile in the shoes of someone in need. First, we would (a) *move our own body, copying* the other person's movements. Then we would (b) *feel distressed* for walking in their place and facing their needs. Doing so, we would (c) *understand* what the other person is going through and would (d) *understand* what they are *feeling*. Also, we would (e) *feel* the emotions that the other feels in their trajectory. After doing so, we may feel the (f) *desire to help this person*. This desire, could (g) *drive us to actually* help the other, even if that action is costly to our self.

In the same order, under a psychological or neuroscientific perspective we can identify the following empathy-related phenomena in this proverb. (a) *Mimicry* is a tendency to synchronize the affective expressions, vocalizations, postures, and movements of another person (Chartrand and Bargh, 1999). (b) *Empathic Distress* is when one is personally distressed by the distress of another person (Batson et al., 1987). (c) *Perspective taking* is the cognitive ability of imagining the perspective of others (Reniers et al., 2011; Myszkowski et al., 2017). (d) *Online simulation* is the ability to predict other people's emotions (Reniers et al., 2011). Perspective taking and online simulation are sub-factors of cognitive empathy (Reniers et al., 2011). (e) *Affective empathy* (or simply "empathy") means experiencing an isomorphic feeling in relation to others with a clear differentiation between self/other, knowing that the origin of the emotion comes from the other (de Vignemont and Singer, 2006; Singer and Lamm, 2009). Therefore, affective empathy is related to the emotional engagement of the observer with the situation of the emoter. (f) *Compassion* is an emotional and motivational state of care for the wellbeing of the other (McCall and Singer, 2013). Finally, (g) *altruism* is characterized by the prosocial behavior of helping others at a cost to the self (de Waal, 2008).

Affective empathy is one specific emotional response that tends to interact with motor, cognitive, and behavioral phenomena (McCall and Singer, 2013). Enhancing abilities in one domain can spillover to benefits in others (McCall and Singer, 2013). For example, empathy increases mimicry (Chartrand and Bargh, 1999). Affective empathy can also be enhanced by cognitive empathy (Batson et al., 2014), which allows individuals to have an accurate understanding of the situation and the feelings of the other (referred to as empathy accuracy; Main et al., 2017). Since empathy creates an isomorphic response to another person's feelings, an empathic response to the distress of others can cause overwhelming distress in the observer. This is defined as empathic distress (Eisenberg and Fabes, 1990) and can lead to "an egoistic motivation to reduce stress by withdrawing from the stressor" (Decety, 2010) and therefore lead to social avoidance. In medical patient relations, for example, empathic distress is related to burnout (Hojat et al., 2009; Klimecki and Singer, 2012; Zenasni et al., 2012). On the other hand, moderate levels of distress may be necessary to drive one to feel empathic concern (Decety, 2010), which is the desire for the wellbeing of others and, therefore, the desire to help. Empathic concern is also described as compassion and relates to a loving-kindness emotion that is not isomorphic in relation to the emoter (Singer and Steinbeis, 2009; Klimecki et al., 2013). This means that the observer does not feel the same as the emoter but is still aware of self-other differentiation (McCall and Singer, 2013). Empathic concern may drive one to altruistic behavior (i.e., to help another person in need even at personal cost) (Batson, 2011; McCall and Singer, 2013).

As we will discuss in the next section, all of these empathic processes are affected by specific moderators relating to awareness of others, awareness of oneself, group identity, motivation, and behavioral affordances.

# Moderators of Empathic Responses

Several factors moderate and even completely block empathic responses. These include psychological processes that preempt empathy, for example perceptions of unfairness (Decety and Cowell, 2015) and dehumanization—associating others as machines, non-human animals or as individuals with no human rights (Bain et al., 2013). In this section, we will discuss different moderators of empathic responses.

# Familiarity, Affiliation, and Similarity Trigger Enhanced Empathy Toward Ingroups and Reduced Empathy Toward Outgroups

We tend to feel greater empathy toward familiar individuals or individuals whom we perceive to be similar to us. This response is part of a general favorability toward similar others (Decety, 2010). This effect causes what can be called enhanced empathy toward ingroup members, due to positive biases toward them (Mathur et al., 2010). Conversely, it can lead individuals to behave unfairly toward outgroup members by comparison (Decety and Cowell, 2015).

The separation between "us" and "them" is not static and can be very subtle. On the one hand, group affiliation and ingroup empathy can be predicted by race (Chiao and Mathur, 2010). Phenotypes—gender, age, skin color—are clearly known to create intergroup barriers. On the other hand, environmental or social factors such as a mixed race group may predict social biases (Van Bavel and Cunningham, 2009) and empathic responses (Weisbuch and Ambady, 2008). For example, individuals from different races may present ingroup responses for others from different races but from the same basketball team (Weisbuch and Ambady, 2008). A classic behavioral experiment showed that football fans primed with the idea of interteam competition were less likely to help an individual in need if he wore a t-shirt of a rival team. Conversely, when primed by the idea of a uniting passion for football, they were more likely to engage in altruistic behavior, even for rival supporters (Levine et al., 2005). Another bonding factor (also used as a strategy to overcome biases) is to consider the observed individual separated from the group, humanizing them. In that way, the observer may present an empathic response to one observed individual, but not to his/her group (Kubota et al., 2012).

# Negative Intergroup Evaluation Block Empathic Response Targeted at Outgroups

Explicit or implicit negative evaluation of an emoter considered an outgroup member can decrease or completely block any empathic response in the observer what can be caused by different constructs, such as negative stereotypes, negative biases and anxiety. These mechanisms can have a very high-speed automatic process that demands low cognitive resources whether the individual is aware or not (Amodio, 2009). Subjects with implicit, but not explicit, bias against outgroup members present reduced sensorimotor resonance to vicarious pain targeted to outgroup members (Avenanti et al., 2010). Stereotypes, biases and anxiety may block one's empathic responses even among individuals with egalitarian values (Correll et al., 2002; Decety, 2010). Categorization of social groups, activation of group stereotypes and use of those stereotypes to form impressions of others is a common practice adopted by social perceivers that can vary in terms of chronic and situational, as well as cognitive and motivational factors, augmenting or reducing stereotypes to form a judgment (Quinn et al., 2003). This categorical thinking based on coded predictions saves cognitive resources on real life interactions, that are usually complex (Macrae and Bodenhausen, 2000). That means that our predictions are shaped by simplified associations present in our long-term memory based on stereotypes, fears, and personal experience (Macrae and Bodenhausen, 2000). Regarding empathy, coded predictions are deeply related to stereotypes (semantic associations) and to biases (visceral categorizations), that can be either negative (when related to outgroups) or positive (when related to ingroup members), respectively, blocking or enhancing one's empathic response toward another (Amodio, 2009). Intergroup interactions are also modulated by the relation of power (resources) and prestige between groups (Fiske et al., 2016) and the perception of competition between groups may also intensify intergroup biases (Esses et al., 2001).

# Perceived Unfairness Blocks Empathic Responses in Men

Perceiving another individual as unfair appears to moderate one's empathic response to them. Neuroimaging studies have shown that empathy-related responses were significantly reduced in males when observing an individual who had behaved unfairly receiving pain (Singer et al., 2006). Instead of feeling the other's pain, this activated an area of the brain related to reward, revealing the pleasure in punishment of unfair others. Curiously, this effect was not observed in women. Even when women disliked one person due to a perceived unfairness they still presented vicarious pain responses.

# Self-Awareness Allows One to Identify Feelings and Their Origins, Self-Affiliation, and Social Values

We refer to self-awareness as a combination of different phenomena related to the perception of one's own emotions, body, and semantic constructs of themselves. First, the awareness of the external origin of one's own vicarious emotions is what differentiates empathy from emotion contagion (de Vignemont and Singer, 2006). Furthermore, individuals vary in their ability to perceive and identify their own emotions and sensory states. For example, alexithymia, has been found to reduce empathic responses (Bird et al., 2010; Bernhardt and Singer, 2012). Second, the plasticity of the perception of the bodily and conceptual self may interact with other modulators such as affiliation, similarity and familiarity enhancing empathy by letting observers to feel more identified with emoters (for a complete review on plasticity of self-perception see Farmer and Maister, 2017).

## Emotion-Regulation: Enables Empathic Concern Instead of Empathic Distress

Emotion regulation can act as a key factor to allow individuals to self-regulate their own stress, in order to direct affective empathy into empathic concern instead of personal distress (Decety, 2010; McCall and Singer, 2013). Emotion-regulation is a complex top-down process that allows individuals to initiate, inhibit or modulate their own emotional state or behavior in response to a given situation. Dispositional differences in the abilities of individuals to regulate their emotions have been shown to relate to differences in the tendency of experiencing empathic concern or distress (Rothbart et al., 1994). For example, children with a greater ability to focus or shift their attention have been found more able to present compassion responses; by modulating their negative vicarious emotions, children can keep their emotional arousal at a moderate level, thereby controlling their distress (Eisenberg and Eggum, 2009).

# Self-Regulation of Behavioral Expressions: Control Inconsistencies (Errors) Between Goals and Biases (as Well as in Conditions Mentioned in Item Negative Intergroup Evaluation Block Empathic Response Targeted at Outgroups)

Implicit biases are likely to emerge even among individuals with egalitarian goals (Devine, 1989; Monteith, 1993; Greenwald et al., 1998; Devine et al., 2002; Amodio et al., 2003; Cunningham et al., 2004). Analysis of the self-regulation of stereotyping suggests that, in these cases, individuals experience guilt and redirect their behavior (Monteith, 1993; Czopp et al., 2006). Furthermore, the motivation to be consistent with egalitarian goals (Bargh et al., 2001; Moskowitz and Ignarri, 2009) can help individuals control non-empathic behaviors. Nevertheless, a great deal of research has shown that successfully suppressing unwanted thoughts or emotions is exceedingly difficult (Wegner and Erber, 1992; Gross and Levenson, 1993; Neil Macrae et al., 1994; Wegner, 1994; Monteith et al., 1998). The same likely holds for the self-regulation of behavior (e.g., Monteith, 1993; Monteith et al., 2002; Monteith and Mark, 2005). These self-regulatory mechanisms are based on two different components: monitoring and operating (e.g., Wegner, 1994). While monitoring operates in a relatively automatic manner and does not require deliberative thinking (Amodio et al., 2004), operating processes to control behavior require high motivation, attention and sufficient cognitive resources, and may not occur in complex situations, distraction, or cognitive load (Gilbert and Hixon, 1991; Spencer et al., 1998). Such situations may lead to a reduction in controlled empathic responses.

# Interpersonal Skills Allow One to Develop Empathic Accuracy

Empathic or non-empathic processes can also be defined by the quality of the interaction between emoter and observer (Main et al., 2017). On the one hand, to be properly recognized by the observer, the stimuli must be clear. That means that emoters with good communication skills and expressivity are more likely to trigger empathic responses (Greenson, 1960; Ickes et al., 1997; Halpern, 2001; Hollan, 2008, 2012). On the other hand, from the perspective of the observer, present attention and openness to understand the empathic stimuli are also needed (Main et al., 2017). Real time tuning to the mental states of others is needed to generate an accurate empathy response.

# Motivation, Power, and Skills Modulate Altruistic Behavior

Even when feeling empathy, individuals may not express empathic behavior such as altruism. Altruistic motives (other-oriented rather than self-oriented motivation) are related to the amount of help one individual may offer to someone in need. Helping behavior is also related to the skills and abilities of individuals facing helping-tasks, as well as the power to offer effectiveness help (Clary and Orenstein, 1991). For a review on different theories of altruism see Feigin et al., 2014.

**Table 1** summarizes highlighted empathic processes, abilities, and modulators discussed in section (A).

Table 1 | Highlighted empathic abilities to be developed: empathy-related dimensions that may be promoted to develop prosocial empathic expressions.


# SECTION (B): EXISTENT STRATEGIES FOR LEARNING EMPATHY-RELATED ABILITIES

# Definition and Description of Potential and Actual Abilities

When discussing for the training of empathic abilities it is important to make a clear distinction between potential and actual skills. In other words, one individual may have the potential to be empathic but not necessarily have the optimal environmental conditions for expressing empathy. This is clearly described by some models such as the Gagné's Differentiated Model of Giftedness and Talent (Gagné, 2013). The main asset of the model is to remember that any potential (the natural ability), whatever its original level, does not necessarily develop spontaneously. It must be underpinned by an appropriate environment, dispositional factors and support. The environmental and intrapersonal factors are called catalysts. In the case of empathy, educational methods should stimulate learners through specific catalysts (e.g., emotionally safe environment, multicultural, collaborative, dynamic, engaging activities to stimulate openness, facilitators to support the learning process) and through the development of specific skills (e.g., perspective taking training, compassion practices, self-regulatory methods, reflexive thinking, social and emotional skills). In this section, we will discuss three methodologies that include catalyst factors within educational contexts and three methodologies that directly train specific empathic abilities.

# Methodologies for Empathy-Related Learning in Educational Contexts Social and Emotional Learning (SEL): A Long-Term Holistic Approach

Research shows that many students lack social–emotional skills and become disengaged as they progress through school, which interferes with academic performance, behavior, and health (Klem and Connell, 2004). In one study from 2006 with 148,189 American students from sixth to twelfth grade, between only 29 and 45% reported having social competencies such as empathy, decision making, and conflict resolution skills; 71% indicated that their school did not provide an encouraging environment (Durlak et al., 2011). To address these topics, the process of SEL was created to enable learners to identify and manage their emotions, motivations, decisions, and social relations (Elias et al., 1997) through self-awareness, self-management, social awareness, relationship skills, and responsible decision making (Collaborative for Academic, Social and Emotional Learning, 2005—casel.org). These skills relate to some of the empathic phenomena discussed in section (A), specifically regarding perspective taking, empathic accuracy, and emotion regulation. A systematic and well implemented SEL program can allow students to learn, model, and practice these skills and apply them to diverse situations such that they become a part of their repertoire of behaviors (Ladd and Mize, 1983; Weissberg et al., 1989). A meta-analysis of SEL practices (Durlak et al., 2011; Taylor et al., 2017) showed that the conceptual model of targeting various social and emotional assets can be associated with significant improvement in students' social and emotional skills, as well as academic performance and less risky behaviors. These empirical findings are in line with the educational literature on how intrapersonal and interpersonal competencies—such as self-regulation, problem solving, and relationship skills—may enhance academic performance and prosocial behavior of students. To have an optimal impact and to offer conditions for students to improve learning, behavior and wellbeing, SEL approaches use two great catalysts: long-term training, and a safe environment in a "whole school approach." The latter engages all members of the school community (management, staff, students, parents, broader community) to work together and to create a safe environment through a sense of belonging and cohesion (Durlak et al., 2011).

# Constructivism: Instrumentalization of Reflexive Thinking

Constructivism combines a series of approaches and methodologies developed to empower students by helping them to learn more than just content, but also to learn how to learn, and to develop cognitively, socially and emotionally (Karagiorgi and Symeou, 2005). It has been shown that constructivist learning environments enhance student's emotional and social abilities, such as self-regulation and perspective taking (Karagiorgi and Symeou, 2005). Some of its strategies are specifically interesting for training perspective taking by offering realistic and plausible stimuli for understanding another's point of view. These representations of reality avoid oversimplification by representing the natural complexity of the world. They present authentic tasks that contextualize rather than providing abstract instruction. They furthermore provide real world, case-based learning environments, rather than predetermined instructional sequences. Altogether, they enable context, and content, dependent knowledge construction (Jonassen, 1994). Constructivism approaches regularly involve cooperative tasks. Educators play the role of facilitators engaged in dialog with students, rather than monologs, which is especially interesting for instrumentalization of reflexivity, letting learners find their own answers, and leading with their misconceptions (Karagiorgi and Symeou, 2005).

# Safe Environment for Positive Intergroup Interactions: Facilitating Positive Connections

To explore the interaction between observer and targeted outgroups can be an effective way to overcome fear and stereotypes. Intergroup contact (Allport, 1955) can improve positive intergroup attitudes when focused on equal status, cooperation, and common goals (Tropp and Page-Gould, 2015). Affective connection to outgroups (liking one outgroup individual) can decrease prejudice by stimulating perceptions of familiarity (Zajonc, 1968, 1980). Friendship with outgroup individuals may also reduce prejudice through prosocial contact (Allport, 1955; Collins and Ashmore, 1970). For example, one quasiexperimental study combined content and intergroup interaction (Rudman et al., 2001) in a 14-week conflict seminar held by an African American male professor. The intervention reduced prejudice and stereotyping among participants, a pattern that was mediated by cognitive factors (the content of the seminar) but also by affective experiences (liking the professor). Similarly, by combining intergroup interaction with perspective taking training (Malhotra and Liyanage, 2005), a 4-day intergroup workshop conducted between Sri Lankan Singhalese and Tamils resulted in enhanced empathy toward outgroup members 1 year after the intervention. Although intergroup interaction may be effective when it leads to appraisal, this approach presents some strong limitations. In real life, individuals tend to interact within their own social group, missing the opportunity to accumulate personal experiences *via* social interactions with outgroup members. Moreover, prejudiced people avoid intergroup contact (Pettigrew, 1998). In fact, obligatory courses in diversity have been reported to enhance racial bias in comparison to control students (Bigler, 1999). After enforced multicultural training, individuals with high external motivation but low internal motivation actually responded in anger to behavioral measures targeting outgroups (Plant and Devine, 1998). Furthermore, adult interventions based on the contact hypothesis (Allport, 1955) rarely diminish bias against outgroups in general, but only improve responses to outgroup members present in these interventions (Hewstone, 1996). Moreover, interventions based in "color-blind" strategies (encouraging individuals to suppress their category-based stereotypes in favor to more personalized judgments) have been shown ineffective (Wolsko et al., 2000), and actually enhance negative bias showing a backfire effect (Schofield, 1986; Wegner, 1994). Together these data suggest that although intergroup interactions may reduce bias, interventions along these lines must account for the motivations of the individual.

**Table 2** summarizes the methodologies for educational contexts reviewed in this section, highlighting respective strategies on promoting catalysts factors and natural abilities.

# Training Methods for Empathic Abilities

In this section, we provide examples of three different mind training methods to enhance abilities related to empathy. These approaches were selected because they offer insights for the development of new training methods using VR.

Table 2 | Highlighted educational approaches for empathy-related learning through environmental stimuli and enhancement of specific empathy-related abilities.


# Role Playing: Enhancing Cognitive Empathy and Emotional Development

Playing the role of a movie character, such as Superman, can be one of the most effective types of play for developing perspective taking (Whitebread et al., 2012). As previously discussed, perspective taking is fundamental for understanding accurately the point of view of another person. Accordingly, perspective taking of an outgroup individual decreases explicit and implicit stereotypes toward the individual and increases positive evaluations toward their group (Galinsky and Mussweiler, 2001). Kwon and Yawkey (2000) also show interesting relations between emotional development and role-playing. Using basic foundations of psychoanalytical and learning theories they discuss how emotional development can be enhanced through role-playing tasks that enhance skills such as interactive levels of expression, control and modeling of emotion, and emotional intelligence. Role playing has also been used in digital games focused on empathy. *Real Lives* allows players to inhabit the lives of individuals around the world. In a quasiexperimental study with high school students in three schools in the USA (Bachen et al., 2012), students who played *Real Lives* as part of their curriculum, expressed more global empathy (observed in their identification with the characters played) and greater interest in learning about other countries.

Besides education, role playing techniques have been used in therapeutic contexts, conflict mediation, restorative justice, and many other fields. In each of these fields, different practices are proposed to help participants visualize events and conflicts from the perspective of others. These practices use physical dynamics (e.g., changing seats with another participant) and narratives related to real life (e.g., reporting on conflict from the protagonist's point of view) to provide multisensory experiences. Successful perspective taking tasks tend to involve more immersive techniques, such as writing an essay about the other's perspective (Todd and Burgmer, 2013) or taking the role of an outgroup member in a computer game (Gutierrez et al., 2014).

# Mindfulness Training: Enhancing Several Empathic Processes

In recent years, mindfulness practices have been getting more attention due to putative therapeutic benefits for depression, anxiety, and chronic pain (e.g., Baer, 2003; Grossman et al., 2004; Galante et al., 2014) and even burnout in the workplace (Krasner et al., 2009). The term mindfulness has been used to describe states, traits, psychological functions, cognitive processes, and different types of meditation practices or intervention programs (Vago and Silbersweig, 2012). In this article, we use mindfulness to describe secular methods of mental training such as Mindfulness-Based Stress Reduction (Kabat-Zinn, 1982), Mindfulness-Based Cognitive Therapy (Segal et al., 2002), and Compassion Cultivation Training (Jazaieri et al., 2013). Each of these methods are driven by different goals, and the article will address specific outcomes relating to empathy-related phenomena, in enhancing natural abilities such as perspective taking and compassion (Klimecki et al., 2013; Hildebrandt et al., 2017) and to address modulators of empathy such as anxiety control and nonjudgmental thinking (Lueke and Gibson, 2014). Mindfulness practices often help the subject to focus their attention to their own breath (Bishop et al., 2004), suggesting that interoceptive awareness may be used as an effective strategy in developing empathy-related abilities. Another common approach in many mindfulness practices is to observe thoughts without suppressing them (Bishop et al., 2004). Mindfulness practices enable individuals to focus their attention on automatic cognitions, such as implicit race biases, and can therefore modulate explicit social judgments and behaviors (Payne, 2005). Many mindfulness approaches consist in daily practices that are intended to generate positive outcomes after months of practice (Hildebrandt et al., 2017), but some studies have shown significant benefits after one single intervention (Klimecki et al., 2013; Lueke and Gibson, 2014). Using implicit association tests (Greenwald et al., 1998), Lueke and Gibson (2014) showed that one single 10 minute intervention of mindfulness—aiming to let subjects simply observe thoughts and events in a nonjudgmental way helped individuals reduce biases against outgroups. Mindfulness practices to develop kindness and compassion have also been associated with lower levels of implicit bias (Kang et al., 2014) and to enhancing altruistic motivations (Condon et al., 2013). Different mindfulness practices appear to have different effects in these domains. Hildebrandt and colleagues (Hildebrandt et al., 2017) analyzed self-reported effects of mindfulness training comparing different types of interventions, focusing, respectively, on present attention, perspective taking, and compassion motivated practices. Results showed that the present attention training was able to significantly increase ratings in self-reported mindfulness, but not improve reports of perspective taking and compassion. Conversely, interventions focusing on perspective taking and compassion enhanced subjects self-reported ratings in all three domains. Specifically, the compassion motivated practice revealed the broadest effects, leading to enhanced abilities of present attention, perspective taking, compassion, and self-compassion, showing the great potential of cultivating compassion mind states to enhance empathy-related abilities. Another study by Klimecki et al. (2013) tested the behavioral and neural effects of one single 6-h training session on compassion-based therapy. The results showed an increase in positive affect, even in response to others' suffering. The phenomenon was observed in brain areas related to positive evaluation (Kringelbach and Berridge, 2009), love (Bartels and Zeki, 2000, 2004; Aron et al., 2005) and affiliation (Vrticka et al., 2008; Strathearn et al., 2009). Taken together, these studies reveal how mindfulness practices can have powerful effects in training empathy-related abilities.

# Implementation of Egalitarian Goals: Enhancing Self-Regulation of Behavioral Expressions

Even when biases and stereotypes initially preempt one's empathic responses, it is possible to upregulate empathy to adopt behaviors in line with internal and social values. At least three different strategies can be implemented in these cases (for review on the strategies listed see Kubota et al., 2012):

• Mental-scanning of immediate responses: self-awareness of non-empathic responses (biases, stereotypes, anxiety) does not require high cognitive resources and can be learned and practiced. Recognizing these responses is the first step in controlling non-empathic expressions. When non-empathic responses are not identified, it is impossible to control them.


Different studies have explored priming methods to implement internal and social goals in order to overcome negative responses to outgroups, changing the attitudes and perceptions of outgroup members. Three interesting approaches are listed below:


Although these methods were found to be efficient in lab conditions, they may not be feasible or effect in everyday life. For example, they may not be effective when the observer is repeatedly exposed to stereotypical and biased information in their everyday environments.

**Table 3** summarizes training methods discussed, highlighting the environmental context in which they can be applied, and the natural abilities involved.

Table 3 | Highlighted training methods for natural empathic abilities and respective environmental contexts where these trainings can be applied.


# SECTION (C): POTENTIAL USES OF EVR FOR TRAINING EMPATHY-RELATED ABILITIES

We next discuss different ways in which science and the arts have used VR in order to explore its potential in promoting empathy.

# Definitions and Description of VR

The term VR has been applied to different technologies with a variety of different characteristics that can be grouped in the following concepts:


# Perceptual Illusions in VR Presence [Place Illusion (PI) and Plausibility Illusion (Psi)] in VR: Being There and Feeling It Is Real

While the concept of "immersion" refers to the physical nature of a system, presence is its subjective correlative (Slater and Sanchez-Vives, 2016). The term presence has been used to convey many alternative meanings (Slater, 2009). In real life, "presence" is the state of being present (Hildebrandt et al., 2017) or is the state of existing in the world and, fundamentally, to have a body. In VR, the term presence is not necessarily related to having a body, but as the feeling of "being there" (Held and Durlach, 1992; Sheridan, 1992). This phenomenon has been referred to by scientists such as Mel Slater as "PI" in order to distinguish it from different concepts and is defined by "the strong illusion of being in a place in spite of the sure knowledge that you are not there." Slater defines Psi as a different concept generally associated with presence. Psi stands for the illusion that the environment exhibited in VR is actually taking place. While PI is constrained by sensorimotor contingencies of the VR system, Psi relates to the credibility of the scenario. In both cases, users know that they are not "there" and that the events are not happening, but they feel as if they are, leading them to adopt behaviors as if they were really inhabiting the virtual environment (Slater and Sanchez-Vives, 2016). The interrelation of presence, engagement, and empathy has been observed in immersive VR experiences that teletransport the user to the environment of one emoter (Schutte and Stilinović, 2017).

# Embodied VR or Full Body Ownership Illusion: Feeling That You Have a Different Body, With Different Traits

Immersive EVR or immersive VR with body ownership illusions (Maselli and Slater, 2013) refers to an adaptation of the technique of the Rubber Hand Illusion (Botvinick and Cohen, 1998) to create full body illusions in VR (Petkova and Ehrsson, 2008; Maselli and Slater, 2013). Using VR, researchers apply multisensory and motor stimuli in synchronicity with the first-person perspective of an avatar—using computer generated imaging (Maselli and Slater, 2013), or the image of real humans through stereoscopic video (Petkova and Ehrsson, 2008). In these studies, the evidence shows that subjects feel that they have swapped bodies with another person (Petkova and Ehrsson, 2008), a plastic mannequin (Petkova and Ehrsson, 2008), a Barbie doll (Van der Hoort et al., 2011), a digital avatar (Maselli and Slater, 2013), an invisible body (Guterstam et al., 2015), and even a body located in the front of them (Lenggenhager et al., 2007). These multisensory stimuli elicit a blurriness in the identity perception of self and other (Paladino et al., 2010) and may even drive participants to present a subjective anxiety to threats targeted at their virtual hand (Zhang and Hommel, 2016). Body Ownership, or the sense of embodiment, is comprised of the sense of self-location, the sense of agency and the sense of body ownership (Kilteni et al., 2012).

The most explored stimuli for inducing embodiment are visuomotor synchronicity, seeing oneself in the body of an avatar that mimics one's movement in real time, and visuotactile synchronicity, seeing tactile stimuli applied to the avatar at the same time that it is applied to the hidden body part of the user (Maselli and Slater, 2013) with the avatar in a congruent posture with the subject (Tsakiris et al., 2007). Visuomotor synchronicity can be applied only to movements of the head, or also to movements of the whole body (Maselli and Slater, 2013), and visuotactile synchronicity can be passive (e.g., being touched) or active (e.g., touching an object) (Tajadura-Jiménez et al., 2013). Maselli and Slater (2013) have shown that a proper combination of stimuli to promote strong embodiment illusions that includes realistic images and wide field of view may not require visuomotor or visuotactile stimulation. In fact, incongruent perceptual cues may not break the embodiment during strong illusions. In one experiment, they showed that an avatar with realistic skin tone placed congruently to the user in immersive VR does not require visuomotor or visuotactile synchronicity to produce the embodiment. Moreover, full body visuomotor synchronicity using the image of an avatar with realistic skin tone can induce Body Ownership Illusion even under asynchronous visuotactile stimulation, which does not occur when the image of the avatar has a non-realistic skin tone (Maselli and Slater, 2013). Other combinations of stimuli, such as congruent full body first-person perspective and visuotactile synchronicity, can also be sufficient to create strong embodiment illusions, even with non-realistic human images (mannequins) and no head movements (Petkova and Ehrsson, 2008).

As well as visuomotor and visuotactile synchronicity, congruent first-person images, and realistic images, there are other variables that may induce or enhance manipulations in the perception of the body. Sound has been shown to alter the perception of the body, even without the use of VR. For example, altering the sound feedback when touching objects may alter the perception of the arm length (Tajadura-Jiménez et al., 2012, 2015a) and/or its strength (Tajadura-Jiménez et al., 2017). Similarly, manipulation of the sound feedback of a hammer hitting the user's virtual hand can make the participants hand feel stiffer and heavier (Senna et al., 2014). Furthermore, manipulations of the sound of someone's steps may also alter the perception of one's own body weight and change the pattern of their gait (Tajadura-Jiménez et al., 2015b). Sound manipulation techniques have also been applied to EVR by changing the frequency of the user's voice feedback to become more similar to the avatar (childlike) causing changes regarding the user's voice recalibration toward the auditory feedback (Tajadura-Jiménez et al., 2017). These experiments show the potential of sound manipulation to enhance EVR experiences. Manipulations of interoceptive signals can also modulate embodied illusions. Evidence suggests that feedback of biosignals such as a heartbeat may enhance embodiment illusions (Suzuki et al., 2013). Synchronous cardiovisual signals increased self-identification and self-location in relation to the subject's virtual body, shifting their perception of touch toward the virtual body (Aspell et al., 2013).

Recent theories of body cognition offer an interesting perspective on the potential nature of these processes. Tsakiris (2017) bases his model on extensive reviews and experiments, suggesting that one's perception about one's own body combines the interaction of bottom up brain phenomena (from the body to the brain) and top down processes (from the brain to the body). In this model, real time information of interoceptive states (such as proprioception, breathing, heart rate, and arousal) and real time information of exteroceptive sensations (such as vision, touch, and taste) inform the brain's predictions of the perception of the body. The regulation between internal bodily states, external environment information, and mental concepts give us the sense of ourselves and the space that surrounds us. Neuroimaging research suggests that a significant prediction error is required to update the predictive internal models of the body matrix (O'Reilly et al., 2013; Riva et al., 2017).

**Table 4** summarizes highlighted concepts related to Body Ownership Illusion.

Table 4 | Highlighted variables and perceptive dimensions of Body Ownership Illusions. Highlighted variables used in different combinations Visuomotor synchronicity of head and/or body, visuotactile synchronicity (active or passive), congruent first person perspective, agency (partial or complete), realistic image, audio, and biosignals feedback

Perceptive dimensions Attribution; self-location; agency

#### Agency Illusions

Although correlated, evidence suggests that Agency and Body Ownership are two different phenomena (Sato and Yasuda, 2005) that can occur in different circumstances. Agency requires voluntary action, while body ownership may occur under both voluntary action and passive events (Tsakiris et al., 2007). The subjective perception of agency over a body part is different from the subjective perception of agency over a physical action, that by being voluntary, involves a combination of efferent (top down) and afferent (bottom up) information (Tsakiris et al., 2007). The rubber hand illusion with visuotactile stimulation is a classic example of Body Ownership Illusion without agency (Botvinick and Cohen, 1998), that actually involves a subjective perception of agency, with affirmative agreements with the sentences "it seemed like I could have moved the rubber hand if I had wanted" and "it seemed like I was in control of the rubber hand" in self-reported questionnaires used to measure the illusion. But this subjective perception of agency decreased, for example, with visuomotor delay, without changing the sense of ownership (Kalckert and Ehrsson, 2012). Even though it is not necessary for inducing Body Ownership Illusions, agency may contribute to the extent that this illusion is felt. Using proprioceptive drift, the perception that the real hand is closer to the displaced rubber hand, Tsakiris et al. (2006) suggest that voluntary movements of body parts induce a global change in proprioceptive awareness. While localized proprioceptive drifts were found in passive stimulation, during active movement of one digit the proprioceptive drift was observed in the whole hand.

More recently, different studies have used the Body Ownership Illusion with visuomotor synchronicity and voluntary movement to induce an illusion of agency over the user actions such as walking (Kokkinara et al., 2016) or speaking (Banakou and Slater, 2014). By embodying a digital avatar that could be controlled by the user's movements, researchers observed self-attribution of agency to subjects over actions taken by the avatar, even without any prior intention, prediction, priming, and cause preceding effect (Banakou and Slater, 2014). In this experiment, the digital avatar would speak independently of the user's action creating not only the perception that subjects were themselves talking, but also changing the fundamental frequency of the user's voice after the experience. This illusion was found to be even stronger when a vibration stimulus was applied to the user's throat in synchronicity with the avatar's voice.

As with the model of body cognition discussed in Section "Embodied VR or Full Body Ownership Illusion: Feeling That You Have a Different Body, with Different Traits," several theories define the sense of agency as a result of the comparison between prediction of efferent and afferent information. For a cognitive and neural perspective of the sense of agency see the review of David et al. (2008). These concepts were implemented in an experiment in which researchers were able to induce the illusion of walking in subjects who were actually seated (Kokkinara et al., 2016) through a combination of priming and body ownership illusion. In this experiment, subjects could see themselves walking while perceiving an optic flow in the environment and a sway movement of the head due to the walking motion. Subjects presented high levels of body ownership in self-report questionnaires and based on physiological data.

Taken together, these findings suggest that Agency Illusions could be combined with Body Ownership Illusions to create experiences in which the avatars perform prosocial actions that could be perceived as voluntary actions of the subjects themselves. Even so, this hypothesis is yet to be investigated before being implemented into empathy-related training.

# Using Perceptual Illusions to Promote Empathy-Related Abilities

Since the 2000s VR has been used to study perspective taking (Gaunet et al., 2001; Lambrey et al., 2012). VR allows users to move their perspectives to different scenarios and universes. One can furthermore play different roles from the perspective of different avatars. The ability of immersive VR to displace the first-person point of view relates directly to perspective taking and role playing. Experiences intended to promote empathyrelated abilities have been developed in both types of VR. Some iconic examples are the non-immersive VR Game Real Lives (Bachen et al., 2012), the immersive VR 360° video projects from the United Nations, *Clouds Over Sidra* (Schutte and Stilinović, 2017), and The New York Times project, *The Displaced* (Sirkkunen et al., 2016). In these last two examples, presence has been correlated to positive empathic responses, showing the power of this media to engage users and attract their full attention on the stories of other individuals. Although clearly powerful, these examples place the viewer in the third person perspective. As the previous pages suggest, VR can amplify these effects by using its full potential to place users in the first-person perspective of the other. Being in the first-person perspective of an avatar that moves in synchronicity with the user may help participants to overcome cognitive loads increasing their memory performance after the VR experience (Steed et al., 2017). Tentative evidence of the positive effect in embodiment and presence of having the first-person perspective of an avatar have also been observed in experiments in uncontrolled settings conducted remotely through an APP through the Internet (Steed et al., 2016). Having an avatar also places participants into the center of the experience (de la Peña et al., 2010). For more examples of immersive VR approaches without the use of embodiment, see the review of Hardee and McMahan (2017) on immersive journalism.

Experiences of EVR allow users to literally step into the shoes of others and see the world from their perspective. Research on EVR has explored how manipulations of the senses can be used to modulate empathic responses. Experiences of stepping into the shoes of outgroup members have shown significant plasticity of empathic abilities even after the experience by decreasing implicit racial biases (Peck et al., 2013) and increasing of mimicry of outgroup members (Hasler et al., 2017). EVR may also affect an individual's self-concept and behavior *via* the traits (positive or negative) of the characters represented in their avatars (Yee and Bailenson, 2007). For example, subjects who embodied a superhero increased their altruistic intentions more than subjects that have embodied a super villain (Rosenberg et al., 2013).

# Multisensory Perspective Taking of Outgroup Members: Affecting Bias, Mimicry, Perception of Similarity, and Emotion

Some experiments have used EVR to allow participants to step into the shoes of outgroup members. Peck et al. (2013) conducted a study in which subjects with light skin could see themselves in a dark-skinned avatar. The manipulation decreased negative implicit associations toward black individuals immediately after the experiment. A similar setting was conducted by Banakou et al. (2016) showing that a decrease of implicit bias was sustained even 1 week after the intervention. In another experiment (Hasler et al., 2017), levels of implicit biases did not change, but the intervention nevertheless increased mimicry between a subject with white skin embodying an avatar with dark skin and another digital character with dark skin.

In one non-experimental setting, de la Peña et al. (2010) used the concepts of multisensory perspective to allow participants to step into the shoes of a character confined in a Guantanamo Bay. Visuomotor synchronicity of the user's head, binaural audio of the environment and haptic feedback of breathing were provided to create the illusion of being in the stress position of the prisoners (a position described in reports on prisoner treatment). Although no scientific data were collected, participants reported feeling anxiety and discomfort and expressed an emotional connection with the situation of the prisoners.

Using a different approach and without immersive interfaces, researchers used a different technique called "enfacement," that stimulates a mirror touch synesthesia (Fini et al., 2013). Participants see the image of faces of different avatars on screen being stroked by a brush. Mimicking the tactile stimuli provided in the avatar, researchers stroked the user's face with an identical brush, in synchrony with the image observed. After seeing the image of avatars with different phenotypes, subjects revealed a greater self-identification with more diverse phenotypic characteristics. Using enfacement illusions with EEG measurements, Serino et al. (2015) observed activation of face-specific regions correlating to the increased identification with the avatar's face.

## Manipulation of Interoceptive Signals: Affecting Emotion Regulation

In a recent experiment without VR (Azevedo et al., 2017a,b), researchers showed that tactile feedback of a slow heartbeat-like rhythm can make subjects more relaxed before performing a stressful task such as speaking in public, showing its potential implications in emotion regulation (an important ability for converting empathic distress into empathic concern). As mentioned in Section "Embodied VR or Full Body Ownership Illusion: Feeling That You Have a Different Body, with Different Traits," interoceptive signals can interfere in embodiment. It has also been shown that interoceptive signals such as heartbeat can correspond to bias behavior (Azevedo et al., 2017a,b) and that awareness of heartbeat signals relate in significant part to cognitive-affective processing (Dunn et al., 2010). These findings reveal a great potential for integrating biosignal manipulations to help participants to control their anxiety when faced with the stress of others, to interfere in anxiety triggered by an outgroup threat and to interfere in emotional processing such as affective empathy.

# Proteus Effect: Affecting Stereotypes, Self-Perception, and Behavior

In virtual environments, digital self-representations (i.e., avatars) may influence users and lead their behaviors to be consistent with the avatar's appearance. This behavioral modulation, known as the Proteus Effect (Yee and Bailenson, 2007; Yee et al., 2009), has been observed in several studies. In their seminal work, Yee and Bailenson (2007) have shown that attractive avatars lead to a more intimate behavior with a confederate in terms of self-disclosure and interpersonal distance. In a second study, they also observed that tall avatars lead to more confident behavior than short avatars in a negotiation task. More recent studies have also shown that the appearance of the embodied avatars could influence attitudes, beliefs (Fox et al., 2013) and actions of the users (Peña et al., 2009; Guegan et al., 2016). For instance, it has been shown in nonimmersive VR environments that the use of an avatar resembling a member of the Ku Klux Klan activates more negative thoughts and leads users to participate more aggressively (e.g., murder, vengeance) in the stories created (Peña et al., 2009). Other studies have shown how the Proteus Effect may enhance stereotypes against outgroup members. It has also been shown that users that play a black avatar in a computer game present greater aggressive cognition and affect (Eastin et al., 2009; Ash, 2016).

From a theoretical point of view, the Proteus effect is based on self-perception principles (Bem, 1972), under which the individual explains his attitudes and internal states based on observation of external cues. In this way, the profile of the avatar could lead the user to make implicit inferences about his/her personal dispositions (e.g., I am an empathic person). The influence of avatars is also compatible with the priming process (Peña et al., 2009), which refers to "the incidental activation of knowledge structures, such as trait concepts and stereotypes, by the current situational context" (Bargh et al., 1996). For instance, perceiving the characteristics of an avatar nurse or a humanitarian worker could activate some related concepts (e.g., altruism, empathy) as well as inhibit more antithetical concepts such as aggression or violence. Moreover, these situational cues may lead to behavioral assimilation *via* an increase in the likelihood of behaviors congruent with the primed concept. Whatever the underlying mechanism, and to the best of our knowledge, no study has directly investigated the links between the Proteus Effect and the empathic processes. However, previous work shows that the digital self-representations can influence behavior in a pro or antisocial way. For example, the embodiment of an avatar resembling an inventor led engineering students to show higher creative fluency and originality of ideas during a face-to-face brainstorming session conducted after immersion in a virtual environment (Guegan et al., 2016). Another example shows that embodying a casually dressed black avatar enhances users' performance in playing drums in comparison to embodying a formally dressed white avatar (Kilteni et al., 2013), probably due to the positive stereotypic association of black individuals and rhythm. In another experiment, it was demonstrated that embodying a Sigmund Freud-like avatar talking to a scanned version of their own body can help users improve their mood after selfcounseling, in comparison to self-counseling in a self-representing avatar (Osimo et al., 2015). Another case of positive stereotyping showed that embodying a superhero who helps the population of a virtual city led to increased prosocial behavior in an offline interaction (Rosenberg et al., 2013). Conversely, using a villainous avatar (Voldemort) led both to an increase in antisocial behavior and to a significant decrease in prosocial behavior (Yoon and Vargas, 2014).

It has been shown that the Proteus effect is mediated by the level of embodiment felt by users in relation to their avatar (Ash, 2016), suggesting that EVR can enhance this effect. Given this set of findings, one might expect that training methods using avatars designed and pretested to improve empathy would induce beneficial behavioral changes, improve positive perceptions among users, and so on. Another possible implication of this concept is to embody a digital avatar of an outgroup member that presents traits that contradict stereotypes. These hypotheses have yet to be tested.

**Table 5** summarizes highlighted strategies that can be applied using VR and Body Ownership Illusion for empathy learning and their potential effects.

# The Machine To Be Another: An Artistic Exploration of EVR Methods for Learning Empathy

To extend the examples beyond scientific fields, the article will briefly describe one artistic system in VR, designed to promote empathy-related behavior, called The Machine to Be Another (TMTBA; Bertrand et al., 2014; Sutherland, 2014; Oliveira et al., 2016), created by one of the authors of this article together with the interdisciplinary collective BeAnotherLab. Although there are no scientific results of these experiments in promoting empathy, the system adopts interesting approaches to address its goals. Inspired by embodiment studies, TMTBA allows users to see themselves in the body of real human beings (captured by video) instead of using computer generated images. The group uses different technological sets, the most famous being "Body

#### Table 5 | Highlighted concepts of embodied virtual reality that can be used for empathy-related training.


Swap." In this installation, two users swap perspectives using VR headsets and first-person cameras, while being instructed to move slowly and to collaborate with each other in order to synchronize their movements while the artists provide physical interactions to stimulate touch. Besides swapping perspectives (under visuomotor and visuotactile synchronicity) the Body Swap is used to present real narratives from different individuals acting as performers. Over a 5-year period, the group has presented several performances from individuals such as asylum seekers in a detention center in Israel, an Iraq veteran in the USA, an African migrant in Spain and victims of police brutality in Brazil. These narratives are created by performers themselves, drawing on their personal views on subjects—from social stigma to stories of forgiveness. The use of real people with real stories, is possibly the main conceptual difference of this artistic work to other lab studies. After 10 min of physical interaction—exploring movements of their hands, arms and legs, as well as interacting with physical objects and with mirrors—they are placed face to face, similar to Petkova and Ehrsson's experiment (Petkova and Ehrsson, 2008) in which one person can shake hands with their own selves, from the perspective of another individual. Another interesting aspect of the system is that it allows the performer and user to meet physically, immediately after the VR experiment, something that would not have taken place in the everyday life. TMTBA has been broadly presented in over 25 countries in artistic, cultural, and academic contexts, and is used as a tool to promote mutual understanding. These presentations have taken experiments based on concepts of Body Ownership illusion to a wide range of audiences, enabling them to experience the perspective of another real human being in the routine of their lives. Designed with a low-budget, this system has several limitations in comparison to other lab studies of embodiment, such as: the use of a monoscopic camera, constrained field of view, and lower resolution due to the quality of the hardware used (initially Oculus Rift Dk2). Even so, the group has collected anecdotal evidence from user's statements, many of which have been reported in the press (EL PAIS\*; TV GloboNews\*\*; The Verge\*\*\*) (Souppouris, 2014; María, 2016; Cristina, 2017). In users' statements, subjects report concern toward performers, pointing to the potential use of this type of experience as an immersive media for social interaction. In order to clarify the effectiveness of these systems and protocols, controlled studies have yet to be developed.

**Figures 1** and **2** document a workshop held by Beanotherlab in a detention center for asylum seekers in Israel with functional diagrams of the different interactive modes of the body swap experience.

# DISCUSSION

This review article has brought an interdisciplinary perspective to promote insights on how to use VR for training empathic skills. It offers a guide for highlighted concepts, educational practices and VR techniques that can be used in empathy-related learning. The referred literature is recommended for deeper understanding each of these complex topics.

The collaborative aspects of constructivist approach could also offer possibilities of interaction with outgroup characters in VR. The constructivist focus on building self-reflexivity could also

Figure 1 | Functional diagram of two users swapping bodies through the system The Machine to Be Another. In this interactive mode, both users have to mirror each other in order to move in syncrony. Picture from workshop held by BeAnotherLab in 2015 at detention center for asylum seekers in Israel.

be incorporated into the idea of mapping and controlling coded predictions. By inviting learners to understand their own misconceptions, strategies of non-stereotypic information could be used as part of the process of the intervention. This training method could therefore be focused on provoking intergroup encounters (in VR and real life) in a series of interventions using EVR, in which subjects can experience in EVR the perspective of an outgroup (multisensory role playing of an outgroup). Explorations of Proteus Effect could be applied through non-stereotypical information revealing more of the context and experiences of the other (individuation). Subjects could then be exposed to real life situations (situated learning) of prejudices faced by outgroup members. This could offer subjects a better idea of the challenges faced by stigmatized outgroups (familiarity). Moreover, in theory we could induce empathy through an empathic personality of the avatar that could demonstrate compassionate discourse (compassionate avatar) and engage in altruistic behavior (altruistic avatar). In order to enhance the experience, manipulations of biofeedback information such as heartbeat and breathing (interoceptive manipulation) could be used to help subjects control anxiety (emotion regulation of distress). By promoting the perception of presence (place and Psi), the experience would likely raise the subject's awareness to the events of the simulation, possibly enhancing their capabilities of self-reflection. A similar methodology could be used to place subjects in different situations under the perspective of one ingroup member, or an avatar similar to themselves. These situations could explore, for example, a task in which they must collaborate with an outgroup member (intergroup collaboration and familiarity), or where they would be helped by an outgroup member (non-stereotypic information). It could also explore the perspective of one observer facing an intergroup interaction between one dominant and one stigmatized individual, in which the stigmatized individual has something in common with them (e.g., a T-shirt of their football team), or having the possibility to help or being induced to help (altruistic agency illusion).

These are some ideas of experiments that would apply existing knowledge to concrete training methods. Each of these examples raises several research questions:


These are just a very few questions that demonstrate a fertile universe for research and for integration of EVR with training methods for empathy-related abilities. With the current democratization of VR devices, the use of EVR has become more accessible, making it possible to develop EVR training methods that can be implemented outside the lab, in contexts such as educational, cultural and artistic environments. Research on these techniques could open doors for the design of new learning tools that, if effective, could have a wide effect in promoting a more empathic society.

# RECOMMENDED STRATEGIES FOR NEW LEARNING APPLICATIONS OF EMPATHY-RELATED ABILITIES IN EVR: PROPOSED EQUALIZING MODEL AND FRAMEWORK FOR EMPATHY LEARNING

In this section, we will discuss how to integrate the content summarized in this article in the design of EVR based empathy training programs.

Table 6 | Framework for equalize empathic processes and expressions through learning methods and embodied virtual reality, based on highlighted concepts and practices.

(1) What is the relationship between emoter and observer?

*Abilities:* intergroup openness; reflexive thinking; social skills; conflict management. *Catalysts:* Long-term training, safe environment collaborative dynamics; engaging voluntary activities

*Modulators:* increase of familiarity, affiliation, similarity with outgroup members; decrease of bias, stereotypes, coded predictions, categorical thinking against outgroup members; enhancement of egalitarian goals and self-analysis of-group fairness related to outgroup members

*Learning methods:* Constructivism and SEL for instrumentalization of reflexive thinking and social skills; implementation of egalitarian goals: repetitive priming for non-stereotypical association, individuation and negation of stereotype; mindfulness training for practice of non-judgmental thinking

*EVR methods:* intergroup embodiment for enhancing selfother similarity; proteus effect with non-stereotypical avatar

(2) How developed is the self-awareness of the observer?

*Abilities:* bodily, emotional, cognitive, and social self-awareness. *Catalysts:* Educators as facilitators

*Modulation:* self-other distinction; emotion recognition; egalitarian internal, and social goals


(3) How developed are the empathic abilities of the observer toward the emoter?

*Abilities:* affective empathy; cognitive empathy; empathy accuracy; empathic distress moderation; compassion; altruism; problem solving. *Catalysts:* Real world case based and contextual knowledge

*Moderators:* emotional engagement; perspective taking; online simulation; dialog skills; present attention; loving-kindness; motivation, power and skills for helping; self-regulation of behavioral expressions


*Obs.: highlighted abilities, moderators, and methods may interconnect and overlap.*

Figure 2 | Functional diagram of performance using the The Machine to Be Another. Picture from workshop held by BeAnotherLab in 2015 at detention center for asylum seekers in Holot (Israel), presenting the narrative of Drhassn steib.

As it was demonstrated, empathy-related responses are result of a complex phenomenon that involves different intergroup, interpersonal and intrapersonal processes and mechanisms. This means that there is no single recipe for empathy development and that several variables in the social environment of the interaction may interfere in what is the most appropriate ability that needs to be developed. Therefore, the first step we recommend is one analyses of all factors related to the social environment of the interaction aiming to stimulate optimal empathic processes: positive intergroup interaction and evaluation, awareness of the other, awareness of the self, empathic concern and altruist behavior. These processes can be analyzed guided by the following questions: (1) What is the relationship between emoter and observer? (2) How developed is the self-awareness of the observer? (3) How developed are the empathic abilities of the observer toward the emoter?

To help to identify what type of ability needs to be enhanced, we also propose a framework that relates each of these questions to a list of relevant abilities, catalysts, and moderators as well as effective learning methods and EVR strategies that can be used. Even so, abilities, catalysts, moderators, methods, and EVR strategies may interconnect beyond this division, spilling over effect into other dimensions.

**Table 6** presents a framework with the most relevant concepts discussed in this article, guiding the design of learning systems within all domains of the equalizer (social interaction context, variables, and enhancer).

After a contextual analysis based on this framework, it will be possible to define the most relevant empathic abilities, catalysts factors, and moderators that can act as important variables of the empathic process in one given situation, and therefore, to define the structure of the learning system. Once the variables that need

# REFERENCES


to be addressed are clear, it is possible to choose relevant enhancers of these variables: training methods and EVR strategies. These enhancers will interfere in one another, what also must be calibrated focusing the structure of the learning system defined in the previous steps. Following these steps will allow the design of learning systems that make an efficient use of relevant EVR strategies.

Training methods can combine several abilities, modulators, catalysts, practices and EVR strategies aimed at promoting an optimal empathic response in subjects. Through this framework, we expect to offer educators insights into different strategies that could be adopted to help learners to develop skills for building a world of tolerance and mutual understanding.

# AUTHOR CONTRIBUTIONS

PB developed conception of the structure, performed search for references, analyzed and edited data, wrote the first draft, and contributed to the development of the manuscript till last editions; JG wrote the section on Proteus Effect; LR contributed to references and revision of sections regarding Empathy and Training Methods; CM contributed to the section on Empathy offering references and critical revision under the perspective of social neuroscience, as well as contributing to the critical revision of the whole article; FZ contributed to the whole process writing sections, providing references, and performing critical revision and final edition.

# FUNDING

The first author benefits from a PhD grant from IDEFI (ANR 2012 IDEFI 04 IIFR).


Bartels, A., and Zeki, S. (2000). The neural basis of romantic love. *Neuroreport* 11, 3829–3834. doi:10.1097/00001756-200011270-00046


and White faces. *Psychol. Sci.* 15, 806–813.15563325. doi:10.1111/j.0956-7976. 2004.00760.x


**Conflict of Interest Statement:** The first author is also cocreator of one art/research work mentioned in the article—The Machine to Be Another—and cofounder of the non-for-profit cultural association BeAnotherLab that presents this work in several contexts. All other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Bertrand, Guegan, Robieux, McCall and Zenasni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# On the Determinants and Outcomes of Passion for Playing Pokémon Go

Gábor Orosz 1,2 \* † , Ágnes Zsila1,3†, Robert J. Vallerand<sup>4</sup> and Beáta Böthe1,3

1 Institute of Psychology, Eötvös Loránd University (ELTE), Budapest, Hungary, <sup>2</sup> Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary, <sup>3</sup> Doctoral School of Psychology, Eötvös Loránd University (ELTE), Budapest, Hungary, <sup>4</sup> Laboratoire de Recherche sur le Comportement Social, Université du Québec à Montréal, Montreal, QC, Canada

In 2016, Pokémon Go became the most popular smartphone game. Despite the increasing popularity of this augmented reality game, to date, no studies have investigated passion for playing Pokémon Go. On the theoretical basis of the Dualistic Model of Passion (DMP), our goal was to investigate the associations between Pokémon Go playing motives, passion, and impulsivity. A total of 621 Pokémon Go players participated in the study (54.9% female; Mage = 22.6 years, SDage = 4.4). It was found that impulsivity was more strongly associated with obsessive passion (OP) than with harmonious passion (HP). HP was associated with adaptive motives (i.e., outdoor activity, social, recreation, and nostalgia), while OP was associated with less adaptive motives (i.e., fantasy, escape, boredom, competition, and coping). Therefore, in line with the DMP, HP and OP for playing Pokémon Go can predict an almost perfectly distinguished set of adaptive or maladaptive playing motives, and OP has a noteworthy relationship with impulsivity as a determinant.

Keywords: gaming motives, harmonious passion, obsessive passion, Pokémon Go, impulsivity, structural equation modeling

# INTRODUCTION

# The Pokémon Go Phenomenon: History and Playing Motives

Pokémon Go has become an increasingly popular augmented reality game, particularly among youth (Dorward et al., 2017; Kamel Boulos et al., 2017). After the first release of Pokémon Go in July 2016, 21 million active players engaged in the world of Pokémon within 1 week, and this game has become the most popular smartphone application, beating the most frequently visited social networking sites such as Facebook, Twitter, or Instagram (Dorward et al., 2017).

Pokémon Go is a massively multiplayer online role-playing game (MMORPG) in which players can find and capture virtual Pokémon species in their real environment (Kamel Boulos et al., 2017). The captured species are added to the player's "Pokédex," a catalog of caught Pokémon. Captured Pokémon can be trained and evolved into stronger forms, and players can challenge others to gym battles (Dorward et al., 2017). The world of "Pocket Monsters" was first introduced to Game Boy players in the mid-1990s. The original story was adapted into an anime series later which attracted millions of young viewers in the 2000s (Katsuno and Maret, 2004). The Pokémon Go application is the latest media product of the Pokémon franchise (Dorward et al., 2017).

Due to the massive success of Pokémon Go, there has been a considerable research interest into the positive and negative sites of usage in terms of physical and mental health (e.g., Althoff et al., 2016; Ayers et al., 2016; Tateno et al., 2016). While a number

#### Edited by:

Mel Slater, Universitat de Barcelona, Spain

#### Reviewed by:

Pascual Gonzalez, Universidad de Castilla-La Mancha, Spain M.-Carmen Juan, Universitat Politècnica de València, Spain

\*Correspondence:

Gábor Orosz orosz.gabor@ppk.elte.hu

†These authors have contributed equally to this work and co-first authors.

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 30 October 2017 Accepted: 26 February 2018 Published: 15 March 2018

#### Citation:

Orosz G, Zsila Á, Vallerand RJ and Böthe B (2018) On the Determinants and Outcomes of Passion for Playing Pokémon Go. Front. Psychol. 9:316. doi: 10.3389/fpsyg.2018.00316

**187**

of studies found associations between Pokémon Go playing activity and several physical and mental health benefits such as reducing sedentary behavior (Nigg et al., 2017), promoting health behaviors (e.g., exercising or walking) (Kaczmarek et al., 2017), and decreasing social withdrawal and anxiety relating to social interactions (Tateno et al., 2016; Kogan et al., 2017), other studies pointed out to a few concerns related to the inappropriate use of the game such as distracting drivers (Ayers et al., 2016) or being lost in unexplored areas (Dorward et al., 2017). Furthermore, it was found that Pokémon Go players reported higher psychological distress than those workers who had not played Pokémon Go (Watanabe et al., 2017).

In sum, previous research points out to relevant behavioral changes in Pokémon Go players as they incorporated playing into their daily routines and spent considerable amount of time outside, discovering remote areas as well as getting a deeper knowledge of their natural environment (Dorward et al., 2017). Contrary to prior expectations about the quick decay in the popularity of this game, there are still millions of active players worldwide of this game and updates (e.g., Pokémon Go Plus, released in January 2017). In light of the important number of Pokémon players worldwide, better understanding of the associations between the use patterns of this augmented reality game and players' psychological characteristics is still needed.

Prior studies investigated the motives for playing online games (e.g., Yee, 2006; Fuster et al., 2014), whereas Demetrovics et al. (2011) offered an integrated model comprising seven dimensions: social (i.e., playing with others), escapism (i.e., escaping reality), competition (i.e., defeating others), coping (i.e., coping with real-life problems), skill development (i.e., improving skills), fantasy (i.e., immersing in another world), and recreation (i.e., relaxing) motives. Zsila et al. (2017) extended this model with three Pokémon Go-specific motives: outdoor activity (i.e., playing outside), nostalgia (i.e., reliving old memories), and boredom (i.e., passing time playing). Another study by Yang and Liu (2017) identified seven Pokémon Go playing motives: fun, friendship maintenance, relationship initiation, exercise, achievement, escapism, and nostalgia.

Among these motives, we can identify adaptive and maladaptive ones. For instance, escaping reality and competition were identified as a strong predictor of problematic online gaming (Király et al., 2015). Furthermore, achievement and social motives were found to be related to psychological wellbeing (Fuster et al., 2014). However, the adaptive vs. maladaptive role of nostalgia is less evident. According to Routledge et al. (2013), nostalgia is related to elevated psychological health and well-being and it promotes adaptive psychological functioning. However, in a recent, gaming-related study, it was weakly and positively associated with loneliness (Yang and Liu, 2017). Increased physical activity associated with Pokémon Go playing was also found to be related to psychological and physical wellbeing (Althoff et al., 2016; Howe et al., 2016; Kamboj and Krishna, 2017).

Orosz et al. (2016) suggested that the quality of engagement in different screen-based activities—such as playing Pokémon Go—can be measured and distinguished by the two forms of passion. The adaptive and maladaptive nature of the engagement or motives toward online and screen-based activities can largely depend on the respective type of passion for the given activity.

# The Dualistic Model of Passion

According to Vallerand (2010, 2015), passion refers to the engagement in a self-defining activity that one loves, finds important, and invests considerable amount of time and energy in it. The Dualistic Model of Passion (DMP) distinguishes between harmonious passion (HP) and obsessive passion (OP). HP is an adaptive form of engagement in an activity, since the person is able to maintain coherence between his/her preferred activity and other life activities. In contrast, OP is associated with the lack of control over a particular activity, leading to a rigid involvement and conflict between the self and other daily life activities (Vallerand et al., 2003; Vallerand, 2010). The present study focuses on the relationship between Pokémon Go playing motives and impulsivity within the empirically well-established DMP theoretical framework (for review see Vallerand, 2010, 2015; Curran et al., 2015).

In gaming literature, Wang and Chu (2007) found that OP was related to problematic gaming, whereas HP was unrelated to it. It was also found that HP for massively multiplayer online games predicted adaptive outcomes such as positive affect and vitality, while OP was associated with low level of need satisfaction, negative affect, excessive gaming, over-engagement, and physical and psychological addiction-like symptoms (Lafreniere et al., 2009; Przybylski et al., 2009; Stoeber et al., 2011; Orosz et al., 2016). Thus, based on previous findings in the field of online gaming, we expect that HP would be related to adaptive motives for playing augmented reality games such as Pokémon Go, whereas OP would be related to maladaptive motives.

# Impulsivity

The personality-related determinants of passion have not yet been investigated extensively in prior research. The relevant studies mainly focused on social determinants (e.g., Mageau et al., 2009; Bonneville-Roussy et al., 2011; Fernet et al., 2014). Only a few studies investigated personality traits as possible determinants of passion (e.g., Vallerand et al., 2006; Tosun and Lajunen, 2009; Orosz et al., 2016). Orosz et al. (2016) found that impulsivity can be one relevant personality trait underlying passion as it was positively related to OP but unrelated to HP.

According to the model of impulsivity by Whiteside and Lynam (2001), impulsivity comprises urgency (i.e., the tendency of engaging in impulse behaviors under negative emotional conditions in order to alleviate these negative feelings and without taking into consideration the potentially harmful longterm effects and consequences); lack of perseverance (i.e., inability to remain focused on a difficult or boring task); lack of premeditation (i.e., difficulty in considering the consequences of an act), and sensation seeking (i.e., tendency to pursue new, exciting activities). Later Billieux et al. (2012) complemented this model by distinguishing negative and positive forms of urgency. Based on this, we propose that people who can hardly resist temptations are more likely to engage in OP for playing Pokémon Go. They might play Pokémon Go in order to quickly and easily alleviate their negative feelings. People with lower levels of monotony tolerance might also develop OP for playing Pokémon Go when they are bored, and the reward system of the game drives them to engage in this activity repetitively.

# The Aim of the Present Study

Despite the increasing popularity of Pokémon Go, relatively little research attention has been paid to the psychological background of playing this augmented reality game (Kaczmarek et al., 2017; Zsila et al., 2017). Considering that millions of active users worldwide still play Pokémon Go in 2017 (statista.com, 2017), the psychological background and motives associated with harmonious and OP for playing this game can be a relevant topic of investigation. Therefore, accumulating knowledge about psychological processes behind playing the most popular augmented reality game (in terms of impulsivity, motives, and passion) can be extremely beneficial if we move beyond gaming and think about work and private life-related aspects of this augmented reality game (Brohm et al., 2017). The aim of the present study was to explore the association of Pokémon Go playing motives with the two passion types (i.e., HP and OP). In the present study we aimed to put emphasis on the detailed outcomes of passion for playing Pokémon Go. We expected that HP would be positively related to adaptive Pokémon Go playing motives—social, skill development, and outdoor activity motives. Conversely, OP was expected to be positively related to escapism, competition and boredom motives. The exploration of these associations in the theoretical framework of passion would contribute to a more nuanced distinction of adaptive and maladaptive playing motives. Furthermore, we also investigated the relationship between passion types and one potential determinant of passion, namely impulsivity. Based on the findings of Orosz et al. (2016) with screen-based activities, we hypothesized that impulsivity would be positively related to OP but unrelated to HP.

# MATERIALS AND METHODS

# Participants and Procedure

Ethical approval was gained from the Institutional Review Board of the Eötvös Loránd University, and the study was performed in accordance with the Declaration of Helsinki. The research was conducted using an online questionnaire. First, participants were informed about the aims and the content of the study. Second, they were assured that they could stop the participation without any consequences whenever the filling process was uncomfortable or unpleasant for them. The data collection occurred in July 2016. Participants were recruited from the largest Hungarian anime (n = 4) and gamer (n = 3) communities on Facebook (comprising about 2,000–8,000 members).

A total of 621 Hungarian Pokémon Go players participated in this study (54.90% female), aged between 18 and 54 years (Mage = 22.57 years, SDage = 4.37). Participants spent 10.42 h per week on average playing Pokémon Go during the week preceding the data collection. Nearly half of them, 48.79% played Pokémon Go daily, whereas 38.65% played 2–6 times per week, and 12.55% played weekly or rarely. The vast majority of players (97.75%) played Pokémon Go on their mobile phone, whereas 2.25% played on their tablet. In the present study we used the same sample as in the Zsila et al.'s (2017) article. In the previous paper the factor structure of the MOGQ-PG was examined. However, in the present paper we intended to examine a specific relationship pattern regarding impulsivity, passion, and playing motives.

# Measures

# Sociodemographic and Pokémon Go-Related Information

Data regarding major demographics were collected including age and gender. Furthermore, data were obtained on the time spent playing Pokémon Go, frequency of playing, and the preferred platform (e.g., mobile phone).

# Motives for Online Gaming Questionnaire-Pokémon Go Extension (MOGQ-PG)

The motives of Pokémon Go players were assessed using the MOGQ-PG (Zsila et al., 2017). The MOGQ-PG appeared to be a good choice as (1) it is based on qualitative studies, (2) it had strong theoretical and scientific background regarding its MOGQ part, (3) it had Pokémon Go-specific factors (including Outdoor Activity, Nostalgia, and Boredom), (4) it had appropriate within network validity (good model fit indices despite it includes 10 factors [Sample 1: CFI = 0.963; TLI = 0.958; RMSEA = 0.057]; [Sample 2: CFI = 0.965; TLI = 0.960; RMSEA = 0.054]), (5) it had appropriate internal consistency (all factors had higher Cronbach's alpha than 0.7), (6) besides it can comprehensively assess very diverse motivational factors, it is relatively short.

The 37-item scale comprises 10 subscales: Social (four items, "Because I can meet many different people," α = 0.89), Escape (four items, "Because gaming helps me to forget about daily hassles," α = 0.85), Competition (four items, "Because I enjoy competing with others," α = 0.92), Coping (four items, "Because it helps me get rid of stress," α = 0.80), Skill Development (four items, "Because it improves my skills," α = 0.87), Fantasy (four items, "Because I can be in another world," α = 0.86), Recreation (three items, "Because it is entertaining," α = 0 77), Outdoor Activity (four items, "Because it provides the daily dose of exercise," α = 0.92), Nostalgia (three items, "Because it reminds me of my childhood," α = 0.92), and Boredom (three items, otherwise I would be bored, α = 0.78). Each item on the MOGQ-PG is rated on a 5-point Likert-scale (1 = almost never/never, 2 = some of the time, 3 = half of the time, 4 = most of the time, 5 = almost always/always).

# The Passion Scale

The Hungarian version of the Passion Scale (Tóth-Király et al., 2017), developed by Vallerand et al. (2003) and Marsh et al. (2013), comprises six items on HP ("Playing Pokémon Go is in harmony with the other activities in my life," α = 0.82), and six items on OP ("I have almost an obsessive feeling for playing Pokémon Go," α = 0.88). In this study, the items of the Passion Scale focused on Pokémon Go. Participants indicated their level of agreement with the statements on a 7-point Likert scale (1 = do not agree at all, 7 = very strongly agree).

# The Short UPPS-P Impulsive Behavior Scale (SUPPS-P)

The SUPPS-P Impulsive Behavior Scale (Billieux et al., 2012; Zsila et al., 2017) comprises 20 items that assess impulsivity on five dimensions: Negative Urgency (four items, e.g., "When I am upset I often act without thinking," α = 0.80); Positive Urgency (four items, e.g., "When I am really excited, I tend not to think on the consequences of my actions." α = 0.75); Lack of Perseverance (four items, e.g., "I finish what I start.," α = 0.71); Lack of Premeditation (four items, e.g., "I usually think carefully before doing anything," α = 0.81); Sensation Seeking (four items, e.g., "I generally seek new and exciting experiences and activities," α = 0.75). The items were translated into Hungarian following the protocol of Beaton et al. (2000). Participants rated each item on a 4-point Likert scale (1 = Agree Strongly, 4 = Disagree Strongly). The total score of impulsivity in the structural regression model was computed by averaging the scores of the five subscales.

# Statistical Analysis

Data analyses were performed with IBM SPSS for Windows, version 20.0 (IBM SPSS Inc., Chicago, Illinois) and Mplus 7.3 (Muthén and Muthén, 1998-2015) using a weighted least squares estimator (WLSMV) considering the non-normal distribution of a number of variables. Structural regression analysis within structural equation modeling (SEM) was used to investigate the associations between impulsivity, HP, OP, and Pokémon Go playing motives. The following fit indices were used to estimate the goodness of fit of the model to the data (Bentler, 1990; Brown, 2015): the Comparative Fit Index (CFI; ≥ 0.95 good, ≥ 0.90 acceptable), the Tucker– Lewis index (TLI; ≥ 0.95 good, ≥ 0.90 acceptable), and the Root-Mean-Square Error of Approximation (RMSEA; ≤ 0.06 good, ≤ 0.08 acceptable) with its 90% confidence interval.

Drawing on previous studies (e.g., Carbonneau et al., 2008; Orosz et al., 2016), parcels were used as indicators for the Passion Scale. Parcels are aggregated items that can be applied in models comprising a high number of latent and manifest variables. An important prerequisite of parceling is the unidimensionality of the scales (e.g., Bandalos and Finney, 2001; Matsunaga, 2008). Following the factorial algorithm of Rogers and Schmitt (2004), exploratory factor analysis was used, and three parcels were created for each of the two passion factor. For HP, parcel 1 consisted of items 5 and 6; parcel 2 consisted of items 3 and 8; and parcel 3 consisted of items 1 and 10. For OP, parcel 1 consisted of items 7 and 11; parcel 2 consisted of items 4 and 12; and parcel 3 consisted of items 2 and 9.

# RESULTS

Descriptive statistics and inter-factor correlations are presented in **Table 1**.

The results supported the hypothesized model [CFI = 0.951, TLI = 0.945, RMSEA = 0.049, (90% CI.047–0.052)], as seen in **Figure 1**. In the structural regression model, impulsivity was


**190**

positively associated with both OP (β = 0.44, p < 0.001) and HP (β = 0.15, p = 0.002). As expected, the strength of this association was stronger for OP than for HP.

The findings almost perfectly supported the expected associations between the two types of passion and playing motives. More specifically, as expected, only HP was positively related to the Social (β = 0.56, p < 0.001), Nostalgia (β = 0.31, p = 0.009), Recreation (β = 0.79, p < 0.001), and Outdoor activity (β = 0.66, p < 0.001) motives. Conversely, as expected, OP was uniquely and positively related to Boredom (β = 0.61, p < 0.001), Coping (β = 0.45, p < 0.001), Competition (β = 0.52, p < 0.001), Fantasy (β = 0.70, negatively associated with Boredom (β = −0.25, p = 0.048). Only one Pokémon Go expected, HP was negatively associated with Boredom (β = −0.25, p = 0.048). Only one Pokémon Go playing motive was associated positively with both types of passion: Skill Development the presence of an almost perfect distinction between adaptive and maladaptive motives along HP and OP. Overall, these results reveal the presence of an almost perfect distinction between adaptive and maladaptive motives along HP and OP.

# DISCUSSION

The goal of the present study was to investigate the relationship between passion, impulsivity and different motives concerning today's most popular augmented reality game, Pokémon Go. According to the results, impulsivity was more strongly associated with OP than with HP for playing Pokémon Go, which is partly in line with our hypothesis as we did not expect significant link between HP and impulsivity. In line with our expectations, OP was positively associated with several maladaptive motives such as escapism, coping, competition, boredom, and fantasy (as a less evidently maladaptive motive). Conversely, also in line with our expectations, HP was associated with several healthy or adaptive motives such as social, recreation, and outdoor activity motives. In sum, similar to other online or screen-related activities (Orosz et al., 2016), along with the DMP, the present model provided further support for the differentiated roles of HP and OP in terms of its personality determinants and motives.

HP was related to social and nostalgia motives, while these motives were not related to OP. These patterns were consistent with the association patterns reported by Fuster et al. (2014), who found that socialization, achievement, and exploration gaming motives were related to HP. However, nostalgia was related to HP, which contradicted the findings of a previous study by Yang and Liu (2017) on Pokémon Go playing motives who found a weak, positive link between loneliness and nostalgia. However, this result is in line with the more general notions of Routledge et al. (2013) concerning the adaptive function of nostalgia, and with those studies highlighting the importance of nostalgia in enhancing psychological well-being by fostering self-continuity and social connectedness (e.g., Routledge et al., 2013; Sedikides et al., 2016). The positive association between HP and these two motives drew the attention to the possible beneficial psychological consequences of playing Pokémon Go which may increase the sense of social connectedness by creating a social network that allows players to share and relive childhood memories of the world of Pokémon.

Passion can also facilitate engagement in behaviors that promote health-related activities such as physical activities (Vallerand, 2015). Positive consequences of HP were identified in prior studies such as self-development, physical and mental health (Lafreniere et al., 2009; Carpentier et al., 2012; Orosz et al., 2016). In the present case, health promotion lies in the nature of augmented reality games as they facilitate outdoor activities by blurring the line between real and virtual worlds, thus making the latter more interesting. On the basis of the results, we assume that if one has a HP for playing Pokémon Go, (s)he is motivated to play Pokémon Go for going outside and this physical activity can be beneficial for the player's health. Regarding mental health, this relationship pattern (HP➔Social) allows to initiate and maintain social relationships with different people (see also Kaczmarek et al., 2017). Therefore, on the basis of these results we might assume that HP for playing Pokémon Go may contribute to the players' physical and mental health. These results are in line with the Pikachu effect of Kaczmarek et al. (2017), who found that players with stronger health motives had more health benefits in terms of more time spent outside and increased physical activity. Harmonious passion for playing this AR game can be a mid-level construct behind this adaptive outcome.

According to prior studies, OP predicted negative health outcomes such as problematic or addiction-like symptoms (e.g., loss of control over the activity, Orosz et al., 2016), negative emotions (Przybylski et al., 2009), and health-risk behaviors (Vallerand et al., 2003). In light of prior studies and the present motivational correlates of OP, players with OP may be less motivated to play Pokémon Go to improve their mental or physical well-being. In the present study, OP was strongly related to escapism and boredom motives. According to previous results, playing online games in order to escape from real life problems can lead to problematic use (Király et al., 2015). Therefore, players who engage in Pokémon Go in order to escape from reality may be at risk of developing a problematic gaming behavior. Coping and competition were also expected to be related to OP, similar to the more general online gaming results of Király et al. (2015). The association of boredom with OP could be explained by the rigid, unsatisfying involvement in an activity, as was described by Vallerand et al. (2003) in their passion model. Furthermore, it was found that fantasy is positively related to OP. On the basis of the positive relationship pattern between OP, coping, escapism and fantasy, we may suppose that fantasy can also be interpreted as a creative internal form of escapism.

In line with previous findings (Orosz et al., 2016), impulsivity was positively associated with OP. However, impulsivity was also related to HP in the present study. According to prior studies, impulsivity can be interpreted as a risk factor for different problems in many fields, including health-risk behaviors (Vallerand et al., 2003), compulsive buying (Billieux et al., 2008), binge eating (Fischer and Smith, 2008; Peterson and Fischer, 2012), and Internet-related addictions (Mottram and Fleming, 2009). In the present study, both HP and OP were related to impulsivity, although this personality trait appeared to have a stronger relationship with OP than with HP. These results suggest that impulsivity may lead someone to rigidly engage in a behavior that is not necessarily problematic per se. However, playing Pokémon Go repeatedly at ill-advised times may lead to conflict with other aspects of one's life thereby to personal problems (e.g., neglecting one's studies) or interpersonal conflicts (e.g., neglecting one's romantic partner).

# LIMITATIONS

This study is not without its limitations. Due to the sampling method, players in this study may not be representative of the entire population of Pokémon Go players. Furthermore, since the assessment instruments in the present study were specific to Pokémon Go playing, information regarding individuals who do not play the game were not collected. Therefore, comparisons with a non-player group cannot be made. In addition, casual inferences cannot be established due to the cross-sectional nature of the study. Furthermore, the direction of associations could be reversed, thus alternative models should be tested in future studies. Another limitation is that Pokémon Go-specific motives can differ from the motives of other augmented reality games (e.g., nostalgia). Finally, the data collection was carried out at the time of Pokémon Go's peak popularity. Therefore, further research is needed not only on this particular augmented reality game but on the role of passion in popular games in general.

# CONCLUSION

The present study aimed to contribute to a deeper understanding of passion for the most popular augmented reality game, Pokémon Go. It was found that different playing motives are linked to different underlying passion constructs, which lead players to divergent experiences that may predict possible positive and negative consequences on the long run. Thus, the exploration of players' motives can lead to an advanced knowledge of Pokémon Go playing practices, which can unfold either healthy or maladaptive use of Pokémon Go. Therefore, the early identification of these motives related to maladaptive psychological mechanisms (e.g., OP) can indicate the need for intervention efforts to reduce the psychological harms of a problematic gaming behavior. The popularity of this game makes it reasonable to examine either positive or negative behavioral consequences of usage as millions of players engage daily in this augmented reality game in recent times.

Finally, the present study provided further evidence for the generalizability of the DMP by demonstrating the divergence of the two passion constructs with regard to playing motives in a relatively large sample of Pokémon Go players.

# REFERENCES


# AUTHOR CONTRIBUTIONS

GO and ÁZ substantially contributed to study design, data gathering, data analyses, interpretation of the results, and manuscript writing; BB substantially contributed to the data gathering, RV substantially contributed to the interpretation of the results, and revising the manuscript. All authors commented on the draft and contributed to the final version, approved the final version of the manuscript, and agreed to be accountable for all aspects of the work.

# FUNDING

This research was supported by grants of the Hungarian Research Fund: (FK 124225 and PD 116686). ÁZ was supported by the ÚNKP-17-3 New National Excellence Program of the Ministry of Human Capacities.

online gaming questionnaire (MOGQ). Behav. Res. Methods 43, 814–825. doi: 10.3758/s13428-011-0091-y


hikikomori. Psychiatry Res. 246, 848–849. doi: 10.1016/j.psychres.2016. 10.038


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Orosz, Zsila, Vallerand and Böthe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Being Bullied in Virtual Environments: Experiences and Reactions of Male and Female Students to a Male or Female Oppressor

Nicole Krämer<sup>1</sup> \*, Sabrina Sobieraj<sup>1</sup> , Dan Feng<sup>2</sup> , Elisabeth Trubina<sup>1</sup> and Stacy Marsella<sup>2</sup>

<sup>1</sup> Computer Science and Applied Cognitive Science, University of Duisburg-Essen, Essen, Germany, <sup>2</sup> College of Computer and Information Science, Northeastern University, Boston, MA, United States

Bullying is a pressing societal problem. As such, it is important to gain a better understanding of the mechanisms involved in bullying and of resilience factors which might protect victims. Moreover, it is necessary to provide tools that can train potential victims to strengthen their resilience. To facilitate both of these goals, the current study tests a recently developed virtual environment that puts participants in the role of a victim who is being oppressed by a superior. In a 2 × 2 between-subjects experiment (N = 81), we measured the effects of gender of the oppressor and gender of the participant on psychophysiological reactions, subjective experiences and willingness to report the event. The results reveal that even when a male and a female bully show the exact same behavior, the male bully is perceived as more threatening. In terms of gender of the victim, the only difference that emerged was a more pronounced increase in heart rate in males. The results were moderated by the personality factors social gender, neuroticism, and need to belong, while self-esteem did not show any moderating influence.

### Keywords: virtual environments, bullying, gender, psychophysiology, resilience, psychological

# INTRODUCTION

The use of virtual environments (VE) is nowadays widespread. Their potential in academia has been discussed extensively, and numerous research applications have been presented. Since VEs offer the possibility to create settings of high ecological validity that can be fully controlled, they have been suggested for and employed in fundamental research (Blascovich et al., 2002; Mapala et al., 2017) and for therapeutic and training purposes (e.g., Bossard et al., 2007; Potkonjak et al., 2016). Fundamental research uses virtual environments to study and understand fundamental mechanisms, for example regarding deceptive behavior (Mapala et al., 2017) or proxemics behavior (Yee et al., 2007; Iachini et al., 2016). Moreover, VEs can be employed to examine and reduce stereotype bias in terms of racial or age stereotypes (Banakou et al., 2016; Oh et al., 2016; Hasler et al., 2017). In the applied area of therapeutic interventions, virtual scenarios are being tested for the treatment of paranoia, post-traumatic stress disorders and other anxiety disorders (Gerardi et al., 2010) such as flight anxiety (Cardos˛ et al., 2017) and speech anxiety (Pertaub et al., 2002). Increasingly, they are also being used for training purposes, mainly in the area of training motor skills, for example regarding surgery (Seymour et al., 2002), motor rehabilitation training (Holden, 2005; Pedreira da Fonseca et al., 2017) or to perfect skills in sports (Miles et al., 2012).

#### Edited by:

Massimo Bergamasco, Sant'Anna School of Advanced Studies, Italy

#### Reviewed by:

Domna Banakou, Universitat de Barcelona, Spain Jennifer Hofmann, University of Zurich, Switzerland

#### \*Correspondence:

Nicole Krämer nicole.kraemer@uni-due.de

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 09 September 2017 Accepted: 15 February 2018 Published: 06 March 2018

#### Citation:

Krämer N, Sobieraj S, Feng D, Trubina E and Marsella S (2018) Being Bullied in Virtual Environments: Experiences and Reactions of Male and Female Students to a Male or Female Oppressor. Front. Psychol. 9:253. doi: 10.3389/fpsyg.2018.00253

The development and evaluation of virtual environments for training resilience and future behavior in stressful situations has not been extensively addressed. Most notable among the exceptions is the stress resilience training conducted with military service members prior to their initial deployment (Rizzo et al., 2013). Here, users are immersed in a challenging context and train a range of psychoeducational and cognitivebehavioral emotional coping strategies believed to enhance stress resilience. More recently, a virtual environment application has been presented that enables resilience training for bullying situations (Feng et al., 2017; Jeong et al., 2017). Given its societal importance, bullying represents a critical, and yet widely neglected, field of application for virtual environment resilience training. Bullying can occur in various forms: as physical (e.g., slapping), verbal (e.g., offensive utterances), and relational (e.g., betrayal, social exclusion, spreading harmful gossip) violence (Berger, 2007). Due to its high prevalence rates in the population, bullying can be seen as a pressing societal problem. Indeed, in a meta-analytic review, Berger (2007) estimated that approximately 9–25% of school children worldwide have already been victims of bullying.

Bullying can have a powerful impact on the victims, in terms of negative affect (e.g., feeling nervous) and physiological reactions (e.g., stress, headaches, pain, sleep problems) (Hansen et al., 2006). It can also cause long-term consequences such as depression (Agervold and Mikkelsen, 2004; Sapouna and Wolke, 2013).

In the current paper, we test the recently developed virtual environment application (Feng et al., 2017) with a special focus on the influence of gender (of the bully as well as the victim) and personality factors. The aim is twofold: We (a) employ virtual environment technology as an empirical testbed in order to learn more about the mechanisms and resilience factors influencing the effects of bullying (specifically regarding the impact of the bully's gender and the participant's personality variables) and (b) evaluate the effects of the environment on different groups of participants with regard to their stress levels, emotional states and behavioral intentions. The results of these analyses should form the basis for an effective training intervention which could be applied to train victims or enhance prevention workshops in schools or universities.

# THEORETICAL BACKGROUND

# Virtual Environments

Virtual environments are synthetic replications of the real world or of specific situations. Users are provided with the experience of being surrounded by these environments (Loomis et al., 1999), and they are often perceived as real. To immerse and interact in the environment, "[u]sers wear displays that fully immerse a number of the senses in computer generated stimuli. Stereoscopic head-mounted displays (HMD) are a distinctive feature of such systems" (Biocca and Delaney, 1995, p. 56). Virtual environments offer the possibility to vary characteristics of situations in very subtle ways: Environmental cues (e.g., creating a classroom, a farm or anything else) and social cues (e.g., the number of virtual persons present, their gender) can be systematically manipulated in order to examine their influence on participants' social interaction, cognition and behavior (Blascovich et al., 2002; Bombari et al., 2015; Maister et al., 2015). One essential advantage of virtual environments lies in their possibility to enable persons to test their responses under fairly realistic conditions, without serious consequences. Therefore, virtual environments are nowadays successfully employed in a variety of settings for research and educational or training purposes. For instance, they are used for disaster training for healthcare professionals (Farra et al., 2015), police personnel (Bertram et al., 2015) or even for civilians learning how to behave in the case of an unexpected fire emergency (Gamberini et al., 2003). Moreover, they are implemented to treat paranoia, post-traumatic stress disorders, and other anxiety disorders (Slater et al., 2006; Gerardi et al., 2010; Atherton et al., 2016; Cardos˛ et al., 2017). In addition, virtual environments are used for fundamental research in order to understand basic mechanisms, for instance, as mentioned above, regarding proxemics behavior, deception, or stereotype bias. Given these applications, it therefore seems feasible to employ a virtual scenario that can (a) serve highly controlled experimental research on the mechanisms and influencing factors underlying victims' responses and potential resilience and (b) be refined to serve as an environment in which to train appropriate reactions and resilience. In order to employ virtual environments for both fundamental and applied research goals, it is necessary to demonstrate that the environment is able to elicit emotional and psychophysiological responses. Previous research demonstrated that virtual environments can indeed elicit strong emotional reactions (Slater et al., 2006; Pan and Slater, 2011). Moreover, Kotlyar et al. (2008) found that both blood pressure and heart rate were significantly increased in response to a speech stressor presented in a virtual environment. Most recently, Kothgassner et al. (2016) found comparable physiological stress responses in participants undergoing a public speaking task in a real-audience and a virtual-audience condition.

In conclusion, most findings indicate that virtual environments induce similar emotional and physiological reactions to those elicited in real-life situations, and compared to classic training methods (e.g., Pan and Slater, 2011; Kothgassner et al., 2016). We therefore suggest that a virtual scenario can be employed in a (mild) bullying setting to examine victims' reactions by measuring physiological (during) and emotional reactions (afterward). In this way, we aim to contribute evidence regarding the influencing factors for emotional reactions and resilience. We further aim to derive suggestions for refining the environment for applied settings such as resilience training interventions.

# Research on Bullying

Juvonen and Graham (2014) state that "[b]ullying involves targeted intimidation or humiliation. Typically, a physically stronger or socially more prominent person (ab)uses her/his power to threaten, demean, or belittle another. To make the target or victim feel powerless, (...)" (p. 161). While some researchers believe that bullying has to occur on a regular basis to have adverse effects (Olweus, 1993), Juvonen and Graham (2014)

suggest that even one single mistreatment can be sufficient to elicit fear of further bullying. The negative consequences can range from negative feelings to severe psychophysiological reactions and clinical depression (Hansen et al., 2006). Bullying can be seen as a stress event, as described by Selye (1950) or Lazarus and Folkman (1984).

One key characteristic of bullying is the power imbalance between the involved parties; there is always a bully (or perpetrator) and a victim. Wolke and Skew (2012) report that the roles (victims, bullies) are remarkably stable over time. This is underlined by recent meta-analytic findings of Kljakovic and Hunt (2016), who confirmed the stability of roles with a large effect size. Einarsen (1999) reported that personality traits of the victim and psychosocial factors are decisive regarding the question of who becomes a victim.

### Prevalence of Bullying

Bullying is a societal problem which affects children, adolescents and adults. Referring to German, Austrian and English studies, Einarsen (1999) speculates that 70–80% of working adults have been bullied by their supervisors. For United States workers, Lutgen-Sandvik et al. (2007) estimated that 35–50% have been affected. Other data suggest that only approximately 10–25% of the adult population across different countries (e.g., Europe, United States) has been affected by bullying (Wolke and Skew, 2012; van Heugten, 2013). Although a wide range of persons are affected by bullying, and the consequences can be devastating, some victims have the resources to cope with the difficult situation and to adjust in a positive way; they seem to be resilient. Research on resilience is currently focusing on the complex interplay of social resources (outside the family), family support and personal characteristics (Sapouna and Wolke, 2013). However, although resilience is an important factor, it has not received a great deal of attention in this context. This might be due to the difficulty of investigating resilience through survey studies, which are widely used in bullying research. As selfreports can become distorted over time, especially concerning felt emotions and immediate reactions, it is hard to identify the relation between personal resources and immediate reactions to bullying. Thus, one goal of the current study is to focus on resilience factors inherent in the victim and to relate these to the reactions that occur in the bullying situation.

### Gender Differences in Bullying

There is consistent evidence that boys and male adolescents more often act as bullies compared to their female counterparts (Barlett and Coyne, 2014; Narayanan and Betts, 2014), especially with respect to physical bullying (Juvonen and Graham, 2014).

Moreover, studies have also demonstrated that boys and male adolescents are more frequently the victims of bullying (Wolke and Skew, 2012; Narayanan and Betts, 2014) compared to females. However, other studies found no gender effect (Kljakovic and Hunt, 2016), or that females were more likely to become victims of relational bullying (Sapouna and Wolke, 2013).

With regard to gender differences in the potential consequences of being bullied, and the question of whether gender might also function as a protective factor in terms of resilience, Sapouna and Wolke (2013) reported that males tend to show lower levels of depression, while females tend to be more vulnerable to depression. This was also demonstrated by Turner et al. (2013) with regard to cyberbullying. With regard to coping behavior, compared to men, women seem to be more willing to (a) to report their mistreatment to authorities and friends and (b) seek help (Unnever and Cornell, 2004; Kostev et al., 2013). Unnever and Cornell (2004) reported that girls found it easier to talk to their friends about victimization than to adults, who might rather be perceived as authorities. Approximately 30 percent of students do not report their victimization at all, because they are scared and do not believe that authorities in particular would be able to change their situation (Unnever and Cornell, 2004; Berger, 2007). Berger (2007) reported a positive effect of talking with peers about mistreatment. In line with these findings and open questions, the present study aims to evaluate whether a virtual bullying situation can be used as a testbed to learn about the factors influencing the willingness to report bullying.

While the aforementioned findings relate to biological gender, the literature also indicates that social gender can be a further determining factor. Social gender refers to personal characteristics, and addresses whether an individual has rather female or male attributes. Attributes that are perceived as female are communality, warmth and expressivity, while supposedly male attributes include instrumentality and dominance. People with atypical characteristics have been shown to be victimized more often (Navarro et al., 2015). Thus, it seems that both the biological and the social gender can predict involvement in a bullying situation. To our knowledge, the potential for resilience with respect to social gender has not yet been addressed, although it has been suggested as one of the personality traits influencing adjustment after victimization (e.g., Donnon and Hammond, 2007). Nevertheless, a person's social gender attributes might predict adverse reactions to a greater degree than biological attributes. For instance, oppression might induce more aversion in a person who is sensitive, sociable and caring (female attributes, Prentice and Carranza, 2002) than in a person who is assertive, competitive and aggressive (male attributes, Prentice and Carranza, 2002).

### **Impact of bullies depending on their gender**

Another unanswered question refers to the impact of the bully depending on his/her biological gender. According to gender stereotypes, men can be perceived as more threatening; thus, it can be asked whether male and female perpetrators are perceived in the same way. Men are seen as agentic and holding attributes like assertiveness and aggression, while women are associated with warmth and communality (see Cuddy et al., 2008). Additionally, men commonly have different physical attributes, which might be perceived as more menacing. On the other hand, female bullies might be perceived as more threatening because counter-stereotypical behaviors (i.e., being suppressive, dominant and aggressive instead of warm and kind) are unexpected and can lead to penalization (e.g., Eagly and Karau, 2002; Bosson and Michniewicz, 2013).

# Individual Differences in Bullying

fpsyg-09-00253 March 5, 2018 Time: 17:4 # 4

As outlined above, it has repeatedly been suggested that it is not random who gets involved in bullying situations (Juvonen and Graham, 2014). As such, it has been discussed whether personality factors are associated with victimization (Einarsen et al., 1994). For example, studies demonstrated that victimization was positively correlated with neuroticism and negatively correlated with conscientiousness (Bollmer et al., 2006; Zapf and Einarsen, 2011; Wolke and Skew, 2012; Kodžopeljic et al., 2014 ´ ; Nielsen and Knardahl, 2015). Zapf and Einarsen (2011) summarized that while some studies found extroversion, agreeableness and conscientiousness to be associated with victimization, others did not. Additionally, selfesteem and self-assertiveness can be important factors (Zapf and Einarsen, 2011). For instance, Baumeister et al. (2003) found that persons with low self-esteem ratings were more often victims than persons with high self-esteem. In addition, Zapf and Einarsen (2011) reported that victims score high on sensitivity, suspiciousness, anxiety and depression and low on assertiveness and competitiveness.

Individual traits also play a decisive role in terms of resilience (Sapouna and Wolke, 2013). In a more general context not specifically related to bullying, Friborg et al. (2005) stated that resilience was related to "high score[s] on emotional stability [low neuroticism], extroversion, openness and conscientiousness [. . .], as well as agreeableness...". They found a strong negative correlation between neuroticism and resilience, and revealed that neurotic persons stated more negative affect and showed more symptoms of anxiety and depression. Transferring these results to resilience against bullying, it can be expected that victims with low scores on neuroticism will report less negative reactions (e.g., negative affect) than victims with high neuroticism scores.

Sapouna and Wolke (2013) found that high self-esteem is positively associated with positive adjustment after victimization. Von Soest et al. (2010) further suggested that hardiness and a positive cognition of events (e.g., seeing chances/opportunities in negative situations/experiences) can lead to less negative reactions to stressful experiences. Van Heugten (2013) added that the perceived level of control on the part of the victim has an impact on the outcome of the bullying situation. In line with Sapouna and Wolke (2013), we therefore suggest that self-esteem is positively associated with resilience and less negative reactions to victimization.

Another moderating factor might be the "need to belong, that is, a need to form and maintain at least a minimum quantity of interpersonal relationships (. . .)" (Baumeister and Leary, 1995, p. 499). While it is fairly well testified that bullies strive for acceptance from their peers (e.g., Olthof and Goossens, 2008), the role of the need to belong on the part of the victim has received less research attention. For victims, the need to belong might especially affect the willingness to approach others after a bullying event.

To sum up, a broad body of research has found that bullying leads to stress reactions in terms of negative affect (e.g., feeling nervous) and physiological reactions (e.g., increased electrodermal activity). Moreover, (personality) traits of the victim (e.g., high neuroticism scores, gender) as well as attributes of the bully (e.g., male competitors are perceived as more dominant) seem to be influential. Although it is well known who is affected by bullying, less is known about resilience factors inherent in the victims. Most researchers applied survey studies to gain insights into bullying processes. While such studies provided a great deal of valuable results, the exploratory power of these results is partly limited. As virtual environments offer the opportunity to create situations of high control and systematization (Loomis et al., 1999; Blascovich et al., 2002), we strive to employ a virtual scenario in order to extend the basic research on these issues.

### Research Questions and Hypotheses

Studies have revealed that prevalence rates of bullying are rather high, with approximately one third of the population across nations and across all ages having already been involved in bullying. The consequences can be far-reaching, especially for victims. For the experimental setting here, we specifically focus on bullying by an authority in an institutional setting, in order to represent a situation of clear power imbalance (Juvonen and Graham, 2014). Moreover, we are especially interested in the question of under which conditions bullying authorities would be reported. Regarding resilience factors, not every victim is permanently hurt/psychologically impaired by bullying; some victims show positive adjustments due to their coping potential (Lazarus and Folkman, 1984). Referring to the current literature on bullying and resilience (see Individual Differences in Bullying; Friborg et al., 2005; Sapouna and Wolke, 2013), we assume different personality attributes to be important, such as neuroticism, self-esteem and need to belong. Furthermore, the biological and social gender have been assumed to influence the victim's reaction to bullying (Prentice and Carranza, 2002; Turner et al., 2013). Based on this previous work, we state the following hypotheses:

H1: Female victims will experience more adverse reactions (based on self-reports and physiological reactions) than male victims. These reactions will be moderated by the victim's personal characteristics (social gender, neuroticism, self-esteem and need to belong).

According to gender stereotypes, we assume that the biological gender of the bully can be an influencing factor.

H2: A male bully will elicit more adverse reactions (based on self-reports and physiological reactions) than a female bully. These reactions will be moderated by the victim's personal characteristics (social gender, neuroticism, selfesteem and need to belong).

Moreover, we suppose an interaction between the biological sex of the victim and the biological sex of the bully.

H3: Female victims oppressed by a male bully will show more adverse reactions (based on self-reports and physiological reactions) than male victims oppressed by a male bully, female victims oppressed by a female bully, and male victims oppressed by a female bully. These

reactions will be moderated by the victim's personal characteristics (social gender, neuroticism, self-esteem and need to belong).

# MATERIALS AND METHODS

# Study Design and Virtual Scenario

To examine our hypotheses, we conducted an experiment with a 2 (bully's gender) × 2 (victim's gender) between-subject design (N = 81, 45 females, 36 males). The virtual environment was designed to simulate a bullying scenario by assigning the participants to a task that is impossible to complete to the bully's satisfaction. As shown in **Figure 1A**, the virtual environment is a wide open space in which two virtual characters are displayed. Specifically, we simulated a rehearsal in an acting class scenario, because this is a situation in which feedback can be given naturally. To create the imbalance of power, the key element of bullying behavior, one of the virtual characters was designed to be the instructor (female/male) and the authority in the scene. The participants took the role of an acting student, reading lines from a script and interacting with a second (virtual) student who was also practicing his lines while taking instructions from a virtual instructor. The participant could see both the instructor and the fellow student (**Figure 1A**) standing in a neutral, stage-like room. The virtual fellow student served to simulate a real-life acting class in which participants rehearse a scene together, as well as to enhance the participants' feeling of being treated differently. The fellow student looked the same in all conditions and displayed the same, neutral behavior, saying his lines with default gaze behaviors following the person who speaks. Participants were asked to rehearse the script adapted from 'Romeo and Juliet: Act 3, Scene 3,' with the virtual fellow student playing another character in the script, Friar Lawrence, while the participant played Romeo. There were no other interactions between the participant and the virtual student, beyond reading their different parts in the script. The researcher told the participants that their goal was to finish their rehearsal in a limited amount of time. Each time the participant finished reading a line, the virtual instructor provided feedback. Participants were told that the instructor's feedback was specifically tailored to their performance, and that they should follow the instructor's directions to the best of their ability. The negative feedback from the virtual instructor was scripted and identical for all participants regardless of their performance. Each time the participant finished reading a line, the virtual instructor verbally bullied the participants by providing strong negative feedback, using harsh language and even ridicule, with negative non-verbal behaviors (see **Figure 1B**). For example, the negative feedback included sentences such as "Ugh, stop. You sound like a dead fish," "No no no, that's not right. Honestly, how hard is it?" and "Come on, work with me here. Say it like you mean it." (For a video of the situation featuring the male perpetrator see **Supplementary Materials**. Please note that the participants watched this with an oculus, i.e., saw only one picture).

# System Apparatus

The 3D virtual environment was developed using Unity3D. The virtual humans' non-verbal behaviors such as facial expression and gestures were automatically generated using Cerebella (Lhommet and Marsella, 2013; Marsella et al., 2013) and the generated animations were controlled using Virtual Human Toolkit. The head-mounted display (HMD) was the Oculus Rift Development Kit 2. The experiment apparatus is shown in **Figure 2**. An Empatica E4 sensor measured physiological signals, heart rate and electrodermal activity.

# Procedure

The experiment took place in a virtual reality lab at the University Duisburg-Essen, Germany. Participants were recruited by personal contact online and offline. When they arrived at the university, they were welcomed by the experimenter, instructed about the setting and asked to provide informed consent. Then, the participants took a seat and were fitted with the Empatica E4 and asked to fill in the first part of the questionnaire including the personality traits. Afterward, the experimenter fitted the participants with the Oculus Rift. The experimenter started (a) the recording of the physiological data by tagging the Empatica E4 and (b) the video recording, and ended the recording after the interaction. Interactions took about three to 4 min; the total duration of the experiment was approximately 30 min. Finally, participants

filled in the second part of the questionnaire and were debriefed.

# Measures

To examine our hypotheses, we captured personality traits and participants' adverse reactions. In addition, participants' sociodemographic characteristics were determined (biological gender, age, level of education).

# Personality Traits

We used a subscale of the NEO-FFI (Borkenau and Ostendorf, 2008) to measure neuroticism (α = 0.810). Ratings were given on 5-point Likert scales (0 = not at all; 4 = absolutely), with high scores indicating a strong manifestation of the trait.

To measure the social gender, we employed the German version of the Personal Attributes Questionnaire (GEPAQ) by Runge et al. (1981), which comprises three subscales (eight items each): masculinity (M+; α = 0.632), femininity (F+; α = 0.663) and masculinity-femininity (M-F; α = 0.590). The masculinity scale includes items like competitive and self-confident, while the femininity scale includes items like sensitive and emotional. Participants gave ratings on 5-point Likert scales (0 = not at all; 4 = absolutely). For the present analyses, we only used the masculinity and femininity subscale.

Self-esteem was captured using the Rosenberg Self-Esteem Scale (Rosenberg, 1965), which consists of ten items (α = 0.838). Ratings were made on 4-point Likert scales (0 = not at all; 3 = absolutely). High values of the sum score represent high self-esteem.

We measured the need to belong with the 10-item Need to Belong Scale (Leary et al., 2013). Ratings were made on 5-point Likert scales (1 = not at all; 5 = extremely), with high scores indicating a high need to belong (α = 0.840). We additionally measured causal attribution style (McAuley et al., 1992), although this is not relevant for the present article.

### Adverse Reactions

To examine adverse reactions, we used self-reports and physiological measures.

### **Physiological reactions**

We captured electrodermal activity (EDA) and heart rate (HR) as indicators of physiological stress responses during the acting task, using the Empatica E4. This is a bracelet with four sensors (photoplethysmography, electrodermal activity sensor, accelerometer, thermometer), which can measure physiological responses in real time. Only the data of the first two sensors were used in order to derive heart rate and electrodermal activity (skin conductance level, SCL). A tagging button was used to mark the start and end of the experimental interaction. We captured a baseline before the beginning of the interaction for 3–4 min while participants were able to look around the room. For further analyses, we calculated the differences between the physiological values at baseline and at the end of the interaction to obtain values for changes in heart rate and skin conductance level.

## **Self-reports**

The current mental state was measured by 28 three-point semantic differentials (Zerssen and Koeller, 1976; 0 = positive pole, 1 = indifferent; 2 = negative pole) such as fresh-faint, irritated-placid or happy-upset. From these, the sum score was formed, with high scores representing mental unease and low scores representing high mental well-being.

Moreover, we asked participants about their perception of the bullying situation using 16 items rated on 9-point Likert scales (1 = totally disagree; 9 = totally agree). Example items are "I felt oppressed by the instructor's behavior" and "The instructor's behavior made me insecure." A factor analysis using Horn's (1965) parallel analysis method resulted in a onefactor solution (α = 0.900); five items had to be removed from the analysis. High scores indicate a strong feeling of oppression.

# **Behavioral intentions**

Finally, we measured behavioral intentions to report the mistreatment using 9-point Likert scales (1 = not at all; 9 = absolutely) after the bullying situation. We captured two different types of report: formal report (one item, "Would you report the behavior of the instructor to the university?") and informal report (three items "Would you report the behavior of the instructor to your friends/family/fellow students?"; α = 0.846).

# Sample

Of the 83 participants who took part in the study, two participants had to be excluded (one due to technical problems and one who switched off the Empathica E4). The final study sample thus comprised 81 participants (45 females, 36 males), with an age range from 18 to 31 years (M = 22.70, SD = 2.93). As the highest level of education, approximately 94% had completed university entrance-level examinations or a higher educational qualification; the remaining 6% named another qualification (e.g., graduated from a mediumtrack school). On average, the participants had 9 years of experience with video games (M = 9.63; SD = 6.81). The participants in the two conditions (female/male oppressor) did

not differ concerning their video-gaming experience (female oppressor condition: M = 10.08, SD = 7.46; male oppressor condition: M = 9.20, SD = 6.16). However, they differed slightly regarding their average age, with those in the female condition being 1 year older (M = 23.40, SD = 0.49) than those in the male condition (M = 22.02, SD = 0.41). Gender was distributed equally across conditions (female participants: nmale bully = 22, nfemale bully = 23; male participants: nmale bully = 19, nfemale bully = 17).

# RESULTS

To examine H1–H3, we conducted a MANOVA with the independent factors bully's gender and participants' gender and the dependent variables physiological reactions (electrodermal activity, heart rate), self-reports (bullying perception, mental state) and behavioral intentions to report the misbehavior of the instructor (informal, formal).

Regarding H1, which stated that female participants would experience more adverse reactions, the analysis showed a difference in heart rate between female and male participants, F(1,77) = 5.01, p = 0.028, η 2 <sup>p</sup> = 0.061: Males showed higher heart rate changes (M = 30.29, SE = 2.75, CI [24.82, 35.76]) than females (M = 22.05, SE = 2.45, CI [17.16, 26.93]). There were no significant effects on SCL [F(1,77) = 1.64, p = 0.205, η 2 <sup>p</sup> = 0.021]. The self-reports on mental state, F(1,77) = 0.38, p = 0.542, η 2 <sup>p</sup> = 0.005, and the perception of the bullying situation, F(1,77) = 0.05, p = 0.831, η 2 <sup>p</sup> = 0.001, did not differ significantly. Moreover, there was no difference between men and women in the intentions to report the bullying situation in a formal, F(1,77) = 0.15, p = 0.698, η 2 <sup>p</sup> = 0.002, or informal way, F(1,77) = 0.00, p = 0.956, η 2 <sup>p</sup> = 0.000.

Concerning H2, which stated that a male bully would elicit more adverse effects than a female bully, the analysis revealed no significant difference in the physiological reactions depending on the bully's gender [SCL F(1,77) = 2.34, p = 0.130, η 2 <sup>p</sup> = 0.030; HR F(1,77) = 1.93, p = 0.169, η 2 <sup>p</sup> = 0.024]. The self-report did not reveal a difference for the variable "mental state ratings," F(1,77) = 0.15, p = 0.705, η 2 <sup>p</sup> = 0.002, but a difference was found for the variable "perception of the bullying situation" depending on the bully's gender, F(1,77) = 5.08, p = 0.027, η 2 <sup>p</sup> = 0.062. The male bully elicited a greater threat perception (M = 5.52, SE = 0.25, CI [5.03, 6.02]) than did the female bully (M = 4.72, SE = 0.25, CI [4.22, 5.23]). Regarding the behavioral intentions to report the bullying, the analysis did not reveal a difference depending on the bully's gender [informal F(1,77) = 0.14, p = 0.707, η 2 <sup>p</sup> = 0.002; formal F(1,77) = 0.63, p = 0.432, η 2 <sup>p</sup> = 0.008].

The interaction of bully's gender and participants' gender (H3) did not show significant differences for physiological reactions [SCL F(1,77) = 0.01, p = 0.908, η 2 <sup>p</sup> = 0.000; HR F(1,77) = 0.00, p = 0.974, η 2 <sup>p</sup> = 0.000], self-reports [mental state F(1,77) = 0.37, p = 0.546, η 2 <sup>p</sup> = 0.005; perception of bullying situation F(1,77) = 0.45, p = 0.505, η 2 <sup>p</sup> = 0.006] and behavioral intentions to report the mistreatment by the instructor [informal F(1,77) = 2.56, p = 0.114, η 2 <sup>p</sup> = 0.032; formal F(1,77) = 0.70, p = 0.406, η 2 <sup>p</sup> = 0.009].

To examine whether the personality variables self-esteem, need to belong, neuroticism, and social gender moderate the results of H1–H3, we conducted three-way moderations (model 3) using the PROCESS macro by Hayes (2013). To this end, we consecutively conducted moderations, with each personality trait (self-esteem, need to belong, neuroticism, and social gender) as a moderator and the physiological reactions [electrodermal activity (EDA) and heart rate], self-reports on experiences (perception of the bullying, mental state), and behavioral intentions to report the bullying behavior on a formal and informal level as dependent reactions.

# Self-Esteem

When self-esteem was used as a moderator for the relation of bully's and participants' gender, there was no effect of self-esteem on the dependent variables and the inclusion of self-esteem did not change any of the results depicted above.

# Need to Belong

In a next step, we ran analyses with need to belong (NTB) as a moderator. The overall model for skin conductance level did not reach significance F(7,73) = 1.51, p = 0.177, R <sup>2</sup> = 0.15, but there was a significant three-way interaction effect of bully's gender, participants' gender and NTB on SCL (b = −0.80, t(73) = −2.51, p = 0.014, CI [−1.43, −0.16]). **Figure 3** depicts the interaction effect for low, medium and high levels of NTB. The Johnson-Neyman technique further showed that the interaction effect of bully's gender and participants' gender on SCL changed significantly at NTB values below −7.76 (8.64%) and above 10.49 (7.41%). The female bully elicited higher SCL in female participants with a high NTB than did the male bully, while the opposite was the case for male participants with a high NTB. Moreover, the male and female bully elicited the same SCL for female participants with a low NTB; however, male participants with a low NTB showed increased SCL in response to the female bully.

Concerning heart rate, the analysis revealed a non-significant overall model F(7,73) = 1.84, p = 0.093, R <sup>2</sup> = 0.15. However, as in H1, there was a significant main effect of participants' gender on heart rate, and a three-way interaction of bully's gender, participants' gender and NTB, (b = 2.41, t(73) = 2.05, p = 0.044, CI [0.67, 4.75]). The latter finding, however, does not show significant transition points within the moderator scores using the Johnson-Neyman technique and will therefore not be interpreted.

With respect to the self-report data, the overall model for the perception of bullying was significant, [F(7,73) = 4.77, p < 0.001, R <sup>2</sup> = 0.25] and showed a main effect of bully's gender, (b = −0.81, t(73) = −2.41, p = 0.019, CI [−1.47, −0.14]), indicating more perceived threat from the male bully than from the female bully. Moreover, an interaction effect of bully's gender and NTB, (b = 0.20, t(73) = −4.55, p < 0.001, CI [0.11, 0.29]) was found. **Figure 4** shows that for the female bully, the perception of

FIGURE 3 | Three-way interaction effect of Bully's Gender∗Participants' Gender∗NTB on SCL.

bullying increased with the NTB score, while the opposite pattern applied for the male bully.

There was no effect of the need to belong on the reported mental state or on the intention to formally report the bullying, and neither of the moderators influenced the effects of the independent variables.

Although the overall model for informal report was also not significant, F(7,73) = 1.68, p = 0.129, R <sup>2</sup> = 0.19, a main effect of NTB emerged (b = 0.75, t(73) = 2.32, p = 0.023, CI [0.01, 0.14]), suggesting that the higher the NTB, the greater the likelihood of an informal report.

# Neuroticism

To examine the impact of neuroticism, we conducted the corresponding moderation analyses. There was no influence of neuroticism on skin conductance level. With regard to heart

rate, the same participant gender effect as in H1 emerged. In addition, a three-way interaction effect of bully's gender, participants' gender and neuroticism (b = 27.56, t(73) = 2.17, p = 0.033, CI [2.25, 52.86]) on heart rate was found. **Figure 5** depicts the interaction effect for low, medium and high levels of neuroticism. The Johnson-Neyman technique further indicated that the interaction effect of bully's gender and participants' gender on heart rate changed significantly at neuroticism values above 0.93 (9.88%). Bullies of both genders elicited an increase in HR in participants with high neuroticism scores.

The overall model on bullying perceptions was not significant, F(7,73) = 2.01, p = 0.065, R <sup>2</sup> = 0.15, but showed the same main effect of bully's gender on bullying perception as in H2. However, a two-way interaction effect of bully's gender and neuroticism (b = 1.20, t(73) = 2.08, p = 0.041, CI [0.05, 2.35]) emerged. **Figure 6** shows that with increasing neuroticism, the perception

of bullying by the male bully decreased while the perception of bullying by the female bully increased.

Concerning the reported mental state, the overall model was not significant and did not reveal any significant main or interaction effects. Regarding the behavioral intentions, the overall model of informal report was not significant F (7,73) = 1.39, p = 0.223, R <sup>2</sup> = 0.07, while the model of formal report was significant F(7,73) = 2.44, p = 0.026, R <sup>2</sup> = 0.14. The informal model nevertheless revealed a significant interaction of bully's gender and participants' gender (b = 1.56, t(73) = 2.14, p = 0.035, CI [0.11, 3.01]), which was not present when neuroticism was excluded from the model. Here, a cross-gender effect emerged: Female participants would be more likely to report mistreatment by a male bully than by a female bully, while the opposite was the case for male participants (**Figure 7**). The analysis of formal reporting of the bullying situation showed an interaction of bully's gender and neuroticism (b = 2.83, t(73) = 2.97, p = 0.004, CI [0.93, 4.73]). **Figure 8** shows that with increasing neuroticism values, participants would be more likely to report the female bully. The opposite pattern emerged with a respect to the male bully: The lower the neuroticism, the greater the likelihood of reporting mistreatment.

# Social Gender

fpsyg-09-00253 March 5, 2018 Time: 17:4 # 11

To test whether participants' social gender influences their experiences and reactions, we calculated models with selfreported masculinity and femininity.

Masculinity did not show a distinct influence on skin conductance level or heart rate. Concerning perception of bullying and mental state, the inclusion of masculinity did not change the results reported in H1–H3. Only the behavioral intentions to either formally or informally report the mistreatment were partially affected by masculinity. The overall model for formal report was significant, F(7,73) = 2.20, p = 0.044, R <sup>2</sup> = 0.12, and revealed a significant effect of masculinity (b = 1.14, t(73) = 2.18, p = 0.032, CI [0.10, 2.19]): Masculinity was positively correlated with the probability of formal report. The model on informal report was not significant, F(7,73) = 1.20, p = 0.314, R <sup>2</sup> = 0.07.

The inclusion of femininity did not lead to different results with regard to skin conductance and heart rate. Additionally, a two way interaction of bully's gender and femininity (b = 19.38, t(73) = 2.30, p = 0.024, CI [2.58, 36.18]) emerged. With increasing scores on femininity, the heart rate increased in response to the female bully, while femininity did not affect the heart rate in response to the male bully (**Figure 9**). There were no effects on either of the self-reports or the behavioral intention variables.

In sum, the results indicate that with the exception of heart rate, which was higher for men than for women, women and men react similarly to a bullying scenario in a virtual environment (H1). H2 shows that regarding the gender of the bully, the male character was perceived as more threatening. There was no interaction between participants' and bully's gender. With regard to potential moderators, only self-esteem did not prove to be influential, while social gender, neuroticism and need to belong showed various interactions, which are discussed in greater detail below.

# DISCUSSION

The aim of the current study was twofold: We (a) employed virtual environment technology as a testbed in order to learn more about the influence of a bully's gender and of participants' resilience factors on the effects of bullying, and (b) evaluated the effects of the gender of the bully (male vs. female) in one bullying situation (acting rehearsal) on the stress level, selfreported mental state, and behavioral intentions of two groups of participants (men and women). This should form the basis for an effective training intervention which might be applied as a training environment in prevention workshops in schools or universities. Therefore, we conducted an experimental betweensubjects study in which we varied the bully's and participants' gender. A virtual environment was used to create a mild bullying situation by a figure of authority and captured participants' (adverse) reactions by means of physiological data, self-reports and behavioral intentions to report the mistreatment. Contrary to expectation (H1), we did not find that female victims experience more adverse reactions during a bullying situation than male victims. On the contrary, males experienced a stronger increase in heart rates than did females. In line with Fowles (1980), this can be interpreted as an increased action readiness, and might be an indication that men tend to react more physically to threat. However, this needs to be further investigated in future studies. Despite this one difference, it seems that taken individually, the reactions of female and male victims are very similar. Moreover, moderators hardly changed these results. For potential applications of the virtual environment, this means that it might be useful to provide resilience training to both women and men.

We further assumed (H2) that a male bully would elicit more adverse reactions in participants, due to stereotypical beliefs about men and their different physical appearance. While we did not find any main effects for the physiological measures, the analyses revealed that the evaluation of the bullying situation is indeed more negative when the bully is male. Although participants did not report feeling worse, they described the situation as more threatening. While this might seem unsurprising at first glance, it is nevertheless remarkable that the same behavior leads to different effects if only the gender of the bully is changed. The fact that the same behavior displayed by women and men does not necessarily lead to the same effects or attributions has already been demonstrated in other realms (Deutsch et al., 1987). Although the male and the female bully's behavior were experienced differently, behavioral intentions to report the mistreatment were not affected by the bully's gender. For potential future application in resilience training, the more menacing effect of the male bully nevertheless suggests that to increase the effectiveness of such training, it may be more beneficial to include a male rather than a female bully. In addition, the gender could be customized to the "victim's" preferred degree of experienced threat.

According to H3, we expected that the male bully would trigger the strongest adverse reactions in female victims, due to the above-mentioned reasons. In contrast, we supposed that the female bully would elicit less adverse reactions especially when interacting with male victims. The analyses did not reveal any significant interaction effect of the bully's gender and participants' gender, which is in line with the results regarding the main effects of bully's and victims' gender, suggesting that overall, gender is rather unimportant concerning the effects of bullying.

However, the consideration of further moderators changes the influence of the bully's gender on different adverse reactions and the interaction of bully's gender and participants' gender. We considered neuroticism, need to belong, self-esteem and social gender as potential moderators. Surprisingly, the only moderator that did not influence the results was self-esteem. This was particularly unexpected given that previous research (Sapouna and Wolke, 2013) indicated that high self-esteem would enable victims to cope better with such a situation. Our results indicate that high self-esteem did not lead participants to evaluate the situation as less threatening or to feel better. However, as we

focused on the immediate reaction in the situation, this does not preclude that long-term coping might be more successful in participants with high self-esteem.

The impact of the bully's gender on the adverse reactions was affected by neuroticism, the need to belong and social gender. Starting with neuroticism, the results indicate that the higher the participants' neuroticism scores, the more they perceived the female bully as threatening. The opposite pattern was observed for the male bully. While participants with low neuroticism found female bullies less threatening than males, people with high neuroticism evaluated both genders to be equally threatening. Moreover, participants with high neuroticism scores were more likely to formally report mistreatment from a female bully than from a male bully. Male bullies elicit more threat, while female bullies are perceived as less threatening, which may lead to less fear of complaining about mistreatment by females. Moreover, as female bullies are acting against their perceived female role of being warm and sincere, this violation may elicit a desire to penalize them (Bosson and Michniewicz, 2013), in this context through a formal report.

Moreover, the need to belong (Baumeister and Leary, 1995) moderated the relation between the bully's gender and the perception of the bullying situation. The higher the participants' need to belong scores, the more threatened they felt by the female bully, while the male bully induced less threat. We assume that these findings are also attributable to gender stereotypes. Participants with a high need to belong, who have a strong wish for attachments and acceptance, might believe that it is easier to befriend females, as females are expected to be friendlier and more communicative and approachable (Eagly and Karau, 2002). However, in the case of the present scenario, the female bully violated her gender role, which in turn might be perceived as particularly threatening. In contrast to this, males are perceived as less approachable, more dominant and less communicative (Prentice and Carranza, 2002); thus, the male bully was acting more in accordance with his perceived role as a male. Another indication that the female bully was perceived as a role-violating person is provided by the interaction effect of the bully's gender and self-reported femininity on increase in heart rate. The more feminine attributes participants hold, the higher the heart rate increases when encountering the female bully, while the heart rate in response to the male bully was unaffected. One might assume that participants who indicate being more sensitive might more easily notice such a role violation, resulting in a higher heart rate, although this assumption is highly speculative at this point. The fact that only a physiological measure was affected, which is hard to control, might indicate that stereotypical beliefs are embedded on an implicit level, but controlled on an explicit level (i.e., self-report on perceived bullying). However, it needs to be acknowledged that the results on the other psychophysiological variable, skin conductance level, did not manifest themselves in exactly the same way. Although it might be seen as troubling that the two physiological measures did not yield the same results, such findings were also demonstrated in recent studies employing first-person shooter games (Drachen et al., 2010). A potential explanation for the differing impact on different psychophysiological measures might lie in the distinction between the behavioral activation system (BAS) and the behavioral inhibition system (BIS) (Fowles, 1980): While the BAS initiates behavior (approach) and is strongly associated with heart rate, the BIS is an anxiety system, which inhibits behavior and is associated with electrodermal activity. Against this background, a uniform reaction of heart rate and electrodermal activity would not be expected.

Besides the aforementioned effects, neuroticism and the need to belong also affected the interaction between bully's gender and participants' gender regarding physiological responses and behavioral intentions. All of these moderations show the same pattern, which indicates specific cross-gender effects. High neuroticism values affected the relation between bully's gender and participants' gender with respect to heart rate and informal report. Highly neurotic female participants showed larger heart rate changes in response to a male bully than to a female bully. The opposite was the case for highly neurotic male participants (note, however, that the highest increase in heart rate was observed in men with low neuroticism scores being bullied by a male, which is not in line with the pattern described). When only the cross-gender effects are addressed, it seems that in line with the construct of neuroticism (Eysenck, 1947), highly neurotic persons experience high stress levels especially when they are bullied by a person of the opposite sex. While research has shown that males more often bully males and that both females and males mistreat females (Narayanan and Betts, 2014), it might be that the unusual situation of male participants being oppressed by a female bully led the heart rate to increase. In contrast, although females might have experience of being bullied by both genders, the physical appearance of the male bully might have been more intimidating, leading to the increased heart rate. Although this claim cannot be corroborated by previous research, it seems generally plausible to assume that male bodies are perceived as more threatening. However, would this matter in a VR environment, in which no physical harm can be done? This also needs to be addressed in future research, but for the moment, in line with media equation assumptions (Reeves and Nass, 1996; Krämer et al., 2015), we suggest that people automatically react to virtual characters in the way they would toward real humans.

The same pattern of results was revealed for the intention to informally report the mistreatment: Highly neurotic males would be more likely to report bullying by a female to their friends, and might be disturbed by the role-incoherence of the female bully. In turn, highly neurotic females would be more likely to report bullying by a male to their friends.

The analyses showed a further interaction effect of participants' gender and bully's gender on skin conductance for participants with a very low and a very high need to belong. Participants with a low need to belong had very similar skin conductance levels in response to the male and female bully, with the notable exception that male participants reacted more strongly to the female oppressor. However, those participants with a high need to belong seem to react in a special way to a same-sex bully: Female participants showed an increase in skin conductance level in the presence of the female bully, while male participants showed such an increase in response to the male bully. Female participants were rather unaffected by their need to belong level in the presence of a male bully; indeed, those with a high need to belong even showed a slight decrease in their skin conductance level. Most notably, the combination of a male participant being oppressed by a female bully was strongly affected by the participants' need to belong: While male participants with a low need to belong had a very strong skin conductance increase, the conductance was rather low in those with a high need to belong. This pattern is – as is customary with three-way interactions – very difficult to interpret, but seems to indicate that especially for male participants, the need to belong influences their reactions, rendering male participants with a low need to belong especially susceptible to the female bully. Moreover, the results indicate again that people with a high need to belong rather strive for same-sex connections and are more affected when they are bullied by their own sex. With regard to psychophysiological reactions, however, this pattern only emerged for skin conductance. This might indicate that in this regard, reactions are less connected to energizing activity, and are rather associated with inhibition and anxiety (Fowles, 1980). Given that we cannot exhaustively explain the patterns (e.g., why people even feel threatened when they have a low need to belong, which appears to suggest that the need to belong is not a prerequisite for reactions), future research needs to incorporate the need to belong.

It is also very important to take a closer look at what the results might mean for the identification of resilience factors: In line with results by Friborg et al. (2005), neuroticism affected the outcomes and especially influenced the impact of the bully's gender. However, low neuroticism or emotional stability did not ease reactions in general, but only depending on the bully's gender. Therefore, it cannot be seen as a general resilience factor. Findings regarding the role of need to belong were also mixed. While this trait affected perceived bullying and skin conductance, the results did not reveal a clear pattern. At the very least, this construct is worthy of inclusion and testing in future studies. Concerning the question of whether social gender can serve as a resilience factor, we found one single effect of self-reported masculinity on the willingness to formally report the bullying, indicating that with increasingly masculine attributes, the likelihood of reporting the mistreatment in a formal way increased. Given that masculine attributes comprise self-confidence and the ability to deal with pressure (Stein et al., 1992), it is logical that these attributes would support the participant to defend her/himself by reporting the mistreatment in formal situations. This is in line with findings that self-esteem fosters resilience (Sapouna and Wolke, 2013), although this effect did not directly emerge in our study. Moreover, social gender influenced the effect of the bully's gender on heart rate changes. In conclusion, we recommend the inclusion of social gender in further studies. However, as the reliability of the scales used in our study proved to be rather low, we would suggest the use of different instruments. Furthermore, an explicit consideration of androgyny would be beneficial, as this could turn out to be a more adequate resilience factor than masculinity, the effects of which were rather ambiguous.

With regard to the overarching question of whether the system could be used to train resilience and appropriate behavior after a bullying event, we can conclude that future developments in this direction would be worthwhile. As the results show that people react to bullies in virtual environments, and feel especially threatened by a male bully, the environment could conceivably be used to train resilience against bullying. The experience might, for example, be integrated into a workshop, in which participants learn to withstand the bullying, regulate their emotions, and are

taught appropriate responses – to the bully as well as regarding the reporting of the behavior.

The study is, of course, not without limitations. Most importantly, our results are limited to showing the effects of bullies of different gender on participants of different gender, additionally considering the role of several person variables. Our design cannot provide insights into the general question of whether the effects in the virtual environment differ from those in the real world and/or whether the effects are due to the specific bullying rather than the acting rehearsal situation. Although questions such as these have been targeted in previous research, it might be useful to address them again in further studies that include the appropriate control groups.

Although the sample is of a reasonable size for a laboratory study, the number of participants is nevertheless rather low when considering three-way interactions. Furthermore, the sample was quite homogenous and included mostly students. Therefore, the results are only generalizable to students – although students constitute one of the most important target groups for future training interventions. Given the specific setting we used (bullying by a figure of authority in an institutional setting), the results might also not be generalizable to other, more informal bullying by peers. However, as a first step, we aimed to gain insights into people's willingness to report misbehavior of a superior, as this might even be more difficult and worthy of training compared to reporting bullying by peers. The specific setting also entailed a situation that might not have been appealing to all participants, although this would likely have been true for any type of task. However, it should be noted here that the situation might not have been sufficiently threatening, as the mean values show only mild perceptions of threat. Moreover, the setting only included a small amount of ridiculing, although this is a frequent element of bullying. Another potentially problematic aspect of the virtual environment is the fact that the second virtual character, the fellow student, was always male. Although this was kept constant in all conditions, it might have influenced especially female participants in specific ways.

Some limitations regarding the dependent variables also need to be noted. For example, the mental state scale was developed as a scale for clinical samples, which always bears the risk that there is only limited variance in a sample with nonclinical participants. However, variance appeared to be in a normal range in the present study. The psychophysiological measures have to be treated with caution, as the Empatica E4 has not been validated in previous studies, meaning that it is unclear whether the data might be influenced by artifacts. Another methodological problem is that during the baseline measurement, participants already knew whether they would be interacting with a female or male instructor, since they had seen a picture of the virtual character in the instructions. This might have attenuated the effects. For future studies including gender (especially gender of the bully), it would be advisable to collect data on stereotypical beliefs. An awareness of the participants' gender stereotypes might facilitate the interpretation of some of the results. With regard to the person variables and potential moderators, we included those which have already been described in the literature (such as neuroticism, gender and self-esteem), but other variables, for instance prior experience with verbal and physical bullying, might, of course, also have influenced the results.

# CONCLUSION

The present study demonstrates that a virtual bullying situation can have distinct effects. With regard to basic research, we conclude that the use of such an environment enables researchers to deepen their understanding of processes in bullying situations and to identify factors that influence victims' reactions and resilience. Specifically, the results demonstrate that a male bully is perceived to be more threatening than a female bully, and that men in particular react to bullying with an increased heart rate, which might indicate their readiness to act. Moreover, the personality factors neuroticism, need to belong and social gender moderate the results. With a view to future applications, the environment could indeed be used in order to prepare people for potential future bullying situations – especially when a male bully is used. The experience might be used as part of an education program that builds on the emotional reactions by reflecting on appropriate reactions and training self-regulation of one's own emotions, as well as learning about appropriate further actions such as formal reporting.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the APA with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the local ethics committee of the Department of Computer Science and Applied Cognitive Science of the University of Duisburg-Essen.

# AUTHOR CONTRIBUTIONS

Conceptualization of the study: NK, SM, and DF. Development of the virtual environment: SM and DF. Collection of the data: ET. Technical support for data collection: DF. Data analyses: ET and SS. Writing of the manuscript: NK and SS. Editing of the manuscript: SM.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg. 2018.00253/full#supplementary-material

VIDEO 1 | Oculus video of the bullying situation featuring the male perpetrator.

# REFERENCES

fpsyg-09-00253 March 5, 2018 Time: 17:4 # 15



engineering: a review. Comput. Educ. 95, 309–327. doi: 10.1016/j.compedu. 2016.02.002


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Krämer, Sobieraj, Feng, Trubina and Marsella. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Virtual Reality for Non-Ordinary Consciousness

#### *Gabriel Axel Montes1,2,3\**

*1University of Newcastle, Newcastle, NSW, Australia, 2Priority Research Centre for Stroke and Brain Injury, Hunter Medical Research Institute, Newcastle, NSW, Australia, 3Research in Bias Node, Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia*

#### Keywords: consciousness, virtual reality, meditation, neurological disorders, religion and science, yoga, shamanism, embodied cognition

Virtual reality (VR) technology is currently seeing a surge of interest in industry, academia, and the public. Myriad developments are already underway, aiming to bring the technology directly to users in ways that offer access to rich virtual multisensory experiences. The immersivity that VR offers is enabled by the brain's constraints on processing bodily self-consciousness (BSC), which anchors self-identity and -location to the physical body. Current developments are focused on leveraging the face-value constraints of BSC to craft immersive VR experiences that are plausible to the human user; i.e., most VR applications today take advantage of BSC as a "trick" or illusion that the body plays on the mind. However, the malleability of BSC can be more powerfully approached as an asset for enhancing the repertoire of body representations and plasticity available to everyday human experience. By manipulating the boundaries of self-local experience, VR can be used as a tool for the cultivation of non-ordinary consciousness (NOC). Such an approach would have the potential to equip society with novel pathways for studying the farther reaches of consciousness and providing opportunities for access to enhanced conscious experiences in everyday life, with far-reaching philosophical and ethical implications.

#### *Edited by:*

*Massimo Bergamasco, Sant'Anna School of Advanced Studies, Italy*

*Reviewed by:* 

*Matteo Candidi, Sapienza Università di Roma, Italy*

> *\*Correspondence: Gabriel Axel Montes info@gabrielaxel.com*

#### *Specialty section:*

*This article was submitted to Virtual Environments, a section of the journal Frontiers in Robotics and AI*

*Received: 06 September 2017 Accepted: 18 January 2018 Published: 07 February 2018*

#### *Citation:*

*Montes GA (2018) Virtual Reality for Non-Ordinary Consciousness. Front. Robot. AI 5:7. doi: 10.3389/frobt.2018.00007*

BODILY SELF-CONSCIOUSNESS

A growing body of work in cognitive neuroscience and philosophy of mind has characterized the multisensory mechanisms governing the integration of bodily signals on which VR relies, or what is called BSC (Blanke et al., 2015). Beginning in neurological patients, this work has extended into healthy subjects and received much attention with bodily illusion paradigms, such as the rubber-hand illusion, enfacement illusion, and the full-body illusion, in which a study subject experiences the subject sense that a hand, face, or other body, respectively, is their own. The full-body illusion was developed using VR, and the rubber hand and enfacement illusions, while developed without using VR, were eventually applied in VR. These paradigms have revealed that such illusory experiences can be achieved through the spatiotemporal manipulation of visuotactile, and sometimes vestibular and auditory, inputs in relation to body parts, either directly or within the behaviorally and neurophysiologically defined potential space—the peripersonal space (PPS)—immediately surrounding the stimulated body part. Empirical data suggest two major brain networks underlying BSC: a frontoparietal one (intraparietal sulcus and premotor cortex; IPS and PMC, respectively) for processing signals related to circumscribed body parts, e.g., hand and face, and a temporoparietal one (supramarginal gyrus, insula, superior temporal gyrus) for processing signals for trunk-based self-identification and self-location. In addition, it has been demonstrated that vestibular processing contributes significantly to anchoring self-location with the physical body (Pfeiffer et al., 2013).

Four principal constraints on multisensory integration have been proposed and evidenced to modulate BSC: (1) proprioception, (2) body-related visual information, (3) PPS, and (4) embodiment. These constraints determine whether or not a change in BSC takes place, fulfilling the criteria for the aforementioned bodily illusions to occur. The result is a subjective (and objectively measurable) sense of ownership over the virtual/prosthetic body part, e.g., the rubber hand (Botvinick and Cohen, 1998; Apps and Tsakiris, 2014), self-identification with the full body, and self-location (Blanke and Metzinger, 2009; Blanke, 2012). The integration of bodily ownership with motor signals yields the sense of agency, or volition over one's actions (Tsakiris et al., 2007; Ma and Hommel, 2015; Trzepacz et al., 2015; Haggard, 2017). The BSC constraints are modulated by the confluence of hierarchical bottom-up and top-down streams of information in the nervous system, which have been formalized in Bayesian models of predictive coding (Clark, 2013; Samad et al., 2015) and the free-energy principle (Friston, 2010; Apps and Tsakiris, 2014; Pezzulo et al., 2015; Donnarumma et al., 2017). In the rubberhand illusion, for example, bottom-up unisensory inputs (touch on the real hand and vision of the rubber hand) are integrated in multisensory areas according to predictions based on prior experiences (Hohwy and Paton, 2010). The multisensory incongruity produces a mismatch between the sensory and predictive streams of information, leading to an attempt at its resolution (prediction error minimization) by weighing the proprioceptive input higher than the visual incongruence, thus producing embodiment of the rubber hand. These models help explain why the virtual environments can elicit such a convincing sense of ownership and agency over illusory bodies and achieve a feeling of presence.

While the behavioral, neural, and theoretical evidence supporting BSC are well defined, the research field currently treats the variability and plasticity of BSC as a "trick" or mere illusion that besets a static physical body, which the brain then attempts to correct. The concept of an illusion is apt as a convention for referencing the deviational phenomena; however, a strict adherence to this notion betrays the implication of the evidence that self-localization is phenomenologically associated with the physical body *because* of the very mechanisms of multisensory information processing, which are biased in favor of heavily weighted-predictive priors on the physical body (trunk BSC especially) (Blanke, 2012; Blanke et al., 2015), rather than due to the body being an *a priori* ontological locus of selfhood—the "self-localization fallacy." Self-location coincides with the physical body because it is, under normal conditions, the nexus where the various sensory streams converge and with respect to which they are globally integrated. The physical body thereby becomes a statistically evergreen Bayesian prior that strongly conditions perception to perceive consciousness as being anchored to, or localized in, it. This view remains commensurate with the starting point that consciousness is produced by the brain while concurrently remaining neutral regarding an ontologically fixed locality (or non-thereof) of selfhood.

# NON-ORDINARY CONSCIOUSNESS

Resetting the self-localization assumption and leveraging the insights of BSC-related scientific evidence offer a foundation for manipulating BSC for hitherto-unexplored ends, particularly through taking advantage of VR. Whereas in the case of neurological or psychiatric patients who experience pathological BSC distortions of self-identification and -localization, the goal might be to promote a healthier and more physically bound BSC (Blanke, 2012), in healthy subjects with neurotypical BSC where there is more room to manipulate and expand the ordinary boundaries of physically bound BSC. In support of this idea is the fact that many practitioners of meditation, yoga, and other methods of NOC report that such practices result in a decreased identification with their physical body and an increased sense of overall well-being (Vago and David, 2012; Tang et al., 2015; Montes, 2017). In line with these reports, the cross-cultural purported purpose of many of the practices of the world's wisdom and shamanic traditions (e.g., Buddhism, contemplative Christianity, Sufism, Taoism, Hinduism, etc.) is to reduce overidentification with the physical body and ego-self, as a means of ameliorating mental-emotional suffering. Many of the NOC practices associated with these traditions directly operate on BSC, such as out-of-body experiences, lucid dreaming, cultivating alternate body schemas and models, qi gong, hypnagogic states, and heightened interoceptive awareness (which crucially involves the insula and a brain region that also regulates self-identification and self-location) (Aspell et al., 2013; Ronchi et al., 2015), among myriad others. The methods of NOC thereby offer a treasury of techniques by which to entrain non-ordinary BSC for both consciousness-related scientific research and education. Opening this line of investigation honors the underappreciated insight of BSC research that the sense of self is experienced as bound to the physical body because of the mechanisms of embodied multisensory integration and hierarchical-predictive weighting, not because it is inherently bound *per se* to the body.

While for many scientists, NOC methods may seem out of reach or experimentally intractable, recent research has made sizeable progress in parsing them into core cognitive domains that may be studied in the laboratory and are easily manipulated using VR (Vago and David, 2012; Tang et al., 2015; Montes, 2017). NOC practices typically enact a set of constructs in cognitive neuroscience, which are supported by phenomenological, behavioral, and neuroscientific evidence: attention, intention, expansion/ evocation, refinement, engagement, and evaluation. While some models prefer more generalizable constructs (e.g., attention, self-regulation, self-awareness) (Vago and David, 2012; Tang et al., 2015), maintaining both specificity and adaptability helps to strike a balance that more precisely anchors NOC methods in both cognitive neuroscience and their rich phenomenology (Montes, 2017). Furthermore, a framework/model may possess broader and deeper applicability if it accounts for computational principles of cognition, such as predictive coding. Because of the phenomenological nature of NOC methods, it is also advantageous if a model can be used to not only measure, but crucially, to entrain and enact those methods. This fosters a neurophenomenological fluidity and flexibility conducive to both research on NOC and education on conducting its practices by cultivators.

Informed by the constraints of BSC and the insights of predictive coding, VR is a powerful means of entraining NOC methods for the down-weighting of self-localization priors and expanding into and integrating alternate body schemas. Particular spatiotemporal combinations of multisensory—and even brain—stimulation can be explored and systematized to reliably and progressively induce the ownership of and agency over virtual or robotic bodies. The virtual bodies may have different scale, form, consistency, number, interactivity parameters, etc. to achieve desired effects and would be embodied by the user in progressively greater orders of magnitude so as to diversify the user's body priors and entrain novel ones (Hohwy and Paton, 2010). Newly conditioned body schemas and NOC can be integrated with the physical body so that the individual may maintain a healthy relationship with the four-dimensional world of space-time while cultivating and assimilating new body priors. While there is much research remaining to be done regarding NOC, the time is ripe for both academic and industry efforts to explore this synergy between VR and NOC.

# SOCIETAL IMPLICATIONS

As the use of VR technology grows increasingly widespread, the philosophical, ethical, and societal implications will likewise continue to grow. The attenuation of physical body Bayesian priors that could come with conditioning to alternate worlds and embodiment dynamics (especially in children growing up with extensive VR use) may result in an experience of a greater prereflective readiness to take ownership over virtual bodies. Such experiences may give credence to the self-localization fallacy and raise new philosophical questions about the nature of consciousness and embodiment. Ethically, it will be important to recognize the potency of VR-enabled NOC and to remain mindful about overly relying on VR beyond it, providing "primer" experiences that can then be cultivated without VR.

The potential for a huge societal impact will be mainly in the use of VR for education on NOC; access to NOC will be democratized and more readily available to non-practitioners. Scientific research will also be able to harness the new technology to conduct previously inaccessible experimental paradigms. Virtual embodiment of alternate body schemas will enable a spectrum of BSC exploration and facilitate NOC experiences in potentially safe and systematic ways. User data collection and analytics and computational models of BSC and NOC can fuel artificial intelligence (AI) engines that guide users through BSC experiences in VR according to their strengths and weaknesses. While it remains to be seen if NOC VR will make a significant contribution to the amelioration of human suffering as maintained by the practices/ traditions that might inform or inspire NOC VR methods, the immersive embodiment afforded by VR coupled with the repository of available NOC methods suggests promising potential in this area.

# REFERENCES


In addition to democratizing NOC for clinically healthy individuals, NOC VR will also be able to serve clinical neurological populations. Existing research already reveals impaired BSC in neurological patients, including disordered self-localization (Heydrich and Blanke, 2013). Non-ordinary BSC will deepen the scientific understanding of BSC and open new doors for offering BSC therapies and facilitate the embodiment of robotic bodies. Furthermore, in a future society where there may be imprecise priors for embodiment and BSC due to the prevalence of immersion in VR, NOC-informed VR therapies for the normalization of BSC may become an industry, with room for technological and medical innovation. Such scenarios may necessitate diverse training in the cognitive neuroscience of BSC, clinical manifestations of BSC disorders, mechanisms of VR, and experience with NOC that is grounded in both empirical science and phenomenology.

Taken together, VR and NOC are poised to form a mutually beneficial alliance in service of BSC treatment and enhancement. Neurological disability could make use of VR-assisted BSC therapy, and healthy individuals may harness the multisensory stimulation afforded by VR to embody alternate body schemas as a means of dilating the bounds of accessible conscious experience. With the assistance of AI, it will be possible to create experiences and programs tailored to individual needs and wishes. Importantly for researchers in academia and industry alike, NOC VR affords opportunities for generating experience-driven hypotheses and experimental paradigms for consciousness research. Balancing the exploration and cultivation of both physically embodied and non-ordinary BSC, society is set to reap the rewards of the intrepid exploration of BSC potentiated by VR.

# AUTHOR CONTRIBUTIONS

GAM conceived, drafted, and revised this work in its entirety.

# ACKNOWLEDGMENTS

The author would like to thank Mind & Life Europe for supporting the larger work of which this article forms a part, and Andreas Roepstorff of Aarhus University (Denmark) and Bryan Paton of the University of Newcastle (Australia) for conceptual guidance.

# FUNDING

This work was partially funded by a Francisco Varela Award from Mind & Life Europe (Grant # 2016-EVarela-Axel Montes, Gabriel).


cardio-visual effects on bodily self-consciousness. *Neuropsychologia* 70, 11–20. doi:10.1016/j.neuropsychologia.2015.02.010


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Montes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The Integration of Realistic Episodic Memories Relies on Different Working Memory Processes: Evidence from Virtual Navigation

Gaën Plancher1,2,3, Valérie Gyselinck1,4 and Pascale Piolino1,2,5,6 \*

<sup>1</sup> Laboratoire Mémoire et Cognition, Université Paris Descartes, Paris, France, <sup>2</sup> Institut de Psychologie, Université Paris Descartes, Boulogne Billancourt, France, <sup>3</sup> Laboratoire d'Etude des Mécanismes Cognitifs, EA 3082, Université Lyon 2, Lyon, France, <sup>4</sup> IFSTTAR-LPC, Versailles, France, <sup>5</sup> INSERM U894, Centre de Psychiatrie et Neurosciences, Paris, France, 6 Institut Universitaire de France, Paris, France

#### Edited by:

Mel Slater, University of Barcelona, Spain

#### Reviewed by:

Rebecca Wiczorek, Technische Universität Berlin, Germany Farhan Mohamed, Universiti Teknologi Malaysia, Malaysia

\*Correspondence: Pascale Piolino pascale.piolino@parisdescartes.fr

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 05 July 2017 Accepted: 12 January 2018 Published: 30 January 2018

#### Citation:

Plancher G, Gyselinck V and Piolino P (2018) The Integration of Realistic Episodic Memories Relies on Different Working Memory Processes: Evidence from Virtual Navigation. Front. Psychol. 9:47. doi: 10.3389/fpsyg.2018.00047 Memory is one of the most important cognitive functions in a person's life as it is essential for recalling personal memories and performing many everyday tasks. Although a huge number of studies have been conducted in the field, only a few of them investigated memory in realistic situations, due to methodological issues. The various tools that have been developed using virtual environments (VEs) have gained popularity in cognitive psychology and neuropsychology because they enable to create naturalistic and controlled situations, and are thus particularly adapted to the study of episodic memory (EM), for which an ecological evaluation is of prime importance. EM is the conscious recollection of personal events combined with their phenomenological and spatiotemporal encoding contexts. Using an original paradigm in a VE, the objective of the present study was to characterize the construction of episodic memories. While the concept of working memory has become central in the understanding of a wide range of cognitive functions, its role in the integration of episodic memories has seldom been assessed in an ecological context. This experiment aimed at filling this gap by studying how EM is affected by concurrent tasks requiring working memory resources in a realistic situation. Participants navigated in a virtual town and had to memorize as many elements in their spatiotemporal context as they could. During learning, participants had either to perform a concurrent task meant to prevent maintenance through the phonological loop, or a task aimed at preventing maintenance through the visuospatial sketchpad, or no concurrent task. EM was assessed in a recall test performed after learning through various scores measuring the what, where and when of the memories. Results showed that, compared to the control condition with no concurrent task, the prevention of maintenance through the phonological loop had a deleterious impact only on the encoding of central elements. By contrast, the prevention of visuo-spatial maintenance interfered both with the encoding of the temporal context and with the binding. These results suggest that the integration of realistic episodic memories relies on different working memory processes that depend on the nature of the traces.

Keywords: virtual environment, episodic memory, working memory, binding, concurrent task

# INTRODUCTION

fpsyg-09-00047 January 30, 2018 Time: 12:11 # 2

Early models of memory made clear distinctions between shortterm and long-term memory. In 1890, James (1890) distinguished between primary and secondary memory. Primary memory, later renamed short-term memory, reflects current states of consciousness, while secondary memory, now referred to as long-term memory, consists of conscious memory of the past. This distinction was maintained in the majority of memory models (e.g., Waugh and Norman, 1965; Atkinson and Shiffrin, 1968). Since then, each construct has been investigated separately. This gave rise to many different theoretical models, mainly pertaining to the structuralist view. On the one hand, shortterm memory evolved into the concept of working memory (WM), classically defined as a system dedicated to the temporary storage and the processing of information (Baddeley and Hitch, 1974). On the other hand, among several forms of longterm memory, the concept of episodic memory (EM) rapidly emerged. Episodic memories are typically described as longterm memories for which the mental experience includes specific information such as time, place, or perceptual details (Johnson and Raye, 1981; Tulving, 2002). Through a process of binding, the various items of information of EM, what-where-when, are linked together, forming connections that give a memory its specificity and distinctiveness (Johnson et al., 1993). Besides EM, different forms of long term memories exist. Semantic memory concerns the store of facts and general knowledge, including the mental lexicon. Implicit or non-declarative memory refers to a heterogeneous collection of non-conscious memory abilities including skills and habits, priming and simple conditioning (Squire, 1992).

While several scientific fields of research have led to a better understanding of these various forms of memory in the lab, the majority of studies seldom targeted realistic situations close to daily life, mainly due to methodological issues. Over the past decades, however, virtual environments (VEs) have gained popularity as a tool in cognitive psychology and neuropsychology because they enable researchers and clinicians to create naturalistic and controlled situations (e.g., for a review, Kane and Parsons, 2017; Plancher and Piolino, 2017). VEs can be developed for various situations. Depending on the study design, the environment can take the form of a city, an apartment, a store, a garden, etc. Interaction with the environment can be accomplished through a huge variety of devices, from a simple joystick or a keypad to a complex driving simulator. VEs have become a good candidate to study EM because they appear particularly suited to properly consider the various components of EM, for which an ecological evaluation is crucial.

Several factors have been identified as modulating the integration of episodic memories, e.g., organization learning (Roenker et al., 1971), level of processing (Craik and Lockhart, 1972), emotion (Kensinger and Corkin, 2003), etc. Some factors relate to the encoding stage, some to the consolidation stage, and others to the retrieval or recall stage. However, the interaction at encoding between WM and EM has rarely been directly assessed in naturalistic situations. This is particularly surprising, as the concept of WM has become central for understanding a wide range of cognitive functions. For example, WM capacities have been found to be involved in numerous areas of higher order cognition including language comprehension (Daneman and Carpenter, 1980; Gathercole and Baddeley, 1993), mathematics (Logie and Baddeley, 1987), reasoning (Engle et al., 1999), and spatial model construction (Gyselinck et al., 2007, 2009, 2015). As WM is connected with many cognitive functions, it is sometimes considered as the heart of cognition.

In several models, WM is seen as an interface between shortterm perceptual memories and long-term memory, thus being of primary importance in the encoding process of future longterm memories. Models vary in their description of the way they interact, however (Ericsson and Kintsch, 1995; Cowan, 1999; Baddeley, 2000; Oberauer, 2002; Unsworth and Engle, 2007). Up to now, these models have mainly investigated the role of longterm memory in WM performance. In the present study, we aim rather at investigating the role of WM in the construction of each aspect of episodic traces, i.e., the traces of what, where and when.

The dual functions of storage and processing characterize WM functioning. Both processing and storage compete for attention, which is a limited resource. The WM model of Baddeley and Hitch (1974) distinguishes several components: the peripheral slave systems and the central executive system. The slave systems include the phonological loop which is necessary for the maintenance and the processing of verbal material, and the visuospatial sketchpad which is necessary for the maintenance and processing of visuospatial material. Finally, the central executive system manages the two slave systems (Baddeley and Hitch, 1974). Maintenance is of primary importance in most of the tasks and activities involving WM since, when both storage and processing are needed, participants usually tend as soon as possible to maintain the items to be remembered before processing them. Two mechanisms of maintenance have been distinguished in WM, articulatory rehearsal and refreshing (Baddeley et al., 1984; Barrouillet and Camos, 2012, respectively). Articulatory rehearsal has been described as being particularly involved in the maintenance of verbal material. The process of rehearsal can be blocked by articulatory suppression, i.e., a concurrent articulation of irrelevant verbal material (e.g., "babababa. . ."). Articulating this syllable involves a minimal cognitive load, but impairs memory performance of verbal information (Camos et al., 2009). The second maintenance mechanism is refreshing. It is primarily dedicated to visual and spatial material, even if the maintenance of verbal material can also rely on refreshing (Grillon et al., 2008; Camos et al., 2009). It enables the maintenance of memory traces through refocusing, i.e., thinking briefly of a just-activated spatial or visual representation.

In the present study, we investigate the role of WM in the construction of episodic memories using an original paradigm in a VE that enables all the components of EM (what, where, when, and binding) to be assessed. We address the question of whether preventing the verbal or visuospatial mechanism of maintenance in WM will have the same effect on the various EM traces of what, where and when. Although various methods have been developed to assess EM, few address entirely the original definition. Most of the time, EM is assessed with very simple

tasks, e.g., remembering a word shown on a computer screen, which does not match the definition of EM as the ability to remember what, where, and when. Recently, some studies have begun to use a VE to assess episodic memories in ecological largescale environments allowing a multi-component assessment of EM (Burgess et al., 2001; Sauzéon et al., 2011; Plancher and Piolino, 2017). In Plancher et al.'s studies, the usefulness of VEs has been demonstrated with young adults, healthy elderly and Alzheimer patients. Typically in these studies, participants were immersed in a VE in which they navigated via a video game wheel and followed a route composed of different turns. In addition to navigating, the participants were instructed to memorize all the elements of the scenes that they encountered within the environment, and to remember the temporal and spatial context associated with the elements so that they would be able to recall them at the end of the presentation. Some of the results suggested that assessing EM in a VE is more ecologic because the memory complaint was more highly correlated with the performances on the virtual test than with performances on the classical memory test (Plancher et al., 2012).

Some previous studies focused on the involvement of WM in spatial cognition using VEs. These studies can be considered as good assessments of the where component of EM. Meilinger et al. (2008) examined the WM involvement in a wayfinding task. Participants learned routes in a VE while they were disrupted by a visual, a spatial or a verbal secondary task. In the visual task, the participants had to imagine a clock with watch hands and indicate if the hands pointed to the same or different halves as the times they had heard. In the spatial task, the participants had to indicate where a sound was coming from (left, right, or front). In the verbal task, the participants had to perform a lexical-decision task. In all secondary tasks, participants received the stimuli via headphones and responded by pressing buttons on a response box. The authors observed that, compared to a control group, all secondary tasks interfered with wayfinding of the routes previously seen, by impacting the encoding of environmental information. The interference was stronger with the visual secondary task. According to the authors, the results indicate that the phonological loop and the visuospatial sketchpad are both involved in the encoding of environmental information. Meilinger et al. (2008) thus put forward a dual coding theory of human wayfinding.

In another virtual reality study, the involvement of WM in the construction of the mental representation of space was investigated (Gras et al., 2013). During route learning, the participants were asked to do a tapping task (tapping four keys sequentially in a parallelogram shape), or an articulatory suppression task (repeating "babebibobu"), or nothing, depending on the condition. Results showed different interference effects depending on the task (layout task vs. recognition of landmarks for example); in addition, the visuospatial abilities of WM modulated performance in the construction of the spatial model of a VE.

As far as we know, however, no experiment has yet assessed the role of WM by distinguishing the verbal and visuospatial subcomponents on different measures of episodic memories, that is, on EM in its entirety, i.e., what, where and when. In the present study, two secondary tasks were used. One focused on the verbal component, thus preventing the verbal rehearsal of episodic traces, while the other one focused on the spatial component, preventing the visuospatial refreshing of episodic traces. In the control condition, participants performed no secondary task.

The rationale of the present study is as follows. If an episodic trace (what, where, or when) relies on verbal maintenance and on the phonological loop, then the verbal secondary task performed during learning is expected to interfere with its encoding and hence result in a poorer recall. If an episodic trace relies on visuospatial representations requiring maintenance by refreshing, and the visuospatial sketchpad, the visuospatial secondary task should interfere also with its subsequent recall. More specifically, we assumed that factual traces (what) representing events and objects that could be easily verbalized should be maintained with verbal rehearsal. However, due to their visual nature they should also be maintained with refreshing. Thus, an interfering effect of both the verbal and the visuospatial concurrent tasks was expected. In contrast, the maintenance of spatio-temporal traces (where and when) and binding is probably less verbal and should be predominantly maintained with refreshing. Thus, mainly – if not only – interference with the visuospatial task was expected on performance reflecting the where, when and binding. The objective of the present study was to test these hypotheses in a more ecological paradigm than the ones traditionally used.

# MATERIALS AND METHODS

# Participants

Eighty-eight undergraduate psychology students at the University (71 females, mean age = 20.32 years; SD = 1.71) received a partial course credit for participating. Each participant was randomly allocated to one of three groups (30 or 28 participants per group). We recorded the frequency with which participants played video games, and whether they had a driver's license. Forty participants had a driver's license and 51 participants regularly played video games. They were equally distributed over the three groups. However, to avoid an effect of familiarity with driving and video games on our results, before the presentation of the experimental environment, all the participants trained themselves on an empty track until they all felt comfortable with the apparatus. All participants gave their informed consent to the study, which was performed in compliance with the Declaration of Helsinki and with the approval of the University's Institutional Review Board.

# Materials The Virtual Equipment

The virtual equipment was composed of a computer-generated 3-D model of an artificial environment. This environment was built with Virtools Dev 3.0 software and the novel EditoMem and SimulMem softwares developed in the lab. The environment was run on a PC laptop computer and explored using a video-game steering wheel, a gas pedal, and a brake pedal. It was projected with a video projector onto a screen 85 cm high and 110 cm wide.

The participants were seated in a comfortable chair. The VE was projected 150 cm in front of them.

# The Virtual Environment

An urban environment simulating French buildings was created. Since the participants were supposed to be sitting in a virtual car, the steering wheel and windshield were part of the images projected during the task (**Figure 1**). In the VE, one route connected ten specific scenes. Each specific scene comprised different elements: one central element (e.g., a newsstand or a sandwich shop) and two or three secondary elements (e.g., a man or a bench). The order in which the ten scenes were encountered (identified by the main element in each scene) was the following: a train station, a newsstand, a post office, a roadworks zone, a fountain, an old building, a parking lot, a sandwich shop, a car accident and a set of shops. Specific areas were located at a turn (**Figure 1**), and a soundtrack of typical city noises (cars, people, birds, etc.) heard through speakers helped the participants to feel immersed in the environment. No other vehicles were presented in the environment and no specific traffic rules had to be respected because, as presented on **Figure 1**, the spatial environment did not contain decision points (i.e., deciding to turn left or right).

To be used for the secondary tasks, garbage containers were located on the sidewalks of the road. In the numerical secondary task assumed to interfere with the phonological loop the participants had to memorize the number of garbage containers. The containers were either green or yellow. The participants had to maintain the number of green and yellow

garbage containers, respectively (there were six yellow and four green altogether). In the visuospatial secondary task, the participants had to memorize the spatial pattern composed by the garbage containers. They were displayed along a line in order to avoid the verbalization of visual forms such as "a square" or "a T." They had to maintain the spatial arrangement of five containers (e.g., first position: yellow/second position: green/third position: yellow/fourth position: green/fifth position: green) (See **Figure 2**). A total of four patterns was used.

# Procedure

The condition of encoding was manipulated between-subjects. The same VE was used for all conditions. In all three conditions

the participants were asked to drive into the town, without stopping, and to memorize all the elements of various scenes encountered in the town (what), along with the associated spatial locations (where) and the temporal context (when). An example scene not actually shown in the experiment was presented as a picture before the exploration, to ensure that the participants understood what they had to memorize: "If you encounter this scene in the virtual town, you have to memorize that there is a bakery, in the beginning of the town, and that this scene was located on a right-hand turn."

Depending on the condition while driving in the virtual town the participants were disrupted by a secondary task that was either verbal or visuospatial. In the control condition, no secondary task was given. In the numerical condition, they were asked to memorize the total numbers of yellow and green garbage containers. Participants had thus to update the numbers each time a new garbage container was encountered. In the visuospatial condition they had to memorize the spatial arrangement of each pattern. The immersion ended when participants reached the edge of the town, which took around 3 min. Participants were informed that in the primary task, which involved the memorization of all the elements of the town, it was not necessary to include the garbage containers. Participants were instructed that both tasks, the primary and the secondary, were of equal importance.

Immediately after the immersion, the participants performed a recall test that assessed their performance in the secondary task. The participants of the verbal condition had to recall the total number of green and of yellow garbage containers. Participants of the visuospatial condition had to draw on a blank sheet the specific patterns in which the green and yellow containers were arranged. All recall tasks took 3 min. During this time, the participants of the control condition chatted with the experimenter.

After this first recall, we evaluated the participant's performance in the EM test. In this test, we used a series of memory tests previously applied to assess EM with the same kind of paradigm (Plancher et al., 2010, 2012, 2013). Participants were required to perform a written free recall of all the elements they remembered, and when they remembered an element, they had to spell out the associated spatiotemporal context. The instructions were associated with an example as follows (each dependent variable is in brackets, associated with the maximum score):




The experimenter noted all recalls on a structured grid of responses. We did not take into account the recall of secondary elements (e.g., bench, tree, person) because they are too generic in a town and thus did not reflect the EM, we focused only on central elements (e.g., newsstand, train station, etc.). There was no specific order in the recall of the components. Once the element had been recalled, the participants could then provide contextual recall in any order. In total, 5 min were allowed for the recall.

In addition, we computed a binding score. For each element recalled, we noted whether the participants recalled the associated components (when and/or where). For example, if they recalled "the post office," did they recall where and when it was presented (max by item = 2)? The binding score for a subject was the sum of all the contextual recalls (number of bindings correctly recalled; max = 20).

Performance in the secondary tasks was expressed as a percentage of correct responses (with 100% for all participants performing the control condition).

Prior to the beginning of the experiment, all participants underwent a training session in an empty environment (i.e., only streets) with a different spatial layout from that of the town subsequently used for the test. They were free to navigate anywhere on the training track. This training session provided the participants with an initial experience of a VE, and familiarized them with control of the virtual car. This session lasted until participants felt familiar with the equipment (on average around 4 min). After the training, the participants were immersed in the VE. The entire experiment lasted around 25 min, including the instructions.

We made the following hypotheses: while the number of what recalls should decrease with both secondary tasks, the number of where, when and binding should only decrease with the visuospatial secondary task.

# RESULTS

Analyses were performed on the recall of each EM score (what, when, where and binding) through a series of ANCOVAs with the condition (verbal, visuospatial, no secondary task) as a betweensubjects factor and the performance in the secondary task as a controlled variable. We decided to control this performance in order to avoid any influence of the task difficulty; the performance was expressed as a percentage of correct responses (with 100% for all participants performing the control condition). To determine the direction of the differences, we carried out post hoc Tukey tests. The following Tukey comparisons were analyzed: condition 1 (control) versus 2 (verbal secondary task) and condition 1 versus 3 (visuospatial secondary task). When the verbal or visuospatial secondary task conditions led to a poorer performance than the control condition, this was interpreted as reflecting an involvement of this component in the memorization of the episodic score. When both secondary tasks statistically differed from the control one, we performed the following Tukey comparison: condition 2 (verbal) versus condition 3 (visuospatial).

**Table 1** shows correct recall of the EM components with means and standard deviations by condition and the results of ANCOVAs and post hoc Tukey tests. A main effect of condition


TABLE 1 | Means and standard deviations of episodic scores for the various experimental conditions and results of ANCOVAs and post hoc Tukey tests.<sup>1</sup>

on the What recall was observed: as expected, participants performed better in the control condition (1) than in the other two (2 and 3) (**Figure 3**). This result suggests that memory traces related to central information can be maintained through verbal rehearsal or refreshing. Similarly, a significant effect of condition on the When recall was observed, but as expected only with the "visuospatial" condition (3) giving worse results than the control condition (1), suggesting that the temporal context is maintained through refreshing. Contrary to our predictions, no significant effect was observed on the Where score, which suggests that this score did not rely on WM maintenance. Finally, an effect of condition on Binding indicated that the "visuospatial" condition (3) gave worse results than the control condition (See **Figure 4**).

# DISCUSSION

In daily life, we are continuously tasked with a long list of cognitive demands that must often be performed simultaneously. Often this means storing long-term memories while performing short-term tasks. Our aim in this experiment was to use a VE in order to test the role of WM while encoding episodic longterm memories in a naturalistic context. In particular, we tried to determine which component of WM is involved in the encoding of EM traces – distinguishing what, where, when and binding. Three main findings arose from our results. First, we observed that the memory of central information (what) was impaired by both numerical and visuospatial concurrent tasks. Second, the memory of temporal context and binding was impaired only when a visuo-spatial concurrent task was performed. Third, the spatial contextual recall was not influenced by any concurrent task.

Based on the assumption that the concurrent verbal and visuo-spatial tasks we used prevent, respectively, mainly rehearsal and refreshing, then our results indicate that central information is likely maintained by both verbal rehearsal and refreshing, whereas temporal and binding information are mainly maintained by refreshing. According to Baddeley's model, the phonological loop is involved in the maintenance of verbal information (Baddeley, 1986). In most of the classical studies investigating the phonological loop, the material to be

<sup>1</sup> Results of the covariate: for what, when, where, and binding all F were <1.

remembered in the primary task was letters or isolated words, items that are clearly verbal. In our study, the central information concerns objects, buildings and events encountered in our virtual city, which could be either reactivated in WM as images or as words. These items could be easily named (e.g., a train station, a post-office, etc.) and thus maintained through verbal rehearsal. Participants were instructed to intentionally memorize items encountered in the virtual town, as well as their context. It is thus likely that participants verbally rehearsed as soon as they could the name of items previously seen in order to avoid the traces decaying. We also observed that memorization of the central information was negatively affected when participants performed the visuospatial secondary task. As central items were presented visually, it is not surprising that the memory of central items also relied on visuospatial maintenance. This is consistent with the studies demonstrating that both maintenance mechanisms (verbal rehearsal and refreshing) can be run in parallel (e.g., Camos et al., 2009).

In addition, it is interesting to observe that the encoding of contextual long-term memories does not seem to rely heavily on verbal rehearsal, since reducing verbal rehearsal through a verbal memory task did not influence the memory context performance. This was true even when participants were instructed to encode the context. They could have developed a verbal strategy of maintenance (e.g., the newsstand was on my left when I turned), but apparently they did not. It seems that verbal strategies are not useful in the consolidation of contextual memories. Given that only the visuospatial secondary task prevented the encoding of contextual memory, refreshing seems to be the predominant mechanism of maintenance in that case. It is likely that verbal maintenance of contextual information is too costly and that maintaining different scenes through mental imagery is a more efficient strategy.

The memory of temporal and spatiotemporal binding information appears to be impaired only by visuospatial maintenance but the memory of the spatial component itself was not influenced by the visuospatial secondary task. This component was assessed by asking the participants to remember if they turned left or right after the element they recalled. This spatial recall is an egocentric one given that participants probably called upon their own body to answer. Egocentric processes are known to be viewpoint dependent and egocentric locations are updated by self-motion information (Burgess, 2006), and even

across imagined self-motion (e.g., Burgess et al., 2004). In the present study, the visuospatial secondary task appeared rather to be allocentric in that it required participants to call upon external elements of the environment to encode the positions of the garbage containers. This would explain why our spatial secondary task did not interfere with our primary task. This interpretation is consistent with the findings of Farrell and Thomson (1998) and Farrell and Robertson (2000) which suggest that spatial egocentric information is automatically encoded through displacement in the VE and does not require to be maintained in WM. However, the high standard deviation of the control group may explain why statistical differences between groups difficulty emerged. This result should therefore be interpreted with caution.

Nevertheless, spatiotemporal binding information was negatively affected by the concurrent visuospatial maintenance. Loaiza and McCabe (2012) showed that refreshing is important for content-context binding in WM, and observed that the more refreshing opportunities an item receives, the more likely it is to be recalled from EM. These results are consistent with our findings suggesting that refreshing promotes memory of context and binding of EM traces. In the study by Loaiza and McCabe (2012), the context associated with central information was temporal in nature. They concluded that an item would be more stably bound to a temporal context when it is refreshed. In addition, previous work demonstrated that the memory of the temporal order in WM was maintained by using spatial mechanisms (e.g., Guida and Lavielle-Guida, 2014). For example, it seems that items to-be-remembered presented sequentially in the center of a screen acquired a spatial dimension: the first words of the sequence has a left spatial value while the last words has a right spatial value (van Dijck and Fias, 2011). Our present findings, which suggest that encoding of temporal memories was disrupted by a visuo-spatial concurrent task, are in accordance with these studies.

Our study presents some limitations, and these should be taken into account for future studies. In order to extend our knowledge of the mechanisms of maintenance involved in EM construction, in future work we could prevent and force rehearsal and refreshing more systematically. For example, we could continuously prevent verbal rehearsal by using an articulatory suppression (say "babababa") or we could prevent attentional refreshing with a continuous auditory detection task. In that way we could separate the primary from the secondary task. In the present study, the primary and the secondary tasks involved both items presented in the VE. We cannot exclude the possibility that participants combined the two tasks. In addition, because the recall for the secondary task was performed before the primary task, it could have an influence on the recall of interest. In the future it would be important for the secondary task not to involve items of the primary task. In addition, to further improve the ecological validity of our assessment, the participants of the EM investigation should not receive any explicit instructions to memorize the episodic information (Pause et al., 2013). Finally, it could also be relevant to assess to what extent the degree of interaction between the VE and the participants, using higher immersive virtual navigation and virtual embodiment (Kilteni et al., 2012), mediates the encoding mechanisms of EM. It is conceivable that a greater immersion would give a stronger EM. Also, in our paradigm, participants drove a virtual car, which constituted a third task. It would be interesting to compare our results when participants perform passive navigation in the VE. The negative impact of the concurrent task could be reduced in that condition.

# CONCLUSION

Using an original paradigm of memory, our results demonstrate for the first time that preventing verbal maintenance through a concurrent task negatively impacts long-term memory of central information, while preventing visuospatial maintenance decreases central, temporal, and binding memory. WM thus appears central to consolidate EM reflecting everyday life and this maintenance is suggested to occur predominantly through the episodic buffer. Finally, as already demonstrated in long-term memory (Plancher et al., 2008, 2010, 2012, 2013), the ecological feature of a paradigm developed using VEs provides an excellent opportunity for investigating EM in its complexity.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of APA, with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the committee of Paris Descartes University.

# AUTHOR CONTRIBUTIONS

Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work: GP, VG, and PP. Drafted the work or revised it critically for important intellectual content: GP, VG, and PP. Final approval of the version to be published: GP, VG, and PP. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved: GP, VG, and PP.

# ACKNOWLEDGMENTS

The authors would like to thank Nayla Debs and Julie Santo for testing participants, as well as all the participants of this study.

# REFERENCES

fpsyg-09-00047 January 30, 2018 Time: 12:11 # 9

Atkinson, R. G., and Shiffrin, R. M. (1968). "Human memory: a proposed system and its control process," in The Psychology of Learning and Motivation: Advances in Research and Theory, Vol. 2, eds K. W. Spence and J. T. Spence (New York, NY: Academic Press), 89–195.

Baddeley, A. (1986). Working Memory. Oxford: Oxford University Press.



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Plancher, Gyselinck and Piolino. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Peripersonal space: an index of Multisensory Body–environment interactions in real, Virtual, and Mixed realities

*Andrea Serino1,2\*, Jean-Paul Noel3 , Robin Mange1 , Elisa Canzoneri1 , Elisa Pellencin4 , Javier Bello Ruiz1 , Fosco Bernasconi1 , Olaf Blanke1,5 and Bruno Herbelin1*

*<sup>1</sup> Laboratory of Cognitive Neuroscience, Center for Neuroprosthetics and Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, 2Department of Clinical Neuroscience, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland, 3Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, United States, 4Department of Cognitive Science and Psychology, University of Trento, Trento, Italy, 5Department of Neurology, Université de Genève, Geneva, Switzerland*

#### *Edited by:*

*Mel Slater, University of Barcelona, Spain*

#### *Reviewed by: Alessandro Farne,*

*Institut National de la Santé et de la Recherche Médicale, France Mar Gonzalez-Franco, Microsoft Research, United States*

> *\*Correspondence: Andrea Serino andrea.serino@unil.ch*

#### *Specialty section:*

*This article was submitted to Virtual Environments, a section of the journal Frontiers in ICT*

*Received: 31 October 2017 Accepted: 20 December 2017 Published: 22 January 2018*

#### *Citation:*

*Serino A, Noel J-P, Mange R, Canzoneri E, Pellencin E, Ruiz JB, Bernasconi F, Blanke O and Herbelin B (2018) Peripersonal Space: An Index of Multisensory Body–Environment Interactions in Real, Virtual, and Mixed Realities. Front. ICT 4:31. doi: 10.3389/fict.2017.00031*

Human–environment interactions normally occur in the physical milieu and thus by medium of the body and within the space immediately adjacent to and surrounding the body, the peripersonal space (PPS). However, human interactions increasingly occur with or within virtual environments, and hence novel approaches and metrics must be developed to index human–environment interactions in virtual reality (VR). Here, we present a multisensory task that measures the spatial extent of human PPS in real, virtual, and augmented realities. We validated it in a mixed reality (MR) ecosystem in which real environment and virtual objects are blended together in order to administer and control visual, auditory, and tactile stimuli in ecologically valid conditions. Within this mixed-reality environment, participants are asked to respond as fast as possible to tactile stimuli on their body, while task-irrelevant visual or audiovisual stimuli approach their body. Results demonstrate that, in analogy with observations derived from monkey electrophysiology and in real environmental surroundings, tactile detection is enhanced when visual or auditory stimuli are close to the body, and not when far from it. We then calculate the location where this multisensory facilitation occurs as a proxy of the boundary of PPS. We observe that mapping of PPS *via* audiovisual, as opposed to visual alone, looming stimuli results in sigmoidal fits—allowing for the bifurcation between near and far space—with greater goodness of fit. In sum, our approach is able to capture the boundaries of PPS on a spatial continuum, at the individual-subject level, and within a fully controlled and previously laboratory-validated setup, while maintaining the richness and ecological validity of real-life events. The task can therefore be applied to study the properties of PPS in humans and to index the features governing human–environment interactions in virtual or MR. We propose PPS as an ecologically valid and neurophysiologically established metric in the study of the impact of VR and related technologies on society and individuals.

Keywords: virtual reality, mixed reality, peripersonal space, multisensory integration, body, self

# INTRODUCTION

The manner in which the brain integrates information from different senses in order to boost perception and guide actions is a major research topic in cognitive neuroscience (Calvert et al., 2004; Spence and Driver, 2004; Stein, 2012) and a topic of increasing interest in the design of virtual environments. Multisensory integration of bodily inputs, in particular, has been recently proposed as a key mechanism underlying the experience of oneself within a body, which is perceived as one's own (body ownership), which occupies a specific location in space (self-location), and from which the external world is perceived (first person-perspective), i.e., the different components of what has been called bodily selfconsciousness (Blanke and Metzinger, 2009; Blanke, 2012; Blanke et al., 2015). Accordingly, the manipulation of bodily inputs has been used to induce the feeling that an artificial or virtual body is one's own and to generate the sensation of being located within a virtual environment (Tsakiris, 2010; Blanke, 2012; Ehrsson, 2012; Serino et al., 2013; Noel et al., 2015b; Salomon et al., 2017). These findings thus highlight the particularly relevant role of bodily inputs for virtual reality (VR) (Herbelin et al., 2016). Multisensory integration of bodily-relevant inputs naturally happen within a limited space immediately surrounding the body, where external stimuli can have direct contacts with the body, i.e., the peripersonal space (PPS; **Figure 1**; Rizzolatti et al., 1997; Ladavas, 2002; Graziano and Cooke, 2006). PPS has been suggested to index the self-space (Blanke et al., 2015; Noel et al., 2015b, 2017; Salomon et al., 2017) and to represent the space wherein the individual interacts with external stimuli. Evolutionarily, until very recently, all direct body-objects interactions have been experienced within a physical PPS. However, as human interactions are increasingly occurring not within the real, but also within virtual or mixed realities, it is interesting to study and characterize how PPS is represented in VR (see Iachini et al., 2016, for a recent delineation of interpersonal space in virtual and real environments). Here, we propose and demonstrate that it is possible to delineate and measure a representation of PPS within virtual and mixed reality (MR) environments.

Several lines of work in neurophysiology and neuroimaging have shown that PPS representation is implemented by specific neuronal populations, which selectively integrate tactile stimuli on the body with visual or auditory cues related to external objects

different parts of the animal's body and visual and/or auditory receptive fields extending for few centimeters in space around the same body part (A). This way PPS neurons respond to an external stimulus as a function of its distance from the animal's body, depending on the extent their multisensory receptive fields (B). In humans, an analogous multisensory system representing the PPS around different body parts has been described (C), so that visual and/or auditory stimuli more strongly interact with tactile processing depending on their distance from the stimulated body part (D). The farthest distance evoking significant multisensory interaction is considered a proxy of the boundaries of PPS in humans.

as they approach the body (Ladavas and Serino, 2008; Macaluso and Maravita, 2010; Cléry et al., 2015, 2017). In this manner, the brain builds a representation of spatial locations in the environment where body-objects points of contact may potentially occur (Cléry et al., 2015, 2017), a mechanism which is postulated to be fundamental for defensive as well as for approaching behaviors (Cléry et al., 2015; de Vignemont and Iannetti, 2015). In monkey neurophysiology, PPS has been studied by measuring the response properties of multisensory neurons, mainly located in the ventral premotor cortex (Rizzolatti et al., 1981; Graziano et al., 1997) and the posterior parietal cortex (Duhamel et al., 1998; Avillac et al., 2005). These neurons respond to tactile stimulation on a particular part of the animal's body (hand, face, and trunk most commonly), as well as to visual or auditory stimuli presented close to the same body part (see **Figure 1**). Importantly, these neurophysiological recordings suggest that neurons encoding for PPS representations are solely responsive when the exteroceptive sensory stimulus is close to the body, but not when auditory or visual stimuli are presented far from it. Additionally, these neurons are most responsive to moving, as opposed to static, stimuli (Fogassi et al., 1996). The extent of PPS is defined by the size of the multisensory receptive fields of such particular class of multisensory neurons.

Directly inspired by the monkey neurophysiology work, we have developed a psychophysical experimental task to measure behaviorally in humans the extent of PPS around the different parts of the body (Canzoneri et al., 2012; Teneggi et al., 2013; Serino et al., 2015a). This approach has been extensively used in neuroscience research in order to investigate different properties of human PPS (Canzoneri et al., 2013a; Bassolino et al., 2014; Taffou and Viaud-Delmon, 2014; Ferri et al., 2015a,b; Galli et al., 2015; Noel et al., 2015b; Serino et al., 2015a; Kandula et al., 2017; Pellencin et al., 2017; Salomon et al., 2017). In this task, participants are requested to respond as fast as possible to a tactile stimulus administered on a given body part, while a taskirrelevant auditory or visual stimulus, which they are instructed to ignore, are presented approaching along the frontal plane at different distances from the participant's body. Taken together, the results of the array of experiments abovementioned demonstrate that tactile reaction times (RTs) speed up as sounds or visual stimuli are presented closer to the body. Further, and critically, the speed up of tactile detection as a function of exteroceptive stimuli distance to the body is not linear, but sigmoidal. Thus, there is a veritable inflexion point wherein if auditory or visual stimuli are presented within the given spatial range, tactile detection is facilitated and it is possible to identify this spatial range wherein multisensory facilitation occurs. Since the main property of the PPS system is in integrating tactile processing with external stimuli when these occur within the PPS (Maravita et al., 2003), the critical distance at which the sound or visual stimuli speed up tactile RTs is taken as a proxy of PPS extension. Such measure has been reliably used to study precisely the extent of individual's PPS (Ferri et al., 2015a,b; Serino, 2016), its plastic and dynamic modification following different kinds of sensory manipulations (Canzoneri et al., 2013b; Ferri et al., 2015a,b; Noel et al., 2015a,b; Serino et al., 2015b; Patané et al., 2016) and following interactions, such as social interactions (Teneggi et al., 2013; Iachini et al., 2014; Pellencin et al., 2017).

The PPS measurement task, originally developed to measure audiotactile interactions, has been adapted to a visuotactile version using 3D computer graphics and head-mounted displays (HMDs) in order to present dynamic visual stimuli (Herbelin et al., 2015; Serino et al., 2015a). Here, we describe the most recent evolution of the task based on MR where real environment and virtual objects are blended. This technology allows for the administration and control of visual, auditory, and tactile stimuli, while participants can see online their own body immersed in a highly realistic prerecorded panoramic capture of a real environment. MR provides us, and cognitive, social, and behavioral scientists generally, with the ability to empirically study the interface between the user's body and the environment. This technology equally permits the freedom to experimentally decide whether the utilization of a virtual or a real environment and/or body is most desirable, or even whether some mixture between the real and virtual is most appropriate.

In this document, after introducing the general setup of the PPS task in its visuotactile version, we present how the MR setup allows delineating the PPS of participants in a tri-modal condition (audiovisuotactile). That is, we query whether the bifurcation of near- and far-space is better defined—in terms of goodness of fit—when further exteroceptive input is administered. In this manner, we query whether the representation of PPS may differentiate between the real environment (where all naturalistic sensory cues are presented), mixed-realties (where some naturalistic sensory cues may be present), and virtual environments (where the sensory periphery has no access to the real world). In the experiments reported below, the real body is always rendered within an environment composed of real contextual cues and virtual objects. In a first experiment (Experiment 1), we present visual looming stimuli, while in the second experiment (Experiment 2), we use audiovisual dynamic stimuli. In both cases, stimuli are combined with tactile stimulation. The results show that the PPS task is able to capture the boundaries of the multisensory PPS at the individual level, in a fully controlled and previously laboratory-validated setup, and, for the first time, maintaining the richness and ecological validity of real-life situations. In addition, results suggest that the utilization of the trimodal version of the task, as opposed to the bimodal, allows for the most reliable delineation of PPS (vis-à-vis goodness of fit), further highlighting the necessity to employ ecologically valid and multisensory scenarios, be it in the real or a virtual environment.

# MATERIALS AND METHODS

# Technological Components

We developed a mixed-reality technology for simulating real and/ or virtual environments in first person perspective based on the omnidirectional capture and recording of visual and auditory stimuli. This approach involves two phases, first the capture and then the re-experiencing. For capturing the scene, several cameras and microphones are assembled to cover the entire sphere of perception around a viewpoint (360° horizontal and vertical stereoscopic vision, horizontal panoramic binaural audio). The panoramic video environment is captured using seven pairs of GoPro Hero4 cameras placed in a spherical rig (3D 360hero 3DH3PRO14H, 6.3 cm intercamera distance per pair) and stitched in two large panoramic videos. Four pairs of binaural microphones (3DIO Omni Binaural Microphone) are used to capture binaural audio in four directions. Our in-house MR software (RealiSM, http://lnco.epfl.ch/realism) then aggregates all data into a single high-resolution panoramic and stereoscopic audiovisual custom format (one panorama per eye, acoustic interpolation of binaural audio between directions). For the reexperiencing phase, VR devices such as HMD (Oculus Rift DK2; 960 × 1,080 per eye at 75 Hz, ~105° FOV diagonal, ~85° FOV horizontal) and stereophonic noise-canceling headphones (BOSE QC15) are used to immerse subjects into the scene. Importantly, a head-mounted stereoscopic depths camera (Duo3D MLX, 752 × 480 at 56 Hz) fixed on the HMD captures the user's body from a first-person perspective, and the stereoscopic image of the body is merged into the virtual scene in replacement of the real body (see **Figure 2**).

The resulting rendered scene highly resembles the recorded scene and the subjects experience seeing themselves (and not a 3D avatar) teleported there. Only head rotations around the captured viewpoint are however possible, and placing subjects in a sitting position is therefore preferable. Any kind of virtual multimedia object can also be merged into the scene, allowing fully controlled presentation of sensory stimuli.

The control of experimental flow, synchronization between tactile, visual, and auditory stimuli, as well as the recording of responses, was implemented in our custom software ExpyVR. This software provides graphical user interface (Qt4) and scripting capabilities (Python 2.7) to drive all the equipment used in the PPS task. It is freely available online at http://lnco.epfl.ch/ expyvr.

#### PPS Measurement Components: Tactile Stimuli

For all experiments presented here, tactile stimuli of 10 ms are delivered on participant's right cheek and by means of a mechanical solenoid controlled *via* a stimulator (MSTC-3 tappers, M&E Solve).

#### PPS Measurement Components: Acoustic Stimuli

Prior work from our group has extensively used a 2 (Canzoneri et al., 2012) and 16 (Galli et al., 2015; Noel et al., 2015b; Serino et al., 2015a) speakers setup expressively developed for the measurement of PPS boundaries. The 16 loudspeakers setup is an audio rendering system composed of two uniform linear arrays of eight loudspeakers each (JBL Control 1 Pro WH Pair, M-Audio FastTrack Ultra 8R) which simulates a white noise sound source, perceived at the middle location between the two loudspeakers rows, and approaching from 2 m away until the position of the participant (see **Figure 3A**). The dynamic nature, intensity, and origin of the sound are manipulated by software acoustic simulations, and the algorithm governing the placement of the virtual sound source has been previously detailed (Serino et al., 2015a). Here, in order to adapt the acoustic stimuli for a VR setup, we developed a headphones version of the task whereby sounds generated *via* the abovementioned 16 speakers setup are recorded with binaural microphones (3Dio Omni Binaural) and replayed with stereo headphones (see **Figure 3B**). This version

Figure 2 | The RealiSM technology. Reality substitution combines the features of classical virtual reality with 360°video and audio capturing, thus offering extended capabilities: stereoscopic rendering, binaural panoramic audio, merging of virtual objects, and integration of first-person perspective stereoscopic video images of the body in the video environment. (Written and informed consent has been obtained from the depicted individual for the publication of their identifiable image.)

Figure 3 | Peripersonal space (PPS) experimental setup. (A) In the audiotactile version of the task, auditory looming sounds were presented by placing participants between two arrays of eight speakers (2 m of longitudinal distance, and 50 cm between participant midline and each array of speakers in the horizontal plane). The stimulus generated by the loudspeakers spatialized a moving broadband sound source moving at a constant speed. (B) In order to create a portable version of the audiotactile task, the RealiSM technology was used to prerecord the sounds from the location of an ideal participant, by means of the 360°audiocapturing system; then, those sounds tracks have been played back to actual participants by means of stereoscopic headphones. (C) In the visuotactile version of the task, by means of an HMD, looming virtual stimuli are visually presented being overimposed in an online recording (or prerecorded video) of the external environment and of the participant's body within the scene. (D) In the trimodal, audiovisuotactile version of the task, both virtual visual (thought HMD) and auditory (by means of headphones) stimuli are simultaneously presented [combining (B,C)].

of the setup, hence, allows for the measurement of PPS *via* an audiotactile paradigm without the necessity for the large array of speakers (see **Figure 1** in Galli et al., 2015).

### PPS Measurement Components: Visual Stimuli

The PPS task can equally be extended to the visual modality (Pellencin et al., 2017). In this case, a tridimensional virtual tennis ball looming toward participants' face is used as visual stimulus (**Figure 3C**). This ball travels 2 m in virtual space at a velocity of 75 cm/s until making fictive contact with the participant's face. The virtual ball is superimposed on the recording of the environment, and the images are presented on the HMD.

# Participants

Here, we report two datasets, respectively, from 27 students (11 females, mean age 24 years) and 26 students (12, female, mean age 23 years), from the Ecole Polytechnique Federale de Lausanne (EPFL), who participated in the visuotactile (Experiment 1) and in the trimodal audiovisuotactile (Experiment 2) versions of the experiment. Subjects were right-handed, with normal or corrected-to-normal eyesight, normal hearing, and no history of neurological or psychiatric disease. All participants received monetary compensation for their time (20 CHF/h) and gave their informed and written consent to take part in this study, which was approved by the ethics committee of the Brain and Mind Institute of the EPFL.

# Experimental Design

Seventy percent (70%) of trials are experimental multisensory (visuotactile in Experiment 1 and audiovisuotactile in Experiment 2) trials in which participants hear a sound (or see a moving ball) approaching toward them. At a given moment in time (hereafter *T*), they receive the tactile target stimulus. Participants are requested to respond to touch as rapidly as possible *via* button press. When subjects are presented with looming visual stimuli (which therefore by definition start far and over time come closer to the participant), the stimuli temporal and spatial dimensions map negatively and linearly. That is, D1 and D2, respectively, correspond to the last and penultimate temporal delays, and so forth.

Ten percent (10%) of trials are unimodal visual trials where only the virtual ball is presented, but no tactile stimulus is given. Thus, based on the task request, these are catch trials and participants are to withhold from responding. Catch trials are important in order to avoid entrainment of an automatic motoric response and to assure that participants are attentive to the task. It also allows measuring false positives and reducing temporal expectancy effects (Kandula et al., 2017).

Because the aim of the task is to identify the farthest distance from the body (D) at which visual stimuli significantly speed up tactile processing, that is when visuotactile RTs become significantly faster than responses to tactile stimulation alone, the task includes also 20% of unimodal tactile trials in which a vibrotactile target stimulus is delivered in the absence of visual stimulation. Unimodal tactile trials are considered baseline trials and are used to show a multisensory facilitation effect on tactile RT due to visual or audiovisual stimuli presented within the PPS as compared to RT of unimodal tactile stimuli. Note that we denote the PPS effect—namely, the facilitation of tactile RTs *via* exteroceptive sensory modalities presented near the body—as a multisensory facilitation effect, and not as indexing multisensory integration, as statistical summation or sensory binding is not indexed here.

# Procedure

Upon arrival at the laboratory, a tactile stimulator is placed on the participant's face (right check). Subjects are informed that they will feel a tactile vibration and that their task is to respond as accurately and rapidly as possible to this tactile stimulation. Participants are equally informed that there will be a taskirrelevant visual (Experiment 1) or audiovisual (Experiment 2) stimuli that will approach toward them. Finally, participants are informed that in some trials (catch trials) only visual stimuli without tactile stimulation will be presented, and yet on other trials (baseline trials) only a tactile vibration will be administered (see above for breakdown of trials).

On each trial, the tactile target stimulus is delivered at a different delay from the moment when the trial start; thus, in the multisensory trials, the tactile stimulus is processed when the visual (or audiovisual) stimulus is perceived as being at different distances from the participant (see Serino et al., 2015a, **Figure 1**, for evidence that approaching auditory stimuli within this context is localized by participants as closer the longer the stimuli has loomed for). In the case of Experiment 1, the visual stimulus approached the participant's face at a constant speed of 75 cm/s and was presented for 2,600 ms. Following the end of the visual stimuli movement, the ball remained on screen for 400 ms, followed by 500 ms of no stimulation. A fixation cross was presented for 1,200 ms in between trials, and the ball initiated approach toward the participant's face 300 ms after offset of the fixation cross. Each trial lasted 5,000 ms. In this experiment, six different temporal delays were used for unimodal and bimodal conditions. In total, Experiment 1 consisted of 300 trials (36 trials per delay for the multimodal condition, randomly intermingled with 8 unimodal tactile trials per delay and 36 unimodal visual trials). Trials were equally divided in four blocks of 75 trials, lasting approximately 7 min each.

In the trimodal version of the task (Experiment 2), the visual stimulation was the same as in Experiment 1, but in addition, dynamic sounds moving at the same velocity and direction of the virtual ball were simultaneously presented. The pre-recorded binaural sounds were administered during the experiment by means of noise-canceling headphones (see PPS Measurement Components: Acoustic Stimuli). Five different temporal delays were used for unimodal and bimodal conditions in Experiment 2, which consisted of a total of 540 trials (12 trials per delay for each trimodal condition, 12 trials per delay for each unimodal condition and 60 catch trials), divided in four blocks of 135 trials, lasting about 12 min each.

We measure RTs to tactile stimuli at each temporal interval, i.e., each distance, and search for the critical distance at which the dynamic multisensory stimulus significantly speeds up RT to tactile stimuli, as compared to the unimodal tactile baseline condition. This distance indicates the spatial location where an external stimulus in space significantly interacts with tactile processing on the body and is taken as a proxy of individuals' PPS boundaries.

# Analysis

Analysis procedure is identical for both Experiment 1 and Experiment 2. Preliminary analyses are conducted on unimodal auditory/visual catch trials in order to test for accuracy in the tactile task. Due to the settings of the tactile target stimulation, participants are very accurate in the task and thus, performance is analyzed in terms of RT only.

We first search for a significant difference in modulation of tactile RT, depending on the distance of the auditory or visual stimuli. To this aim, we compare RT in the multisensory condition with those in the baseline unimodal condition, at the different temporal delays. Thus, we first run a repeated measure ANOVA on RT with sensory condition (unimodal and bimodal) and distance as factors. We search for a significant interaction and then check that a significant effect of distance is presented in the bimodal condition, and not (or much less) in the unimodal condition. In this way, baseline trials are used to control for spurious modulation in RT due to an expectancy effect (i.e., the fact that if a trial has started a moment ago and no tactile vibration has been given, it is more and more likely that the tactile stimuli is approaching in time).

At this point, in order to identify the location of the external visual or audiovisual stimulus in space leading to a significant modulation of tactile processing, RT in the multisensory stimulation conditions are corrected, on an individual basis, for baseline performance. That is, for each participant, we identify the baseline condition resulting in the fastest RT among the baseline unimodal tactile conditions. We calculate the mean raw RT for that condition, and this value is subtracted from the mean raw RT to tactile stimulus for each audiotactile or visuotactile condition. In this way, we adopt the most conservative criterion to show a facilitation effect on tactile RT due to visual or audiovisual presentation. Negative deviations from the baseline (which by definition is now zero) indicate a multisensory facilitation effect (visuotactile or audiovisuotactile RTs that are faster than the fastest unimodal response). In order to identify the boundaries of PPS representations, we search for the farthest point in space where either visual or audiovisual stimuli induce a significant facilitation effect as compared to baseline (i.e., the fastest unimodal tactile condition).

Finally, in order to extract unique parameters able to estimate the PPS boundary at the individual level, we equally fit the data to a sigmoidal function (Eq. 1),

$$\varphi(\mathbf{x}) = \frac{\mathcal{Y}\_{\min} + \mathcal{Y}\_{\max} \, e^{(\mathbf{x} - \mathbf{x}\_{\epsilon})^{\mathsf{f}} \mathbf{b}}}{1 + e^{(\mathbf{x} - \mathbf{x}\_{\epsilon})^{\mathsf{f}} \mathbf{b}}},\tag{1}$$

where *x* represents the independent variable (i.e., the distance of the sound or ball), *y* is the dependent variable (i.e., the RT), *y*min and *y*max represent the lower and upper saturation levels of the sigmoid, *x*c the value of the abscissa at the central point of the sigmoid [i.e., the value of *x* at which *y* = (*y*min + *y*max)/2] and *b* establishes the slope of the sigmoid at the central point. The ideal sigmoidal function fitting RT in multimodal condition is reported in **Figure 4** (top right panel). Two parameters are free to vary and thus estimated: the central position of the sigmoid and the slope of the sigmoid at the central point. The root mean error and the coefficient of determination (*R*<sup>2</sup> ) are equally extracted from the fitting procedure as goodness-of-fit measures. Each of this parameter gives specific information concerning the spatial modulation of multisensory interaction at the individual level. The *R*<sup>2</sup> is used to evaluate the goodness of fit of the function, i.e., how well the spatial dependent modulation of RT is described by a sigmoidal function. We have shown that a sigmoidal model better explains RT modulation in the bimodal condition as compared to a linear model (Canzoneri et al., 2012). However, there are individual differences in the goodness of fit of the model. For individual data with *R*<sup>2</sup> < 0.50, no other parameters are considered, since a sigmoidal model does not adequately fit with the data. For *R*<sup>2</sup> ≥ 0.50, the central point of the sigmoidal function indicates the middle point of the spatial range where the pattern of RT changes from slow to fast, typically corresponding, respectively, to far and near sound or ball location. Thus, the function's central point can be considered a single-value proxy of the location where the multisensory facilitation effect occur and therefore of the PPS boundary. Finally, the slope of the function reflects how quick the transition between slow and fast RT is. Thus, it can be considered a measure of how well defined the PPS boundary is (Noel et al., 2016). It is worth noting that according to the formula above, the larger the parameter *b*, the shallower the slope, and vice-versa.

# RESULTS

# Experiment 1—Visuotactile PPS

Mean RTs to tactile stimuli were calculated for each temporal delay (from T1 to T6) and submitted to a 2 (Condition: Visuotactile, Tactile) × 6 (Distances of the ball) repeatedmeasures ANOVA. As illustrated in **Figure 4A**, the interaction was significant [*F*(5,130) = 6.796; *p* < 0.001], showing that tactile responses were more strongly modulated as a function of temporal delays in the visuotactile than in the unimodal tactile condition. The ANOVA run on visuotactile trials showed that RT became progressively faster at decreasing ball distances. In order to identify the location in space where the virtual ball made RT in the visuotactile condition significantly faster than unimodal responses, for each participant, we first identified the condition of tactile stimulation resulting in faster RT. We compared these values with the mean RT at the different distances in the visuotactile conditions by means of one-sample *t*-test, corrected for multiple comparisons (six comparisons) with the Bonferroni method. RTs in the visuotactile condition was faster than the fastest unimodal RT when tactile stimulation was associated with a virtual ball at D1, D2, and D3 (all *p*-values < 0.001), and not when the ball was at father distance, i.e., D4, D5, and D6. Thus, the PPS boundary was located between D3 and D4.

Figure 4 | Representative results from a visuotactile peripersonal space (PPS) task. (A) Averaged reaction times (RTs) (error bars represent SEM) to tactile stimulation as a function of temporal delays for unimodal tactile (gray) and visuotactile trials (red). Visuotactile stimuli induced a stronger modulation of tactile RT, as compared to unimodal tactile stimuli, depending on temporal delays, that is on the position of the virtual ball in space at the time of tactile stimulation. The PPS boundary is identified as the distance at which the visual stimulus induced significantly faster RT as compared to the fastest unimodal tactile RT (as indicated by the dashed line). (B) Sigmoidal fitting of averaged raw RT in the visuotactile condition. (C) Ideal PPS curve from sigmoidal fitting: the central point of the curve is a single-data point proxy of transition between slow and far RT, i.e., between PPS and extrapersonal space, where the slope of the function at the central point indicates how sharp this transition is. (D) RT and fitting for individual subjects (ordered as a function of the goodness of fitting, based on individual *R*<sup>2</sup> ).

In order to represent such differential modulation of tactile processing at the individual subjects level, we fit (Eq. 1) the relationship between tactile RTs and timing at which tactile stimuli occurred with a sigmoidal function as described above. The averaged and individual data fit is shown in **Figure 4B** (see **Figure 4C** for an idealized case). Importantly, at the individual level, the sigmoidal fitting was able to represent the distance dependent modulation of tactile response with an *R*<sup>2</sup> higher than 0.10 in 20 out of 27 participants and higher than 0.50 in 16 participants, where mean *R*<sup>2</sup> was equal to 0.83 (individual fitting data are shown in **Figure 4D**). From these data, we were able to estimate the average central point of the sigmoidal function at a distance equal to 123 cm (3.71/6.00 × 200 cm).

# Experiment 2—Audiovisuotactile PPS Task

Mean RTs to tactile stimuli were calculated for each temporal delay (from D1 to D5) and submitted to a 2 (Condition: Audiovisuotactile, Tactile) × 5 (Distances of the ball/sound) repeated-measures ANOVA. The interaction was significant [*F*(4,100) = 2.71; *p* = 0.037], showing that tactile responses were more strongly modulated as a function of temporal delays in the audiovisuotactile than in the unimodal tactile condition. The ANOVA run on audiovisuotactile trials showed that RTs became progressively faster at decreasing ball/sound distances (see **Figure 5A**). In order to identify the location in space where the virtual ball (and sound) made RT in the audiovisuotactile condition significantly faster than unimodal responses, for each participant, we used the same procedure as described above for Experiment 1. To this aim, we compared the condition of tactile stimulation resulting in faster RT with the mean RT at the different distances in the audiovisuotactile conditions by means of one-samples *t*-test, corrected for multiple comparisons (5 comparisons) with the Bonferroni method. RT in the audiovisuotactile condition was faster than the fastest unimodal RT when tactile stimulation was associated with a virtual ball at D1, D2, and D3 (all *p*-values < 0.01), and not when the ball was at

a farther distance, i.e., D4 and D5. Thus, the PPS boundary was located between D3 and D4.

Also in this case, in order to represent such differential modulation of tactile processing at the individual subject level, we fit (Eq. 1) the relationship between tactile RTs and timing at which tactile stimuli with a sigmoidal function as described above. The averaged and individual data fit are shown in **Figure 5B**. Importantly, at the individual level, the sigmoidal fitting was able to represent the distance dependent modulation of tactile response with an *R*<sup>2</sup> higher than 0.10 in 24 out of 26 participants and higher than 0.50 in 18 participants (whose mean *R*<sup>2</sup> was equal 0.97; individual fitting data are shown in **Figure 5C**). From these data, we were able to estimate the average central point of the sigmoidal function at a distance equal to 105 cm (2.63/5.00 × 200 cm).

# Contrast between Visuotactile and Audiovisuotactile Delineations of PPS

Comparing the results from Experiments 1 and 2, thus, even if just qualitatively, seemingly indicates that whether measured *via* a visuotactile or an audiovisuotactile paradigm, the extent of PPS remains stable. That is, in Experiment 1 the boundary of PPS was measured at 123 cm, while it was measured at 105 cm in Experiment 2. Indeed, these two average measurements were not statistically different from one another (independent-samples *t*-test, *p* = 0.21). Of note, however, it appears that the representation of PPS is most readily captured—in terms of goodness of fit—*via* the tri-modal paradigm, as opposed to the bimodal one. While 74% of subject's data in Experiment 1 fit the sigmoidal function with *R*<sup>2</sup> > 0.10, 92% of the data from Experiment 2 met this threshold. Similarly, 59% of subjects fit the sigmoidal with *R*<sup>2</sup> > 0.50, a number that increased to 69% in Experiment 2. Finally, the average *R*<sup>2</sup> (after rejecting participants with *R*<sup>2</sup> < 0.60) in Experiment 1 was 0.83 (SEM = 0.05), far from the 0.97 (SEM = 0.05) in Experiment 2 [unpaired *t*-test *t*(51) = 2.19, *p* = 0.032].

# DISCUSSION

We present how the boundaries of PPS can be measured in terms of spatially dependent modulation of multisensory responses with a simple behavioral task that can be conducted with participants immersed in a MR environment. In this context, the PPS boundary is identified as the location in space where tactile processing is significantly boosted by the presentation of an external event, as signaled by visual or audiovisual stimulation. Further, we show that the delineation of PPS is most robustly (i.e., goodness of fit) accomplished *via* the presentation of approaching audiovisual stimuli than simply visual stimuli. This latter finding seemingly implies that there is a gradual relationship between the faithfulness or completeness of exteroceptive sensory representation and the delineation of PPS. That is, the near and far spaces are most clearly bifurcated when sensory information pertaining to the external environment is richer.

Our results indicate that the extent of the multisensory PPS assessed behaviorally in MR is comparable with the extent of multisensory receptive fields observed in neurophysiological studies (i.e., spatial modulations of tactile responses in **Figures 4** and **5** are similar to spatial modulation from PPS neurons shown for instance in **Figure 1B**). Furthermore, the measure of individuals' PPS is most robustly accomplished with a multimodal approach such as with the MR technology presented here. By merging pre-recorded scenes with real-time input and computer graphics, our technology allows presenting multimodal stimuli while participants are immersed in a surrounding visual and acoustic environment. Importantly, the participants also see their own body acting within the same environment. The complementary richness and ecological validity of the setup and the perfect control of the experimental apparatus allows, on the one hand, to correctly run the PPS task with the scientific rigor of previous laboratory setups, and on the other hand, to present ecologically valid and rich scenarios, close to real life events. This represents an essential added value for cognitive science research, since PPS is a multisensory-motor representation of the body in interaction with its environment (Serino et al., 2015a). This approach opens new perspectives for studying cognitive foundations of human behavior in real life contexts, while the subject is interacting with the environment and, maybe even more interestingly, when interacting with other people. Indeed, recent studies have shown that the nature of one's interaction with another agent (Teneggi et al., 2013) or even our social perception of the other (Pellencin et al., 2017) shapes our multisensory PPS.

The proposed analyses, based on standard analysis of variance, but perhaps most importantly also on function fitting, allows for accurately measuring the PPS boundary at the group level, but also at the individual level for the majority of the participants when audiovisual multisensory stimuli approached participants. This sigmoidal fitting approach and the observation that fits are robust under audiovisual multisensory conditions has important implication for the study of individual differences in PPS representation at the neural (Ferri et al., 2015a,b), physiological (Sambo and Iannetti, 2013), and behavioral (Taffou and Viaud-Delmon, 2014) levels. Indeed, an array of recent observations indicates that PPS is not only heavily influenced by external or environmental conditions, but also by personality traits such as anxiety (Sambo and Iannetti, 2013) or claustrophobia (Lourenco et al., 2011). Similarly, theoretical postulations suggest that the representation of PPS may play an under-appreciated role in psychopathology (Candini et al., 2017; Noel et al., 2017). Thus, the results reported here suggest that function fitting coupled with immersion in a realistic environment and the presentation of multiple cues of information pertaining to the external environment may be best suited for future individual differences studies of PPS.

The empirical observation that a sigmoidal fitting allowing for the bifurcation between the PPS and extrapersonal space is most robust (e.g., fits raw data appropriately over a greater percentage of participants) when audiovisual (vs. visual alone) stimuli loom in a virtual environment has strong implications for the study of bodily self-consciousness and multisensory integration generally. The finding implies that the degree to which a virtual environment is rendered affects bodily representation, or the bifurcation between the external environment and the body. Interestingly, Samad et al. (2015), recently cast the rubber-hand illusion (RHI; Botvinick and Cohen, 1998) —an illusion whereby participants feel ownership over a fake hand after congruent visuosomatosensory stimulation—in light of Bayesian Casual Inference (Körding et al., 2007; Shams and Beierholm, 2010). Under this framework, localization of an object/organism in the environment depends on the relatively reliability of the sensory representation of that particular object/organism, as well as that of other objects/organisms present in the environment. Hence, in Samad et al. (2015), it is computationally predicted that the rubber-hand illusion would not occur after visuosomatosensory displacement of approximately 30 cm—the sources of sensory information being too far. This prediction has been suggested in empirical studies (Lloyd, 2007), and interestingly 30 cm equally corresponds with the approximate size of perihand representation (Serino et al., 2015a), implying that embodiment of a fake hand can solely occur within PPS. In turn, the current findings seemingly indicate that the faithfulness of rendering of virtual environments may affect the possibility for embodiment within that environment (see also Gonzalez-Franco and Lanier, 2017). It will be interesting in future studies to determine the interplay between the faithfulness of virtual environment renderings and bodily representations. For instance, does the ability to embody alternative bodies (such as a body with a very long arm; Kilteni et al., 2012) differ in unisensory and multisensory environmental conditions? Similarly, it has been suggested that embodiment relaxes the temporal constraint for multisensory integration (Maselli et al., 2012), while others have shown that audiovisual temporal acuity is impaired within PPS (Noel et al., 2016), hence begetting the question; how does PPS and embodiment—as affected by virtual representations—interplay and interact with multisensory processes such as temporal binding or inclusively cross-modal attention (e.g., Gonzalez-Franco et al., 2017)?

Lastly, in a context of growing interest for VR technologies, it is becoming essential to evaluate and to scientifically study human interactions in virtual and MR conditions (Herbelin et al., 2016). The measure of PPS presented here offers to scientists in the field of cognitive and behavioral sciences, as well as to researchers on the sense of presence and on interactivity in VR, an objective and easily-implemented assessment of basic neural responses to rich and immersive exposure to complex interactive scenarios. The delineation of PPS has a strong tradition within neurophysiology and a growing body of literature within psychophysics. Perhaps even more interestingly for its utility within the study of the impact of virtual environments on individuals and society, the PPS is taken to index human–environment interactions. In other words, the PPS has been shown to surround not the physical body but the perceived self-location (Noel et al., 2015b; Salomon et al., 2017), and as such it is seemingly a metric that can readily be utilized in characterizing presence or immersion in virtual environments. On the other hand, as we demonstrate here, the boundary of PPS is most readily delineated when a rich environmental context is administered (i.e., a multisensory delineation of the external environment). As such, the measure of PPS appears ideally suited to arbitrate the push and pull in mixed realities between administering a rich virtual experience leading to presence and place illusion, and administering sufficient real world environmental context in order to remain grounded in the physical milieu. Thus, the mixed-reality PPS task presented here might be particularly powerful to study social interactions at the individual subject level, allowing manipulating rich and complex social context (e.g., by presenting crowded environments), while preserving the sensitivity and the rigor of a proper experimental protocol.

Finally, as our societies become more accustomed and even entrenched in virtual environments, it may be interesting to chart how representations of environmental space, such as the PPS, become altered after long term VR experiences. Technological improvements should therefore be brought to the setup presented here. First and foremost, navigation in space is not supported by our panoramic capture, and other approaches for graphic (3D graphics or volumetric reconstruction) and audio rendering (HRTF and acoustic spatial audio simulation) should be used for enabling free navigation inside the scene. Second, the body integration would benefit from improvement of the field of view and the addition of visuotactile cues [such as those described in Gonzalez-Franco and Lanier (2017)] in order to strengthen the illusion of owning the body presented in the simulated environment and to better understand the dynamic changes of PPS when active during VR immersion.

# ETHICS STATEMENT

Studies were approved by the ethics committee of the Brain and Mind Institute of the EPFL.

# AUTHOR CONTRIBUTIONS

AS, OB, BH, and JN defined the scientific goals and designed the experiments. BH, RM, and JBR developed the technology and implemented the experimental platform. JN, AS, EC, EP, and FB conducted the experiments and analyzed the data. AS, JN, and BH wrote the manuscript. All authors contributed to and reviewed the manuscript.

# ACKNOWLEDGMENTS

ExpyVR is a multimedia and virtual reality platform for cognitive science experimentation developed at the EPFL Laboratory of Cognitive Neurosciences (http://lnco.epfl.ch) by Bruno Herbelin, Javier Bello Ruiz, Nathan Evans, and Tobias Leugger.

# FUNDING

RM, EC, and FB are supported by W-Science Investment, Zurich, within the Reality Substitution Machine project (RealiSM, http:// lnco.epfl.ch/realism). AS, BH, and OB are supported by grants from the Swiss National Science Foundation. OB is supported by the Bertarelli Foundation. AS is supported by the Leenaard Foundation.

# REFERENCES


peripersonal space shapes bodily self consciousness. *Cognition* 166, 174–183. doi:10.1016/j.cognition.2017.05.028


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Serino, Noel, Mange, Canzoneri, Pellencin, Ruiz, Bernasconi, Blanke and Herbelin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

#### *Naor Demeter <sup>1</sup> \*, Dorit Pud2 and Naomi Josman1*

*1Department of Occupational Therapy, Faculty of Social Welfare & Health Sciences, University of Haifa, Haifa, Israel, <sup>2</sup> Faculty of Social Welfare & Health Sciences, University of Haifa, Haifa, Israel*

Virtual reality (VR) is an advanced and useful technology in the distraction from pain. The efficacy of VR for reducing pain is well established. Yet, the literature analyzing the unique attributes of VR which impact pain reduction is scarce. The present study evaluated the effect of two VR environments on experimental pain levels. Both VR environments are games used with an EyeToy application which is part of the video capture VR family. The VR environments were analyzed by expert occupational therapists using a method of activity analysis, allowing for a thorough evaluation of the VR activity performance requirements. The VR environments were found to differ in the cognitive load (CL) demands they apply upon subjects. Sixty-two healthy students underwent psychophysical thermal pain tests, followed by exposure to tonic heat stimulation under one of three conditions: Low CL (LCL) VR, high CL (HCL) VR, and control. In addition, following participation in VR, the subjects completed a self-feedback inventory evaluating their experience in VR. The results showed significantly greater pain reduction during both VR conditions compared to the control condition (*p* = 0.001). Hierarchical regression revealed cognitive components which were evaluated in the self-feedback inventory to be predictive factors for pain reduction only during the high cognitive load (HCL) VR environment (20.2%). CL involved in VR may predict the extent of pain decrease, a finding that should be considered in future clinical and laboratory research.

#### Keywords: cognitive load, environments, experimental pain, virtual reality, activity analysis

# INTRODUCTION

Distraction is a process in which attention is directed away from the nociceptive stimuli and changes the quality and quantity of pain (Van Damme et al., 2010; Cohen et al., 2014). Distraction can be achieved when attention is directed toward another sensory modality such as visual, auditory, or tactile stimuli (Miron et al., 1989) and is commonly evoked by various cognitive tasks (Eccleston and Crombez, 1999). Few former studies have evaluated the influence of cognitive load (CL) on analgesia, and their findings have been inconsistent. Some studies have shown no interaction between the task load and the nociceptive stimuli (Seminowicz and Davis, 2007). Other evidence shows that the CL

*Edited by:* 

*Mel Slater, University of Barcelona, Spain*

#### *Reviewed by:*

*Tasha Stanton, University of South Australia, Australia Regis Kopper, Duke University, United States*

> *\*Correspondence: Naor Demeter naor0506@gmail.com*

#### *Specialty section:*

*This article was submitted to Virtual Environments, a section of the journal Frontiers in Robotics and AI*

*Received: 01 August 2017 Accepted: 06 December 2017 Published: 04 January 2018*

#### *Citation:*

*Demeter N, Pud D and Josman N (2018) Cognitive Components Predict Virtual Reality-Induced Analgesia: Repeated Measures in Healthy Subjects. Front. Robot. AI 4:70. doi: 10.3389/frobt.2017.00070*

**Abbreviations:** CL, cognitive load; LCL, low cognitive load; HCL, high cognitive load; CPT, cold pressor test; NPS, numerical pain scale; TSA, thermal sensory analyzer; CPM, conditioned pain modulation.

involved in the task does impact the level of pain decrease (Miron et al., 1989; Legrain et al., 2002; Buhle and Wager, 2010). This is based on the premise that a task that occupies a person's attention leaves fewer cognitive resources available to focus on the pain (McCaul and Malott, 1984). As an example, Romero et al. (2013) found that higher attentional resources in a task lead to a higher reduction in pain ratings. They found that there is an interaction between the intensity of nociceptive stimuli and the level of perceptual load of a task. Nevertheless, there is also evidence which shows that pain can attract attention; even when the subject does not intend to focus on the pain; its threatening nature withdraws attention (Eccleston and Crombez, 1999; Legrain et al., 2009). Regardless, it is imperative to better understand the link between distraction-induced analgesia and CL.

Virtual reality (VR) is an advanced and useful technology that can be used to distract from pain (Mahrer and Gold, 2009; Kenny and Milling, 2016; Dascal et al., 2017). It is thought that VR distraction is effective because it immerses and engages the person in a way that involves many senses. Therefore, VR requires higher levels of attention (Mahrer and Gold, 2009). VR was shown to be effective in pain relief, both in clinical populations, such as burn pain patients and in laboratory research of healthy subjects (Carrougher et al., 2009; Hoffman et al., 2011); such studies were conducted using diverse methodologies regarding both pain measures and VR paradigms (i.e., Hoffman et al., 2003). However, there is scarce literature examining the specific *attribute* of VR in reducing pain or relating to the cognitive effort that individuals need to invest in the environment in order to perform the task correctly. Examples of these attributes may be attention, memory, or executive functions. In the VR environments chosen for the current study, attention is needed in order to follow the paddles or light beams presented on the screen. The identification of specific attributes may serve as a key for implementing the approach as part of individualized medicine.

An assessment of the specific attributes of a task can be achieved using a method known as "activity analysis." Activity analysis is one of the earliest tools and a fundamental skill in occupational therapy. This process is a careful observation of an activity, game, or other therapeutic activity. It is intended to assess its features or characteristics for the purpose of identifying and defining the dimensions of the activity performance requirements. It allows for the comparison of activity A with activity B and for the understanding of its therapeutic potential (Kuhaneck et al., 2010; Crepeau et al., 2013; Thomas, 2015). Activity analysis is usually performed by a therapist in order to decide what kind of activity will suit the patient's need; however, it can be performed for research purposes by a few expert judges in the form of a questionnaire in order to decide what the activity requirements and attributes are. As far as we know, activity analysis has yet to be used to analyze VR environment characteristics.

Therefore, the current study aimed to: (1) investigate the effect of participation in two VR environments which differed in terms of CL demand on experimental evoked pain scores in healthy subjects; and (2) identify predictive factors affecting pain reduction during participation in VR.

The study hypotheses were that: (1) a significant reduction in pain ratings will be found following participation in both VR environments in comparison with control condition. (2) Reduction in pain ratings will be significantly greater in the HCL (high cognitive load) environment than in the LCL (low cognitive load) environment. (3) Higher CL in the VR environment will predict pain reduction.

# MATERIALS AND METHODS

The current study is a part of a project that has been divided into two separate publications: (1) the first publication focused on factors of gender and conditioned pain modulation (CPM) as predictors affecting experimental pain stimuli reduction, published in a previous publication (see Demeter et al., 2014). (2) The current study focuses on the CL, addressing comparison to another VR environment. Therefore, the "Materials and Methods" section of this paper resembles our previous publication.

# Subjects

The current study recruited 62 healthy subjects (31 males, 31 females; mean = 24.2, SD = 3.7 years) aged 18–35 years. All subjects met the inclusion criteria of being pain free, not taking any medication, and having the ability to communicate and understand all study objectives and instructions. Sample size was calculated based on moderate effect size *f* = 0.25, α = 0.05, and power = 0.80. The sample size was set to 48 subjects using G\*Power software (Faul et al., 2007).

# Experimental Pain Models

# Cold Pressor Test (CPT)—Cold Pain Threshold, Tolerance, and Intensity

The CPT apparatus (Heto CBN 8-30 Lab equipment, Allerod, Denmark) is a temperature-controlled water bath with a maximum temperature variance of ±0.5 C, which is continually stirred by a pump. Subjects were instructed to insert their right hand into the CPT and maintain a static position. After the simultaneous activation of a stopwatch, subjects were asked to keep their hands submerged in the cold water for as long as possible. A cutoff time of 180 s was set for the purpose of safety. Subjects were requested to indicate the exact point in time in which the cold sensation began to elicit pain. The time until pain was first perceived was defined as time to pain onset (seconds). In the current study, the water temperature of the CPT was 5°C. Immediately after hand withdrawal, subjects were asked to indicate their maximal pain intensity on a 0–100 numerical pain scale (NPS), from 0 which represented "no pain" to 100 which represented the "worst pain one can imagine." The latency of intolerability (spontaneous hand removal) was defined as pain tolerance (seconds). Tolerance for subjects who did not remove their hand from the water for the entire 180 s was recorded as 180 s.

# Thermal Sensory Analyzer (TSA)—Thermal Thresholds and Pain Intensity

Cold and heat pain thresholds were determined with the limits method on a Medoc TSA-2001 device (Medoc, Israel). A Peltier thermode (30 × 30 mm) was attached to the skin above the thenar eminence, and baseline (BL) temperature was set at 32.0°C and raised or reduced at a rate of 1°C/s. The stimulator temperature range was 0–50°C. Subjects were asked to press a switch when the stimulus was first perceived as painful heat or cold. Three readings were obtained for each thermal modality (cold and hot), and their averages were determined as pain threshold scores. The TSA was also used to determine sensitivity to noxious heat stimulation. Subjects were exposed to tonic heat stimulation (46.5°C, for 120 s) on the medial part of their left ankle and asked to provide feedback on their perception of pain intensity (NPS 0–100).

#### Assessment of CPM

Conditioned pain modulation is considered to be a manifestation of pain inhibition and describes a state whereby the response to a given noxious test stimulus is attenuated by another conditioning stimulus that is simultaneously administered to a remote area of the body (Yarnitsky et al., 2010). Phasic heat stimulations were given in order to induce a CPM effect and considered the "test stimulation," whereas cold stimulation was used as a "conditioning" stimulation. For further elaboration, see Demeter et al. (2014).

# EyeToy

The current study included two EyeToy environments. The EyeToy is a popular application from the video capture family, developed by the Sony Corporation for use with a Play-Station 2 platform (www.playstation.com). It is a low cost, off-the shelf game, allowing interaction with virtual objects presented on a standard television screen (Kushner, 2004). The EyeToy presents the user's image in real time, does not require a special environment, and therefore is easy to use in any location (Sveistrup et al., 2003). The user does not wear any equipment during participation in the EyeToy; therefore, he can move freely. The application includes competitive motivational environments allowing the participation of one or more users (Sveistrup et al., 2003). Both of the environments used in the current study were taken from "EyeToy Kinetic" (EyeToy games CD). The environments were chosen because of their similar motor requirements. In the first environment, named "Backlash," the subject is required to move his upper limbs and right leg, to avoid contact with four paddles, two paddles on either side of the screen, with a central circle. In the second environment, named "Equilibrium," the subject is required to move his upper limbs and right leg and be precise in touching light beams appearing on the screen in different positions.

# Self-Feedback VR Inventory

The self-feedback inventory was prepared for the current study and included questions regarding participation in VR based on the Presence Questionnaire and Immersive Tendencies Questionnaire (ITQ) (Witmer and Singer, 1998). Both questionnaires are internally consistent measures with high reliability (Witmer and Singer, 1998). The purpose of the questions was to collect knowledge about the subjective responses of the participants to the VR experience in each environment. The inventory includes a Likert scale of 1–5 (1 = not at all, 5 = a lot) which evaluates aspects such as: (1) the ability to predict what will happen in response to the subject's action (anticipation); (2) the feeling of skilled movement and interaction with the VR environment (movement skills); (3) the ability to block external distraction and concentrate on a task (attention and cognitive inhibition); and (4) the extent of physical effort demand during a task (physical effort).

# Activity Analysis Form

In order to thoroughly analyze and identify different aspects of each VR environment, "inter-rater reliability" was tested with four experts, using an activity analysis form (Murphy and Davidshofer, 1994). Inter-rater reliability is a process in which two or more raters classify objects into predefined categories, examining the extent to which they agree (Anastasi and Urbina, 1997). Activity analysis is a tool frequently used by occupational therapists for analyzing different activities and identifying the skills and demands of a certain activity. This qualitative-based form includes 73 items which review general aspects of the activity (16 items such as activity description, required preparations, or activity structure) and activity performance components: motor (16 items), sensory (16 items), cognitive (14 items), psychological (19 items), and neuromuscular (8 items). For each item, the experts gave a qualitative evaluation regarding a specific VR environment to which she/he was exposed (Drake, 1991). There was a 100% agreement between raters on 66% of the items in LCL and 76.7% in HCL. For the rest of the items in both LCL and HCL, there was 50–90% agreement.

# Study Procedure

#### Determining VR Environment Characteristics

Four experienced occupational therapists actively participated in each VR environment and completed the activity analysis form immediately afterward. The occupational therapists were experts in cognitive and motor intervention. Agreement among experts was calculated. According to the experts' evaluation, the main characteristics of each environment were identified and representative titles were given. Specifically, it was found that although both environments were based on a similar motor task, the "equilibrium" environment involved a higher CL and demanded more cognitive resources (attention, accurate movement, and problem solving) compared to the second environment—"backlash." Consequently, the "backlash" VR environment was named low cognitive load virtual reality (LCL), whereas the "equilibrium" environment was named high cognitive load virtual reality (HCL).

# Study Design

Study approval was provided by the Ethical Committee of the University of Haifa, Faculty of Social Welfare & Health Sciences. Every subject received an explanation of the study, signed an informed consent to participate in the study, and then underwent a set of pain training tests and an introduction to VR environments. After 10 min, a series of pain tests was performed to determine each participant's BL sensitivity to pain. The series of tests included measuring heat and cold pain thresholds (TSA), sensitivity to noxious cold (time to pain onset, tolerance, and intensity), and CPM, as explained above. All tests were conducted in random order with 5-min intervals between them. Immediately thereafter, each subject went through three separate experimental conditions in random order: (A) LCL; (B) HCL; or (C) heat stimulation without VR (the control condition). A 5-min break was provided between each study condition. The VR system (Eye-Toy) was turned off during the control condition.

During each condition, subjects were exposed to tonic noxious heat stimulation (46.5°C, for 140 s) applied to the medial part of the left ankle. Heat pain intensities (NPS 0–100) were reported to be 10, 40, 70, 100, and 130 s from the initial heat stimulation, as well as 10 s after the stimulation was completed, a total of six times. The exposure to each VR environment lasted 120 s parallel to the heat stimulation, starting 10 s following the initiation of the heat application (right after the first NPS report). Thus, four NPSs were measured *during* VR participation. During participation in VR, the user did not wear any equipment except for the peltier thermode which delivered the heat stimuli to their ankle. Immediately after participating in each VR environment, subjects filled out the self-feedback VR inventory, providing feedback regarding their experience in VR as commonly used in other VR studies (Kizony et al., 2006).

# Statistical Analyses

Descriptive statistics described subjects and study variables. Repeated measure ANOVA was performed to explore the differences in the extent of pain decrease between the three study conditions. In order to examine differences between six measurements, a Bonferroni *post hoc* test was performed. Repeated contrast was conducted in order to examine the interaction effect. The maximal pain decrease from BL was calculated for each study condition separately (i.e., ΔLCL, ΔHCL, and ΔControl). A Spearman correlation test examined correlations between all pain measurers taken before the three study conditions and pain decrease following VR. Hierarchical regression was used to examine the variables predicting pain decrease following VR. Results were considered significant at the 0.05 level and presented as mean ± SEM.

# RESULTS

All the pain measures that were taken before the three study conditions are depicted in **Table 1**.

The mean (± SEM) scores of the self-feedback VR inventory (1–5) were as follows: (1) following LCL: anticipation, 3.8 ± 0.91; movement skills, 3.9 ± 0.80; attention and cognitive inhibition, 4.2 ± 0.64; physical effort, 3 ± 0.87; (2) following HCL: anticipation, 3.3 ± 1.11; movement skills, 3.1 ± 0.92; attention and cognitive inhibition, 4.1 ± 0.85; and physical effort, 1.8 ± 0.85. Significant differences between the two VR environments were found; in the LCL VR environment, the subjects reported that they could better anticipate what would happen in response to their action, that they felt more skilled in movement, and that the activity had a higher physical effort demand (**Figure 1**).

# Effect of VR Participation on Pain Intensity—Within-Session Results LCL Environment

The mean BL heat pain score taken before exposure to VR was 63.6 ± 3.3; 30 s after the heat stimulus was administered, the mean pain score dropped to 32.8 ± 3 (test 1), 29.0 ± 2.7 (test 2), 30.0 ± 2.9 (test 3), and 33.0 ± 3.2 (test 4). In the last heat measurement following 120 s from the beginning of the stimulation and right after VR was discontinued (test 5), the mean pain score increased to 47.8 ± 3.5 [RM ANOVA, *F*(5, 305) = 73.54, *p* < 0.001, η<sup>2</sup> = 0.55]. Bonferroni test revealed that in the LCL environment, all pain measurements (tests 1, 2, 3, 4, and 5) were significantly different compared with BL measurement (*p* < 0.001).

#### HCL Environment

The mean BL heat pain score taken before exposure to VR was 65.6 ± 3.3; 30 s after the heat stimulus was administered, the mean pain score dropped to 33.2 ± 24.9 (test 1), 32.7 ± 3.2 (test 2), 35.4 ± 3.6 (test 3), and 33.6 ± 3.6 (test 4). In the last heat measurement, 120 s from the beginning of the stimulation and right after VR was discontinued (test 5), the mean pain score increased to 45.4 ± 3.9 [RM ANOVA, *F*(5, 305) = 58.92, *p* < 0.001, η<sup>2</sup> = 0.49]. Bonferroni test revealed that in HCL environment, all pain measurements (tests 1, 2, 3,4, 5) were significantly different compared with BL measurement (*p* < 0.001).

#### Control Session

The mean BL heat pain score was 63.9 ± 3.2, which decreased to 48.4 ± 3.2 at test 1 [RM ANOVA, *F*(5,305) = 17.26, *p* < 0.001, η<sup>2</sup> = 0.22]. During this session, across the following four measurements, pain ratings were similar: 48.0 ± 3.3, 52.6 ± 3.5, 56.4 ± 3.7, and 55.3 ± 4 (tests 2, 3, 4, and 5 respectively). Bonferroni test revealed that in the control group, all pain measurements (tests 1, 2, 3, and 4) were significantly different than BL measurement (*p* < 0.001), except the last measurement (test 5) which was not significantly different than BL.

The maximal pain reduction was found to be between test 1 (BL) and test 3. Therefore, the difference between these two measures was calculated and the value, named ΔVR (Δ LCL = Δ low cognitive load VR), ΔHCL = (Δ high cognitive load VR), was used for further statistical analyses.

# Effect of VR Participation on Pain Intensity—Sessions Comparison

No significant differences were identified between the three pain scores at BL [RM ANOVA, *F*(2,122) = 0.64, *p* = 0.53]. However,


the reduction in pain intensity across the entire 140 s was significantly different between the study conditions [*F*(10, 610) = 14.53, *p* < 0.001, η<sup>2</sup> = 0.19]. Repeated contrast tests showed a significantly greater reduction in pain in VR conditions compared with control conditions between BL and test 1 [*F*(2, 183) = 14.97, *p* < 0.001, η<sup>2</sup> = 0.14]. In addition, there was a significant increase in pain ratings in test 5 in VR conditions only [*F*(2,183) = 21.92, *p* < 0.001, η<sup>2</sup> = 0.19] (**Figures 2** and **3**).

# Correlations between the Battery of Pain Measures and Maximal Pain Decrease in the Three Study Conditions

In the LCL environment, the Spearman test resulted in a negative correlation between ΔLCL and heat pain threshold (*r* = −0.27, *p* = 0.03) and a positive correlation with heat pain intensity (*r* = 0.33, *p* = 0.01). In addition, a positive correlation was found between ΔLCL and CPM (*r* = 0.39, *p* = 0.002). In the HCL environment, only one correlation was found to be significant; this was between ΔHCL and CPM (*r* = 0.40, *p* = 0.001). All other correlations were not found to be significant*.* In the control condition, no significant correlations were found between Δcontrol and battery of pain measures.

# Regression Analyses

Hierarchical regression analysis was conducted for each of the study conditions in order to identify predictive variables for pain reduction. The following variables were examined as possible predictors: gender, all pain measures, and four statements of the self-feedback VR inventory (anticipation, movement skills, attention and cognitive inhibition, and physical effort). In the LCL condition, hierarchical regression showed that 6.1% of the pain decrease variance was explained by gender, meaning that pain decreased more in men than in women. CPM accounted for another 7.5% of the explained variance, indicating that the extent of CPM predicted pain decrease (**Table 2**).

Hierarchical regression showed that gender predicted 10% of the explained variance in the HCL condition, as well, meaning that pain was less decreased in women than in men. CPM predicted 11.6% of the variance, meaning that the extent of CPM predicted decrease in pain. In addition, two statements of the self-feedback inventory (anticipation + attention and cognitive inhibition) added another 20.2% of the explained variance, meaning that the higher the score for abilities of anticipation, attention and cognitive inhibition, the more the pain decreased (**Table 3**).

No predictive variables were identified in the control condition [*F*(4, 56) = 1.89, *p* = 0.13].

# DISCUSSION

The current study had three hypotheses, of which only the following two were supported in the study: (1) during VR with two different CL tasks, pain ratings were significantly reduced with no difference in the extent of reduction between the two virtual environments; (3) attention and cognitive inhibition, as well as anticipation, predicted pain reduction in the HCL environment only (Demeter et al., 2016). The second hypothesis was not supported: (2) reduction in pain ratings will be significantly greater

Figure 2 | Heat pain intensity during three study conditions (mean ± SEM). Asterisks represent differences between the two virtual reality (VR) conditions and control within two adjacent time points. LCL, low cognitive load VR; HCL, high cognitive load VR. \*\**p* < 0.01, \*\*\**p* < 0.001.

in the HCL (high cognitive load) environment than in LCL (low cognitive load) environment.

Virtual reality has been shown to be effective as a pain relief technique in a variety of clinical pain conditions (i.e., Hoffman et al., 2004; Dascal et al., 2017) and in the laboratory setting, demonstrating its alleviating effect on experimental evoked pain (i.e., Hoffman et al., 2003; Sil et al., 2014). Yet, the data examining the VR environment *attributes* that impact pain reduction are limited. One study compared the effects of two different environments (warm and cold) on thermal pain intensities in healthy volunteers (Mulhberger et al., 2007). The authors hypothesized that a cold environment would reduce heat pain and *vice versa*. Nevertheless, this hypothesis was refuted when no differences were found in the effect of each environment on pain in both models. Law et al. (2010) examined whether a higher level of central cognitive processing demand (e.g., working memory and emotional control) involved in a distraction task would increase tolerance for cold pressor pain. They compared interactive versus passive distraction tasks *via* a VR-type helmet and demonstrated that the effect of distraction on cold pain tolerance was significantly enhanced when the distraction task included greater demands for central cognitive processing. Loreto-Quijada et al. (2014) compared the effects of two VR environments on a set of pain-related and cognitive variables. One aimed to distract attention away from pain (VRD), and the other was designed to enhance pain control (VRC). It was shown that the VRD intervention significantly raised the pain threshold and increased pain tolerance, while VRC seems to have a greater effect on the cognitive variables, such as the negative thoughts that commonly accompany pain problems. The current study presented in this article focused only on VR environments that aimed to distract attention away from pain and showed similar results for pain reduction; as mentioned, pain intensity was reduced during both VR environments.

The fact that the settings of these laboratory studies are diverse in many aspects points to the barriers that limit the generalization of conclusions among different studies. The current study adds that participation in VR reduces experimental pain intensity regardless of a specific cognitive demand environment. While the fact that VR is an efficient pain distracter is not novel, the similarity of those two chosen environments in their ability to reduce pain was surprising. We believe that although a distinction in the CL between the two tasks was verified, the CL *per se* was not distinguished enough in this study. When the VR environments were first chosen, we wished to minimize bias as much as possible by choosing similar tasks through the means of general presentation and motor activity. Even though the main parameter that was identified as diverse was the amount of CL involved, it

Table 2 | Hierarchical regression for predicting variables of pain decrease in an LCL environment.


*\*p* ≤ *0.05.*

could be that the variation between environments was not sharp enough. Therefore, no difference in their impact on pain was found. Hence, the contribution of the CL on pain reduction as was shown in previous studies (Eccleston and Crombez, 1999) cannot be ruled out due to the negative results of the present study; further studies are therefore warranted in order to answer this issue.

The present study also identified predictive factors affecting pain reduction during VR. Three predictors were identified. The first two predictors, including gender and CPM are discussed in a previous publication (see Demeter et al., 2014). The previous publication (Demeter et al., 2014) included the same sample of subjects, while the current research examined another VR condition. The last and best predictor identified in this study as an efficient pain reducer under VR included the following cognitive components: (1) attention and cognitive inhibition and (2) anticipation. These cognitive components made an impact only when a high cognitive effort was required within the HCL VR environment.

The link between pain and cognitive performance has been previously observed in experimental and clinical settings (i.e., Coen et al., 2008), and the findings are inconsistent. Attention constitutes the most studied cognitive component in relation to pain. Evidence shows that while attending to a painful stimulus generally increases perceived intensity (Van Damme et al., 2010); previous studies have found that only a sufficiently attention-demanding cognitive task can shift attention away from pain (Eccleston and Crombez, 1999). Moreover, some evidence shows that pain can attract attention, causing the subject to put his focus on the pain stimuli rather than being distracted from the pain (Legrain et al., 2009). The current study identified not only attention but also cognitive inhibition and anticipation as possible predictors for pain reduction during a task with a high cognitive load. Cognitive inhibition represents the ability to suppress irrelevant information and is considered a component of executive functions. Other components of executive functions include the ability to formulate and maintain goals and strategies and to retain information for further processing (Connor and Maeir, 2011). To the best of our knowledge, there is sparse evidence relating to the link between executive functions and pain inhibitory control. One review (Solberg Nes et al., 2009) has proposed a relationship between self-regulation, a component of executive functions, and the ability to cope and manage different aspects of chronic pain conditions. Although it is important and


*\*p* < *0.05 \*\*p* < *0.01 \*\*\*p* < *0.001.*

*CPM, conditioned pain modulation; A* + *CI, attention* + *cognitive inhibition; ant, anticipation.*

adds knowledge about the cognitive aspect, it focuses on studies of patients with chronic pain, unlike the current study which involved experimental pain in healthy subjects. Another study evaluated these links with healthy volunteers exposed to a cold pain model (Oosterman et al., 2010); better cognitive inhibition (as measured by the Stroop test), but not other executive functions, were found to be associated with less sensitivity to pain. Similarly, the current study obtained evidence that high perceived cognitive inhibition, as reported by the participant, predicted pain reduction.

Anticipation of action is another executive function component (Barkley, 1997). When a task is performed repeatedly, it is more likely to be automatically processed, which in turn reduces the accompanying CL. This renders the task less effective in competing with pain for attention resources (Eccleston and Crombez, 1999). This study revealed, using the selffeedback VR inventory, that the more the subjects participated in the VR environment, the more they reported an ability to more accurately anticipate the outcomes of their action. Thus, when a subject anticipated the outcome of his actions, he or she was less distracted from pain. There was a significant difference between the two VR environments in the extent of anticipation. In the HCL environment, subjects reported that they were less able to anticipate the result of their action. This finding further supports the notion that the HCL environment has a higher cognitive load. However, the lack of significant difference between the environments in the extent of attention needed by the subjects may explain the lack of difference in pain reduction between the two VR environments.

The uniqueness of the current study also derives from the use of the "activity analysis" method for analyzing the VR environments' task requirements. This process, which is usually used by occupational therapists, allows the practitioner to understand the demands placed upon a person who engages in a certain task (Thomas, 2015) and can also be used for research purposes. Based on our literature review, the current study is the first to compare VR attributes using activity analysis.

# Study Limitations

The activity analysis is a basic efficient tool used by occupational therapists. However, to the best of our knowledge, it was never used before for detecting general differences between VR environments and specifically differences in CL. Therefore, further

# REFERENCES


validation and future research are recommended for evaluating the activity analysis together with objective experimental measures of the effect of CL. Previous evidence (i.e., Pud et al., 2009) shows that sensitivity to pain may be affected by hand dominancy. The current study did not evaluate hand dominancy, and it is recommended to address the issue in future studies. In addition, the subjective experience of the subject in the VR environment was evaluated by the Self-Feedback VR Inventory which included only a few questions taken from the presence and ITQ (Witmer and Singer, 1998). Non-use of these questionnaires as a whole may affect their reliability.

In conclusion, this novel study identified evidence for significant pain reduction during submersion in two VR environments. This aspect needs to be considered when customizing pain treatment protocols for patients coping with pain. Further work is necessary in order to assess the benefits of CL in pain reduction.

# ETHICS STATEMENT

The study was approved and carried out in accordance with the recommendations of "the Ethical Committee of the University of Haifa, Faculty of Social Welfare & Health Sciences"; with written informed consent from all subjects. The protocol was approved by "the Ethical Committee of the University of Haifa, Faculty of Social Welfare & Health Sciences" (approval number-092/10).

# AUTHOR CONTRIBUTIONS

All authors of this article (ND, DP, and NJ) have substantially contributed to the conception of the work, its analysis, and interpretation of data and have approved the publication of this work.

# ACKNOWLEDGMENTS

This study was supported by a grant from the Dean of the Faculty of Social Welfare and Health Sciences, University of Haifa, Israel, for interdisciplinary research. Part of this study was submitted by the first author to the University of Haifa, Department of Occupational Therapy in partial fulfillment of the requirement for the master's degree. The current article was elaborated based on a previous publication published in the proceeding book of the 11th Intl Conf. Disability, Virtual Reality & Associated Technologies, Los Angeles, CA, USA, 20–22 September 2016.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Demeter, Pud and Josman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A Public Database of Immersive VR Videos with Corresponding Ratings of Arousal, Valence, and Correlations between Head Movements and Self Report Measures

Benjamin J. Li <sup>1</sup> \*, Jeremy N. Bailenson<sup>1</sup> , Adam Pines <sup>2</sup> , Walter J. Greenleaf <sup>1</sup> and Leanne M. Williams <sup>2</sup>

<sup>1</sup> Department of Communication, Stanford University, Stanford, CA, United States, <sup>2</sup> Department of Psychiatry and Behavioral Sciences, School of Medicine, Stanford University, Stanford, CA, United States

#### Edited by:

Albert Rizzo, USC Institute for Creative Technologies, United States

#### Reviewed by:

Andrej Košir, University of Ljubljana, Slovenia Hongying Meng, Brunel University London, United Kingdom

> \*Correspondence: Benjamin J. Li benjyli@stanford.edu

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 08 September 2017 Accepted: 20 November 2017 Published: 05 December 2017

#### Citation:

Li BJ, Bailenson JN, Pines A, Greenleaf WJ and Williams LM (2017) A Public Database of Immersive VR Videos with Corresponding Ratings of Arousal, Valence, and Correlations between Head Movements and Self Report Measures. Front. Psychol. 8:2116. doi: 10.3389/fpsyg.2017.02116 Virtual reality (VR) has been proposed as a methodological tool to study the basic science of psychology and other fields. One key advantage of VR is that sharing of virtual content can lead to more robust replication and representative sampling. A database of standardized content will help fulfill this vision. There are two objectives to this study. First, we seek to establish and allow public access to a database of immersive VR video clips that can act as a potential resource for studies on emotion induction using virtual reality. Second, given the large sample size of participants needed to get reliable valence and arousal ratings for our video, we were able to explore the possible links between the head movements of the observer and the emotions he or she feels while viewing immersive VR. To accomplish our goals, we sourced for and tested 73 immersive VR clips which participants rated on valence and arousal dimensions using self-assessment manikins. We also tracked participants' rotational head movements as they watched the clips, allowing us to correlate head movements and affect. Based on past research, we predicted relationships between the standard deviation of head yaw and valence and arousal ratings. Results showed that the stimuli varied reasonably well along the dimensions of valence and arousal, with a slight underrepresentation of clips that are of negative valence and highly arousing. The standard deviation of yaw positively correlated with valence, while a significant positive relationship was found between head pitch and arousal. The immersive VR clips tested are available online as supplemental material.

Keywords: virtual reality, database, immersive VR clips, head movement, affective ratings

# INTRODUCTION

Blascovich et al. (2002) proposed the use of virtual reality (VR) as a methodological tool to study the basic science of psychology and other fields. Since then, there has been a steady increase in the number of studies that seek to use VR as a tool (Schultheis and Rizzo, 2001; Fox et al., 2009). Some studies use VR to examine how humans respond to virtual social interactions (Dyck et al., 2008; Schroeder, 2012; Qu et al., 2014) or as a tool for exposure therapy (Difede and Hoffman, 2002; Klinger et al., 2005), while others employ VR to study phenomenon that might otherwise be impossible to recreate or manipulate in real life (Slater et al., 2006; Peck et al., 2013). In recent years, the cost of a typical hardware setup has decreased dramatically, allowing researchers to spend less than the typical price of a laptop to implement compelling VR. One of the key advantages of VR for the study of social science is that sharing of virtual content will allow "not only for cross-sectional replication but also for more representative sampling" (Blascovich et al., 2002). What is needed to fulfill this vision is a database of standardized content.

The immersive video (or immersive VR clip) is one powerful and realistic aspect of VR. It shows a photorealistic video of a scene that updates based on head-orientation but is not otherwise interactive (Slater and Sanchez-Vives, 2016). When a viewer watches an immersive VR clip, he sees a 360◦ view from where the video was originally recorded, and while changes in head orientation are rendered accurately, typically these videos do not allow for head translation. A video is recorded using multiple cameras and stitched together through software to form a total surround scene. In this sense, creating content for immersive video is fairly straightforward, and consequently there is a wealth of content publicly available on social media sites (Multisilta, 2014).

To accomplish the goal of a VR content database, we sourced and created a library of immersive VR clips that can act as a resource for scholars, paralleling the design used in prior studies on affective picture viewing (e.g., International Affective Picture System, IAPS; Lang et al., 2008). The IAPS is a large set of photographs developed to provide emotional stimuli for psychological and behavioral studies on emotion and mood induction. Participants are shown photographs and asked to rate each on the dimensions of valence and arousal. While the IAPS and its acoustic stimuli counterpart the International Affective Digital Sounds (IADS; Bradley and Lang, 1999) are well-established and used extensively in emotional research, a database of immersive VR content that can potentially induce emotions does not exist to our knowledge. As such, we were interested to explore if we can establish a database of immersive VR clips for emotion induction based on the affective response of participants.

Most VR systems allow a user to have a full 360◦ head rotation view, such that the content updates based on the particular orientation of the head. In this sense, the so-called field of regard is higher in VR than in traditional media such as the television, which doesn't change when one moves her head away from the screen. This often allows VR to trigger strong emotions in individuals (Riva et al., 2007; Parsons and Rizzo, 2008). However, few studies have examined the relationship between head movements in VR and emotions. Darwin (1965) discussed the idea of head postures representing emotional states. When one is happy, he holds his head up high. Conversely, when he is sad, his head tends to hang low. Indeed, more recent empirical research has provided empirical evidence for these relationships (Schouwstra and Hoogstraten, 1995; Wallbott, 1998; Tracy and Matsumoto, 2008).

An early study which investigated the influence of body movements on presence in virtual environments found a significant positive association between head yaw and reported presence (Slater et al., 1998). In a study on head movements in VR, participants saw themselves in a virtual classroom and participated in a learning experience (Won et al., 2016). Results showed a relationship between lateral head rotations and anxiety, where the standard deviation of head yaw significantly correlated to the awareness and concern individuals had regarding other virtual people in the room. Livingstone and Palmer (2016) tasked vocalists to speak and sing passages of varying emotions (e.g., happy, neutral, sad) and tracked their head movements using motion capture technology. Findings revealed a significant relationship between head pitch and emotions. Participants raised their heads when vocalizing passages that conveyed happiness and excitement and lowered their heads for those of a sad nature. Understanding the link between head movements in VR and emotions may be key in the development and implementation of VR in the study and treatment of psychological disorders (Wiederhold and Wiederhold, 2005; Parsons et al., 2007).

There are two objectives of the study: First, we seek to establish and allow public access to a database of immersive VR clips that can act as a potential resource for studies on emotion induction using virtual reality. Second, given we need a large sample size of participants to get reliable valence and arousal ratings for our video, we are in a unique position explore the possible links between head movements and the emotions one feels while viewing immersive VR. To accomplish our goals, we sourced for and tested 73 immersive VR clips which participants rated on valence and arousal dimensions using self-assessment manikins. These clips are available online as supplemental material. We also tracked participants' rotational head movements as they watched the clips, allowing us to correlate the observers' head movements and affect. Based on past research (Won et al., 2016), we predicted significant relationships between the standard deviation of head yaw with valence and arousal ratings.

FIGURE 1 | The experimental setup depicting a participant (A) wearing an Oculus Rift HMD to view the immersive VR clips, and (B) holding an Oculus. Rift remote to select his affective responses to his viewing experience.

# METHODS

and roll.

# Participants

Participants comprised of undergraduates from a medium-sized West Coast university who received course credit for their participation. In total, 95 participants (56 female) between the ages of 18 and 24 took part in the study.

# Stimulus and Measures

The authors spent 6 months searching for clips of immersive VR which they thought will effectively induce emotions. Sources include personal contacts and internet searches on website

such as YouTube, Vrideo, and Facebook. In total, more than 200 immersive VR clips were viewed and assessed. From this collection, 113 were shortlisted and subjected to further analysis. The experimenters evaluated the video clips and a subsequent round of selection was conducted based on the criteria employed by Gross and Levenson (1995). First, the clips had to be of relatively short length. This is especially important as longer clips may induce fatigue and nausea among participants. Second, the VR clips had to be understandable on their own without the need for further explanation. As such, clips which were sequels or part of an episodic series were excluded. Third, the VR clips should be likely to induce valence and arousal. The aim is to get a good spread of videos that will vary across the dimensions. A final 73 immersive VR clips were selected for the study. They ranged from 29 to 668 s in length with an average of 188 s per clip.

Participants viewed the immersive VR clips through an Oculus Rift CV1 (Oculus VR, Menlo Park, CA) head-mounted display (HMD). The Oculus Rift has a resolution of 2,160 × 1,200 pixels, a 110◦ field of view and a refresh rate of 90 Hz. The lowlatency tracking technology determines the relative position of the viewer's head and adjusts his view of the immersive video accordingly. Participants interacted with on-screen prompts and rated the videos using an Oculus Rift remote. Vizard 5 software (Worldviz, San Francisco, CA) was used to program the rating system. The software ran on a 3.6 GHz Intel i7 computer with an Nvidia GTX 1080 graphics card. The experimental setup is shown in **Figure 1**.

The Oculus Rift HMD features a magnetometer, gyroscope, and accelerometer which combine to allow for tracking of rotational head movement. The data was digitally captured and comprised of the pitch, yaw, and roll of the head. These are standard terms for rotations around the respective axes, and are measured in degrees. Pitch refers to the movement of the head around the X-axis, similar to a nodding movement. Yaw represents the movement of the head around the Y-axis, similar to turning the head side-to-side to indicate "no." Roll refers to moving the head around the Z-axis, similar to tilting the head

from one shoulder to the other. These movements are presented in **Figure 2**. As discussed earlier, Won et al. (2016) found a relationship between lateral head rotations and anxiety. They showed that scanning behavior, defined as the standard deviation of head yaw, significantly correlated with the awareness and concern people had of virtual others. In this study, we similarly assessed how much participants moved their heads by calculating the standard deviations of the pitch, yaw, and roll of their head movements while they watched each clip and included them as our variables.

Participants made their ratings using the self-assessment manikin (SAM; Lang, 1980). SAM shows a series of graphical figures that range along the dimensions of valence and arousal. The expressions of these figures vary across a continuous scale. The SAM scale for valence shows a sad and unhappy figure on one end, and a smiling and happy figure at the other. For arousal, the SAM scale depicts a calm and relaxed figure on one end, and an excited and interested figure on the other. A 9-point rating scale is presented at the bottom of each SAM. Participants select one of the options while wearing the HMD using the Oculus Rift remote control device that could scroll among options. Studies have shown that SAM ratings of valence and arousal are similar to those obtained from the verbal semantic differential scale (Lang, 1980; Ito et al., 1998). The SAM figures are presented in **Figure 3**.

# Procedure

Pretests were conducted to find out the duration that participants were comfortable with watching immersive videos before they experience fatigue or simulation sickness. Results revealed that some participants encountered fatigue and/or nausea if they watched for more than 15 min without a break. Most participants were at ease with a duration of around 12 min. The 73 immersive VR clips were then divided into clusters with an approximate duration of 12 min per cluster. This resulted in a total of 19 groups of videos. Based on the judgment of the experimenters, no more than two clips of a particular valence (negative/positive) or arousal (low/high) were shown consecutively (Gross and Levenson, 1995). This was to discourage participants from being


TABLE 1 |

Comprehensive

 list of all immersive VR clips in database.


TABLE 1 |

Continued

**251**



\*p < 0.05.

too involved in any particular affect and influence his judgement in the subsequent ratings. Each video clip was viewed by a minimum of 15 participants.

When participants first arrived, they were briefed by the experimenter that the purpose of the study was to examine how people respond to immersive videos. Participants were told that they would be wearing an HMD to view the immersive videos, and that they can request to stop participating at any time if they feel discomfort, nauseous, or some form of simulator sickness. Participants were then presented with a printout of the SAM measures for valence and arousal, and told that they would be rating the immersive videos based on these dimensions. Participants were then introduced to the Oculus Rift remote and its operation in order to rate the immersive VR clips.

The specific procedure is presented here: Participants sat on swivel chair which allowed them to turn around 360◦ if they wished to. They first watched a test immersive VR clip and did a mock rating to get accustomed to the viewing and rating process. They then watched and rated a total of three groups of video clips with each group comprising of between two and four video clips. A 5 s preparation screen was presented before each clip. After the clip was shown, participants were presented with the SAM scale for valence. After participants selected the corresponding rating using the Oculus Rift remote, the SAM scale for arousal was presented and participants made their ratings. Following this, the aforementioned 5 s preparation screen was presented to get participants ready to view the next clip. After watching one group of immersive VR clips, participants were given a short break of about 5 min before continuing with the next group of clips. This was done to minimize the chances of participants feeling fatigue or nauseous by allowing them to rest in between group of videos. With each group of videos having a duration of about 12 min, the entire rating process lasted around 40 min.

# RESULTS

# Affective Ratings

**Figure 4** shows the plots of the immersive video clips (labeled by their ID numbers) based on mean ratings of valence and arousal. There is a varied distribution of video clips above the midpoint (5) of valence that vary across arousal ratings. However, despite our efforts to locate and shortlist immersive VR clips for the study, there appears to be an underrepresentation for clips that both induce negative valence and are highly arousing. **Table 1** shows a list of all the clips in the database, together with a short description, length and their corresponding valence and arousal ratings.

The immersive VR clips varied on arousal ratings (M = 4.20, SD = 1.39), ranging from a low of 1.57 to a high of 7.4. This compares favorably with arousal ratings on the IAPS, which range from 1.72 to 7.35 (Lang et al., 2008). Comparatively, arousal ratings on the IAPS ranged from 1.72 to 7.35. The video clips also varied on valence ratings (M = 5.59, SD = 1.40), with a low of 2.2 and a high of 7.7. This compares reasonably well with valence ratings on the IAPS, which range from 1.31 to 8.34.

# Head Movement Data

Pearson's product-moment correlations between observers' head movement data and their affective ratings are presented in **Table 2**. Most scores appear to be normally distributed as assessed by a visual inspection of Normal Q-Q plots (see **Figure 5**). Analyses showed that average standard deviation of head yaw significantly predicted valence [F(1, 71) = 5.06, p = 0.03, r =0.26, adjusted R <sup>2</sup> = 0.05], although the direction was in contrast to our hypothesis. There was no significant relationship between standard deviation of head yaw with arousal [F(1, 71) = 2.02, p = 0.16, r = 0.17, adjusted R <sup>2</sup> =0.01)]. However, there was a significant relationship between average head pitch movement and arousal [F(1, 71) = 4.63, p = 0.04, r = 0.25; adjusted R <sup>2</sup> = 0.05]. Assumptions of the F-test for the significant relationships were met, with analyses showing homoscedasticity and normality of the residuals. The plots of the significant relationships are presented in **Figures 6**, **7**.

# DISCUSSION

The first objective of the study was to establish and introduce a database of immersive video clips that can serve as a resource for emotion induction research through VR. We sourced and tested a total of 73 video clips. Results showed that the stimuli varied reasonably well along the dimensions of valence and arousal. However, there appears to be a lack of representation for videos that are of negative valence yet highly arousing. In the IAPS and IADS, stimuli that belong to this quadrant tend to represent themes that are gory or violent, such as a victim of an attack that has his face mutilated, or a woman being held hostage with a knife to her throat. The majority of our videos are in the public domain and readily viewable on popular websites such as Youtube which have a strict policy on the types of content that can be uploaded. Hence, it is not surprising that stimuli of negative valence and arousal were not captured in our selection of immersive videos. Regardless, the collection of video clips (which can be found here) should serve as a good launching pad for researchers interested to examine the links between VR and emotion.

Although not a key factor of interest for this paper, we observed variance in the length of the video clips which was confounded with video content. Long video clips in our database tend to be of serious journalism content (e.g., nuclear fallout, homeless veterans, dictatorship regime) and naturally evoke negative valence. Length is a distinct factor of videos in contrast to photographs which are the standard emotional stimuli of photographs. Hence, while we experienced difficulty sourcing for long video clips that are of positive valence, future studies should examine the influence of video clip length on affective ratings.

The second objective sought to explore the relationship between observers' head movements and their emotions. We demonstrated a significant relationship between the amount of head yaw and valence ratings, which suggests that individuals who displayed greater movement of side-to-side head movement gave higher ratings of pleasure. However, the positive relationship shown here is in contrast to that presented by Won et al. (2016) who showed a significant relationship between the amount of head yaw and reported anxiety. It appears that content and context is an important differentiating factor when it comes to the effects of head movements. Participants in the former study explored their virtual environment and may have felt anxious in the presence of other virtual people. In our study, participants simply viewed the content presented to them without the need for navigation. Although no significant relationship was present between standard deviation of head yaw and arousal ratings, we found a correlation between head pitch and arousal, suggesting that people who tend to tilt their head upwards while watching immersive videos reported being more excited. This parallels research conducted by Lhommet and Marsella (2015) who compiled data from various studies on head positions and emotion states and showed that tilting the head up corresponds to feelings of excitement such as surprise and fear. The links between head movement and emotion are important findings and deserves further investigation.

One thing of note is the small effect sizes shown in our study (adjusted R <sup>2</sup> = 0.05). While we tried our best to balance efficient data collection and managing participant fatigue, some participants may not be used to watching VR clips at length and may have felt uncomfortable or distressed without overtly expressing it. This may have influenced their ratings for VR clips toward the end of their study session, and may explain the small effect size. Future studies can explore when participant fatigue is

likely to take place and adjust the viewing duration accordingly to minimize the impact on participant ratings.

Self-perception theory posits that people determine their attitudes based on their behavior (Bem, 1972). Future research can explore whether tasking participants to direct their head in certain directions or movements can lead to changes in their affect or attitudes. For example, imagine placing a participant in a virtual garden filled with colorful flowers and lush greenery. Since our study shows a positive link between amount of head yaw and valence ratings, will participants tasked to keep their gaze on a butterfly fluttering around them (therefore increasing the amount of head movement) lead to stronger valence compared to those who see a stationary butterfly resting on a flower? Results from this and similar studies can possibly aid in the development of virtual environments that assist patients undergoing technology-assisted therapy.

Our study examined the rotational head movements enacted by participants as they watched the video clips. Participants in our study sat on a swivel chair, which allowed them to swing around to have a full surround view of the immersive video. Future studies can incorporate translational head movements, which refers to movements that operate horizontally, laterally and vertically (x-, y-, and z- axes). This can exist through allowing participants to sit, stand or walk freely, or even program depth field elements into the immersive videos and seeing how participants' rotational and translational head movements correlate with their affect. Exploring the effects of the added degrees of freedom will contribute to a deeper understanding on the connection between head movements and emotions.

# ETHICS STATEMENT

This study was carried out in accordance with the recommendations of the Human Research Protection Program, Stanford University Administrative Panel on Human Subjects in Non-Medical Research with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Stanford University Administrative Panel on Human Subjects in Non-Medical Research.

# AUTHOR CONTRIBUTIONS

The authors worked as a team and made contributions throughout. BL and JB conceptualized and conducted the study. AP contributed in the sourcing and shortlisting of immersive

# REFERENCES


VR clips and in revising the manuscript. WG and LW acted as domain consultants for the subject and contributed in writing and revisions.

# FUNDING

This work is part of the RAINBOW-ENGAGE study supported by NIH Grant 1UH2AG052163-01.

# SUPPLEMENTARY MATERIAL

The Supplementary videos are available online at: http://vhil. stanford.edu/360-video-database/


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Li, Bailenson, Pines, Greenleaf and Williams. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Reporting Mental Health Symptoms: Breaking Down Barriers to Care with Virtual Human Interviewers

*Gale M. Lucas1 \*, Albert Rizzo1 , Jonathan Gratch1 , Stefan Scherer <sup>1</sup> , Giota Stratou <sup>1</sup> , Jill Boberg1 and Louis-Philippe Morency <sup>2</sup>*

*<sup>1</sup> Institute for Creative Technologies, University of Southern California, Los Angeles, CA, United States, 2School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States*

A common barrier to healthcare for psychiatric conditions is the stigma associated with these disorders. Perceived stigma prevents many from reporting their symptoms. Stigma is a particularly pervasive problem among military service members, preventing them from reporting symptoms of combat-related conditions like posttraumatic stress disorder (PTSD). However, research shows (increased reporting by service members when anonymous assessments are used. For example, service members report more symptoms of PTSD when they *anonymously* answer the Post-Deployment Health Assessment (PDHA) symptom checklist compared to the *official* PDHA, which is identifiable and linked to their military records. To investigate the factors that influence reporting of psychological symptoms by service members, we used a transformative technology: automated virtual humans that interview people about their symptoms. Such virtual human interviewers allow simultaneous use of two techniques for eliciting disclosure that would otherwise be incompatible; they afford *anonymity* while also building *rapport*. We examined whether virtual human interviewers could increase disclosure of mental health symptoms among active-duty service members that just returned from a yearlong deployment in Afghanistan. Service members reported more symptoms during a conversation with a virtual human interviewer than on the official PDHA. They also reported more to a virtual human interviewer than on an *anonymized* PDHA. A second, larger sample of active-duty and former service members found a similar effect that approached statistical significance. Because respondents in both studies shared more with virtual human interviewers than an anonymized PDHA—even though both conditions control for stigma and ramifications for service members' military records—*virtual human interviewers that build rapport* may provide a superior option to encourage reporting.

Keywords: virtual humans, assessment, disclosure, psychological symptoms, anonymity

# INTRODUCTION

People are reluctant to disclose information that could be potentially stigmatizing. One area where this failure to disclose honest information has particularly large consequences is mental health. Due to the stigma associated with mental health problems (Link et al., 1991, 2001), people are reluctant to report symptoms of such disorders. The consequences are significant—mental health problems exact

#### *Edited by:*

*Mel Slater, University of Barcelona, Spain*

# *Reviewed by:*

*Lucia Valmaggia, Institute of Psychiatry, Psychology & Neuroscience (IoPPN), United Kingdom Xueni Pan, Goldsmiths, University of London, United Kingdom*

> *\*Correspondence: Gale M. Lucas lucas@ict.usc.edu*

#### *Specialty section:*

*This article was submitted to Virtual Environments, a section of the journal Frontiers in Robotics and AI*

*Received: 26 July 2017 Accepted: 22 September 2017 Published: 12 October 2017*

#### *Citation:*

*Lucas GM, Rizzo A, Gratch J, Scherer S, Stratou G, Boberg J and Morency L-P (2017) Reporting Mental Health Symptoms: Breaking Down Barriers to Care with Virtual Human Interviewers. Front. Robot. AI 4:51. doi: 10.3389/frobt.2017.00051*

**256**

a significant toll on society (World Health Organization, 2004; Insel, 2008; National Institute of Mental Health, 2010; World Economic Forum, 2011).

The majority of individuals who seek mental health services report facing stigma and discrimination (Thornicroft et al., 2009). Accordingly, stigma and discrimination acts as a significant barrier to care and honest reporting of symptoms, which individuals must overcome. They may try to deal with stigma using coping methods that are more or less effective (Isaksson et al., 2017); however, if they cannot successfully cope, stigma and the resultant unwillingness to report symptoms end up preventing people from accessing or receiving treatment, leaving the disorder unresolved. These barriers to care pose a large problem for society since mental health problems are costly, both in terms of money and social capital (Insel, 2008) and unresolved mental health problems continue to accrue increasing costs (World Health Organization, 2004; Insel, 2008; National Institute of Mental Health, 2010; World Economic Forum, 2011).

In the current work, we explore this problem among military service members, given that failure to disclose symptoms is often cited as a barrier to care in the military [Institute of Medicine (IOM), 2014; Rizzo and Shilling, in press]. Service members are reluctant to report symptoms of combat-related conditions like posttraumatic stress disorder (PTSD), which is typified by persistent mental, behavioral, and emotional symptoms as a result of exposure to physical or psychological trauma. Not only are service members more likely to have PTSD than civilians (Vincenzes, 2013; Schreiber and McEnany, 2015) but also as a result of the perceived stigma surrounding the condition (Hoge et al., 2004, 2006), they are particularly reluctant to report symptoms (Olson et al., 2004; Appenzeller et al., 2007; Warner et al., 2007, 2008, 2011; McLay et al., 2008; Fear et al., 2010; Thomas et al., 2010). The reluctance of service members in the United States Military to report PTSD symptoms is especially intensified when they are screened for mental health symptoms using the official administration of the Post-Deployment Health Assessment (PDHA; Hyams et al., 2002; Wright et al., 2005) since this information becomes documented in their military health records. Indeed, there are pragmatic military career implications (such as the perception of possible future restrictions from certain job placements and from obtaining future security clearances) for having been screened positive for mental health conditions.

To address this reluctance to disclose PTSD symptoms on the PDHA, we examine whether a new technology, namely virtual human-interviewers, can be used to increase willingness of service members to report PTSD symptoms compared to the PDHA. Virtual humans are digital representations of humans that can portray human-like characteristics and abilities and can be used to interview people in a natural way using conversational speech (see **Figure 1**). We first describe empirical work that provides a theoretical basis for how virtual human interviewers might increase the willingness of service members to report PTSD symptoms, followed by the research questions addressed in the current work. We then describe and discuss results from two studies that examine the effectiveness of virtual human interviewers designed to foster service member reporting of mental

FIGURE 1 | Interview with our virtual human interviewer.

health symptoms that might otherwise be withheld when using traditional self-report checklists (such as the PDHA).

# Related Work

Anonymity is theorized to support the potential effectiveness of virtual human interviewers for increasing reports of healthrelated symptoms. Previous research suggests that anonymized forms of assessment increase reporting. For example, respondents reveal more honest information during computerized selfassessments (Greist et al., 1973; Beckenbach, 1995; Weisband and Kiesler, 1996; van der Heijden et al., 2000; Joinson, 2001), and they appear to do so because they perceive these assessments to be more anonymous than non-computerized human interviewing methods (Sebestik et al., 1988; Thornberry et al., 1990; Baker, 1992; Beckenbach, 1995; Joinson, 2001). Although anonymized assessments can improve honest reporting for even mundane private information (Beckenbach, 1995; Joinson, 2001), these effects are especially strong when the information is illegal, unethical, or culturally stigmatized (Weisband and Kiesler, 1996; van der Heijden et al., 2000). As many behaviors that are harmful to mental and physical health fall into this category (e.g., drug use, unsafe sex, suicide attempts), anonymized forms of assessment can be especially important in healthcare assessment. For example, when asked to disclose information about suicidal thoughts using a computer-administered assessment, participants not only felt more positive about the assessment compared to traditional methods, but also gave more honest answers (Greist et al., 1973).

Relevant to the focus of the current work, anonymity has been shown to increase reporting disclosure of PTSD symptoms among service members (Olson et al., 2004; McLay et al., 2008; Warner et al., 2011). One study indicated that following a combat deployment, the sub-sample of service members who *anonymously* answered the routine PDHA symptom checklist reported twofold to fourfold higher mental health symptoms and a higher interest in receiving care compared to the overall results derived from the standard administration of the PDHA, which is identifiable and linked to service members' military records (Warner et al., 2011).

Initial research on the use of virtual humans to conduct clinical interviews suggests that interviewees are indeed more open to virtual human interviewers than their human counterparts (Slack and Van Cura, 1968; Lucas et al., 2014; Pickard et al., 2016). Because a conversation with a virtual human interviewer may be viewed as more anonymous, users may be more comfortable disclosing about highly sensitive topics and on questions that could lead them to admit something stigmatized or otherwise negative. For example, during a clinical interview with a virtual human interviewer, participants disclose more personal details when they are told that the virtual human is autonomous than when they are told that the virtual human is operated by a person in another room (Lucas et al., 2014). Pickard et al. (2016) reported that individuals are more comfortable disclosing to an automated virtual human interviewer than its human counterpart.

While research has yet to establish that virtual human interviewers can increase reporting of PTSD symptoms specifically among service members, some research has considered the potential benefits of using virtual human interviewers and related technology for service members (Lewandowski et al., 2011; Rizzo et al., 2011; Serowik et al., 2014; Bhalla et al., 2016). Rizzo et al. (2011) developed a virtual human to interview service members about their PTSD symptoms. Advances in automation now allow virtual human interviewers to have more interactive conversations with users, in which questions about the PTSD symptoms can be embedded. Having such an interactive conversation is critical because, while anonymity is beneficial, building rapport with respondents can also increase reporting (Burgoon et al., 2016).

Indeed, a second theoretical basis behind the potential effectiveness of virtual human interviewers for increasing report of symptoms is rapport. Psychological theories of rapport (e.g., Tickle-Degnen and Rosenthal, 1990) have outlined verbal and non-verbal behaviors that help to build rapport; and subsequent research has shown that resultant rapport leads interlocutors to disclose more (Miller et al., 1983; Hall et al., 1996; Gratch et al., 2007, 2013; Burgoon et al., 2016). Differences in disclosure between assessment formats have also been found to be mediated by feelings of rapport; rapport leads individuals to disclose more personal information (Dijkstra, 1987; Gratch et al., 2007, 2013).

Because traditional computerized self-assessments and other anonymized forms lack any human element, these traditional assessments do not evoke the same feelings of rapport or social connection. Specifically, when there is not a human or human-like agent present in some way, shape, or form, people feel less socially connected during the assessment (Gratch et al., 2007, 2013).

Tickle-Degnen and Rosenthal (1990) suggest several features of "the human element" that are important in increasing rapport, including both verbal and non-verbal behavior. For example, listeners who are naturally more verbally receptive and attentive and who use more follow-up questions, produce greater disclosure from reticent interviewees (Miller et al., 1983). Beyond the words uttered, non-verbal behavior such as positive facial expressions, attentive eye gaze, welcoming gestures and open postures have been reported to influence feelings of rapport (Hall et al., 1996; Burgoon et al., 2016). These features may allow virtual human interviewers to more effectively build rapport, in contrast to traditional computerized self-assessments and other anonymized forms. Indeed, research suggests that virtual human interviewers have the potential build rapport as well as—or even better than—human interviewers (e.g., DeVault et al., 2014).

Researchers have attempted to translate these psychological theories of rapport into computational systems and studies have indicated that it is possible to capture these behaviors in various automated systems ranging from machine learning-based prediction models (e.g., Morency et al., 2009; Huang et al., 2010) to "chatbots" (e.g., Kerlyl et al., 2007) and virtual humans (e.g., Cassell and Bickmore, 2002; Bickmore et al., 2005; Haylan, 2005; Cassell et al., 2007; Gratch et al., 2007, 2013; Matsuyama et al., 2016; Zhao et al., 2016). For example, some virtual humans have been designed to utilize verbal (e.g., words uttered, prosody, intonation, etc.) and non-verbal behavior (e.g., positive facial expressions, gaze, gestures, and posture) to build rapport (Cassell and Bickmore, 2002; Bickmore et al., 2005; Haylan, 2005; Cassell et al., 2007; Gratch et al., 2007, 2013; Matsuyama et al., 2016; Zhao et al., 2016). Research has also established that virtual humans that employ such rapport-building behaviors are able to induce disclosure (Gratch et al., 2007, 2013).

While rapport-building seems contrary to anonymity, the use of virtual human interviewers may provide a solution that allows for both anonymity as well as rapport-building. Some virtual humans can be used to interview people in a natural way (i.e., *via* conversational speech). Akin to the "Rapport Agents" described above, these virtual human interviewers have been designed to *build rapport with users specifically during interviews* (e.g., Gratch et al., 2013; Qu et al., 2014)*,* including clinical interviews (e.g., Bickmore et al., 2005*;* DeVault et al., 2014; Lucas et al., 2014; Rizzo et al., 2016). Interspersed *appropriately* during an interview, the virtual human interviewers use verbal and non-verbal backchannels (e.g., utterances of agreement such as "mhm" or head nods) to build rapport with the interviewee. Indeed, virtual human interviewers that employ such backchannels when appropriate to the conversation create greater feelings of rapport than virtual human interviewers that employ them at random during the interview (e.g., Gratch et al., 2013; Qu et al., 2014). As with Rapport Agents, when virtual human interviewers use these rapport-building behaviors in this way, they are able to prompt disclosure from interviewees (Gratch et al., 2013).

# The Current Research

Given that the experience of stigma can limit the reporting of PTSD symptoms, many service members with the disorder are not identified and do not have the opportunity to benefit from the evidence-based treatments that currently exist. By using a virtual human interviewer to increase self-disclosure of more accurate information, service members having such difficulties could be better encouraged to access potentially beneficial mental health care options. Although the prior research is suggestive, it has not sufficiently established that virtual human interviewers can be used to increase service members' willingness to endorse the presence of PTSD symptoms compared to self-report on the PDHA checklist items. Thus, the current study tested whether virtual human interviewers can encourage reporting of PTSD symptoms compared to the gold-standard PHDA. Accordingly, we hypothesize that service members will be more willing to disclose PTSD symptoms to a virtual human interviewer than on the official PDHA (H1). The study also examines the role of rapport *in addition to* anonymity for increasing disclosure. While Warner et al. (2011) demonstrated that service members who answered the PDHA symptom checklist anonymously were more willing to report mental health symptoms compared to the official PDHA, virtual human interviewers with the added benefit of rapport-building, may have the capability to evoke higher levels of disclosure of symptoms. If rapport has an impact on self-disclosure in this context above and beyond anonymity, service members will be more willing to report symptoms to a virtual human interviewer than on an anonymized version of the PDHA (even though they are both equally anonymous). Thus, we hypothesized that service members would be more willing to report PTSD symptoms to a virtual human interviewer than on an anonymized version of the PDHA (H2).

In order to address these research questions, we conducted two studies to test whether service members (Studies 1 and 2) and Veterans (in Study 2) would be more willing to report PTSD symptoms when asked by a virtual human interviewer than when asked to report on the PDHA (either official or anonymized).

# STUDY 1—MATERIALS AND METHODS

# Study 1—Participants

In Study 1, 29 (2 females) active-duty Colorado National Guard service members volunteered to participate in the study during 2013. None of the service members in the unit declined to participate. After returning from a year-long deployment to Afghanistan, they completed the measures described below. The sample was diverse regarding age (*M* = 41.46, Range = 26–56) and previous number of combat deployments (*M* = 2.00, Range = 1–7). Due to technical failures five participants (all male) were excluded from the analysis reported below.

# Study 1—Design and Procedure

This study compared reporting of PTSD symptoms in three formats: (1) standard administration of the PDHA upon return from deployment; (2) an anonymized version of the PDHA; and (3) PDHA questions on PTSD symptoms asked by a virtual human interviewer that were embedded in a longer set of general interview questions. All participants completed the official PHDA within 2 days of the other two assessments (either before or after these other two assessments) and signed releases to allow the research team to access their official PDHA responses gathered at post-deployment processing. Three questions on the PDHA assess whether the service member is experiencing the three core Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) diagnostic symptoms for PTSD (intrusive recollections; avoidance/numbing; hyperarousal).

On the official PDHA, participants were asked "*Have you ever had any experience that was so frightening, horrible, or upsetting that, in the past month, you:*


Participants selected "yes" or "no" on each of these three items, and their answers were submitted to the US Military as part of their official military health record. At the same time, we were granted access to these official PDHA responses for our study sample.

Next, participants arrived at the study site, gave consent, completed a demographic questionnaire, and rated their mood using items "I am happy" and "I worry too much" on 4-point scales from *almost never* to *almost always*. They then were escorted to a private room and completed the anonymized PDHA PTSD questions on a computer, selecting Yes or No responses to each item. The participants were verbally assured that their responses were confidential as they would be deidentified using a participant number code.

Participants were then engaged by a virtual human interviewer who conducted a semistructured screening interview *via* spoken language. Participants were still alone in the private room and were told they would not be observed by anyone during the interview and that the video recordings of their interview session would not be released to anyone outside the research team. The full interview was structured around a series of agent-initiated questions organized into three phases: Phase 1 was a rapportbuilding phase where the virtual human interviewer asked participants general introductory questions (e.g., "Where are you from originally?); Phase 2 was the clinical phase where the virtual human interviewer asked a series of questions about symptoms (e.g., "How easy is it for you to get a good night's sleep?"), which included the naturally embedded PDHA questions; Phase 3 was the ending section of the interview where the virtual human interviewer asks questions designed to return the patient to a more positive mood (e.g., "What are you most proud of?"). Across the session, the virtual human interviewer built rapport using follow-up questions (e.g., "Can you tell me more about that?"), empathetic feedback (e.g., "I'm sorry to hear that"), and non-verbal behaviors (e.g., nods, expressions).

The PDHA questions that were asked by the virtual human interviewer were slightly re-worded in order to embed them in the interview. In place of the three PDHA questions listed above, the virtual human interviewer asked participants these revised versions:


Participants' answers to the three PDHA questions during the interview were recorded and later coded by two blind coders as to whether the participant had this experience in the last month or not. While more nuanced than checking "yes" or "no" on PDHA, our coders dichotomized open-ended responses to parallel the PDHA. For example, one response to the intrusive recollections question that was coded as *"no" was "Um*… *haven't had any, you know, dreams or nightmares. No."* A response to the avoidance question that was coded as "yes" stated: *"Um*… *I try to leave early. I try to leave a situation. I try not to talk to those people. Um*… *That's the only time I really avoid a situation is avoiding those people."* Coders had 100% agreement, and codes served as "yes" or "no" answers.

# STUDY 1—RESULTS

In Study 1, three versions of the PDHA (official PDHA, Anonymized PDHA, and virtual human interviewer) were administered to participants to determine whether manner of administration produced differing responses. Scores were created for each version of the PDHA by counting the number of "yes" answers to the three questions that assess the core DSM-IV-TR diagnostic symptoms for PTSD (intrusive recollection, avoidance/numbing, hyperarousal). To compare responding, we conducted a repeated-measures ANOVA using 24 participants from a sample of active-duty Colorado National Guard who completed all three measures. There was a significant effect of assessment type, *F*(2,23) = 4.29, *p*= 0.02 (**Figure 2**). Within-subject contrasts revealed that participants reported more symptoms of PTSD (responded "yes" on more questions) when asked by the virtual human interviewer (*M* = 0.79, SE = 0.23) than when reporting on the official PDHA (*M* = 0.25, SE = 0.15), *F*(1,23) = 7.38, *p* = 0.01, or even when reporting on the anonymized version of the PDHA (*M* = 0.33, SE = 0.16), *F*(1,23) = 4.84, *p* = 0.04. The difference between official and anonymized versions of the PDHA was not significant [*F*(1,23) = 0.19, *p* = 0.66].

Study 1 provided an initial test of our research hypotheses with results suggesting that service members are more willing to report PTSD symptoms to a virtual human interviewer than

interviewer during a postdeployment interview in Study 1. \**p* < 0.05.

on the official PDHA (H1). The results also indicate that service members are more willing to report PTSD symptoms to a virtual human interviewer than on an anonymized version of the PDHA (Q2). Indeed, because respondents in this study shared more with virtual human interviewers than an anonymized PDHA—even though both conditions control for stigma and ramifications for service members' military records—*virtual human interviewers that build rapport* may provide a superior option for encouraging endorsement of these symptoms. This finding has important implications, suggesting that virtual human interviewers may help service members "open up" and report their psychological symptoms through rapport building. We then conducted a second study to replicate (and extend) this result in a larger, more diverse sample including both active-duty service members and retired military veterans. In this second study, we also ruled out the confound introduced by the wording differences between the virtual human interviewer's questions and the questions listed on the PDHA. In Study 2, the questions asked by the virtual human interviewer were worded identically to the questions on the anonymized PDHA.

# STUDY 2—MATERIALS AND METHODS

# Study 2—Participants

In Study 2, 132 (16 female) active duty service members and veterans were recruited (e.g., through *Craigslist*), and paid \$30 for their participation during 2014 and 2015. Only individuals who were enrolled as a part of the US military, either currently or in the past, were invited to participate. As in Study 1, this sample was diverse regarding age (*M* = 44.12, Range = 18–77), but information regarding number of deployments was not taken for this sample.

# Study 2—Design and Procedure

Participants completed the same procedures as Study 1 with a few exceptions. First, since this sample included veteran participants who had not just returned from a deployment, we did not collect the official PDHA for this study. Thus, we only compared responses to the anonymized PDHA with the same questions asked by the virtual human interviewer.

Second, after giving consent and completing demographic questions, participants also completed additional screening measures including the PTSD Checklist (PCL; Blanchard et al., 1996). The PCL is a self-report measure that evaluates PTSD using a 5-point Likert scale. It is based on the DSM-IV-TR. Scores range from 17 to 85 and symptom severity is reflected in the size of the score, with larger scores indicating greater severity of PTSD symptoms. The PCL is commonly used in clinical practice and in research studies on PTSD. Participants also completed additional individual difference questionnaires that were not relevant to the current research questions, but described elsewhere (DeVault et al., 2014; Gratch et al., 2014).

Finally, and most importantly, Study 2 rules out the confound of question wording. While in Study 1 our anonymized version of the PDHA used the question wording from the official PHDA, in Study 2 our anonymized version of the PDHA used the exact wording employed by the virtual human interviewer (see above). Therefore, in this study, any differences observed between answers on the anonymized version of the PDHA and the interview led by a virtual human could not be due to question wording. Coders again dichotomized each response as "yes" or "no" for the PTSD symptom, and had 100% agreement.

# STUDY 2—RESULTS

A repeated-measures *t*-test among participants who successfully completed both assessments (*n*= 126) revealed an effect of assessment type that approached statistical significance [*t*(125) = 1.76, *p* = 0.08]; participants reported more PTSD symptoms when asked by the virtual human interviewer (*M* = 1.21, SE = 0.10) than on an anonymized version of the PDHA (*M* = 1.05, SE = 0.11; **Figure 3A**). There is a significant interaction with PCL score [*F*(1,124) = 4.38, *p* = 0.04] such that among those with subtler subthreshold PTSD symptoms (below median on the PTSD Checklist; PCL; Blanchard et al., 1996), the effect is significant [*M* = 0.53, SE = 0.10 vs. *M* = 0.17, SE = 0.07; *t*(63) = 3.77, *p* < 0.001; **Figure 3B**]. However, there is no significant difference among those already reporting higher symptoms on the PCL [*M* = 1.92, SE = 0.12 vs. *M* = 1.95, SE = 0.14; *t*(61) = −0.20,

FIGURE 3 | Number of posttraumatic stress disorder (PTSD) symptoms reported out of three questions representing Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) categories. Participants reported fewer symptoms (1) on an anonymized version of the Post-Deployment Health Assessment (PDHA), than (2) to a virtual human interviewer in Study 2 (A) across all participants and (B) among those with subtler PTSD symptoms [below median on PTSD Checklist; PTSD Checklist (PCL); Blanchard et al., 1996]. \**p* < 0.05; † *p* < 0.09.

*p* = 0.84]. Likewise, entering PCL score as a covariate rendered the aforementioned effect of assessment type on number of reported symptoms significant [virtual human interviewer *M* = 1.21, SE = 0.08 vs. anonymized PDHA *M* = 1.05, SE = 0.08; *F*(1,124) = 5.78, *p* = 0.02]. Finally, while an ANOVA revealed a significant between-subjects main effect of active duty status on number of reported symptoms such that active duty subjects were overall less willing to report symptoms (*M* = 0.16, SE = 0.25) than veterans [*M* = 1.27, SE = 0.10; *F*(1,124) = 17.32, *p* < 0.001], there was no interaction between assessment type and active duty status [*F*(1,124) = 0.34, *p* = 0.56].

Like Study 1, Study 2 demonstrates that service members are more willing to report PTSD symptoms to a virtual human interviewer than on an anonymized version of the PDHA (Q2). Indeed, even though both conditions control for stigma and ramifications for service members' military records, participants are more willing to report PTSD symptoms to virtual human interviewer than an anonymous version of the PDHA.

# DISCUSSION

Across both studies, participants reported more PTSD symptoms when asked by a virtual human interviewer. Study 1 showed the effectiveness of virtual human interviewers in a sample of active duty service members reporting symptoms of mental distress. Supporting H1, service members reported more symptoms during a conversation with a virtual human interviewer than on the official PDHA. Our analysis of the small sample in Study 1 did not reveal differences between official and anonymized versions of the PDHA as was reported in Warner et al. (2011). However, in Warner et al., within-group results were not assessed and instead mean group differences between those who volunteered to fill out the anonymous version were compared with the mean of the larger official PDHA sample (1,712 out of 3,502). Service members in Study 1 also reported more PTSD symptoms to a virtual human interviewer than on an *anonymized* PDHA. In Study 2, we found a similar effect that approached statistical significance using a larger sample of active-duty service members and veterans. As in Study 1, participants in this study tended to report more symptoms when asked by a virtual human interviewer than on an anonymized PDHA. Thus, both reported studies support H2. Furthermore, the second study suggests that individuals falling under the radar in traditional assessments and scoring low on questionnaires like the PCL (e.g., possibly due to impression management, fear of stigmatization) could be detected by virtual human interviewers. Indeed, in this second study (where the sample has a broader range of distress), without taking into account PCL, the effect of assessment type on reporting of PTSD symptoms only approached statistical significance.

Although we showed that virtual human interviewers can increase service members' disclosure of mental health symptoms, further research is required to rule out alternative explanations concerning the mechanism behind this disclosure. For example, the open-ended nature of the questions asked by the virtual human interviewer could have contributed to encouraging service members to disclose. To see the extent to which this factor contributes, future research could—for example—compare an open-ended paper-and-pencil version of the PDHA questions to the official forced-choice version in the absence of rapport building. Likewise, in both studies, all participants completed the anonymized PDHA before the interview with the virtual human, leaving order as another possible alternative explanation. However, this is unlikely to explain our results because, in Study 1, some participants completed the official PDHA before the anonymized PDHA and the virtual human interview, whereas others completed the official PDHA *after* these other two assessments. Although we do not have access to the dates when specific service members in our study completed the official PDHA to further test this, if order made a significant contribution, we would not have found the strongest effect of assessment (1) in this study and (2) when comparing to this (official) assessment, which was completed last for some participants. While this may help to rule out order as an alternative explanation for the difference between the virtual human interviewer and the official PDHA, it does not preclude the possibility that an order effect contributed to the difference between the virtual human interviewer and the anonymized PDHA.

In line with previous studies (Slack and Van Cura, 1968; Lucas et al., 2014; Pickard et al., 2016), these results support the view that virtual human interviewers provide a safe, reduced-stigma context where users may reveal more honest information. However, our results also go beyond prior work in that the current study focused specifically on service members and veterans, rather than a general civilian population. Also, where other clinical interviews led by virtual humans are more general, the clinical interview in this work assessed responses to specific questions about symptoms of PTSD. Thus, the results of this study add to previous work on use of such technologies for service members by demonstrating that virtual human interviewers may have a role to play in enhancing military mental health assessment by encouraging service members to report more PTSD symptoms than the gold-standard PDHA.

Moreover, beyond effects of anonymity found previously (e.g., Warner et al., 2011), virtual human interviewers may help soldiers "open up" and report their psychological symptoms through rapport building. Given that service members were more willing to report symptoms to a virtual human interviewer than on an anonymized version of the PDHA—even though these assessments were equally anonymous, this work establishes the idea that rapport has an impact on self-disclosure *above and beyond anonymity*. Pragmatically, this finding makes the case for taking advantage of the value that rapport-building holds for honest reporting rather than just relying on anonymity. For example, just having an anonymous online form appears not to be a sufficient "technological leap" to maximize self-disclosure. Honest reporting of such symptoms can better inform accurate diagnosis and help service members and civilians to break down barriers to care and receive evidence-based interventions that could mitigate the serious consequences of having a chronic untreated health condition. As such, the benefits of virtual human administrated mental health assessments could be substantial.

Finding that there is an impact of rapport for increasing disclosure *in addition to* anonymity has implications beyond just reporting of psychological symptoms. Building upon the established effect of anonymity on disclosure (Sebestik et al., 1988; Thornberry et al., 1990; Baker, 1992; Beckenbach, 1995; Joinson, 2001; Warner et al., 2011), rapport-building could be beneficial for honest disclosure of any kind of sensitive information. As reviewed by Weisband and Kiesler (1996), anonymous assessments are especially helpful for eliciting information that is illegal (such as crimes like sexual assault) or is largely considered unethical or at least taboo (like risky sexual activity); our work implies that adding the second technique to elicit disclosure of rapport-building would further increase honest reporting of such information. Because virtual humans can build rapport while maintaining anonymity, they could be particularly useful for encouraging these kinds of disclosures.

Future work should investigate the impact of virtual human interviewers on promoting honest disclosure in other such sensitive clinical domains (Rizzo and Koenig, in press). Additionally, virtual human interviewers could be considered as an assessment strategy in other areas (e.g., financial planning) where people may perceive at least *some* stigma, and therefore may be tempted to under-report certain values (such as debt) even though honest information is essential for practitioners to give clients sound advice. Virtual human interviewers might also be useful for gaining honest information that—while not particularly stigmatizing—is still uncomfortable to disclose. For example, in organizational contexts, virtual humans could be helpful in eliciting honest performance evaluations.

# ETHICS STATEMENT

The study was approved by University of Southern California's Institutional Review Board. Prior to participating, every participant consented by signing a written informed consent. No vulnerable populations were involved.

# AUTHOR CONTRIBUTIONS

GL made substantial contributions to the conception and design of the work as well as the analysis or interpretation of data. AR and JG made substantial contributions to the conception and design of the work as well as interpretation of data. SS and GS made substantial contributions to the conception of the work as well as analysis and interpretation of data. JB made substantial contributions to the acquisition, analysis, or interpretation of data. L-PM made substantial contributions to the conception and design of the work.

# FUNDING

This work was supported by DARPA under contract W911NF-04-D-0005 and the US Army. Any opinion, content or information presented does not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred.

# REFERENCES


mental health screening after combat deployment. *Arch. Gen. Psychiatry* 68, 1065–1071. doi:10.1001/archgenpsychiatry.2011.112


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2017 Lucas, Rizzo, Gratch, Scherer, Stratou, Boberg and Morency. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Virtual Reality as a Potential Tool to Face Frailty Challenges

Silvia Serino1, 2, Serena Barello<sup>1</sup> , Francesca Miraglia3, 4, Stefano Triberti <sup>1</sup> and Claudia Repetto<sup>1</sup> \*

*<sup>1</sup> Department of Psychology, Catholic University of the Sacred Heart, Milan, Italy, <sup>2</sup> Applied Technology for Neuropsychology Lab, Istituto Auxologico Italiano, Milan, Italy, <sup>3</sup> Department of Geriatrics, Neuroscience and Orthopedics, Institute of Neurology, Catholic University of the Sacred Heart, Rome, Italy, <sup>4</sup> Brain Connectivity Laboratory, IRCCS San Raffaele Pisana, Rome, Italy*

Keywords: virtual reality, frailty, rehabilitation, executive functions, gait, aging in place, patient engagement

The aging population and the corresponding increase in age-related diseases present scientific community and public health authorities with imminent challenges. One of these challenges deals with a deeper understanding of functional status of elderly in order to prevent and/or delay the onset of late-life disability (Rodríguez-Artalejo and Rodríguez-Mañas, 2014). The syndrome of "frailty" has been recently introduced in literature to specifically characterize the health of older individuals who deserve special attention because of their increased vulnerability to adverse health outcomes (Afilalo et al., 2010). Although there is not a unique definition of frailty (Morley et al., 2013), the majority of studies refers to the five operational criteria (Fried et al., 2001): decreased gait speed, reduced grip strength, prolonged and unmotivated exhaustion, low physical activity, unintended weight loss. The problem of different definitions leads also to a large variation in reported prevalence rates, which range approximately from 5 to 60% (Collard et al., 2012). However, this multifaceted decline in different physiological systems make frail older individuals progressively more exposed to stressors (Clegg et al., 2013), making urgent the need for better care interventions.

In parallel, some authors suggested to introduce also the phenotype of "cognitive frailty" to refer to older individuals who manifest a concurrent weakness in both physical and cognitive domains (Kelaiditi et al., 2013). Other than the presence of physical frailty, Kelaiditi and colleagues (Kelaiditi et al., 2013) proposed that the key criteria of cognitive frailty is the presence of mild cognitive impairment, in the absence of dementia. A very recent large study (Delrieu et al., 2016), involving 1.617 participants, clarified that cognitive frailty individuals showed a specific weakness in executive domain (i.e., a wide range of high-level cognitive abilities including problem-solving, planning, monitoring). These findings were in line with studies suggesting a crucial link between an early cognitive decline in frontal areas and gait deficits (Montero-Odasso et al., 2011), since they share common brain networks.

Different treatment approaches have been investigated in clinical trials for reducing the functional decline of frail individuals: exercise interventions (Forster et al., 2009), nutritional programs (Fiatarone et al., 1994), and integrated approaches (Looman et al., 2016). Three main issues emerged so far: the need for a more precise identification of markers of frailty; a call for innovative therapeutic strategies; the importance to develop personalized and integrated care models and intervention approaches aimed at improving independence, preferably delivered in their home setting (de Vos et al., 2012). In this context, we suggest that Virtual Reality can be an innovative tool potentially able to address the aforementioned issues.

#### Edited by:

*Albert Rizzo, USC Institute for Creativie Technologies, United States*

#### Reviewed by:

*Pascual Gonzalez, Universidad de Castilla-La Mancha, Spain*

> \*Correspondence: *Claudia Repetto claudia.repetto@unicatt.it*

#### Specialty section:

*This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology*

Received: *14 June 2017* Accepted: *24 August 2017* Published: *05 September 2017*

#### Citation:

*Serino S, Barello S, Miraglia F, Triberti S and Repetto C (2017) Virtual Reality as a Potential Tool to Face Frailty Challenges. Front. Psychol. 8:1541. doi: 10.3389/fpsyg.2017.01541*

# VIRTUAL REALITY FOR FRAIL PATIENTS: HOW AND WHY

Virtual Reality (VR) is a combination of technological devices that allows users to navigate into and interact with tridimensional computer-created environments, having the subjective sensation to be there (Sanchez-Vives and Slater, 2005; Triberti and Riva, 2015). In the last decades, VR has been extensively used in neuroscience, and recent studies investigated the role of VR on brain modulation via neuroimaging methods. These studies aimed to characterize brain activity in virtual environments in order to understand the neurophysiologic correlates of the virtual navigation (Pacheco et al., 2017). Neuroimaging evidence suggested that medial temporal lobe structures, the hippocampus in particular, as well as parietal and frontal regions have been implicated in spatial navigation in humans (Iaria et al., 2008). Furthermore, it was observed an elevated theta and gamma power associated with VR navigation and increased theta coherence between right parietal and temporal regions (Cornwell et al., 2008). Theta and gamma oscillatory activity and cortico-hippocampal communication are part of a brain mechanism involved in the information transfer of different spatial representations required for successful navigation (White et al., 2012); parietal lobe is involved in numerous aspects of visuo-spatial cognition; theta and gamma activity in this region likely reflects the mechanism by which medial temporal and parietal brain regions communicate during navigation (Moser et al., 2008).

Therefore, experimental evidence underlined that during navigation in a virtual environment the brain activity is modulated (Weidemann et al., 2009), and paved the way to consider VR a potential tool for diagnosis and rehabilitation of motor and cognitive deficits.

Along this line of reasoning, in the following paragraphs we want to clarify how VR can provide interesting opportunities in order to deal with the challenges prompted by frailty: early symptoms identification, motor and cognitive rehabilitation, home-setting interventions.

As far as motor symptoms are concerned, an increasing number of studies have developed novel paradigms to deeply model the gait characteristics (e.g., Martens et al., 2017), especially for those ones difficult to analyze in clinical setting (i.e., intermittent gait freezing in Parkinson's Disease—PD, or more related to frailty—a decrease in gait speed). For instance, Shine and co-workers developed a VR-based environment for investigating gait features (Shine et al., 2013). In this paradigm, the participants, seated in front of a computer screen, were asked to navigate in a realistic virtual environment (i.e., a corridor) via footpedals. In addition to the "simple command" (such as "WALK" or "STOP" that appeared on the screen), they were trained to more complex stimuli, entailing executive functions: the congruent color-word ("BLUE" written in blue that means "WALK"), or the incongruent color-word ("RED" written in green that means "STOP"). Shine and co-workers (Shine et al., 2013) found that PD patients with freezing of gait had a large frequency of motor arrests in comparison with "non-freezers" PD patients on this task, making it suitable for modeling the gait behavior. However, this dual-task paradigm implemented in VR appears also particularly useful for the evaluation of motor aspects of gait speed and related cognitive deficits. As far as cognitive symptoms are concerned, traditional paperand-pencil tests are not reliable to capture the "complexity" of executive functioning emerging in real-life situations (Shallice and Burgess, 1991; Goldstein, 1996). One attempt to overcome this issue is the development of tests evaluating the executive functioning in real-life scenarios, such as the Multiple Errands Test (a shopping task in a supermarket, Shallice and Burgess, 1991) or the Executive Function Performance Test (simple cooking, telephone use, and medication management, Baum et al., 2008). Given the difficulties in reproducing these tests in real life situations (i.e., time consuming, high economic costs, safety of the patients, poor controllability of experimental conditions), VR technology has been increasingly used for the assessment of executive functions. Indeed, VR permits to develop scenarios reproducing daily-life situations, allowing a secure and ecologically valid assessment of executive functions (Parsons, 2011). For example, Nir-Hadad and co-workers (Nir-Hadad et al., 2017) recently developed and tested in a sample of 19 post-stroke patients a virtual version of the original Four Item Shopping Task, which requires budget management as a functional test of executive functioning. Also the virtual version of the Multiple Errands Test has been developed and tested in different clinical populations (Raspelli et al., 2012; Cipresso et al., 2014).

Second, VR could be a promising tool to enhance neuroplasticity in neurorehabilitation (Ng et al., 2013). The concept underlying VR-based therapy as a treatment for motor and cognitive dysfunction is to improve neuroplasticity of the brain by engaging users in multisensory training. VR-based intervention effectiveness was demonstrated in several chronic stroke patients (Lloréns et al., 2015), in vestibular (Alahmari et al., 2014), in sensori-motor (Fluet and Deutsch, 2013) and cognitive rehabilitation of neurological patients (Slobounov et al., 2015).

A recent systematic review found that VR-based trainings were more effective than conventional therapies in enhancing balance and gait ability in post-stroke patients (de Rooij et al., 2016). The advantages offered by VR over conventional approaches were multiple: within virtual environments, it is possible to develop repetitive and personalized motor training that are enriched by different feedbacks (proprioceptive, visual, auditory) able to maximize motor learning. In particular, the use of VR in combination with haptic devices (i.e., robotic systems able of give users tactile and force feedbacks when interacting with virtual objects) can enhance the environment realism (Hoffman et al., 1998), thus improving the efficacy of a rehabilitation program (Teruel et al., 2015). Moreover, VR-based stimulation can provide frail individuals with engaging and enriching environments, helping them to repeat the exercises harder and longer, thus exploiting the principle of motor learning (Kitago and Krakauer, 2013). Furthermore, although VR is currently in the developmental phase in terms of treatment of frailty (Mugueta-Aguinaga and Garcia-Zapirain, 2017), VR-based rehabilitation protocols have been already tested for training executive functions in other clinical populations (Faria et al., 2016). It is worthy to underline that cognitive training might be particularly demanding for elderly, especially in case of cognitive impairment. As previously explained, VR offers the chance to set-up cognitive exercises within meaningful environments (Riva et al., 2006). Moreover, in virtual environments it is possible to reproduce real-life situation in a safer and more controlled setting: the ecological validity is an important feature in neuropsychological assessment and remediation, even more so for the executive functions trainings.

Finally, as the old and frail population continues to grow, a great deal of attention has been dedicated to find organizational solutions aimed at promoting aging-in-place policies, in order to facilitate individuals in living independently in one's own home as long as possible (Stones and Gullifer, 2016). Indeed, aging-inplace is recognized as a crucial strategy to improve the quality of life of elderly citizens as well as the sustainability of social and welfare systems. Early evidence is supportive of the advantages of structured program to enable aging-in-place as it enhances patient engagement in their own medical and rehabilitation processes (Kim et al., 2017), which is a crucial predictor of patients' quality of life and medication adherence (Barello and Graffigna, 2015). However, aging-in-place requires a reframing of the care models in terms of seamless transitions between hospitals, the welfare system and territory care, along with all other physical and social contexts in an elderly citizens life. However, accomplishing this will require substantial innovation in the incorporation of advanced technologies in the process of care and cure. In this context, VR-based technologies might be a powerful tool to make the aging-in-place imperative a concrete reality (Lange et al., 2010). Indeed, this tool might guarantee elderly people to follow the rehabilitation process directly at home. Moreover, VR has the potential to sustain elderly people active engagement in the medical course due to their high level of customization according the patient's unique expectations and care needs (Graffigna et al., 2014). According to these

# REFERENCES


reflections, aging-in-place using VR-based technologies may be a promising solution for the upcoming aging society. However, the implementation of these solutions at home should consider also the introduction of some specific systems that allow patients' monitoring (e.g., intelligent systems for teletherapy, Rodríguez et al., 2016, or wearable devices for unobtrusive monitoring, Patel et al., 2012) to make VR-based training as controlled as in clinical settings.

Beside the numerous advantages VR offers for facing frailty challenges, potential limitations should also be taken into account. At a basic research level, it should be acknowledged that there are still few evidences about of VR brain modulation effects in several domains: long-term outcomes, direct comparisons between commercial and customized modules, immersive vs. non-immersive VR, and augmented vs. fully virtual systems. At the applicative level, frail older adults could show low degrees of technology acceptance, due to a general diffidence toward the technological devices or to the discomfort elicited by the specific VR set up proposed. These limitations, thought, should not prevent researchers and clinicians from carrying on projects that test the use of VR with elderly and frail patients; on the contrary, these issues should stimulate to move forward in basic and applicative research in order to better exploit in the future the VR capabilities with frail individuals.

# AUTHOR CONTRIBUTIONS

SS, SB, FM, and CR conceived the work. SS, SB, ST, and FM drafted the paper. CR and ST revised critically the work.

# ACKNOWLEDGMENTS

This work was partially supported by the Fondazione Cariplo within the project "Active Aging and Healthy Living."


for physical frailty in very elderly people. N. Engl. J. Med. 330, 1769–1775. doi: 10.1056/NEJM199406233302501


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Serino, Barello, Miraglia, Triberti and Repetto. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Physiological Signal Analysis for Evaluating Flow during Playing of Computer Games of Varying Difficulty

Yu Tian<sup>1</sup> , Yulong Bian<sup>2</sup> , Piguo Han<sup>1</sup> , Peng Wang<sup>1</sup> \*, Fengqiang Gao<sup>1</sup> \* and Yingmin Chen<sup>1</sup> \*

<sup>1</sup> Department of Psychology, Shandong Normal University, Jinan, China, <sup>2</sup> Department of Computer Science and Technology, Shandong University, Jinan, China

#### Edited by:

Albert Rizzo, USC Institute for Creative Technologies, United States

#### Reviewed by:

Pascual Gonzalez, Universidad de Castilla-La Mancha, Spain Anna Pribilova, Slovak University of Technology in Bratislava, Slovakia

#### \*Correspondence:

Peng Wang pengsdnu@163.com Fengqiang Gao gaofengqiang@sdnu.edu.cn Yingmin Chen cc8030306@163.com

#### Specialty section:

This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology

Received: 03 January 2017 Accepted: 19 June 2017 Published: 04 July 2017

#### Citation:

Tian Y, Bian Y, Han P, Wang P, Gao F and Chen Y (2017) Physiological Signal Analysis for Evaluating Flow during Playing of Computer Games of Varying Difficulty. Front. Psychol. 8:1121. doi: 10.3389/fpsyg.2017.01121 Flow is the experience of effortless attention, reduced self-consciousness, and a deep sense of control that typically occurs during the optimal performance of challenging tasks. On the basis of the person–artifact–task model, we selected computer games (tasks) with varying levels of difficulty (difficult, medium, and easy) and shyness (personality) as flow precursors to study the physiological activity of users in a flow state. Cardiac and respiratory activity and mean changes in skin conductance (SC) were measured continuously while the participants (n = 40) played the games. Moreover, the associations between self-reported psychological flow and physiological measures were investigated through a series of repeated-measures analyses. The results showed that the flow experience is related to a faster respiratory rate, deeper respiration, moderate heart rate (HR), moderate HR variability, and moderate SC. The main effect of shyness was non-significant, whereas the interaction of shyness and difficulty influenced the flow experience. These findings are discussed in relation to current models of arousal and valence. The results indicate that the flow state is a state of moderate mental effort that arises through the increased parasympathetic modulation of sympathetic activity.

Keywords: flow experience, shyness, physiological signal, person–artifact–task model, computer game

# INTRODUCTION

Since Csikszentmihalyi's systematic description of flow in his 1975 work Beyond Boredom and Anxiety, this optimal experience has become a crucial concept in related research. He noted that artists become entirely immersed in their projects, working feverishly to complete them, and then lose all interest in their work after completion. In more recent research, this optimal experience has been investigated in various domains such as computer game playing (Rheinberg et al., 2003; Keller and Bless, 2008; Moller et al., 2010; Keller et al., 2011; Bressler and Bodzin, 2013; Peifer et al., 2014; Harmat et al., 2015), musical instrument performance (De Manzano et al., 2010), course learning (Engeser and Rheinberg, 2008; Schaik et al., 2011), mountain hiking (Wöran and Arnberger, 2012), and team performance (Aubé et al., 2014). In addition, studies have suggested that flow has positive effects on psychological well-being (Clarke and Haworth, 1994; Asakawa, 2004, 2009) and quality of life (Csikszentmihalyi, 1990). These observations motivated a scientific investigation of a flow assessment system.

# Mediation of Flow Precursors

fpsyg-08-01121 July 1, 2017 Time: 16:10 # 2

Previous studies have suggested that computer-mediated environments (CMEs) are an appropriate context for assessing flow experience. Such environments are conducive to flow because they facilitate flow-precursor interactions through clear artifact goals, immediate feedback, and task characteristics (Finneran and Zhang, 2003; Mauri et al., 2011). To clarify the causes of the flow experience in CMEs, Finneran and Zhang (2003) developed the person–artifact–task (PAT) model (**Figure 1**) to evaluate constructs and ambiguous flow-related operationalizations. This model uses the following three stages of flow as a framework: flow precursors, flow experience, and flow consequences. The precursors of flow are person (P), artifact (A), and task (T). The flow experience itself is composed of concentration, loss of self-consciousness, time distortion, and telepresence. The consequences of flow are positive affect and an autotelic experience. The PAT model focuses on flow precursors, and the model structure suggests that flow experience is the consequence of interactions among these precursors. To psychometrically assess the viability of this model, Schaik et al. (2011) measured flow experience in an immersive virtual environment for collaborative learning. The results indicated that flow experience is mediated by its precursors.

A challenging task is a crucial precursor in the PAT model (Finneran and Zhang, 2003; Mauri et al., 2011). However, a challenging task can both facilitate and hinder flow (Fullagar et al., 2013). The challenge–skill balance model (**Figure 2**) posits that if the demands of a situation or task exceed a person's skills and coping resources, that person experiences anxiety (Csikszentmihalyi, 1975), which has also been referred to as "stress" (Lazarus et al., 1980); in the present paper, the term "anxiety" is employed. However, if a task is insufficiently challenging, the person experiences boredom or relaxation. Flow occurs when skills and demands are in balance (Csikszentmihalyi, 1975; Lazarus and Folkman, 1984). According to flow theory, boredom and anxiety are the antithesis of flow (Csikszentmihalyi, 1975; Fullagar et al., 2013). The states of low and excessive arousal generated by boredom and anxiety were found to be associated with disintegrated attention rather than focused attention, which is characteristic of flow (Izard, 1977). This indicates that optimal physiological arousal should facilitate flow, whereas low or excessive physiological arousal should hinder it (**Figure 2**; Bressler and Bodzin, 2013; Peifer et al., 2014; Harmat et al., 2015).

Different personal characteristics can lead to substantially different flow experiences for the same activity (Csikszentmihalyi and Csikszentmihalyi, 1988; Asakawa, 2009). Such differences include not only personal skills but also a person's underlying life attitude and personality traits (Csikszentmihalyi and LeFevre, 1989; McCrae and Costa, 1990). Shyness is a personality trait that is characterized by anxiety toward meeting people and derives from a fear of being evaluated and rejected. Shy people tend to have less self-confidence, low self-esteem, and excessive self-focused attention and self-consciousness (Pilkonis, 1977). By contrast, flow experiences are characterized by undivided attention to a limited stimulus field and an absence of selfreferential thoughts (Csikszentmihalyi and LeFevre, 1989). Accordingly, when shy people interact with a CME, they are likely to assess themselves regularly to ensure that they have performed adequately. Consequently, they cannot concentrate on a task and are likely to focus on their physical environment instead. In addition, the balance of perceived challenges and skills has a central role in facilitating optimal experience, enabling a person to meet an increase in demand with a sustained level of efficacy but without an increase in mental effort (Fullagar et al., 2013). In a CME, if the perceived difficulty of a challenge is increased, the participant's focus and information-processing speed should also increase (e.g., when developing a search strategy or using hardware and software). By contrast, excessively self-focused attention may cause shy participants to ignore information crucial to the completion of a task, to require additional effort to maintain their challenge–skill balance, and they may thus experience higher levels of anxiety. Therefore, the anxiety caused by the difficulty of the perceived challenge causes them to feel disconnected from their environment and incapable of experiencing flow. Thus, the interaction of shyness and the consequent difficulty of the task may constitute a predictor of flow experience. However, this has yet to be tested empirically. Hence, the following research hypotheses were proposed:


# Flow and Physiological Signals

Previous studies have failed to ascertain the features of the physiological responses of the flow state and have only used interviews and questionnaires, which are inherently retrospective (Tezuka et al., 2014). Crucially, flow occurs during an activity in which a person is fully immersed and self-referential thoughts are completely inhibited. When participants are asked to recount their experiences in interviews and questionnaires, they have already left the flow state and have begun self-reflection. Thus, the self-report method is subjective and performed after the activity. A solution to this problem is to adopt physiological flow indicators that are objective and can be measured during the activity without interrupting the participant (Engeser, 2012).

Researchers are becoming increasingly more focused on psychophysiological investigations of flow, with the objective of identifying physiological indicators for the development of a multidimensional psychophysiological evaluation system. Many studies have suggested that the components of flow are mediated by positive valence and high arousal (e.g., Nacke and Lindley, 2009; De Manzano et al., 2010; Mauri et al., 2011; Bressler and Bodzin, 2013; Peifer et al., 2014; Harmat et al., 2015). Furthermore, the flow state is experienced by people when they are deeply and actively involved in a task, in particular game playing, which involves performing at peak ability and applying

high levels of concentration, thus indicating a state of heightened arousal. Flow's link to affect and arousal necessarily entails the flow experience having valenced content. Several components of the flow experience have been found to be dependent on positive valence and high arousal. Arousal stimuli can modulate attentional processes (i.e., concentration) (Brosch et al., 2008; Jefferies et al., 2008; Sheth and Pham, 2008); positive valence generally reduces self-awareness (Roy et al., 2008); and sense of time is altered such that positively valenced, highly arousing stimuli are perceived as being of shorter duration and are reproduced at a faster tempo than negative, less-arousing stimuli (Droit-Volet and Meck, 2007; Noulhiane et al., 2007). Thus, positive valence and high arousal are two potential physiological indicators of flow experience.

Cardiac and respiratory activity, and average changes of skin conductance (SC), are significant physiological predictors of flow experience (e.g., Uijtdehaage and Thayer, 2000; Nacke and Lindley, 2009; Mauri et al., 2011; Bressler and Bodzin, 2013; Peifer et al., 2014; Harmat et al., 2015). Typically, respiration (RSP) is rapid and shallow and has an increased minute volume (De Manzano et al., 2010), cardiovascular measures show an increase in heart rate (HR) (Uijtdehaage and Thayer, 2000; De Manzano et al., 2010), and SC (i.e., the ability of the skin to conduct an electrical current) is increased (Nacke and Lindley, 2009; Bruya, 2010; Mauri et al., 2011). These observations have been consistently associated with positive valence and high arousal. However, it should be noted that lower HR variability (HRV) during working memory-and attention-demanding tasks can be an indicator of lower mental effort, which relates to the effortless

attention experienced during the flow state (Bruya, 2010; De Manzano et al., 2010; Keller et al., 2011).

We investigated the relationship between physiological signals and flow experience by examining the activation of the sympathetic and parasympathetic nervous systems. The sympathetic nervous system is the counterpart of the parasympathetic nervous system and upregulates physiological arousal (Porges, 1995). Both systems can be active simultaneously and influence the arousal process independently (Berntson et al., 1991). The interaction pattern of sympathetic and parasympathetic activation can be reciprocal, positively related (coactivation or coinhibition), or uncoupled (Berntson et al., 1991). The different possibilities of sympathetic and parasympathetic interaction provide the autonomic response with higher flexibility and precision for meeting anticipated or realized environmental challenges (Berntson et al., 1991; Thayer and Lane, 2000). Previous studies have reported that increased HR is associated with sympathetic activation (Porges, 1995) and is usually accompanied by low HRV, demonstrating variability in the length of the cardiac interbeat interval (Porges, 1995; Lehrer, 2003). Again, an elevated respiratory rate (RR) indicates increased sympathetic activity, which is indicative of high arousal; however, increased respiratory depth (RD) is an effective indicator of a more relaxed state and may reflect increased parasympathetic activity (Wientjes, 1992; Harmat et al., 2015). In addition, increased SC is associated with high task demand and is also indicative of sympathetic activation (Lacey et al., 1963; Hugdahl, 1995; Siddle et al., 1996; Dawson et al., 2007; Bruya, 2010).

We combined flow experience with physiological measurements in a CME. In this environment, participants engage in active cognitive processing (as opposed to simply recalling or memorizing information) and focus on relevant incoming information; they also engage in developing a performance strategy and using skills to complete tasks. Although physiological measurements have been demonstrated to be an effective predictor of flow experience, consistent physiological assessment systems for studying flow in the context of CMEs are lacking. For example, Harmat et al. (2015) found that higher flow was associated with lower levels of low-frequency

HRV, whereas Peifer et al. (2014) reported that lower levels of low-frequency HRV were associated with lower levels of flow. Therefore, further research is required to clarify the relationship between flow experience and physiological signals. Moreover, CMEs are a crucial area requiring further exploration because of their excellent potential for promoting Web-based learning (Mauri et al., 2011), communication (Zhou, 2013), and game playing (Rheinberg et al., 2003; Keller and Bless, 2008; Moller et al., 2010; Keller et al., 2011; Bressler and Bodzin, 2013; Peifer et al., 2014; Harmat et al., 2015). Flow studies have typically centered on the balance of challenges and skills (De Manzano et al., 2010; Bressler and Bodzin, 2013; Harmat et al., 2015). However, to elucidate the causes of flow experience, we focused on the interaction of shyness and difficulty. Hence, the following hypotheses were proposed:


# The Present Study

A CME entailing a complex and demanding computer game (Blocmania 3D) that has varying difficulty levels (difficult, medium, and easy) was used as an experimental task for inducing flow according to the PAT model. During game playing, the physiological signals of the participants (shy and non-shy) were recorded continuously. The research had two aims: (1) to ascertain which physiological signals contribute to flow experience in CMEs, and (2) to ascertain whether flow experience is influenced by its precursors.

# MATERIALS AND METHODS

# Method

The present study was conducted in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards, with the approval of the Human Research Ethics Committee of Shandong Normal University.

# Participants

A total of 350 undergraduates in China were assessed according to the College Students Shyness Scale (Wang et al., 2009). The 20 highest-scoring undergraduates were designated "shy" and the 20 lowest-scoring students were designated "non-shy." All 40 were recruited to participate in the experiment voluntarily. The tested students were aged 17–24 years (M = 19.06 ± 2.14, 27 females). They were all healthy and were requested not to smoke or drink alcohol for 48 h before the experiment because this would affect the central autonomic nervous system. Written informed consent was obtained from all adult participants and from the legal guardians of all non-adult participants. All participants were informed that they had the right to withdraw from the study at any time. After completing the experiment, they were given three small gifts.

# Experimental Materials and Task

For the experiment, a popular computer game, Blocmania 3D, was employed. The original source code was retrieved from http://www.verycd.com. The game is essentially a 3D version of Tetris, the idea for which was created by Golomb (1994) and entails geometrical objects called "tetrominoes," consisting of four squares that are joined edge-to-edge in different configurations, falling from the top of the computer screen vertically, one at a time. While a tetromino falls, the player can rotate it and move it sideways by using the up and down arrow keys on a computer keyboard. The task is to fit the pieces together and create complete horizontal rows of squares, which disappear when completed and earn the player points. In the present implementation of the game, the speed at which the pieces fall can be varied by 2 s and corresponds to the difficulty levels. We controlled the differences in difficulty among the experimental conditions by creating an additional fixed-effect task demand as a continuous variable; thus, according to the starting speed, each condition was coded as 3, 2, or 1 (difficult, medium, and easy, respectively).

To ensure variation in psychological flow during the experiment, the participants played three trials of Blocmania 3D, each with a different difficulty level. A preliminary experiment was conducted to ensure the difficulty of each condition matched the intended level of difficulty. Moreover, to provide a criterion for evaluating gameplay difficulty, the participants completed one questionnaire item, which was assessed using a 9-point Likert scale ranging from 1 (extremely easy) to 9 (extremely difficult). Analysis of the differences in perceived difficulty among the experimental conditions revealed significant differences [F(2,36) = 5.41, p < 0.01, η 2 <sup>p</sup> = 0.23]. Pairwise comparisons indicated that the difficulty of the medium sessions (M = 5.41, SEM = 0.82) was significantly higher than that of the easy sessions [M = 3.12, SEM = 0.17; t(18) = 4.38, p < 0.01] and lower than that of the difficult sessions [M = 8.79, SEM = 0.53; t(18) = 6.31, p < 0.01].

# Design

In accordance with Finneran and Zhang's (2003) elucidation of flow theory in their PAT model, the independent variables employed in the present study were the precursors of flow experience, namely shyness and task difficulty, and the dependent variables were the physiological signals and self-reports of flow experience. In this study, a two-factor mixed-design was employed. Specifically, the experiment involved three levels of within-subject task difficulty × two levels of between-subject shyness (shy and non-shy).

# Physiological Measures and Instruments

A Biopac MP 150 System (Biopac Systems Inc., Santa Barbara, CA, United States) using AcqKnowledge 4.3 was applied to continuously record the participants' physiological signals while they were in a flow state. A sampling rate of 1000 Hz was used for all channels. For all measurements, we used the BioNomadix system as the standard filter setting.

### Cardiovascular Activity

fpsyg-08-01121 July 1, 2017 Time: 16:10 # 5

Cardiovascular activity was recorded by applying bipolar EL 504 Cloth Base Electrodes to the left and right sides of the participants' chests with a 3 × 30-cm Electro Lead (BN-EL30-LEAD3) connected to a Biopac BioNomadix RSP and electrocardiogram (ECG) amplifier. The participants' skin was cleaned using a low-alcohol detergent to minimize impedance. Generally, the quality of the recorded data was high and minimal interference was caused by movement. The recorded ECG data were imported into AcqKnowledge 4.3 to calculate the HR and associated HRV. HR was estimated from the cardiac interbeat intervals. An algorithm based on wavelet transforms was used to obtain the cardiac interbeat intervals from the ECG data. HRV refers to changes in the cardiac interbeat intervals over time. It is induced by autonomic activity, and the power spectral components of cardiac interbeat intervals provide a measurement of sympathetic and parasympathetic activity (Porges, 1995; Lehrer, 2003).

## Respiration Activity

Thoracic RSP was measured using a piezoelectric respiratory belt transducer (MLT1133, AD Instruments) with an output range of 20–400 mV and a sensitivity of 4.5 ± 1 mV/mm. The belt was attached around the chest below the nipple line (or below the breast for women). Furthermore, RSP activity mainly entails RD and the interval of thoracic RR. We measured RR in beats per minute (bpm) and RD from peak to peak.

# Skin Conductance

Skin conductance was recorded using a BioNomadix Electrodermal Activity Transducer connected to a BioNomadix two-channel electrodermal activity amplifier. SC was recorded using two 30-mm unpolarizable round electrodes (Clark Electromedical Instruments) placed on the middle phalanx of the index and third digits of the non-dominant hand and secured with adhesive tape. Resistance was measured with a 15-µA direct current. Additionally, the participants' skin was cleaned using a low-alcohol detergent to minimize impedance (Rachow et al., 2011).

# Flow Experience Measures

Flow experience was measured using the Flow Short Scale to evaluate the physiological signal measures. The scale has been shown to be a reliable measuring instrument (Engeser and Rheinberg, 2008; Engeser, 2012). The scale comprises 10 items (e.g., "I do not notice time passing" and "I have no difficulty concentrating") and a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree). The reliability of the scale is high (Cronbach's α = 0.90). Additionally, the results of a confirmatory factor analysis, which was performed using Mplus 7.0, were as follows: χ 2 /df = 2.46, p < 0.001; comparative fit index (CFI) = 0.97; non-normed fit index (NNFI) = 0.96; root mean square error of approximation (RMSEA) = 0.04; standardized root mean square residual (SRMR) = 0.05; and all factor loadings for indicators measuring the same construct were statistically significant. The participants were asked to complete the scale immediately after completing each level.

# Experimental Procedure

The participants were asked not to consume alcoholic beverages or cigarettes for 48 h before the experiment. All experiments were conducted between 2 and 6 PM to minimize the effect of circadian variations on the physiological signals. The participants were tested individually. After arriving at the laboratory, they were briefed on the experimental procedures and fitted with the electrodes. During the experiment, a 4-min baseline measure was recorded for each participant. The participants were exposed to each experimental condition for 6 min. The Flow Short Scale was completed after each exposure. The Latin square experimental design was adopted to counterbalance the sequence effects, and the entire experimental session was completed in approximately 45 min. Additionally, to provide a criterion for evaluating whether the each participants was fond of the game, the participants completed one item after gameplay, which was assessed using a 7-point Likert scale ranging from 1 (extremely unenjoyable) to 7 (extremely enjoyable). An independent samples t test indicated no difference in self-reported scores between the shy and non-shy participants [t(38) = 0.24, p = 0.82].

# Statistical Analysis

All data were stored on a disk after each trial and then analyzed offline. All participants performed the cover task accurately (i.e., all of them demonstrated competence at playing Tetris); therefore, none of them were excluded from the analysis. To investigate the hypotheses, the results were analyzed using following strategies:


Thus, repeated measures analysis of variance (ANOVA) was applied to determine the effects of shyness and difficulty on selfreported flow and physiological flow. SPSS version 19.0 was used for the analysis. Moreover, mixed ANOVA for repeated measures

was applied to determine the interaction effects between shyness and difficulty on self-reported flow and physiological flow.

# RESULTS

**Table 1** provides the descriptive statistics for both the psychological and physiological variables.

The repeated measures ANOVA results showed that flow experience (self-reported) differed significantly among the three task difficulty levels [F(2,36) = 5.87, p < 0.01, η 2 <sup>p</sup> = 0.25]. Pairwise comparisons indicated that the frequency of flow experience was significantly higher for the medium sessions (M = 54.53, SEM = 1.69) than for the easy sessions [M = 49.05, SEM = 1.71; t(18) = 4.56, p < 0.001] and difficult sessions [M = 49.78, SEM = 1.72; t(18) = 4.12, p < 0.001]. Thus, Hypothesis 1 was confirmed, indicating that experiencing a medium level of task difficulty contributes to flow. These results show that physiological activity induced by medium-difficulty sessions contributes to flow.

To investigate Hypothesis 4, we tested the relationships between cardiovascular activity, RR, SC, and flow experience for the three levels of task difficulty. The repeated measures ANOVA revealed significant differences in HR [F(2,36) = 10.59, p < 0.01, η 2 <sup>p</sup> = 0.31], HRV [F(2,36) = 5.33, p < 0.05, η 2 <sup>p</sup> = 0.23], RR [F(2,36) = 10.64, p < 0.01, η 2 <sup>p</sup> = 0.37], RD [F(2,36) = 4.62, p < 0.05, η 2 <sup>p</sup> = 0.21], and SC [F(2,36) = 13.62, p < 0.01, η 2 <sup>p</sup> = 0.46] among the three levels of task difficulty. Pairwise comparisons indicated that the HR of medium sessions (M = 78.52, SEM = 2.21) was significantly higher than that of the easy sessions [M = 76.26, SEM = 2.24, t(18) = 3.26, p < 0.01] and significantly lower than that of the difficult sessions [M = 79.27, SEM = 2.42; t(18) = −2.74, p < 0.05]. The HRV of the medium sessions (M = 0.06, SEM = 0.01) was significantly lower than that of the easy sessions [M = 0.10, SEM = 0.17; t(18) = −2.38, p < 0.05] and higher than that of the difficult sessions [M = 0.03, SEM = 0.01; t(18) = 2.31, p < 0.05]. The RR of the medium sessions (M = 21.31, SEM = 1.24) was significantly faster than that of the easy sessions [M = 19.79, SEM = 1.00; t(18) = 3.26, p < 0.01], but no difference in RR was observed between the medium and difficult sessions [M = 22.34, SEM = 1.16; t(18) = 1.94 p = 0.64]. RD during the medium sessions (M = 0.34, SEM = 0.04) was significantly deeper than that during the easy sessions [M = 0.28, SEM = 0.04; t(18) = 2.15, p < 0.05] and difficult sessions [M = 0.26, SEM = 0.03; t(18) = 3.12, p < 0.01]. The CS during the medium sessions (M = 8.25, SEM = 1.33) was significantly higher than that during the easy sessions [M = 5.91, SEM = 0.95; t(18) = 3.69, p < 0.01] and lower than that during the difficult sessions [M = 9.38, SEM = 1.50; t(18) = −2.64, p < 0.05]. Thus, Hypotheses 4(iii) and 4(iv) were confirmed; whereas, flow experience was associated with moderate HR, HRV, and SC.

To investigate Hypothesis 2 and 5, we tested the effect of shyness (shy and non-shy) on physiological flow experience and self-reported flow experience. Repeated measures ANOVA showed no significant effects of shyness in predicting both physiological flow experience and self-reported flow experience.

To investigate Hypothesis 6, a series of mixed ANOVAs was conducted to examine the interactions between shyness and difficulty in predicting flow experience (physiological activity). Crucially, we observed a significant two-way interaction between shyness and difficulty in predicting HR [F(2,34) = 4.56, p < 0.05, η 2 <sup>p</sup> = 0.23]. Simple main effect analysis revealed that the shy participants had a higher HR than the non-shy participants under the medium [M = 78.78, SEM = 3.14 vs. M = 77.53, SEM = 3.10; F(2,17) = 4.31, p < 0.01] and difficult conditions [M = 79.29, SEM = 3.12 vs. M = 77.84, SEM = 3.23; F(2,17) = 3.31, p < 0.05]; for the easy condition, the effect was non-significant [M = 75.40, SEM = 3.58 vs. M = 75.91, SEM = 3.14; F(2,17) = −1.31, p = 0.87] (**Figure 3A**). Additionally, we also found a significant two-way interaction between shyness and difficulty [F(2,34) = 5.12, p < 0.001, η 2 <sup>p</sup> = 0.28] in predicting the self-reported flow experience (Hypothesis 3). Simple main effect analysis revealed that the self-reported flow experience of the non-shy participants was higher that of the shy participants under the medium condition [M = 56.78, SEM = 2.14 vs. M = 52.38, SEM = 2.58; F(2,17) = 4.31, p < 0.01]; the effects were non-significant under the easy condition [M = 49.59, SEM = 2.78

TABLE 1 | Descriptive statistics for self-reported flow experience and physiological variables for the three experimental conditions.


Values are means with standard deviations for the shy (n = 20) and non-shy (n = 20) participants. n, group size; HR, heart rate; HRV, heart rate variability; RR, thoracic respiratory rate; RD, thoracic respiratory depth; SC, skin conductance.

vs. M = 48.57, SEM = 2.63; F(2,17) = 1.43, p = 0.69] and difficult condition [M = 50.05, SEM = 3.12 vs. M = 52.38, SEM = 2.74; F(2,17) = 1.51, p = 0.43] (**Figure 3B**).

# DISCUSSION

# Flow and Physiological Signals

On the basis of the TAP model, we investigated the physiological activity of participants in a flow state while they played the game Blocmania 3D. Repeated measures ANOVA results show that self-reported data support the experimental manipulation (i.e., the speed of falling tetrominoes) under the different experimental conditions, and pairwise comparisons demonstrated that the optimal flow experience occurred during the medium session. Furthermore, pairwise comparisons of the physiological measurements demonstrate that moderate HR, moderate HRV, increased RR, moderate RD, and moderate SC were evident when the participants were in the optimal flow state.

During game playing, participants experience feelings of enjoyment that indicate a state of heightened arousal and positive affect (Harmat et al., 2015). This may be because more key nutrients are consequently required; therefore, HR increases to meet the metabolic demand and increased cardiac output. Furthermore, a rapid RR increases the efficiency of oxygenation (oxygen in the lungs has maximal access to the heart). Thus, flow experience may be associated with increased metabolism related to sympathetic nervous activity. However, the higher RD during high flow indicates a more relaxed state and elevated parasympathetic activity; thus, a non-reciprocal increase of activity in both branches of the autonomic nervous system may indicate increased parasympathetic modulation of sympathetic activity. This combination of cardiorespiratory patterns agrees with the findings of previous studies (De Manzano et al., 2010; Mauri et al., 2011; Peifer et al., 2014; Harmat et al., 2015). However, this does not explain why the HRV changes did not occur in the expected direction. HRV may be used as an indicator for mental effort in flow (Bruya, 2010; De Manzano et al., 2010; Keller et al., 2011). Relatively high HRV values (compared with HRV levels in a relaxed and stressful state) indicate that increasing task difficulty could result in greater mental effort and that flow experience is a state of moderate mental effort.

Moderate sympathetic activity during flow was indicated by moderate SC values relative to that under a relaxed or stressful state; however, comparably significant results were not observed for HR or RR. SC appears to be a physiologically sensitive indicator of sympathetic activity. Thus, SC patterns provide a specific signature of the state of flow, which is associated with moderate sympathetic activity. Additionally, because mental effort is positively associated with sympathetic activity (De Manzano et al., 2010; Harmat et al., 2015), the moderate SC values, which are related to moderate sympathetic activity, also indicate that flow experience is a state of moderate mental effort.

# Mediation of Flow Precursors

The physiological assessment system for studying flow experience in the context of CMEs exhibited favorable psychometric properties with respect to our first aim. This assessment system was used to test the second aim, namely whether flow experience is moderated by its precursors. Mixed ANOVA results demonstrate that the interaction of shyness and task difficulty can predict HR, and the simple main effect analysis results show that the HR of the shy participants was higher than that of the non-shy participants in the medium and difficult sessions. However, flow experience was related to moderate HR, indicating that the shy participants may have had a lower flow experience than the non-shy participants in the medium session. Additionally, the mixed ANOVA results for self-reported flow also show that the flow experience of the shy participants was lower than that of the non-shy participants in the medium session, and this result further verified the physiological flow (moderate HR). Thus, Aim 2 was verified, and the PAT model was found to be successful in measuring flow in CME.

The challenge–skill balance indicates that if a person's skills are too great for a given task, that person will experience boredom; by contrasting, if a task is too challenging, the person will experience anxiety (Csikszentmihalyi, 1975). In Blocmania 3D, the participants had different experiences as the difficulty level increased. At the easy level, the speed

of the falling pieces was slower and the participants could easily create complete horizontal rows of squares; consequently, they experienced boredom. By contrast, high levels of difficulty lead to anxiety. However, at the medium level, which was the optimal level of challenge, completing the tasks elicited feelings of competence (Deci and Ryan, 2000). Perceived competence tends to occur naturally in players when they are in a flow state.

As the speed of the falling pieces increased, we expected that the participants would be more focused on the task and process information quicker; this required them to use the keyboard to rotate squares and develop a strategy for creating complete horizontal rows of squares. However, too much selffocused attention may have caused the shy participants to ignore critical information required for developing such a strategy, and they may have required additional mental effort to maintain the challenge–skill balance. Furthermore, significant differences were observed between the medium and difficult levels but not for the easy level. This could be because at this level, minimal mental effort was required and thus both the shy and non-shy participants experienced boredom rather than flow. However, for the medium and difficult levels, the shy participants applied more mental effort; however, the flow experience is associated with moderate mental effort. Accordingly, the flow experience for shy people might be low at the medium and difficult levels.

In addition, shyness has been found to negatively influence the frequency of flow experience (Hirao et al., 2012). This is because flow is experienced when a high-level challenge is balanced with high-level skills and intrinsic rewards are generated, and it is negatively associated with subjective anxiety. However, shy people tend to have low confidence. Therefore, the authors of that study contended that it is possible that a lack of confidence can prevent shy people from reaching the flow state when faced with a highly challenging task. Accordingly, in the case of shy people, to reach the flow state, their ability to complete a task must be improved or the level of a challenge must be reduced.

Physiologically, HR was successful in measuring the mediation of flow precursors in CME, which is associated with sympathetic activity. During gameplay, the HR of the shy participants was higher than that of the non-shy participants under the medium and difficult conditions. This suggests that the shy participants required more oxygen to meet their metabolic demand and that increased mental effort consumed more oxygen and nutrients. By contrast, no significant difference was observed for RR,

# REFERENCES


which is another significant indicator of sympathetic activity. This discrepancy could be attributable to RR being self-regulated subjectively and shy players possibly adjust their behaviors regularly to maintain a moderate RR, whereas HR is impossible to self-regulate and is thus objective. Therefore, flow experience is mediated by its precursors in CMEs (Guo and Poole, 2009; Mauri et al., 2011).

# Limitations and Future Directions

First, a conceivable limitation of the present study was that the level of difficulty was not adapted to the skill level of each participant individually. Despite the results of self-reporting indicating a significant difference among the three experimental difficulties, matching the speed to each participant's skill is difficult. Future studies should match the challenge to each participant's skill for participants to experience the optimal flow experience. Second, this study examined only the effect of person (P) and task (T) on flow experience in a CME context; future studies should consider investigating the influence of all factors (person, artifact, and task) of the PAT model on flow experience. In addition, the contributions of different physiological indicators require exploration, because different indicators may have different weights in predicting flow experience, such as event-related potentials and electroencephalograph signals, which could reveal brain activity.

# CONCLUSION

First, the results show that moderate HR, moderate HRV, increased RR, moderate RD, and moderate SC are related to flow experience. Second, flow experience during gameplay is the result of increased parasympathetic modulation of sympathetic activity. Third, flow experience is influenced by its precursors, and HR is an effective indicator for measuring the interaction of flow precursors in a CME context.

# AUTHOR CONTRIBUTIONS

YT contributed to writing, data analysis, and polishing the manuscript. PH conducted experiments and data analysis. YB, PW, FG, and YC also contributed in polishing the manuscript.


Zhou, T. (2013). The effect of flow experience on user adoption of mobile TV. Behav. Inf. Technol. 32, 263–272. doi: 10.1080/0144929X.2011.650711

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Tian, Bian, Han, Wang, Gao and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Model of Illusions and Virtual Reality

Mar Gonzalez-Franco\* and Jaron Lanier

*Microsoft Research, Redmond, WA, United States*

In Virtual Reality (VR) it is possible to induce illusions in which users report and behave as if they have entered into altered situations and identities. The effect can be robust enough for participants to respond "realistically," meaning behaviors are altered as if subjects had been exposed to the scenarios in reality. The circumstances in which such VR illusions take place were first introduced in the 80's. Since then, rigorous empirical evidence has explored a wide set of illusory experiences in VR. Here, we compile this research and propose a neuroscientific model explaining the underlying perceptual and cognitive mechanisms that enable illusions in VR. Furthermore, we describe the minimum instrumentation requirements to support illusory experiences in VR, and discuss the importance and shortcomings of the generic model.

Keywords: virtual reality, embodiment, perception, cognition, avatars

#### Edited by:

*Albert Rizzo, USC Institute for Creative Technologies, United States*

#### Reviewed by:

*Farhan Mohamed, Universiti Teknologi Malaysia, Malaysia Marco Fyfe Pietro Gillies, Goldsmiths, University of London, United Kingdom*

> \*Correspondence: *Mar Gonzalez-Franco margon@microsoft.com*

#### Specialty section:

*This article was submitted to Human-Media Interaction, a section of the journal Frontiers in Psychology*

Received: *14 April 2017* Accepted: *19 June 2017* Published: *30 June 2017*

#### Citation:

*Gonzalez-Franco M and Lanier J (2017) Model of Illusions and Virtual Reality. Front. Psychol. 8:1125. doi: 10.3389/fpsyg.2017.01125*

INTRODUCTION

As it is the case with many technologies, the beginnings of VR are closely linked to industry and startups. It is the manufacturing of devices that popularizes the technologies, making it available for others. In that regard, despite the initial concept of VR was formulated in the 1960s by Dr. Ivan Sutherland, it wasn't until later that the first devices became available. One of the authors (Lanier) lead the team that implemented the first experiences with avatars and social virtual reality (VR) (Lanier et al., 1988; Blanchard et al., 1990). This work occurred in the context of a 1980s technology startup (VPL Research), and while results were reported in the popular press (Lanier, 2001) and anecdotally, the context was not one in which rigorous experiments were undertaken, nor was research peer reviewed (Lanier, 1990). VPL Research provided initial VR instrumentation for many laboratories and pioneered a school of thought that described some of the many possibilities of avatars and VR for social and somatic interactions (Blanchard et al., 1990). Meanwhile, in the intervening decades, the original hypotheses have been refined and empirically formalized by the scientific community (Blascovich et al., 2002; Tarr and Warren, 2002; Sanchez-Vives and Slater, 2005; Yee et al., 2009; Bohil et al., 2011; Fox et al., 2012). Reflecting on this body of research, we can gain a general understanding of illusions that take place in VR. In this paper, we not only review a broad range of VR illusions, but also propose a comprehensive neuro-perceptual model to describe them.

Our proposal integrates and explains a wide variety of VR illusions that have been formally investigated through a combination of three classes of processes borrowed from established neuroscience models: bottom-up multisensory processing (Calvert et al., 2004; Blanke, 2012), sensorimotor self-awareness frameworks (Gallagher, 2000), and top-down prediction manipulations (Haggard et al., 2002). Using this model, we can understand the perceptual and cognitive mechanisms that trigger the great majority of illusions in the literature of VR.

**279**

# ILLUSIONS ENABLED BY VIRTUAL REALITY

While VR instrumentation varies, it always includes sensors to track and measure a set of the person's body motions, such as the motion of the head, and often a great deal more about the person's physiological state, including pose, force, metabolic, or interoceptive factors, and so on, as well as an equally variable set of actuation and display devices. VR could, at a hypothetical extreme, measure anything in the human body, and present a stimulus for any sensory modality of the human body. VR sensors are typically paired with VR displays or actuators. For instance, if a display device addresses a sensory modality located in the human head, such as the eyes or ears, then head tracking becomes relevant.

When these sensor-coupled stimuli match the brain's expectations of what the next moment will bring, then the brain will tend to treat the simulated reality as real, which in turn will engage additional neural mechanisms to further the veracity of the illusion. Indeed, the everyday perception of physical reality relies on a low-level, continuous calibration of raw data from biological sensors, which might be thought of as mild, continuous hallucinations, or imperfect implicit neural hypotheses of what to expect from the real world. These are constantly corrected based on new input to enhance the perceived veracity of a virtual world (Lecuyer, 2017).

The popular literature of the 1980s described a "conversion moment"—that took place a second or two after a user donned a headset—when a VR user stopped responding to the physical environment, and started to experience the virtual world as effectively real. It is possible that this sense of a slightly delayed conversion moment was more noticeable with the cruder equipment of that period. It continues to be the case that there is a transition during which a user shifts awareness and behavioral responses to the virtual world instead of the physical. This is not unexpected since other types of multisensorial illusions that do not require VR, such as the Rubber Hand Illusion, also take time to elicit (Botvinick and Cohen, 1998).

The effect has been compared—in popular culture—to a hallucinogenic drug experience. However, illusory states in VR don't directly alter higher cognitive functions, as happens when chemically stimulating the brain with hallucinogenic drugs. Nonetheless, VR users can feel that they have been transported to a new location (place illusion), that the events happening are real (plausibility illusion) (Sanchez-Vives and Slater, 2005), and even that their bodies have been substituted by an avatar (embodiment illusion) (Spanlang et al., 2014).

Indeed, it is because VR illusions are driven by the neurological mechanisms of everyday perception of the body in the world that participants often exhibit realistic responses to VR (Slater, 2009). For instance, participants prefer to take a longer path on (simulated) solid ground rather than walking over the famous illusion of a virtual pit (Meehan et al., 2002).The responses to the virtual pit are so realistic that increases in heart and respiratory rate are registered when approaching the void.

Human cognition is highly attuned to other people in the physical environment and this remains so in virtual environments. The study of avatars in VR is therefore central to the understanding of cognition and behavior in VR.

Participants not only respond realistically to the environment, but also behave genuinely when interacting with avatars. Despite the challenges of the uncanny valley, avatars are processed in the brain like people, and humans are able to recognize differential familiarity levels on avatar faces (Bailenson et al., 2006; Gonzalez-Franco et al., 2016). Hence, social norms, such as interpersonal distance, are kept when interacting with avatars (Bailenson et al., 2003; Sanz et al., 2015). In the same way, more complex social behaviors are also reproduced inside VR: shy males show higher anxiety when interacting with a virtual female than confident males (Pan et al., 2012), or self-similar avatars (Aymerich-Franch et al., 2014). And, people immersed as bystanders during violent incidents in VR are likely to intervene following realistic behavioral patterns (Rovira et al., 2009).

The full-body illusion is a phenomenon unique to VR (Lanier et al., 1988). It takes place when participants feel they inhabit a virtual body (Heydrich et al., 2013). This experience can be induced by presenting a virtual body co-located to the participant's body (Maselli, 2015).

The effect can be enhanced by presenting a VR mirror to the participants in which they can see their virtual body moving as they move from a first person perspective (Gonzalez-Franco et al., 2010), but also through passive visuo-tactile multisensory stimulation (Kokkinara and Slater, 2014). A virtual body (AKA an avatar body) enhances the exploration and interaction capabilities of VR in an ergo-centered fashion. Participants not only gain a visual representation so that they can socialize, but also have access to a wider set of methods of interacting with the simulated world.

Interestingly, changing the design of a virtual body can elicit behavioral changes (Bailenson and Segovia, 2010; Fox et al., 2012). For example, participants altered the way they play music depending on the embodied avatar, being less musical when the avatar was dressed as a business man (Kilteni et al., 2013). Test subjects also modified their behavior during psychological treatment when embodying an avatar representing Sigmund Freud (Osimo et al., 2015).

The link between avatar design and behavior is probably related to pre-conceived stereotypes and mimicry effects. Humans easily interiorize stereotypes associated with their life experiences and what they learn from the environment, producing unconscious biases that influence behavior when exposed to new situations (Bourgeois and Hess, 2008). Those mechanisms mix with non-conscious mimicry during social interactions. Mimicry is well-known to be elicited as an automatic behavior in response to social exclusion and to reduce outgroup effects (Lakin et al., 2008). Indeed, the human desire to fit in and be liked can not only alter personalities, but might be so profound as to alter one's own physiological interoceptive function to reflect an interlocutor during conversation (Durlik and Tsakiris, 2015).

The mimicry effect in VR and its relationship with preconceived stereotypes is well illustrated in the research of Prof. J. Bailenson that investigates how participants assimilate nonverbal gestures and behaviors through imitation in immersive VR (Bailenson and Yee, 2005; Fox et al., 2009). Sometimes this effect can produce positive outcomes, such as increased empathy (Rosenberg et al., 2013), but in other occasions it might lead to self-objectification in a sexualized context (Fox et al., 2013). Through avatar design and virtual scene changes, VR enables the study of non-conscious mimicry and personality altering effects with a reduction of unknown environmental variables. For instance, an improvement of negotiation skills has been observed when a subject is embodied in a taller avatar (Yee and Bailenson, 2007). More mature financial decisions were evoked when subjects inhabited avatars that approximated aged versions of themselves (Hershfield et al., 2011).

Aside from behavioral changes, subjects can also accept substantial structural transformations to their virtual bodies, even temporarily altering self-body perception (Normand et al., 2011).

This effect was first observed in the 1980s, and was dubbed Homuncular Flexibility (Lanier, 2006). Formal study of Homuncular Flexibility has confirmed the earlier, informal observations (Won et al., 2015a). An example of this effect is that participants embodied in differently shaped avatars can overestimate their own body size (Normand et al., 2011; Piryankova et al., 2014).

A remarkable result is that subjects can be made to naturally accept supernumerary limbs (Won et al., 2015b). For instance, subjects can control tails on their avatars (Steptoe et al., 2013). Furthermore, being inside an avatar with a full-body ownership illusion, in which one feels that the virtual body is her body, might elicit self-attribution mechanisms. Those mechanisms enable for an action of the avatar to be incepted in the brain as being originally intended by the participant, producing an illusory sense of agency. Test subjects can self-attribute small alterations in their motor trajectories (Azmandian et al., 2016) and even in the speech of an avatar (Banakou and Slater, 2014). However, sufficiently radical alterations of avatar actions cause semantic violations and are rejected by testers (Padrao et al., 2016). In this sense, VR can contribute to the better understanding of the brain's plasticity, and help explore how the brain and the body integrate by presenting scenarios beyond what would be physically feasible.

# MINIMUM INSTRUMENTATION REQUIREMENTS TO SUPPORT ILLUSORY MECHANISMS IN VIRTUAL REALITY

Illusory experiences are not only a consequence of using VR, but the very foundation of its operation. In VR, the participant is not merely an observer, but is the center of the system, both screen and viewer. In order to enable this self-centered experience, plausible sensory stimulation must persuade the brain that realism has not been lost when natural information derived from the physical environment is replaced by computer generated information. This process of successful substitution enables VR experiences to "feel real" (Brooks, 1999; Guadagno et al., 2007; Slater, 2009).

Complex VR systems incorporate congruent stimulation of multiple modalities such as vision, audition, and tactile/proprioception (the latter typically when participants are represented by a virtual body). Evidence shows that VR can successfully stimulate coordinated human perceptual modalities so that brain mechanisms which collect and process afferent sensory input will interpret the data coherently (Kilteni et al., 2015).

A useful definition of VR, which distinguishes VR from other complex media technologies, is that VR tends to avoid semantic violations as the brain and body interact in synch with the simulated environment. As an example, we can consider "spherical videos" which are commonly available on headsets that make use of smartphones which include sensors for rotation, but not for translation.

Despite the utility of stimulating multiple sensory modalities to engage the integration that enhances a fully ergo-centered experience, one particular sense has remained key to VR: vision. Visual dominance is a human characteristic (Posner et al., 1976); therefore it is not surprising that visual input is exceptionally important to VR.

In that sense, stereoscopic photography (dating to the 1840s) can be considered a precursor to VR. A pair of photographic prints aligned for typical human interocular distance, mounted on a stereoscope with a sufficient Field of View (FoV) and accommodation can create a minimal, self-experiential illusion capable of briefly transporting users to an alternate reality. Static stereoscopic photography has since evolved into spherical videos.

Illusory states can be convincing in spherical video technologies, but only provided that users do not try to interact with the environment. These relatively passive experiences (with no translational motion, very limited interactivity, and without body representation) can generate realistic brain responses; e.g., motor cortex activation is found even in static setups when a virtual object attacks a static participant in VR (González-Franco et al., 2014).

However, since there is no underlying dynamic simulation that can respond to variations in user behavior, this type of illusion breaks the moment users try to explore or interact with the virtual environment, constraining the veracity of the self-centered experience, and engendering a "body semantic violation" (Padrao et al., 2016). Therefore, the minimal instrumentation required to produce the illusion of entering VR without semantic violations (i.e., breaks on the illusion) must combine a continuously updated (head tracked, at a minimum) display with congruent sensorimotor contingencies (Spanlang et al., 2014).

This principle can be generalized. We can evaluate whether a given media technology instrumentation can be understood as VR by how well it avoids semantic violations. While there might never be an instrumentation for VR that completely avoids semantic violations, there are many designs for VR hardware in which a user will typically not encounter a semantic violation for extended periods of time. The authors acknowledge that this is a subtle issue that might be understood somewhat differently in the future due to cultural change or shifting philosophical interpretations, but nonetheless, a practical

difference between systems that display semantic violations almost immediately and those that largely avoid them as been demonstrated.

# TOWARD A COGNITIVE MODEL: WHICH BRAIN ACTIVITIES FACILITATE VIRTUAL REALITY ILLUSIONS?

The underlying brain mechanisms that enable users to "believe" that a computer-generated world is effectively real can be modeled through a combination of at a minimum three classes of processes: bottom-up multisensory processing (Calvert et al., 2004; Blanke, 2012), sensorimotor self-awareness frameworks (Gallagher, 2000), and top-down prediction manipulations (Haggard et al., 2002). We first consider bottom-up multisensory processing.

Bottom-up sensory processing is understood as an aggregated probabilistic cognitive strategy. The brain combines bodily signals subject to a degree of noisy variation in weighting and other parameters, and adapts those parameters continuously based on feedback. It can be framed as a natural analog to artificial sensor fusion. Signals arrive from different modalities with different temporal and spatial resolutions, different degrees of freedom, and presumably differences in coding, but the brain is able to integrate them effectively.

Bottom-up sensory processing implicitly infers the most effective ways to respond to the external world from moment to moment, but is also a key aspect of self-body consciousness (Blanke, 2012). Another analogy is to robot architecture; robots receive information through sensors and that data reflects both the status of the robot and the status of the world beyond the robot. Algorithms must integrate multiple data streams in order to both represent the world as accurately as possible and to control the robot's actuators as effectively as possible.

When multiple sensory modalities provide congruent data, the brain is more likely to "believe" the information to be true. Or, when asynchronous or ambiguous information is presented, the brain might reject the afferent information from one or more sensors as erroneous.

A common problem in navigational VR setups, simulator sickness, has its roots in discordant multisensory integration (Akiduki et al., 2003). When simulator sickness occurs, visual input might indicate movement, while the vestibular system does not. This mismatch in cross-modal sensory inputs generates a "Schrödinger cat situation" in the brain: the brain infers that the body is both static and moving. A clash of this kind must be resolved.

To tackle ambiguity in sensory information, the brain might seek higher probabilistic confidence in one interpretive state over the others; in this case between the person's location being in motion or stationary. For example, when subjects are seated, there is an increased number of skin pressure and proprioceptive sensors that add evidence that body position is static. The scales are tipped toward a fixed position interpretive state, so being seated can help reduce simulator sickness (Stoffregen and Smart, 1998). However, it also reduces the illusion of movement.

Similarly, visual experience can be modified, though that approach is usually less minimalist. For instance, VR headsets can be modified to optimize peripheral visual content in order to reduce simulator sickness (Xiao and Benko, 2016). Approaches to reduce simulator sickness can be invasive. One example is to stimulate the vestibular system directly with galvanic instrumentation (Lenggenhager et al., 2008).

In all these experiments, significant variation in individual responses has been observed. Cross-modal environmental and body interpretation varies from person to person.

Brains in subjects with extensive training in tasks that emphasize one modality will allocate more resources to that modality as a result of brain plasticity (Cotman and Berchtold, 2002). For instance, ballet dancers develop remarkable proprioceptive abilities, by which they are able to know very precisely where each limb of their body is, even with their eyes closed.

Internally, the sensory modalities are not exclusively pitted against one another. The brain also contains multisensory neurons that attend multiple inputs (Stein and Stanford, 2008). An audio-visual multisensory neuron, for example, will be more likely to fire when both excitatory stimuli are present and synchronous.

The multisensory system can enhance or depress the role of each unimodal stimuli exerting influence in a specific situation (Stein and Stanford, 2008). Typically, if one modality triggers more multisensory neurons than another, that modality is more likely to display dominance. In addition to cases of sensory modality dominance or suppression, cross-modal dynamics can help to explain synesthetic phenomena (Posner et al., 1976).

As noted earlier, visual-dominance is often associated with human cognition. This might be because, in addition to numerous unimodal visual neurons, many multimodal neurons are also influenced by visual stimulation: audio-visual, visuotactile, visuo-proprioceptive, visuo-vestibular (Bavelier and Neville, 2002; Shams and Kim, 2010).

Visual dominance enables bottom-up multisensory integration mechanisms that can be manipulated to generate body illusions leveraging visual stimulation. This is not only the basis of operation of many of the VR bodily illusions we have described (Normand et al., 2011; Piryankova et al., 2014), but it has also been shown to alter body perception in experiments that don't require VR. For instance, in the famous rubber hand illusion, participants believe that their hand has been replaced by a rubber hand through visuo-tactile synchronous stimulations (Botvinick and Cohen, 1998).

# BEYOND BOTTOM-UP PROCESSING AND A CANDIDATE FOR A THEORY OF ILLUSION IN VR

Multisensory integration alone cannot explain why VR illusions are so strong. It only relies on the input of the afferent sensors at a specific moment and does not consider the history of previous states, while interactions with the real and virtual worlds are continuous. More complex prediction mechanisms take place in our brain.

Sensorimotor frameworks can be useful as explanations for VR's effective illusions. These frameworks rely strongly on the comparison of internal representations of the actual, desired, and predicted states of the external world after a motor action has been executed (Gallagher, 2000). If the afferent sensory input (with multisensory integration) matches the predicted state, then the brain is more likely to infer that the afferent input is correct. A simple model (**Figure 1**) can describe the functioning of sensorimotor contingencies that enable VR illusions.

A model of this type can also be used to describe the foundations of motor learning and the self-awareness of voluntary actions. This approach not only accounts for more passive VR illusions (such as in 1840 stereoscope or in modern 360 video), but also explains why these illusions are reinforced through intentional interaction and exploration of a virtual environment, and are even stronger when participants are embodied in an avatar.

When users move their head or limbs, through active, voluntary motor execution, and the predicted state in their brain matches the information that arrives through the sensory afferent modalities (e.g., vision, proprioception, audio...), then there is a strong VR illusion. The strength of the illusion ultimately derives from the powerful agency implications related to volition: "I am the initiator or source of the action" (Haggard et al., 2002).

This type of self-awareness model based on predictions can explain strong top-down manipulations of afferent feedback. An example is found in experiments with action binding mechanisms, where actions (such as pressing a button) and feedback (such as a delayed audio beep) can be perceived closer in time (Haggard et al., 2002). In these experiments, discordant afferent inputs are apparently recalibrated or suppressed in the brain in order to confirm a predicted state of the world (Haggard and Chambon, 2002): "I have a prediction, ergo this is my final state." The illusion illustrated in such experiments is related to the illusions created in VR. The brain can "decide" that there is an error in measurement in order to reinforce a preference for a predicted outcome.

These top-down agency mechanisms that have been shown to increase tolerance to latencies in certain settings (up to 200 ms; Haggard and Chambon, 2002), have implications to VR experiences. Proprioceptive experiences can be manipulated in this way when reaching for objects in VR (Azmandian et al., 2016). Producing self-attribution of retargeted motions strong enough to ignore associated proprioceptive drifts if the tactile feedback is coherent with the visual input (Kokkinara et al., 2015; Azmandian et al., 2016).

When does a top-down mechanism fail? The brain will reject an illusion when the discordance between afferent sensory inputs and the predicted/intended state become too extreme. This failure mode of VR can be described as a semantic violation (Padrao et al., 2016). The degree of failure can be measured as an increase in perceived latency between intention and a perceived action (Haggard et al., 2002).

In sum,sufficient results exist to describe the broad underlying mechanisms that enable VR experiences to be internalized as real. Continuous bottom-up multisensory integration is modulated by complex cognitive predictions (Slater, 2009; Blanke, 2012). Predictions can be reinforced through interactions so that the brain might even "correct" some sensory deficiencies in order to match its predicted states using top-down manipulations (Haggard et al., 2002). These corrections are so powerful that can alter the sense of agency and produce self-attribution of avatar actions into participants (Banakou and Slater, 2014).

# PARTIAL AWARENESS OF ILLUSION

This model does not address varying levels of partial awareness that users report and demonstrate during their exposure to VR. Even though participants are aware at all times that they are in a computer simulation, evidence suggests that being exposed to certain scenarios—particularly when one's own sense of self is manipulated through altered avatar design—can produce nonconscious effects; these might be perceptual or behavioral (Yee and Bailenson, 2007).

We tentatively assemble several mechanisms related to levels of partial awareness in VR illusions.

One approach to understanding partial awareness of illusion in VR concerns the human capacity to enable automatic cognitive mechanisms. Once a task is well-trained, the brain becomes less consciously aware of performing that task, so it is able to focus on other mental activities. People can walk and talk on the phone at the same time, for instance, though the ability of an individual to accurately assess their own capacity for multitasking is imperfect. We might think of the general VR illusion as being similar to walking in the above example. The modifications cognitive processes have taken on in order for the simulation to feel real have become unconscious background activities, as described by well-established theories (Haggard et al., 2002). Empirical examples of automatic mechanisms in VR have been found using EEG recordings, when participants activate their motor cortex as a response to a threat (González-Franco et al., 2014). But also through behavioral responses when participants interact as bystanders in a violent scenario (Slater et al., 2013) or in the presence of a moral dilemma (Pan and Slater, 2011).

Reducing semantic violations is an essential task in VR, but the softer quality of plausibility further strengthens illusions in VR (Slater, 2009). We can extrapolate that the more plausible an illusion is, the more likely it is to be processed unconsciously.

Based on the model, we hypothesize that cognitive and sensory saturation will change the level of awareness of some illusions, i.e., a sufficient quantity of "tricks" in VR, as described in the experiments referenced in this paper, might be compounded in order to overwhelm the ability of an individual to consciously keep track of some illusory aspects of an experience. Therefore, more illusions would be undetected and accepted as real than if they had been presented one at a time. This might happen particularly when performing a task requiring higher cognitive functions, in which the brain is so saturated that it has no more load to dedicate to the evaluation of basic perceptual information. Further experiments would be needed to validate this hypothesis.

We are not yet proposing a model to explain in a comprehensive way how brain tolerance, automatic processing, and saturation might trigger different levels of awareness of a VR illusion. Incorporating further awareness mechanisms to our current model would probably require a more complex approach including more recent ideas from machine learning.

However, our model, based on classical, established theories, is useful for describing how VR illusions come about in the first place.

# DISCUSSION

In this paper, we first reviewed illusions that can take place in VR and then presented a neuroscientific model able to describe why and how they take place. We suggest that VR illusions occur when media instrumentation stimulates neural bottom-up multisensory processing, sensorimotor self-awareness frameworks, and cognitive top-down prediction manipulations and furthermore allows these to reconcile in such a way that semantic violations are infrequent.

This model of illusion in VR summarizes how VR research has interacted with established human neuroscience theories, while also suggesting and requiring new ideas. For instance, VR enables unprecedented experiments that are both broadly multisensory, and yet with few uncontrolled variables, in order to investigate whole-body cognitive mechanisms (Kilteni et al., 2015). Indeed, the model accommodates a wide range of ergo-centered research in VR, including not only multisensorial illusions but also potentially illusory/false memories (Osimo et al., 2015), such as

# REFERENCES


memories of agency (Guadagno et al., 2007), conversations with avatars (Pan et al., 2012), and engaging in plot interventions (Yee et al., 2009).

In all these cases VR presents an expanded experimental platform that can be interpreted using a model composed of previously-established theories—and yet, VR also presents new experimental design constraints, such as the avoidance of disabling, unintended semantic violations. Experiments taking place in physical reality avoid that problem, since physical reality is presumed to be well-ordered, complete, and consistent.

We discussed the question of partial awareness of VR illusions and some potentially relevant cognitive mechanisms, but we concluded that it is still premature to incorporate these elements into the model.

VR has recently become widely available, and it is ever more urgent for varied stakeholders to understand what illusions can be created in VR; those with ethical, legal, or compassionate concerns will benefit from a compact framework for understanding these illusions.

For instance, one worrying scenario is that in the future, if one is completing a work assignment within a virtual world, experiencing a degree of cognitive saturation, one's avatar might also be slightly altered in relation to incidental portrayals of a product or a political candidate, in order to achieve a change in behavior that would benefit a third party without the user's knowledge. While variants of this type of effect have been observed in prior media, the cited experiments show that manipulative illusions could be remarkably powerful in VR. Examples of this implicit behavioral avatar manipulations include the increase in saving behaviors after being embodied in an older avatar, or the altered negotiation skills after being exposed to taller or shorter avatars (Yee et al., 2009).

The model suggests how the manipulative aspects of the VR illusion can be selectively weakened. It can also help to identify manipulation abatement strategies that are unlikely to work.

We hope that our model can be leveraged as a base to design future VR experiences. We expect that both scientists and creators will find it useful for understanding the implications of the VR scenarios that they design and the types of illusions they generate.

# AUTHOR CONTRIBUTIONS

All authors listed have made substantial, direct and intellectual contributions to the work, and approved it for publication.

virtual reality experiences," in Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems 1968–1979 (San Jose, CA). doi: 10.1145/2858036.2858226


renderings of the future self. J. Mark. Res. 48, S23–S37. doi: 10.1509/jmkr.48. spl.s23


**Conflict of Interest Statement:** The authors declare that the current manuscript presents a balanced and unbiased review on the field of Virtual Reality. The authors however report their affiliation to Microsoft, an entity with a financial interest in the subject matter or materials discussed in this manuscript. The authors have conducted the review following scientific research standards.

Copyright © 2017 Gonzalez-Franco and Lanier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Enhancing Our Lives with Immersive Virtual Reality

*Mel Slater1,2,3\* and Maria V. Sanchez-Vives1,2,4*

*1 Event Lab, Department of Clinical Psychology and Psychobiology, University of Barcelona, Barcelona, Spain, 2 Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain, 3Department of Computer Science, University College London, London, UK, 4 Institut d'investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain*

Keywords: virtual reality, presence, immersion, place illusion, plausibility, moral dilemmas, embodiment, immersive journalism

# SUMMARY

Virtual reality (VR) started about 50 years ago in a form we would recognize today [stereo head-mounted display (HMD), head tracking, computer graphics generated images] – although the hardware was completely different. In the 1980s and 1990s, VR emerged again based on a different generation of hardware (e.g., CRT displays rather than vector refresh, electromagnetic tracking instead of mechanical). This reached the attention of the public, and VR was hailed by many engineers, scientists, celebrities, and business people as the beginning of a new era, when VR would soon change the world for the better. Then, VR disappeared from public view and was rumored to be "dead." In the intervening 25 years a huge amount of research has nevertheless been carried out across a vast range of applications – from medicine to business, from psychotherapy to industry, from sports to travel. Scientists, engineers, and people working in industry carried on with their research and applications using and exploring different forms of VR, not knowing that actually the topic had already passed away.

The purpose of this article is to survey a range of VR applications where there is some evidence for, or at least debate about, its utility, mainly based on publications in peer-reviewed journals. Of course not every type of application has been covered, nor every scientific paper (about 186,000 papers in Google Scholar): in particular, in this review we have not covered applications in psychological or medical rehabilitation. The objective is that the reader becomes aware of what has been accomplished in VR, where the evidence is weaker or stronger, and what can be done. We start in Section 1 with an outline of what VR is and the major conceptual framework used to understand what happens when people experience it – the concept of "presence." In Section 2, we review some areas where VR has been used in science – mostly psychology and neuroscience, the area of scientific visualization, and some remarks about its use in education and surgical training. In Section 3, we discuss how VR has been used in sports and exercise. In Section 4, we survey applications in social psychology and related areas – how VR has been used to throw light on some social phenomena, and how it can be used to tackle experimentally areas that cannot be studied experimentally in real life. We conclude with how it has been used in the preservation of and access to cultural heritage. In Section 5, we present the domain of moral behavior, including an example of how it might be used to train professionals such as medical doctors when confronting serious dilemmas with patients. In Section 6, we consider how VR has been and might be used in various aspects of travel, collaboration, and industry. In Section 7, we consider mainly the use of VR in news presentation and also discuss different types of VR. In the concluding Section 8, we briefly consider new ideas that have recently emerged – an impossible task since during the short time we have written this page even newer ideas have emerged! And, we conclude with some general considerations and speculations.

Throughout and wherever possible we have stressed novel applications and approaches and how the real power of VR is not necessarily to produce a faithful reproduction of "reality" but rather that

#### *Edited by:*

*Massimo Bergamasco, Sant'Anna School of Advanced Studies, Italy*

#### *Reviewed by:*

*Daniel Thalmann, Nanyang Technological University, Singapore*

#### *\*Correspondence:*

*Mel Slater melslater@ub.edu*

#### *Specialty section:*

*This article was submitted to Virtual Environments, a section of the journal Frontiers in Robotics and AI*

*Received: 17 October 2016 Accepted: 15 November 2016 Published: 19 December 2016*

#### *Citation:*

*Slater M and Sanchez-Vives MV (2016) Enhancing Our Lives with Immersive Virtual Reality. Front. Robot. AI 3:74. doi: 10.3389/frobt.2016.00074*

**287**

it offers the possibility to step outside of the normal bounds of reality and realize goals in a totally new and unexpected way. We hope that our article will provoke readers to think as paradigm changers, and advance VR to realize different worlds that might have a positive impact on the lives of millions of people worldwide, and maybe even help a little in saving the planet.

# 1. VIRTUAL REALITY – FOUNDATIONS

# 1.1. Introduction – Now Is the Time

"It's a very interesting kind of reality. It's absolutely as shared as the physical world. Some people say that, well, the physical world isn't all that real. It's a consensus world. But the thing is, however real the physical world is – which we never can really know – the virtual world is exactly as real, and achieves the same status. But at the same time it also has this infinity of possibility that you don't have in the physical world: in the physical world, you can't suddenly turn this building into a tulip; it's just impossible. But in the virtual world you can …. [Virtual reality] gives us this sense of being able to be who we are without limitation; for our imagination to become objective and shared with other people." Jaron Lanier, SIGGRAPH Panel 1989, Virtual Environments and Interactivity: Windows to the Future.

Although said more than 25 years ago by the person who coined the term "virtual reality" (VR) this statement about the excitement and potentiality that was apparently just around the corner in the late 1980s really does apply today. The dream at the time was a VR that would be available cheaply on a mass scale worldwide. The expectation and hope was very high. As Timothy Leary said in the following year's SIGGRAPH Panel, imagining a time when the cost of an HMD and body-tracking equipment would be at low-end consumer level, "… suddenly the barriers of class and linguistics and education and nationality are gone. The kid in the inner city can slip on the telepresence hardware and talk to young people in China or Russia. And have flirtations with kids in Japan. In other words, to me there is something wonderfully democratic about cyberspace. If it's virtual you can be anyone, you can be anything this time around. We are getting close to a place where that is feasible." Unfortunately, the feasibility was not there, or at least not realizable at that time or anywhere near it. Now though the possibility is real, and for whatever reason now is the time.

During the past 25 years when VR was supposed to have "died"1 masses of research into both the development of the technology and its application in a vast array of areas has been continuing. Scott Fisher, one of the VR pioneers in a 1989 essay reported in Packer and Jordan (2002) set out a number of applications: telepresence, where VR provides an interface through which the participant operates in a distant place embodied in a robot located there; data visualization; applications in architectural visualization; medicine including surgical simulation; education and entertainment; remote collaboration. These were all applications that were being worked on at the time. In this article, we set out how VR has been used in these and in a variety of other applications, applications that have already shown results that may be of significant benefit for individuals and society. With VR available on a mass scale, the potential for these benefits to have significant impact is now all the greater. However, as Jaron Lanier also said in the 1990 panel "… there's really a serious danger of expectations being raised too high." This remains true today, but we can have slightly less caution since research in the intervening quarter of a century has demonstrated results that stand on a reasonably solid scientific basis.

For an overview of a range of applications of VR (not all considered in this article), see the paper by one of the pioneers of VR, Frederick Brooks (1999), with an updated discussion by Slater (2014). What follows is not meant to be a survey of all possible results in all possible applications. We have selected areas that we believe are particularly important for demonstrating how VR has been and might be used to improve the lives of people, and to help overcome some societal problems, or at the very least help in scientific understanding of problems and contribute toward solutions. Readers might find that their favorite topic, research result, or paper has not been mentioned. This is because we have focused on illustrative results and developments rather than attempting to be comprehensive. Indeed, to write comprehensively about every section in this article would require something like the whole article length devoted to it. Even so without trying to be comprehensive, we have found it necessary to cite many references. We have concentrated on scientific papers in peer-reviewed journals. Immersive VR has shown an extremely impressive array of applications over the years, but what is important now, given the lesson of what happened in its first phase, is that we emphasize results that have some level of scientific support. The scope of this article is on the uses of VR; we are not presenting techniques, methods, interfaces, algorithms, or any of the technical side, except where this is relevant to explain a particular application or results.

Our thesis is similar to that presented in the quote from Jaron Lanier above: VR offers us a way to simulate reality. We do not say that it is "exactly as real" as physical reality but that VR best operates in the space that is just below what might be called the "reality horizon." If a virtual knife stabs you, you are not going to be physically injured but nevertheless might feel stress, anxiety, and even pain. If a virtual human unexpectedly kisses you, you may blush with embarrassment, and your heart start pounding, but it will be a virtual kiss only. On the other hand, as Lanier said, the real power of VR is to go beyond what is real, it is more than simulation, it is also creation, allowing us to step out of the bounds of reality and experience paradigms that are otherwise impossible.

Virtual reality is "reality" that is "virtual." This means that, in principle, anything that can happen in reality can be programed to happen but "virtually," a point that we return to in Chapter 8, since, for example, this is not the case with touch and force feedback. Therefore, writing about the potentialities inherent in

<sup>1</sup>http://www.technologyreview.com/view/421293/whatever-happened-tovirtual-reality/ though see also http://science.nasa.gov/science-news/science-atnasa/2004/21jun\_vr/ from NASA Ames, 2004.

VR is a difficult task – since it encompasses what can be done in physical reality (for good or evil). But even more, since it is VR, we emphasize that we can break out of the bounds of reality and accomplish things that cannot be done in physical reality. Herein lies its real power. With VR we can, for example, simulate and improve traditional physiotherapy by making it more interesting for the patient by changing their apparent location and activity to something more interesting than just what they are actually doing. In reality, a machine might be helping someone to move their legs for physiotherapy, but with VR they can be given the illusion that rather than just moving their legs for therapy they might be playing soccer in the World Cup. This type of approach augments current practices. But, VR can go way beyond this and introduce radical paradigm shifts.

In VR we are currently still at the stage similar to that of the transition between theater and movies as pointed out by Pausch et al. (1996). Movies were originally just another way to show theater. It took a while before moviemakers developed a new grammar, ways of presenting a story unique to this medium. So, the same will be true of VR. Nowadays, a computer game in VR is just a traditional computer game – but displayed in a different medium. Eventually there will be a paradigm shift, one that we cannot know at the time of writing. Putting this another way, VR is revolutionary, even though it has taken 50 years to get from the initial idea in the lab to becoming a mass consumer product. How this product might develop and change the world in which we live remains unknown. In this article, we try to set out some of what has been done with VR and to some extent what might be done. We address positive uses of VR, while recognizing from the outset that there will be, like with any technology, uses that are morally repugnant. For example, vehicles can do serious damage when used improperly, even though their designed purpose is to transport people or facilitate commercial activity.

# 1.2. Essential Concepts

The idea of immersive VR in the form that we think of it today was foreshadowed by Ivan Sutherland in 1965 (Sutherland, 1965) and then realized with the "Sword of Damocles" HMD described in a paper published 3 years later (Sutherland, 1968).2 This was not the first ever HMD – see, for example, a collection of pictures compiled by Stephen R. Ellis of NASA Ames, which includes one dating back to 1613.3 Nor was this the first ever virtual environment system – see the multisensory Sensorama system by Morton Heilig,4 or Myron Krueger's pioneering work on Artificial Reality (Krueger et al., 1985; Krueger, 1991), or the years of work on flight simulators (Page, 2000). However, it was the first that, although using almost totally different technology than available today, introduced (and implemented) the concepts that make up a VR system. An HMD delivers two computer-generated images, one for each eye. The 2D images are computed and rendered with appropriate perspective with respect to the position of each eye in the three-dimensionally described virtual scene. Together, the

2https://www.youtube.com/watch?v=NtwZXGprxag&feature=youtu.be

3http://humansystems.arc.nasa.gov/groups/acd/projects/hmd\_dev.php 4http://www.mortonheilig.com/InventorVR.html

images therefore form a stereo pair. The two small displays are placed in front of the corresponding eye, with some optics that enables the user to see the images. The displays are mounted in a frame, which additionally has a mechanism to continually capture the position and orientation of the user's head, and therefore gaze direction (assuming that the eyes are looking straight ahead). Hence, as the head of the user moves, turns, or looks up and down, this information is transmitted to the computer that recomputes the images and sends the resulting signals to the displays. From the point of view of the users, it is as if they are in an alternate life-sized environment, since wherever they look, in whichever direction, they see this surrounding computer-generated world in 3D stereo with movement and motion parallax. (The same can be done with specialized sound.) In fact, from this point on we drop the term "user" and refer to the "participant." VR is different from other forms of human–computer interface since the human *participates in the virtual world rather than uses it*.

In the 1980s, NASA Ames developed the VIEW system (Virtual Interface Environment Workstation) described by Fisher et al. (1987). 5 This was a full VR system with all components recognizable today: head-tracked wide field-of-view relatively light weight HMD, audio, tracking of the body, tracked gloves that allowed participants to interact with virtual objects, tactile and force feedback (haptics), and where the VR could be linked to a telerobotics system (Section 6.4).

Also in the 1980s a company VPL led by Jaron Lanier became a driving force of VR developments constructing the Eyephone HMD, tracked data gloves6 for interaction, whole body tracking, and reality built for two (Blanchard et al., 1990).7 They also developed a visual programming language that made it possible to build virtual environments with limited programming. It was a goal for people to be able to construct their virtual realities, while in VR, and immediately share these with multiple people. It was probably through the work of VPL that the idea of VR became widely publicized.

The degree of excitement, creativity, speculation, visions of a positive future, belief in the near-term mass availability of VR cannot be overemphasized. Indeed, the ideas and realizations that were around in the late 1980s and early 1990s can be read anew today and have a new freshness – and are especially important because what was hoped for then (VR for the mass of people at low cost) is now becoming a reality. Readers are urged to read the proceedings of two panels that occurred at the SIGGRAPH conference in 1989 (Conn et al., 1989) and 1990 (Barlow et al., 1990) to get an idea of the excitement and promise of the heady days of early VR.

Head-mounted display technology puts the displays close to the eyes. Another type of immersive VR system was developed by Cruz-Neira et al. (1992) referred to as a CAVE™ system (Cruz-Neira et al., 1993). Here, images are back-projected onto the walls of an approximately 3 m cubed room (front projected onto the floor by a projector mounted on the ceiling above the

<sup>5</sup>https://www.youtube.com/watch?v=3L0N7CKvOBA

<sup>6</sup>https://www.youtube.com/watch?v=fs3AhNr5o6o

<sup>7</sup>https://www.youtube.com/watch?v=ACeoMNux\_AU

open topped cuboid). Typically, three walls and the floor are screens. The images are projected interlaced at, e.g., 90 frames per second, 45 showing left eye images and the others the right eye images. Lightweight shutter glasses alternately have one eye lens opaque and the other transparent, in sync with the projected images. The brain fuses the two into one overall 3D stereo scene. Through head tracking mounted on the glasses, the image is correctly perspective computed for the head position, direction, and orientation of the participant. More than one person can be in the Cave simultaneously, and wearing the stereo shutter glasses, but the perspective is only correct for the one wearing the head-tracked glasses. Hence, such Cave-like systems, like HMDs deliver a surrounding 3D world. Of course, such a system has been far more expensive than an HMD system, both in terms of the space required and the cost (high powered projectors, a multiprocessor computer system, complex software for lock-step stereo rendering across all the displays, equipment maintenance). Moreover, as the promise of HMD driven VR diminished in the 1990s through the failure to develop high quality displays at low enough cost, and with acceptable ergonomics (such as weight), Cave-like systems came to be used as an alternative. However, unlike HMDs, each Cave was typically tailor-made to order (it depended on available space apart from anything else) and never became a mass product. Caves became one of the mainstays of VR research and applications from the late 1990s and through the 2000s until recently. The applications we discuss below include both HMD and Cave systems.

Conceptually, a minimal VR system places a participant into a surrounding 3D world that is delivered to a display system by a computer. At the very least, the participant's head is tracked so that image and auditory updates depend on head-position and orientation. The computer graphics of the system delivers perspective-projected images individually to each eye, and the resulting scenario should be seen with correct parallax. Ideally, there should be a means whereby participants can effect changes in the virtual world. This may be accomplished by 3D tracked data gloves, or a handheld device such as a Wand (which is like a mouse or joystick but tracked in 3D space). Note that this says nothing about how the world is rendered. Even with the wire frame (lines only) images portrayed in Sutherland (1968), Ivan Sutherland noted that "An observer fairly quickly accommodates to the idea of being inside the displayed room and can view whatever portion of the room he wishes by turning his head …. Observers capable of stereo vision uniformly remark on the realism of the resulting images."

# 1.3. Immersion and Presence

Consciousness of our immediate surroundings necessarily depends on the data picked up by our sensory systems – vision, sound, touch, force, taste, and smell. This is not to say that we simply reproduce the sensory inputs in our brains – far from it, perception is an active process that combines bottom-up processing of the sensory inputs with top-down processing (including prior experience, expectations, and beliefs) based on our previously existing model of the world. After a few seconds of walking into a room we think that we "know" it. In reality, eye scanning data show that we have foveated on a very small number of key points in the room, and then our eye scan paths tend to follow repeated patterns between them (Noton and Stark, 1971). The key points are determined by our prior model of what a room is. We have "seen" a small proportion of what there is to see; yet, our perceptual system has inferred a full model of the room in which we are located. In fact it has been argued that our model of the scene around us tends to drive our eye movements rather than eye movements leading to our perceptual model of the scene (Chernyak and Stark, 2001). It was argued by Stark (1995) that this is the reason why VR works, even in spite of relatively simplistic or even poor rendering of the surroundings. VR offers enough cues for our perceptual system to hypothesize "this is a room" and then based on an existing internal model infer a model of this particular room using a perceptual fill-in mechanism. Recall the quote from Sutherland above how people accommodated to and remarked on the realism of the wire frame rendered scene displayed in the "Sword of Damocles" HMD.

The technical goal of VR is to replace real sense perceptions by the computer-generated ones derived from a mathematical database describing a 3D scene, animations of objects within the scene – represented as transformations over sets of mathematical objects – including changes caused by the intervention of the participant. If sensory perceptions are indeed effectively substituted then the brain has no alternative but to infer its perceptual model from its actual stream of sensory data – i.e., the VR. Hence, consciousness is transformed to consciousness of the virtual scenario rather than the real one – in spite of the participant's sure knowledge that this is not real.

Effective substitution of real sensory data is an ideal. In practice, it depends on several factors, not least of which is – which sensory systems are included? Typically, vision, and often auditory, more rarely touch, more rarely force feedback, more rarely still smell, and almost unknown taste.8 If we consider the typical VR system, it is primarily centered around vision, may have sound, and may have some element of tactile feedback. However, even vision alone is often enough for numerous applications, since anyway for many people it is perceptually dominant. So, participants in a VR typically encounter a situation where their visual system places them on say a roller coaster, but all other sense perceptions are from the surrounding physical environment. Nevertheless, they may scream and react as if they are on the roller coaster even while talking to a friend in reality standing nearby.

Factors that are critical for effective sensory substitution have been known for several years (Heeter, 1992; Held and Durlach, 1992; Loomis, 1992; Sheridan, 1992, 1996; Steuer, 1992; Zeltzer, 1992; Barfield and Hendrix, 1995; Ellis, 1996; Slater and Wilbur, 1997): such as wide field-of-view vision, stereo, head tracking, lowlatency from head move to display, high-resolution displays, and of course the more sensory systems that are substituted the better. However, these types of technical factors (and there are others) are for one purpose *– to afford the participant to perceive using natural sensorimotor contingencies* (O'Regan and Noë, 2001a,b; Noë, 2004). What this means is that in order to perceive we use

<sup>8</sup>Though see Project Nourished: http://www.projectnourished.com

our bodies in a natural way. We turn our head, move our eyes, bend down, look under, look over, look around, reach out, touch, push, pull, and doing all or some subset of these things simultaneously. Perception is a whole body action. *Hence, the primary technological goal of VR is to realize perception through such natural sensorimotor contingencies to the best extent possible*, and of course this continually comes up against limitations. For example, if while wearing an HMD or in a Cave we look very closely at an object, eventually we will see pixels. Or, in most existing VR systems, if we touch some arbitrary virtual object we will not feel it.

By an *immersive* VR system we mean one that delivers the ability to perceive through natural sensorimotor contingencies. This is entirely determined by the technology. Whether you can turn around 360°, all the while seeing a very low-latency continuous update of your visual field in correspondence with your gaze direction, is completely a function of the extent to which the system can do this. We can classify systems in this way as being more or less immersive. We say that system A is more immersive than system B if A can be used to simulate the perception afforded by B but not *vice versa*. Hence, in this sense an HMD is "more immersive" than a Cave, since there is something that can be represented in an HMD that cannot be represented in a Cave (even a six-sided Cave): the virtual representation of the participant's body. In a Cave when you look down toward yourself you will see your real body. In an HMD with head tracking you can see a virtual body substituting your own (if this has been programed). Moreover, the virtual body can be designed to look like the real one, or not, and certainly with body tracking can be programed to move with real body movements and so on. So, in this way an HMD-based system can (in an ideal sense) be set up to simulate a Cave, but not *vice versa*.

Immersion describes the technical capabilities of a system, it is the physics of the system. A subjective correlate of immersion is *presence*. If a participant in a VR perceives by using her body in a natural way, then the simplest inference for her brain's perceptual system to make is that what is being perceived is the participant's actual surroundings. This gives rise to the subjective illusion that is referred to in the literature as presence – the illusion of "being there" in the environment depicted by the VR displays – *in spite of the fact that you know for sure that you are not actually there*. This specific feeling of "being there" has also been referred to as "place illusion" (PI) (to distinguish it from the multiple alternative meanings that have been attributed to the term "presence") (Slater, 2009). It was coined by Marvin Minsky (1980) to describe the similar feeling that can arise when embodying a remote robotic device in a teleoperator system.

Place illusion can occur in a static environment where nothing happens – just looking around a stereo-displayed scenario, for example, where nothing is changing. When there are events in the environment, events that respond to you, that correlate with your actions, and refer to you personally, then provided that the environment is sufficiently credible (i.e., meets the expectations of how objects and people are expected to behave in the type of setting depicted), this will give rise to a further and independent illusion that we refer to as "Plausibility" (Psi) that the events are really happening. Again, this is an illusion in spite of the sure knowledge that nothing real is happening. A virtual human approaches and smiles at you, and you find yourself smiling back, even though too late you may say to yourself – why did I smile back, there is no one there?

The real-time update of sensory perception as a result of movement (e.g., head turning) gives rise to the sense of "being there" – the illusory sensation of being in the computer-generated environment (Sanchez-Vives and Slater, 2005). The dynamic changes following events caused by or to the participants can give rise to the illusion that the events are really happening – "plausibility" (Slater, 2009). With a technically good VR system (wide field-of-view high-resolution stereo display, with low-latency head tracking at a minimum), the "being there" aspect is essentially determined for all but a few moments during an experience (Slater and Steed, 2000). Psi is much harder to attain, often requiring specific domain knowledge (e.g., the virtual representation of a doctor's surgery for the purposes of training had better be according to their expectations if doctors are to accept it). In this article, we use PI to refer to the illusion of being there, whereas presence refers to both PI and Psi. Following Sanchez-Vives and Slater (2005), the behavioral correlate of "presence" is that participants behave in VR as they would do in similar circumstances in reality. For a more formal treatment of PI, Psi, and presence, including experimental results, see Slater et al. (2010a). 9 These issues are taken up again in Chapter 8.

This fundamental aspect of VR to deliver experience that gives rise to illusory sense of place and an illusory sense of reality is what distinguishes it fundamentally from all other types of media. It is true that in response to a fire in a movie scene, the viewers' hearts might start racing, with feelings of fear and discomfort. But, they will not run out of the cinema for fear of the fire. In VR, about 10% did run out when confronted by a virtual fire even though the fire did not look realistic (Spanlang et al., 2007). In a movie that includes a fight between two strangers in a bar, audience members will not intervene to stop the fight. In VR, they do – under the right circumstances – specifically when the victim shares some social identity with the participant (Slater et al., 2013), which itself is remarkable because obviously there is no one real there with whom to share social identity.

So, VR is a powerful tool for the achievement of authentic experience – even if what is depicted might be wholly imaginary and fantastic. In a scenario with dinosaurs such as that shown in "Back to Dinosaur Island – Jurassic World with Oculus Rift,"10 of course participants know that the situation is not real. Nevertheless, they would typically have the illusion of being there and have the illusory sensation that the dinosaur's actions are really happening.

Evidence over the past 25–30 years shows that PI and Psi can occur even in quite low-level systems. This is because VR relies on the brain "filling in" detail in response to the apparent situation, so that *just like in physical reality* people find themselves responding with physiological and reflex actions before they consciously reason out the situation – in this case that in fact nothing real is happening. That reasoning or high-level cognitive processing

<sup>9</sup>https://www.youtube.com/watch?v=QEKxyhSPiVg

<sup>10</sup>https://www.youtube.com/watch?v=lmHEQRVJzBI

occurs more slowly, after the autonomic bodily responses have already occurred. For example, put someone next to a virtual precipice and their heart will start pounding (Meehan et al., 2002), even though eventually of course they can say to themselves that it is not really there. VR effectively relies on this duality – between very rapid brain activation that causes the body to respond (by the body responding, we include autonomous responses and thoughts that are generated in response to an apparent situation) and the slower cognitive process that reasons things out, which is of course a vital mechanism for survival, and occurs normally in physical reality.

Since VR evokes realistic responses in people, it is fundamentally a "reality simulator." By this we mean that participants can be placed in a scenario that depicts potentially real events, with the likelihood that they would act and respond quite realistically. This can obviously be exploited for many applications including rehearsal for the actual events, planning, training, knowledge dissemination, and so on. However, VR is also an *unreality simulator!* The events that it depicts may be ones that are highly unlikely to happen or cannot happen because they violate fundamental laws of physics, such as defying the laws of gravity. In VR, the physical laws can be simulated to the limit that computational power supports, or they can be changed or violated. Similarly, social conventions can be violated. A person might one day participate in a world that has never existed, such as Pandora from James Cameron's movie Avatar.11 But still, provided some fundamental principles are adhered to, giving rise to the illusions of being in the virtual place where real events are taking place – participants can nevertheless demonstrate realistic responses. At the simplest level your heart is likely to race equally being faced with a realistic depiction of a precipice (something that could happen) or being chased by otherworld monsters. In this way, VR dramatically extends the range of human experiences way beyond anything that is likely to be encountered in physical reality. Hence, the amazing capability of VR not just as a reality simulator but as an unreality simulator that can paradoxically give rise to realistic behavior.

In this article, we will outline some of the applications that have been developed that show the positive use of VR for the potential benefit of society and individuals – how VR can be used to enhance well-being across a vast range of aspects of life. VR as a reality simulator has its uses in various forms of training, for education, for travel, some of which are discussed in the sections below. Moreover, VR as an unreality simulator can be used for many different types of entertainment – that extend from passive to active. It should also be noted that VR as an unreality simulator can also be used to solve "real" problems – as we will indicate later.

In each of the sections below, we will tackle a different domain of application. We will show in each section what has been done at the time of writing and give some indication of the degree to which it has been successful (i.e., its scientific validation). Additionally, where relevant, we will discuss ideas and proposals indicating what could be done in this domain.

# 2. SCIENCE, EDUCATION, AND TRAINING

# 2.1. Psychology and Neuroscience 2.1.1. The Virtual Body

In Franz Kafka's Metamorphosis,12 Gregor Samsa woke up one morning lying in bed and found himself transformed into a horrible insect-like creature. The body felt like his own, but he had to learn how to move himself in new ways, and of course it had an impact on his attitudes and behaviors and those of others who saw him. Using VR, it has been shown to be possible to actually experiment with these types of body transformations, though rather more pleasant ones, and in the early days at the VPL company, there was experimentation by Jaron Lanier with embodiment in a virtual lobster body.

The question of how the brain represents the body is fundamental in cognitive neuroscience. How does the brain distinguish that this object is "my" hand and part of my body, but that object, a cup, is not part of my body, or that other object is your hand and not part of me? Common sense would have us believe that our own internal body representation is stable, something that changes only slowly through time, but experiments have shown that it is quite easy to shift the illusion of body ownership to objects that are not part of the body at all, or to a radically transformed body, so that our body representation is highly malleable.

A classic and very simple experiment to show this is called the rubber hand illusion (RHI) presented by Botvinick and Cohen (1998) in a one page Nature paper in 1998, which has had an enormous impact on the field (over 1800 citations – Google Scholar – at the time of writing). It has led to a vast literature that exploits these illusions to understand how the brain represents the body. Recent reviews are provided in Blanke (2012); Ehrsson (2012); and Blanke et al. (2015). In the RHI, the subject sits by a table onto which a rubber hand is placed in an anatomically plausible position, and approximately parallel to the subject's corresponding real hand. The real hand is hidden behind a partition. The experimenter sitting opposite the subject taps and strokes the seen rubber hand and the hidden real hand synchronously in time and as far as possible at the same locations on the two hands. From the subject's point of view, there is a rubber hand seen on the table in front, and arranged so that it could be the subject's own hand, and this hand is seen to be tactilely stimulated. But, corresponding to the seen stimulation, there is actually felt stimulation on the real hand. The brain's perceptual system resolves this conflict by integrating the two separate but synchronous inputs into one, resulting in the perceptual and proprioceptive illusion that the rubber hand is the subject's hand.13,14 This feeling, just like PI or Psi, is impossible to describe – it has to be experienced. If the visual and tactile stimulation are asynchronous, then the illusion does not occur, or occurs to a much lesser extent. To elicit a behavioral measure of the illusion, the idea of "proprioceptive drift" was introduced in Botvinick and Cohen (1998). Before the stimulation, participants with eyes closed had to point to their

<sup>12</sup>http://www.gutenberg.org/files/5200/5200-h/5200-h.htm

<sup>13</sup>https://youtu.be/x5-TPXIzKuI

<sup>14</sup>https://www.youtube.com/watch?v=TCQbygjG0RU

<sup>11</sup>http://www.avatarmovie.com/index.html

hand under the table on which their arm was resting. After the stimulation, participants were again asked to repeat the pointing procedure. The distance between the post- and pre-measures is called the proprioceptive drift, where greater values indicate that participants pointed more toward the rubber hand after than before. Indeed, it was found that the drift was on the average positive for those in the synchronous condition and zero for those in the asynchronous.

Armel and Ramachandran (2003) went on to show that subjects also respond physiologically to a threat to the rubber hand. They argued that our internal body representation is updated moment to moment based on the stimulus contingencies received. Synchronous multisensory perception leading to the hypothesis that a rubber hand might be our real hand is taken on by the brain that very quickly generates the corresponding illusion as a way to resolve the contradiction between the seen and felt synchronous stimulation. There are limitations, such as the rubber hand needing to look like a human hand, its position must be plausible, and so on, but the fundamental result that we can have strong feelings of ownership over an object that we know for certain is not part of our body is clearly demonstrated by this illusion.

Lenggenhager et al. (2007)15 and Ehrsson (2007)16 went on to show how similar multisensory techniques could be used to induce out-of-body illusions. Each of these used an HMD *via* which subjects saw a distant body. The HMD received video signals from cameras pointing toward the body. In the case of Lenggenhager et al. (2007), the distant body was a manikin with its back to the subject. The manikin was seen to be stroked on the back, which was felt on the subject's back through synchronous stimulation by the experimenter. Subjects then had the strange illusion of being located at or drawn toward the manikin body to their front. In the case of Ehrsson (2007), the video cameras were pointed to the back of the subject's own seated body. So from the perspective of the subject, they saw their own body from behind themselves. The experimenter synchronously stroked the subject's real chest (out-of-sight) and visibly made similar strokes under the cameras. From the point of view of the subjects, they saw and felt stroking toward themselves (since their viewpoint was that of the stereo cameras), but they were apparently located behind their real body. Here, the visual and tactile information cohered to generate the illusion of being behind their own body. When the space under the camera was attacked with a hammer, participants responded physiologically (since the hammer would seem to be coming toward the illusory location of their chest). When the visual and tactile stimulation was asynchronous neither the illusion nor the physiological response occurred to the same extent.

Following this, a form of VR to study body ownership with respect to the whole body (full body ownership) was achieved by Petkova and Ehrsson (2008) through the use of video cameras mounted on top of a manikin that fed a stereo HMD worn by the participant, so that when participants looked down toward their real body, they would see the manikin body instead of their own. This was accompanied by *visuotactile synchrony*, induced by applying tactile stimulation to the real body synchronized with a corresponding visual stimulation to the manikin body. The result was subjective illusion of ownership over the manikin body, demonstrated also by a physiological response when a knife threatened that body. The illusion diminished when visuotactile asynchrony was applied.

The use of VR to transform the body was first realized by Jaron Lanier, in the late 1980s. The importance of this work for cognitive neuroscience was not realized at the time, and it was never published scientifically, although see Lanier (2006) and it is referred to in Lanier (2010). Lanier used the term "homuncular flexibility" to refer to the finding that the brain can adapt to different body configurations and learn how to manipulate such an alien body – for example, manipulating end-effectors of a body representation as a lobster by learning to use muscles in the stomach, or though combinations of different muscle activations. The extreme flexibility of the body representation had been studied in the 1980s by Lackner (1988). It was found that applying vibrations of around 100 Hz to a muscle tendon on the biceps leads the forearm to move in flexion, but if the movement is resisted, then there will be an illusion of movement of the forearm in the opposite direction (extension). Now suppose that both hands are holding the waist and such muscle spindle vibrations are applied. There is an illusion that both arms are extending, but since the hands are attached to the waist this is impossible. The way that the brain resolves this is to give the illusion of an expanding waistline! By vibrating on the other side of the muscle tendons the arms can be given the illusion of flexing – which will result in a shrinking waist illusion. Ehrsson et al. (2005) used these illusions with brain imaging to capture brain activation changes associated with these radical changes in the body. Tidoni et al. (2015) used these vibratory techniques in conjunction with VR as part of a developing program for the rehabilitation of disabled patients. This followed earlier work by Leonardis et al. (2012) who used such vibrations to induce illusory movements but in conjunction with a brain–computer interface (BCI) motor-imagery paradigm, i.e., the participant imagines moving their arm, feels their arm moving through application of the vibrations technique, and then sees the corresponding virtual arm move. This was part of an Embodiment Station (discussed in Section 6.5).

Regarding non-human body configurations Ehrsson (2009) and Guterstam et al. (2011) showed, for example, that using the multisensory techniques associated with the RHI, it is possible to give participants the illusion of owning additional arms. Regarding body shape, Kilteni et al. (2012)17 showed that it is possible to have an illusion of ownership over an asymmetric human body, where one arm is three times as long as another, and where the participant responds by automatically withdrawing the arm when there is a threat to the distant hand. This illusion had first been implemented and experienced at VPL in the 1980s, although not published. Steptoe et al. (2013) showed how humans could adapt to having a tail, through embodiment using a Cave-like system, but seeing the virtual body from behind. Participants learned how to use the tail in order to avoid harm to the body. More

<sup>15</sup>https://www.youtube.com/watch?v=4PQAc\_Z2OfQ

<sup>16</sup>https://www.youtube.com/watch?v=ee4-grU\_6vs

<sup>17</sup>https://www.youtube.com/watch?v=EyujFtuFWvo

recently, Won et al. (2015a) have continued to study homuncular flexibility, showing that people can learn to control virtual bodies through mappings that are different from the usual ones. Some implications of this across a range of fields have been discussed in Won et al. (2015b).

Returning to the RHI, Ijsselsteijn et al. (2006) found that an illusion of ownership can be attained over a 2D projection of an arm on a table top when the visuotactile synchronous stimulation is applied as in the RHI. Although the subjective illusion was reported, the proprioceptive drift effect did not occur. Using VR, Slater et al. (2008) showed that a virtual arm could be felt as owned by participants when seen to be stroked synchronously with the corresponding hidden real arm. This was achieved by a virtual arm being displayed on a powerwall as projecting (in stereo) out of the real shoulders of participants. A tracked wand was used to tap and stroke the participant's hidden real hand, which was shown on the display as a virtual ball tapping the virtual hand. This was done synchronously in which case the full illusion of ownership occurred including proprioceptive drift, or asynchronously, which typically did not result in the illusion.

In the full body illusion setup of Petkova and Ehrsson (2008), there was no head tracking so that participants had to be looking down in a fixed orientation toward their body, in order to see the manikin body as substituting their real body. Slater et al. (2010b) carried out the first study of full body ownership using VR where participants saw a virtual body that was spatially coincident with their own and which they saw through a wide field-of-view stereo and head-tracked Fakespace Wide5 HMD.18 Hence, when they looked down toward themselves they saw a virtual body that substituted their actual (hidden) body and from the viewpoint of the eyes of that virtual body (coincident with their own). We refer to this as *first-person perspective* (1PP). The experiment also included visuotactile synchrony (they felt their arm being stroked in synchrony with seeing their corresponding virtual arm stroked) or visuotactile asynchrony. There was also a condition where the virtual body was seen from a third-person perspective (3PP) (i.e., the virtual body was not spatially coincident with the real body, but to the left of the participant's location). In this setup, it was found that 1PP was clearly the dominant factor, although visuotactile synchrony had some contribution. Remarkably, the illusion occurred in spite of the fact that all the participants were adult males but were embodied in a young female body.19 The difference between the results of Petkova and Ehrsson (2008) and Slater et al. (2010b) was taken up by Maselli and Slater (2013). The vital importance of 1PP for body ownership was also emphasized by Petkova et al. (2011) and considered further by Maselli and Slater (2014).

One of the major advantages of VR in this context compared to using rubber hands or manikin bodies is that virtual limbs or the whole virtual body can be moved. Sanchez-Vives et al. (2010) exploited this to show that the illusion of ownership over a virtual arm can be induced by synchrony between real and virtual hand movements (visuomotor synchrony). Participants wearing a data glove that tracked the movements of their hand and fingers saw a virtual hand (projected in stereo 3D on a powerwall) move in synchrony or asynchrony with their real hand movements. This resulted in an illusion of ownership just as with visuotactile stimulation.

The same can be done for the body as a whole. Through realtime motion capture, mapped onto the virtual body, when the person moves their real body they would see the virtual body move correspondingly. Participants can see their virtual body moving by directly looking toward themselves and in virtual mirror reflections (and shadows) (Slater et al., 2010a). Kokkinara and Slater (2014) showed in later work that when there is a 1PP view of the virtual body then visuomotor synchrony is the more powerful inducer of the body ownership illusion than visuotactile synchrony.

We use the term *virtual embodiment* (or just embodiment) to refer to the process of replacing a person's body by a virtual one. This requires the stereo HMD with wide field-of-view (so that the person can actually see their virtual body), with head tracking, at the minimum. Additional multisensory correlations such as visuotactile and visuomotor synchrony may be included. A technical setup to achieve this is described in Spanlang et al. (2014). Virtual embodiment may give rise under the right multisensory conditions (such as 1PP, visuotactile, and/or visuomotor synchrony) to the *illusion of body ownership*, which is a perceptual illusion that the virtual body feels as if it is the person's own body (even though it may look nothing like their real body).

There has been a lot of work on building virtual embodiment technology (Spanlang et al., 2013, 2014), studying the conditions that can lead to such body ownership illusions (Slater et al., 2008, 2009, 2010b; Sanchez-Vives et al., 2010; Borland et al., 2013; González-Franco et al., 2013; Llobera et al., 2013; Maselli and Slater, 2013, 2014; Pomes and Slater, 2013; Blom et al., 2014; Kokkinara and Slater, 2014) and exploring the effects of distortions away from the normal form of a person's actual body (Slater et al., 2010b; Normand et al., 2011; Kilteni et al., 2012; Steptoe et al., 2013). There have also been studies on how illusions of body ownership might result in various changes to the real body.

For example, it had previously been shown that the RHI leads to a cooling of the real hand (Moseley et al., 2008) – though see also Rohde et al. (2013) – as well as an increase in its histamine reactivity (Barnsley et al., 2011). Cooling of several points on the body has also been reported in a 3PP full body illusion (Salomon et al., 2013). There is also evidence using VR suggesting that the 1PP full body ownership illusion can result in changes in temperature sensitivity (Llobera et al., 2013). It has also been shown that when in the full body illusion the virtual hand is attacked that there is an electrical brain response (EEG) that corresponds to what would be expected to occur when a real hand is attacked (González-Franco et al., 2013). Banakou and Slater (2014) showed that embodiment in a virtual body that is perceived from 1PP and that moves synchronously with the real body can result in illusory agency over an act of speaking. The virtual body was seen directly and in a virtual mirror. Participants spent a few minutes simply moving with the virtual body moving synchronously with their movements in the experimental condition or asynchronously in another. At some moment, the virtual body

<sup>18</sup>http://www.fakespacelabs.com/Wide5.html

<sup>19</sup>https://www.youtube.com/watch?v=3wg14z5O9Ug

unexpectedly uttered some words (45 in total) with appropriate lip sync. Those in the visuomotor synchronous condition later reported a subjective illusion of agency over the speaking – as if they had been the ones who had been speaking rather than only the virtual body. Moreover, when participants were asked to speak after this exposure, the fundamental frequency of their own voice shifted toward that of the higher frequency voice of the virtual body. Thus embodiment resulted in the preparation of a new motor plan for speaking, which was exhibited by participants in the synchronous condition changing the way that they spoke after compared to before the experiment. This did not happen for those in the asynchronous condition.

Thus, VR offers a very powerful tool for the neuroscience of body representation. For a recent review of this field, see Blanke et al. (2015). It can be used to do effectively and relatively simply what is impossible by any other means – instantly produce an illusion of change to a person's body. In the next section, we consider some of the consequences of changing representations of the self.

### 2.1.2. Changing the Body Can Change the Self

"… one of the fundamental differences between virtual reality and other forms of user-interface is that you're really present in it, your body is represented and you can react with it as you, … And the fact that you're in it, and that you define yourself is really fascinating. Oftentimes, being able to change your own definition is actually part of a practical application. Like in the world we did last year, where an architect was designing a day care center and could change himself into a child and use it with a child's body and run faster and have different proportions and all that." Jaron Lanier (Barlow et al., 1990).

This quote is another illustration that much of what is being discussed today was already thought of and even implemented in the heady days of early VR. If VR can endow someone with a different body, what consequences does this have? We have already mentioned above that ownership over a rubber hand can lead to physiological responses, and there is some evidence that points to the possibility that the experimental real arm can experience (very small) drops in temperature, and that the same can occur over different parts of the body in a virtual whole body illusion, or that in the virtual arm illusion that there may be a change in temperature sensitivity. But, are there higher-level changes to attitudes, behaviors, even cognition?

Yee and Bailenson (2007) introduced a paradigm called the "Proteus Effect," where it was argued that the digital selfrepresentation of a person could influence their attitudes and behaviors in online and virtual environments. Essentially, the personality or type of body or the actions associated with the digital representation would influence the actual real-time behaviors of participant, both in the VR and later outside it. In their 2007 paper, they showed that being embodied in an avatar that had a face that was judged as more attractive than their actual one led participants to move closer to someone else displayed in a collaborative virtual environment than those participants whose avatar face was judged less attractive. Similarly, being embodied in taller avatars led to more aggressive behaviors in a negotiation task than being embodied in shorter avatars. These results also carried over to representations in online communities (Yee et al., 2009). Groom et al. (2009) embodied White or Black people in a Black or White virtual body, in the context of a scenario in which they were in an interview applying for a job. The embodiment was through an HMD with head tracking, with the body seen in a mirror, and lasted for just over 1 min. Using a racial Implicit Association Test (IAT) (Greenwald et al., 1998), they found after the exposure there was greater bias in favor of White for those embodied in the Black virtual body. This difference did not occur when participants simply imagined being in a White or Black body. Hershfield et al. (2011) studied the effect of embodiment in aged versions of themselves on their savings behavior. They embodied people in a virtual body that either had a representation of their own faces, or their faces aged by about 20 years. The virtual body was shown in a virtual mirror. They found some modest evidence in favor of the hypothesis that being confronted with their future selves influenced their behavior toward greater savings for the future. See also the example concerned with fostering exercise (Fox and Bailenson, 2009) in Section 3.1.2.

The theoretical basis of Proteus Effect (Yee and Bailenson, 2007) is Self-Perception Theory [e.g., Bem (1972)], which suggests that people infer their attitudes by observing their own behaviors and the context in which these occur, and almost all the examples above do put people into behavioral situations. It has been also been argued that attitudinal and behavioral correlates of transformed body ownership can be explained as people behaving according to how others would expect someone with that type of body to behave (Yee and Bailenson, 2007). Essentially, this comes down to stereotyping. For example, in the case of the racial bias study of Groom et al. (2009), participants were put into precisely a situation that is known to be one where there is implicit bias against Black people compared to White.

In an experiment by Kilteni et al. (2013), people were embodied either in a dark-skinned casually dressed (Jimi Hendrix-like) body or in a light-skinned virtual body. The body moved with visuomotor synchrony, but also there was synchronous visuotactile feedback through a drumming task, so that participants saw their virtual hands hit a virtual hand drum that was coincident in space with a real hand drum. Hence, when they hit the virtual drum they would also feel it.20 In this experiment, those embodied in the dark-skinned casual body expressed significantly greater body movement while drumming than those embodied in the light-skinned body that was wearing a formal suit. This result occurred, in the view of the stereotype theory, because there is greater expectation that people who look more like Jimi Hendrix would be more bodily expressive. However, self-perception theory and stereotyping cannot account for attitudinal changes that have been observed in experiments where only the body changes, and there are no particular behavioral demands within the study. These results are better explained within the multisensory perception framework based on the research that has stemmed from the RHI.

<sup>20</sup>https://www.youtube.com/watch?v=ydzSgLim5Y4

Peck et al. (2013) carried out a racial bias study where participants were embodied for 12 min in either a Black body, a White one, a purple one, or no body at all. The body moved synchronously with real body movements of the participants through real-time motion capture and was seen directly by looking toward the self with the head-tracked HMD and in a mirror.21 Those in the "no body" condition saw a mirror reflection of a Black body, but which moved asynchronously to their own movements. A racial IAT was applied some days before the experience and then immediately after. It was found that average implicit racial bias significantly decreased only for those who had the Black embodiment. During the 12 min of exposure, the participant did not have any task except to move and to look toward themselves and in the mirror while doing so. The only events that occurred were that 12 virtual characters walked by, 6 of them Black and the others White. It is likely that the results are different from Groom et al. (2009) because of the much longer exposure time, the full body synchronous movement, and the fact that there was no task, so that this was only based on body ownership through multisensory perception. Given the contrary earlier result of Groom et al. (2009), it was hard to believe that just 12 min of this experience could apparently reduce implicit racial bias. However, independently it was shown by Maister et al. (2013) that the RHI over a black rubber hand also leads to a reduction of implicit racial bias in light-skinned people. For a review of this area of research see Maister et al. (2015). Recent results demonstrate that the decrease in implicit bias lasts for at least 1 week after the exposure (Banakou et al., 2016).

van der Hoort et al. (2011) showed using the multisensory techniques of Petkova and Ehrsson (2008) that when average sized adults have an illusion of body ownership over smaller or larger manikin bodies that this results in changes in their perception of object sizes (in a small body objects seem to be larger, but smaller in a large body). Banakou et al. (2013) reproduced this result in immersive VR.22 They showed that the illusion of body ownership of adults over small body leads to overestimation of object sizes. However, if the form of the body represented that of a (4-year-old) child then the size overestimation was approximately double that compared to when the form of the body was an adult body but shrunk down to the same size as the child. Moreover, in the child embodiment case, there were changes in implicit attitudes about the self toward being child-like substantially beyond changes induced by the illusion of ownership of the adult-shaped body of the same size. In other words, only the *form* of the body (child-like compared to adult-like) has this effect.

The child and racial bias studies relied on an IAT – e.g., Greenwald et al. (1998) – a reaction time measure where participants have to quickly associate between two target concepts (e.g., Black and White people) and an attribute (e.g., Positive and Negative). When the concept and attributes must be simultaneously selected (e.g., when deciding if a stimulus matches White or Black but where each is also associated with Positive or Negative), then a faster choice in pairing say Black and Negative and White and Positive, compared to Black and Positive with White and Negative, would indicate an implicit racial bias. Such implicit bias is found notwithstanding the explicit attitudes of people, which may not be discriminatory, there being a dissociation between implicit and explicit bias (Greenwald and Krieger, 2006). Indeed, in the explicit racial attitudes test in Peck et al. (2013) there was no evidence of explicit racial bias – although there was implicit racial bias shown in the preexperiment IAT. When it comes to discriminatory behavior, the IAT results have better predictive power in social interaction than explicit measures (Greenwald et al., 2009) – for example, with respect to eye contact, proxemics, and hiring practice (Ziegert and Hanges, 2005; Rooth, 2010). Even though the use and interpretation of the IAT may be controversial, there is evidence supporting its explanatory and predictive power (Jost et al., 2009).

With respect to embodiment in a child body, it is known that perception from the perspective of a smaller body results in size overestimations (van der Hoort et al., 2011), and indeed this occurred for both the adult and child conditions in Banakou et al. (2013). However, this does not explain why the overestimation in the child condition was almost double that of the adult condition. Since we have all been children it is possible that the brain relies on autobiographical memory thus making the world appear larger, and more rapidly finds associations between the self and child-like categories. However, with respect to the racial bias study (Peck et al., 2013), none of the participants had ever had dark skin, and yet 12 min of exposure was enough to significantly change their IAT score away from indications of bias. How is this possible? Our answer suggests that the body ownership and agency over the virtual body is more than a superficial illusion, and that it goes beyond the perceptual to influence cognitive processing. It was argued in Banakou et al. (2013); Llobera et al. (2013) that a fundamental mechanism may be through the postulated "cortical body matrix" (Moseley et al., 2012), which maintains a multisensory representation of the space immediately around the body in a body-centered reference frame. The system is responsible for homeostatic regulation of the body, and for dynamically reconstructing the body representation moment to moment based on current multisensory information. It was argued that if, as seems likely, such a system exists, it then operates globally in a hierarchical top-down fashion, so that attribution of the whole body to the self leads to attribution of the body parts to the self. Moreover, it was proposed that it also maintains an overall consistency between the multifaceted aspects of self (personality, attitudes, and behaviors) and the body representation. We can view IAT changes as direct evidence of this – changing the body apparently leads to changes in implicit attitudes. We can say that as well as body ownership over a different body leading to changes in implicit attitudes, the documented changes in implicit attitudes are a very strong signal that in fact there has been a change in body ownership. A further study also hints at the likelihood that a change in body ownership can also result in cognitive changes (Osimo et al., 2015), where it was shown that swapping bodies with (virtual) Sigmund Freud led to an improvement in mood after a self-counseling process.23

Frontiers in Robotics and AI | www.frontiersin.org December 2016 | Volume 3 | Article 74

<sup>21</sup>https://www.youtube.com/watch?v=HliN3iOX090

<sup>22</sup>https://www.youtube.com/watch?v=8Oy83OVgbSM

<sup>23</sup>https://www.youtube.com/watch?v=sn-UNGcbi2Q

The use of embodiment and the transformative power that it seems to have is fundamental feature that separates immersive VR from other types of system, and recent scientific results do back up the statement by Jaron Lanier in the quote at the head of this section, said a quarter of a century ago.

#### 2.1.3. Spatial Representation and Navigation

Virtual reality is especially suitable for the study of spatial representation and spatial navigation. This at the core of the use of VR: to break down the walls of our room, to transport us to another space, a space that we can explore with or without moving (see Section 6). Spatial navigation is useful for a number of areas and purposes: for learning to navigate a certain model space such as a foreign city to be visited, for rehabilitation of spatial abilities after a neurological disorder or brain injury that affected this function, for neuroscience research (to understand the basis of spatial cognition, memory, and sensory processing), for city design, or to treat post-traumatic stress disorder (PTSD) associated with a space, among others.

We may want to move around the city of Paris and to become oriented before we travel to the real city. Or we do not plan to go, and we just want to visit virtual Paris. First of all, how do we move around the city? We can move with a joystick. This allows us to navigate easily from our couch, for example. However, this method may not be optimal if we are planning to internalize, to "learn" the spatial map of Paris, which is better achieved if we move our bodies, since this then enhances theta frequencies in the hippocampus (Kahana et al., 1999). We can also navigate by walking-in-place (Slater et al., 1995; Usoh et al., 1999). Another technique for moving through distances that are greater than the physical space in which the participant can move is called "redirected walking," where, for example, the system takes advantage of participant head turns to rotate the environment more than the head turn – in this way giving people the impression that they had walked in a long straight line when in reality they had walked in a curve or *vice versa* (Razzaque et al., 2001, 2002), research that is ongoing, e.g., Suma et al. (2015). Or, we could eventually navigate by thought alone if the VR is connected to a BCI (Pfurtscheller et al., 2006). This is an excellent possibility for patients who are completely immobilized since they can feel the freedom of navigating by thought, an experience very positively evaluated by users (Friedman et al., 2007; Leeb et al., 2007) (see Section 6.5).

Understanding the brain mechanisms that underlie the generation of internal maps of the external world, the storage (or memory) of these maps, and their use in the form of navigation strategies is an important field in neuroscience (notice that the Nobel Prize in Physiology or Medicine 2014 was shared, one-half awarded to John O'Keefe, the other half jointly to May-Britt Moser and Edvard I. Moser "for their discoveries of cells that constitute a positioning system in the brain," known as "place cells" and "grid cells"). Many of the associated studies have been carried out in rodents that were navigating in laboratory mazes. But, how can we study navigation in humans? VR navigation has been found to provide a consistent sensitive method for the study of hippocampal function (Gould et al., 2007). The hippocampus is the main brain structure supporting spatial representation, a structure that is larger than average in London taxi drivers, who are famous for learning the map of London in great detail (Maguire et al., 2000). Virtual cities have been used to determine, for example, that we activate different parts of the brain when we do wayfinding versus route following (Hartley et al., 2003), and to identify spatial cognition deficits in disorders such as depression (Gould et al., 2007) or Alzheimer (Cushman et al., 2008).

Even though the brain processes underlying spatial navigation in rodents used to be studied in real mazes, in recent years VR for rodents has also become a valuable tool in basic research in neuroscience. This technique allows navigation of virtual spaces while the animals walk in place on a rotating ball, such that their head is stable and their brain can be visualized while they do spatial tasks (Harvey et al., 2009). Even more recent VR systems for rodents allow 2D navigation including head rotations, resulting in the activation of all the same brain mechanisms that had been identified for freely moving animals, while the animals remain static and walking-in-place (Aronov and Tank, 2014). This approach allows detailed observation of specific brain cells during navigation.

Since navigation in virtual space can activate the same brain mechanisms as navigation in the real world, spatial "presence" can be successfully generated (Brotons-Mas et al., 2006; Wirth et al., 2007). The illusory sensation of spatial presence allows the recreation of all the sensations associated with a particular place by using VR, which is useful in order to treat PTSD associated with a space. This has been widely used with soldiers that had been in Iraq and Afghanistan (Rizzo et al., 2010). Virtual spaces such as virtual Iraq, and in particular virtual navigation, have also been used for assessment and rehabilitation following traumatic brain injury, a lesion also frequent in soldiers (Reger et al., 2009). Assessment tasks and training tasks for rehabilitation often go hand in hand, and thus retraining in topographical orientation, wayfinding, and spatial navigation in VR is often used in cognitive rehabilitation following traumatic brain injury, neurological disorders (Bertella et al., 2001; Koenig et al., 2009; Kober et al., 2013). Furthermore, it has been proposed that sustained experiential demands on spatial ability carried out in VR protect hippocampal integrity against age-related decline (Lovden et al., 2012).

Virtual reality can be used to study the strategies that humans use for spatial navigation, which reveals the underlying geometry of cognitive maps. These maps could have a Euclidean structure preserving metrics and angles or a topological graph structure. To study this, experiments in the VENLab24 (Rothman and Warren, 2006; Schnapp and Warren, 2007) included a large area that allowed tracked displacements while in VR. A virtual environment representing a virtual hedge maze allowed identification of the location of certain landmarks. By creating two "wormholes" that rotate and/or translate a walker between remote places in the virtual hedge maze, they made the space non-Euclidean, in order to explore the navigational strategies used by different subjects. This is a good example of how VR can be used in this domain to achieve things that are impossible in reality.

The study of navigation and wayfinding in VR has a long history. A good starting point for those interested in following

<sup>24</sup>http://www.cog.brown.edu/research/ven\_lab/research.html

Slater and Sanchez-Vives Enhancing Our Lives with VR

this up is the special issue of the journal Presence – Teleoperators and Virtual Environments, edited by Darken et al. (1998). There is a difference between techniques for navigating effectively within a virtual environment, and the extent to which learning wayfinding through a space in a virtual environment transfers to real-world knowledge. Darken and Goerger (1999) pointed out that while the use of VR seems to produce the best results in terms of acquiring spatial knowledge of a terrain, when it comes to actual performance VR training often does not transfer, and can even make the situation worse. The authors, based on a number of studies, concluded that using specific VR techniques (e.g., a virtual compass) and relying on specific virtual imagery during the learning process does not transfer well to real-world wayfinding. However, those who use the VR to rehearse what they will later do in reality, to make a plan, without relying on detailed cues but rather transferring their experience into more abstract spatial knowledge do a lot better. Ruddle et al. (1999) carried out a direct comparison between navigation on a desktop system compared to a head-tracked HMD. They found that although there were no differences in task performance between the two systems in the sense of measuring the distance traveled, the HMD users stopped more frequently to look around the scene and were able to better estimate straight line paths between waypoints. On the other hand, those using the desktop system seemed to develop a kind of tunnel vision. This difference between the two illustrates that in immersive VR there is generation of the types of kinesthetic and proprioceptive cues, i.e., body-centered perception – contributing to what we referred to earlier as natural sensorimotor contingencies for perception – that improve the chance of transfer of knowledge to real-world task behavior. Ruddle and Lessels (2009) carried out a further study where they compared navigation task performance in a virtual environment under three different conditions: (1) a desktop interface, (2) an HMD that was tethered, so that although participants could look around, they could not walk, and (3) a wide area tracking system that allowed participants to really walk. They found that in both their reported experiments (which differ in rendering style of the environment) that those who were able to really walk outperformed the other two groups. See also Ruddle et al. (2011b). In fact, it was later found that walking (in this case enabled through an omnidirectional treadmill) clearly resulted in improved cognitive maps of the space compared to other methods (Ruddle et al., 2011a, 2013) as predicted by Brotons-Mas et al. (2006). In this context, it is worth noting that when comparing presence in a virtual environment through a head-tracked HMD, using (1) point-and-click techniques, (2) walking-in-place where the body moves somewhat like walking but not actually walking, and (3) real walking using wide area tracking, Usoh et al. (1999) found that subjectively reported PI (the component of presence referring to the sense of "being there") was greater for both types of walking compared to the point-and-click technique. On some presence measures, real walking was preferred to walking-in-place, and as would be expected, real walking was the most efficient form of navigation.

A recent study by Sauzéon et al. (2015) used a powerwallbased VR system to test the effect on episodic memory of a virtual apartment. Participants had two methods for navigation through the apartment, either passively watching or using a joystick to actively explore. It was found that episodic memory was superior in the active condition. A similar setup using a virtual model of the city of Tübingen was shown to be advantageous in helping stroke patients to recover some wayfinding ability (Claessen et al., 2015).

In a very famous experiment in 1963, Held and Hein (1963) took 10 pairs of neonatal kittens and arranged that 1 navigated an environment by actively moving around it, but the second was carried along passively in a basket by movements of the first. They found that the kittens that were passively moved around, although in principle subject to the same visual stimuli as the active ones, developed significant visual-motor deficits. The authors concluded that "self produced movement with its concurrent visual feedback is necessary for the development of visually-guided behavior." A similar observation was obtained in rats while walking versus being driven in a toy car (Terrazas et al., 2005), while simultaneous brain recordings were obtained, and the spatial information carried per neuronal spikes in place cells was found to be smaller in the passive navigation. This type of finding fits very well with findings in human studies in virtual environments. The conclusion from these studies is that simply putting someone in a VR in order to learn a particular environment can be effective provided that the form of locomotion includes active control by the participant. Concomitant with our views that the most important factor behind PI is the affordance by the system of perception through natural sensorimotor contingencies, the more that the whole body can be involved in the process of locomotion, the better the result in transfer to the real world, and the formation of cognitive maps.

This is an important and vitally important area of research, and above, we have scratched the surface. As VR becomes used on a mass scale, one of its most frequent uses will probably be for virtual travel. If people simply use VR to observe an environment then the form of interface for navigation does not matter much – other than adhering to excellent user interface principles suitable for VR: of greater interest are the sights and sounds encountered. However, if people want to use it for rehearsal, to learn about how to get from A to B, then they had better use a form of bodycentered interface, at least equivalent to walking-in-place, but preferably one of the new generation of treadmill interfaces that are currently in development.

# 2.2. Scientific and Data Visualization

Immersive VR visualization and interaction with data is relevant for scientific evaluation and also in the fields of training and education. It also allows an active interaction with the representations, e.g., in drug design (see below). We can walk through brains25,26 or molecules, and we can fly through galaxies. The requirements and level of interaction will vary depending on whether this "walk" is for professional use, for students, or for the general public. Immersion in the data could take place alone or in a shared environment, where we explore and evaluate with others.

<sup>25</sup>http://www0.cs.ucl.ac.uk/research/equator/projects/escience/ 26https://www.youtube.com/watch?v=tFtpmOBt7jY

The data could be static, or we could be immersed in dynamic processes. The data should be viewable in multiscale form.

Three-dimensional representation of real or modeled data is important for understanding data and for decision-making following this understanding, a relevant topic for a number of fields, especially at this time of exponentially growing datasets. Even when most of the analysis tools are computer-run algorithms, human vision is highly sensitive to patterns, trends, and anomalies (van Dam et al., 2002). There is a substantial difference between looking at 3D data representations on a screen and being immersed in the data, navigating through it, interacting with it with our own body, and exploring it from the outside and the inside. It is logical to expect that when VR commercial systems are pervasive, there will be a trend for currently used 3D data representations on a flat screen to be visualized in immersive media. This, along with the body-tracking systems, will allow a more natural interaction with the data. The extent to which this interaction with data goes further than the "cool" effect and adds real value to the comprehension, evaluation, and subsequent decisions taken as a result is an important issue to explore. It is also important to identify ways to maximally exploit the potential of this data immersion capability.

Specific examples of VR for data visualization include molecular visualization and chemical design. In a recently described system called the "Molecular Rift," the immersive 3D visualization of molecules is combined with interaction with molecules based on gesture-recognition (Norrby et al., 2015). In this version, participants were immersed into protein–ligand complexes. The system was evaluated by groups with experience in medical chemistry and drug design, and the study was focused on the improvement of the user-interaction with the molecules based on gestures and not in the evaluation of improved performance of drug design or specific tasks. Out of 14 users, all of them found the system potentially useful for drug design, and they enjoyed using it, while none experienced motion sickness.

A more specific task in interaction with molecules was tested by Leinen et al. (2015). In this study, a task of manipulating nanometer-sized molecular compounds on surfaces was tested under usual scanning probe microscopy versus immersive visualization through an Oculus Rift HMD. The hand-controlled manipulation for extracting a molecule from a surface was improved by the visual feedback provided by immersive VR visualization: preestablished 3D trajectories were followed with higher precision, and deviations from them were better controlled than in immersive than in non-immersive systems (Leinen et al., 2015).

Moving from the nanoscale to the microscale, a specific task consisting of the evaluation of the spatial distribution of glycogen granules in astrocytes (glial cells, a type of brain cells) was evaluated in an immersive environment in a Cave-like system (Cali et al., 2015). A section of the hippocampus of 226 μm3 at a voxel resolution of 6 nm was 3D reconstructed based on electron microscopy image stacks. A set of procedures and software was developed to allow such immersive reconstruction. The distribution of glycogen granules initially appeared to have a random distribution, but they were discovered to be grouped into clusters of various sizes with particular spatial relationships to specific tissue features. The authors found the immersive evaluation of the 3D structure to be pivotal to identify such non-random distribution (Cali et al., 2015). The use of an interactive VR room also allowed multiple users to share and discuss the evaluation of the cellular details. In this study there were, however, no comparisons between task performance across different display media.

A comparison across three different media – 3D reconstructions rendered on (1) a monoscopic desktop display, (2) a stereoscopic visual display on a computer screen (fishtank), and (3) a Cave-like system – was carried out by Prabhat et al. (2008). In this study, confocal images of Drosophila data: the egg chamber, the brain, and the gut, were evaluated by subjects who had to describe or quantify specific features mostly related to spatial distribution or colocalization and geometrical relationships. A more immersive environment was preferred qualitatively by subjects, and task performance was also superior.

Immersive VR is of great value for surgery training, an aspect that is developed in Section 2.4 where specific examples are described. Visualization of the human body from an immersive perspective can provide medical students an unprecedented understanding of anatomy, being able to explore the organs from micro to macro scales. Furthermore, immersive dynamic models of body processes in physiological and pathological conditions would result in an experience of "immersive medicine."

Large-scale coordinated efforts to understand the brain are under way in projects such as the European Human Brain Project27,28 and BRAIN29 Initiative of the United States. These projects are generating detailed multiscale and multidimensional information about the brain. Immersive VR will have a role in the visualization of these brain reconstructions or of the simulations built based on the experimental data. The Blue Brain Project (predecessor of the Human Brain Project) has already generated a full digital reconstruction of a rat slice of somatosensory cortex with 31,000 neurons based on real neurons, and 37 million synapses (Markram et al., 2015). This simulation generates patterns of neuronal activity that reproduce those generated in the brain and is amenable of immersive exploration into the structure and function of the brain.

Considering now a larger spatial scale, astronomical visualization in immersive VR has also been explored, both for professional and educational purposes (Schaaff et al., 2015). These authors represented high-resolution simulations of re-ionization of an Isolated Milky Way-M31 Galaxy Pair, with various different representations. It is interesting for education that information can be added to the immersive displays.

There is an exciting perspective in the scientific and data visualization area that will open new doors to our understanding. It will be important to evaluate the extent to which immersion and interaction with data results in a more thorough, intuitive, and profound understanding of structures and processes. But in any event, once this route is open, visualization of 3D models on a flat screen will feel like watching Star Wars on a small black and white TV (see Presentation S1 in Supplementary Material).

<sup>27</sup>https://www.youtube.com/watch?v=\_UFOSHZ22q4

<sup>28</sup>https://www.youtube.com/watch?v=ldXEuUVkDuw

<sup>29</sup>https://www.youtube.com/watch?v=\_N-BAv3Hz8k

# 2.3. Education

Isaac Asimov's novels Fantastic Voyage (1966),30 based on the movie of the same name,31 and Fantastic Voyage II: Destination Brain (1987)32 portrayed a situation with humans shrunk to microscopic scale entering into the body of a patient. VR and the detailed human body scans that now exist make this possible (of course in *virtual* reality). McGhee et al. (2015) have used the "fantastic voyage" approach to support education of stroke patients about their condition by allowing them to move through a brain representation using the Oculus Rift HMD.

The area of application of VR in education is vast. For recent reviews, see Abulrub et al. (2011), Mikropoulos and Natsis (2011), Merchant et al. (2014), and Freina and Ott (2015). There are several reasons why VR is an excellent tool for education. First, it can change the abstract into the tangible. This could be especially powerful in the teaching of mathematics. For example, Hwang and Hu (2013) suggest that the use of a collaborative virtual environment has advantages for students learning geometrical concepts compared to traditional paper and pencil learning. However, it is not completely clear which type of VR system was used, although it appears to be of the desktop variety. Kaufmann et al. (2000) describe an HMD-based augmented reality system that provides a learning environment for spatial abilities including concepts from vector algebra. They provide anecdotal evidence for the effectiveness of the method. Roussou (2009) reviews the teaching of mathematics in VR using a "virtual playground"33,34 and in particular describes an experiment on learning how to compare fractions by 50 children of between 8 and 12 years in a Cave-like system (Roussou et al., 2006). In a between-groups experiment, there were three conditions – children who learned using active exploration of the scenario (*n* = 17), those who used the virtual playground but who learned by passively observing a friendly virtual robot (*n* = 14), and another group who did not use VR but rather a Lego-based method (*n* = 19). Quantitative analysis of the results found no advantage to any system. A detailed qualitative analysis, however, suggested that the passive VR condition tended to foster a reflective process among the children, and great enjoyment in interacting with the robot, associated with better understanding.

The second advantage of VR in education is, notwithstanding the results of the virtual playground experiment, that it supports "doing" rather than just observing. One example of this is surgical training (see Section 2.4), for example, one review emphasizes how VR is increasingly used in neurosurgery training (Alaraj et al., 2011), ideally in conjunction with a haptic interface (Müns et al., 2014). Indeed, a European consensus program for endoscopic surgery VR training has been designed and agreed (van Dongen et al., 2011). For an example in engineering learning see Ewert et al. (2014).

The third advantage is that it can substitute methods that are desirable but practically infeasible even if possible in reality. For

32http://www.goodreads.com/book/show/83545.Fantastic\_Voyage\_II

example, if a class needs to learn about Niagara Falls 1 week, the Grand Canyon the next, and Stonehenge35 the week after, it is infeasible for the class to visit all of those places. Yet, virtual visits are entirely possible, and such environments have been under construction (Lin et al., 2013) including the idea of virtual field trips (Çaliskan, 2011). It has certainly been suggested that immersive VR will change the nature of field trips,36 and although there have been plenty of inventive demonstrations37,38,39 it seems that as yet there have been no studies of the effectiveness of this, although perhaps it is so obviously advantageous that formal studies may be unnecessary.

The fourth advantage of VR in education involves breaking the bounds of reality as part of exploration. For example, changing how activities such as juggling would be if there was a small change in gravity, or how it would be to ride on a light beam, a universe where the speed of light were different. These ideas were envisaged and implemented for VR by Dede et al. (1997); however, there has been no more recent follow-up, which could now occur given greater availability of VR equipment.

In this article, we have emphasized that the real power of VR is that it enables approaches that go beyond reality in a very fundamental way – more than just exploring strange physics. An example of this in the field of education was provided by Bailenson et al. (2008), concerned with the delivery of teaching rather than the content. In a collaborative virtual environment, it is possible to arrange the virtual classroom so that every student is at the center of attention of the teacher, and where the teacher has feedback about which students are not receiving enough eye gaze contact. Additionally, virtual colearners who could be either model students or distracting students can influence learning, and the results overall showed that these techniques do improve educational outcomes. Bailenson and Beall (2006) referred to this type of technique as "transformed social interaction."

Overall, for the reasons we have given, and no doubt others, VR is an extremely promising tool for the enhancement of learning, education, and training. We have not mentioned other possibilities such as music or dance, or various dexterous skills, but for these areas VR has clearly great potential.

# 2.4. Surgical Training

Within the area of VR for training, surgical training has been a thoroughly investigated field (Alaraj et al., 2011). The use of simulations in surgical planning, training, and teaching is highly necessary. To give an illustrative example of why VR is necessary for surgery: interventional cardiology has currently no other satisfactory training strategy than learning on patients (Gallagher et al., 2005). It seems that acquiring such training on a virtual human body would be a better option.

In the training of medical students and in particular of surgeons, there is a relevant potential role for VR as a tool to learn anatomy through virtual 3D models. Even though there are

<sup>30</sup>http://www.goodreads.com/book/show/83539.Fantastic\_Voyage 31http://www.imdb.com/title/tt0060397/

<sup>33</sup>https://www.youtube.com/watch?v=PLqlTaT3Bgk

<sup>34</sup>https://www.youtube.com/watch?v=UxUZIHAJ2H4

<sup>35</sup>https://www.youtube.com/watch?v=iiGzNGlnYJ4

<sup>36</sup>https://www.youtube.com/watch?v=sSRzeGkhUic

<sup>37</sup>https://www.youtube.com/watch?v=iK3GsAcwKaI

<sup>38</sup>https://www.youtube.com/watch?v=JEsV5rqbVNQ

<sup>39</sup>https://www.youtube.com/watch?v=mlYJdZeA9w4

studies trying to evaluate how useful VR can be to improve the learning of anatomy (Nicholson et al., 2006; Seixas-Mikelus et al., 2010; Codd and Choudhury, 2011) – including studies proposing that VR could replace the use of corpses in medical school – fully immersive and interactive systems have hardly been used up to now. Most of the 3D models used so far are for screen displays. Still, even the visualization of non-immersive 3D body models to study anatomy yields good results for learning, and therefore this is an area that should expand in the future, integrating fully immersive systems and different forms of manipulation and interaction of the trainees with the body models.

One of the first publications of VR in the field of surgery was on VR-hepatic surgery training, and the words "Surgical simulation and virtual reality: *the coming revolution*" were on the title of both the article (Marescaux et al., 1998) and the editorial (Krummel, 1998) in the *Annals of Surgery* nearly 20 years ago. However, the revolution has not happened yet, although the field is now ready for this possibility.

Surgical training in VR requires a combination of haptic devices and visual displays. Haptic devices transmit forces consisting of both the forces exerted by the surgeon and a simulation of the forces and resistances of the various body tissues. A critical question is whether the skills acquired in a virtual training are successfully transferred to the real world of surgery. Seymour et al. (2002), in a highly cited article, provides one of the first demonstrations that this is the case. The performance of laparoscopic cholecystectomy gallbladder dissection was found to be 29% faster for VR-trained versus classically trained surgeons, while errors were six times less likely to occur in the VR-trained group. The system used though (Minimally Invasive Surgical Trainer-Virtual Reality – MIST VR system – Mentice AB, Gothenburg, Sweden), was a 2D representation on a screen of a haptic system used for simulated surgery. These results are likely to improve with a more immersive system. To illustrate the value given to surgical training in VR, an FDA panel voted in August 2004 to make VR simulation of carotid stent placement an important component of training. In the same month, the Society for Cardiovascular Angiography and Interventions, the Society for Vascular Medicine and Biology, and the Society for Vascular Surgery all publicly endorsed the use of VR simulation in carotid stent training (Gallagher and Cates, 2004).

The most common uses so far of VR for surgical training have been those of laparoscopic procedures (Seymour et al., 2002), carotid artery stenting (Gallagher and Cates, 2004; Dawson, 2006), and ophthalmology [Eyes Surgical, based on Jonas et al. (2003)]. In general terms, a large number of studies – out of which only a few seminal ones are cited here – coincide in finding positive results of VR training.

Most of the systems mentioned above concentrate on the local surgical procedure, e.g., how to place a stent or dissect the gallbladder. However, the reality in a surgery room is more complex, and the surgery may need to be performed in situations where the patient's physiological variables are not stable, or there can be a hemorrhage, or even a fire in the surgical theater. The response of the surgical team to these situations will be critical for the well-being of the patient, and immersive VR should be an optimal frame for such training. VR can embed the specific surgical procedure, for example, the placement of the carotid stent, into various contexts and under a number of emergency situations. In this way, during training, not only the contents but also the skills and the experience of being in a surgery room for many years can be transmitted to the trainees, which can include not only surgeons but all the sanitary personnel, each in their specialized roles.

There is a huge explosion of research in the effectiveness of VR-based training for surgery including meta-analyses and reviews (Al-Kadi et al., 2012; Zendejas et al., 2013; Lorello et al., 2014), transfer of training (Buckley et al., 2014; Connolly et al., 2014), and many specialized applications (Arora et al., 2014; Jensen et al., 2014; Singh et al., 2014). This is likely to be a field that expands considerably.

# 3. PHYSICAL TRAINING AND IMPROVEMENT

Here, we broadly address issues relating to physical training and improvement through sports and exercise, an area of growing interest to professional sports.

# 3.1. Sports

In the 1990 SIGGRAPH Panel (Barlow et al., 1990), Jaron Lanier mentioned the idea of being able to play table tennis (ping-pong) with a remote player using networked VR. Of course this is now possible40 and is certain to be readily available in the near future. For example, a version has been implemented using two powerwall displays plus tracking for each player (Li et al., 2010). However, the opponent need not be a remote player in a shared VR but may be a virtual character. Immersive VR, at least with hand tracking if not full body tracking, has ideal characteristics for playing table tennis or other competitive sports, with the possible advantage of not having to spend time traveling to the gym.

There are several areas where VR can provide useful advantage for sport activities. First, for leisure and entertainment reasons – such as the table tennis example above. Second, for learning, training, and rehearsal. To the extent that VR supports natural sensorimotor contingencies at high enough precision, it could be used for these purposes. However, here it would be important to carry out rigorous studies to check in case small differences between the VR version and the real version might lead to poor skills transfer, or incorrect learning. For example, learning to spin or slam in table tennis requires very fine motor control depending on vision, proprioception, vestibular feedback, tactile feedback, force feedback, even the movement of air, and the sound of the ball hitting the table and the bat. Hence, to build a virtual table tennis that is useful for skill acquisition or improvement must take into account all of these factors, or the critical ones if these are known. On the other hand, virtual table tennis could be thought of as a game in its own right and nothing much to do with the real thing. In this case, virtual table tennis would fall under the first category – entertainment and leisure. Additionally, as we will see in Section 6.3 in the context of acting rehearsal,

<sup>40</sup>https://www.youtube.com/watch?v=m4Oeu4SLCgY

although VR misses fine detailed facial expression that is critical for successful acting, it is nevertheless useful for that aspect of rehearsal known as "blocking," which is concerned more with overall spatial configuration of the actors in the scenario. Similarly, even without being able to reproduce all the fine detail necessary for the transfer of training skills to reality, VR may be useful in team sports to plan overall strategy and tactics. A third utility of VR in sports is for rehabilitation following injury. We will briefly consider some of these areas.

In a comprehensive review of VR for training in ball sports Miles et al. (2012) analyze eight challenges: effective transfer of training, the types of skills best learned in VR, the technologies that result in the best quantifiable performance measures, stereoscopic displays have both advantages and disadvantages (e.g., vision is not the same as in real life) – under which conditions should they be used?, the role of fidelity – to what extent and under what conditions is it important?, what kind of feedback should be delivered to the learner, how and when is feedback appropriate?, the effectiveness of teaching motor skills in the inevitable presence of latency and inaccuracies of representation, and finally, cost. The review points out several inevitable hurdles that must be overcome. For example, in training for field games such as American Football or soccer, the area of play is huge compared to the effective space in which someone in a VR system can typically move. A play on a field may involve running 25 m, whereas the effective area of tracking is say 2 m around a spot where the participant in VR must stand. Clearly, using a Wand to navigate or even a treadmill may miss critical aspects of the play (see also Section 2.1.3 for a brief discussion of different methods of moving through a large virtual environment). The paper reports many such pitfalls that need to be overcome and points out that studies have been inconclusive and therefore, there is the need for more research.

Craig (2013) reviews how VR might be used to understand perception and action in sport. She argues that VR offers some clear advantages for this and gives a number of examples where it has been successful, as well as pointing out problems. However, she wonders why if it is successful it has not been widely used in training up to now, but where there is reliance on alternatives such as video. She points out that one problem has been cost, though this is likely to be ameliorated in the near term. A second problem is to effectively and differentially meet the needs of players and coaches, pointing out how VR action replays could be seen from many different viewpoints, including those of the player and of the coach so that different relevant learning would be possible. Another advantage of VR would be to train players to notice deceptive movements in opponents, by directing attention to specific moves or body parts that signal such intentions. However, she points out as mentioned above how it is critical to provide appropriate cues to avoid mislearning.

Ruffaldi et al. (2011) examined the theoretical requirements for successful training transfer in the context of rowing and described a haptic-enabled VR system with a single large screen for visual feedback. Rauter et al. (2013) described a different VR simulator for rowing. This was a Cave-like system enhanced with auditory and haptic capabilities, an earlier version described in von Zitzewitz et al. (2008). Their study, carried out with eight participants, compared skill acquisition between conventional training on water, with training in the simulator. Examining the differences between the two they concluded that both with respect to questionnaire and biomechanical responses that the methods were similar enough for the simulator to be used as a complementary training tool, since there was sufficient and appropriate transfer of training using this method. Wellner et al. (2010b) described an experiment where 10 participants took part in simulated rowing. The novelty was that they added a virtual audience to test the idea that the presence of an audience would encourage the rowers in a competitive situation. They did not find a notable outcome in this regard, only the relatively high degree of presence felt by the participants. On similar lines, Wellner et al. (2010a) examined whether the presence of virtual competitors in a rowing competition would boost performance. No definite results were found, but according to the authors, the study had some flaws, and in any case the sample size was small (*n* = 10). In spite of null results, it is important to note how VR affords the possibility to experiment with such factors that would be possible, but logistically very difficult to do in reality.

Another example of this use of VR that is logistically very difficult to do otherwise is for spectators to attend sports matches when they cannot physically attend (e.g., someone in the US who is a fan of English soccer). Instead, they can view them, as if they were there – and have the excitement of seeing the game lifesized, first hand, and among a crowd of enthusiasts. Kalivarapu et al. (2015) implemented a system to display American Football in a high-resolution, six-sided, Cave-like system and also in an Oculus DK2 HMD. They carried out a study with 60 participants who were divided into three conditions: Cave (*n* = 20), HMD (*n* = 20), and video (*n* = 20), where the game and associated events were shown on video. They concluded that the Cave and HMD experiences gave the participants greater opportunity to interact (i.e., view from different vantage points) compared to the video. Participants nevertheless experienced a greater degree of realism in the Cave, perhaps not surprising because of its greater resolution (and several orders of magnitude greater cost). On the whole, the HMD and Cave produced similar results across a number of aspects of presence. There is a growing interest in the use of VR for sports viewing and other events, mainly using 360° video. See also the "Wear the Rose" system that gives fans the chance to experience rugby games first hand,41,42,43,44 and an example of its use in American Football.45

There have been many other applications of VR in sports – impossible to cover all of them here – for example, a baseball simulator,46 for handball goalkeeping (Bideau et al., 2003; Vignais et al., 2009), skiing (Solina et al., 2008), detecting deceptive

<sup>41</sup>http://www.o2.co.uk/sponsorship/rugby/wear-the-rose

<sup>42</sup> http://news.sky.com/story/1222817/oculus-rift-headset-may-helpsports-training

<sup>43</sup>http://www.telegraph.co.uk/technology/news/10621480/Virtual-realityheadset-recreates-England-rugby-squad-training-experience.html

<sup>44</sup>http://www.telegraph.co.uk/technology/technology-topics/10681570/Virtualreality-training-session-with-England-rugby-squad.html

<sup>45</sup>http://bleacherreport.com/articles/2563010-stanfords-new-virtual-realitysystem-is-changing-sports-forever

<sup>46</sup>https://www.youtube.com/watch?v=hXOQsXFcWnk

movements in rugby (Brault et al., 2009; Bideau et al., 2010), and pistol shooting47 (Argelaguet Sanz et al., 2015), among others. A special issue of Presence – Teleoperators and Virtual Environments was devoted to VR and sports (Vignais et al., 2009; Multon et al., 2011), which would be a good starting point for readers wishing to follow up this topic in more detail (see Presentation S2 in Supplementary Material).

# 3.2. Exercise

It is well known that aerobic exercise is extremely good for us, especially as we age. A meta study of research relating to older adults carried out by Colcombe and Kramer (2003) showed that there is a clear benefit for certain cognitive functions. A more recent survey by Sommer and Kahn (2015) again showed the benefits of exercise for cognition for a variety of conditions. Yu et al. (2015) showed its utility for Alzheimer patients and Tiozzo et al. (2015) for stroke patients. However, repetitive exercise with aerobic benefits can be boring; indeed, Hagberg et al. (2009) found in a study that enjoyment is important in increasing physical exercise.

Virtual reality opens up the possibility of radically altering how we engage in exercise. Instead of just being on a stepping machine watching a simple 2D representation of a terrain, we can be walking up an incline on the Great Wall of China, or walking up the steps in a huge auditorium where we are excitedly going to watch a sports game, or even walking up steps to a fantasy castle in a science fiction scenario. Instead of just riding an exercise bike, we can be cycling through the landscape of Mars.48,49,50

One use of VR for exercising would be an extension of approaches that have already been tested, normally referred to as "exergaming." This involves, for example, connecting an exercise bike to a display, so that the actions of the rider affect what is displayed, e.g., faster pedaling leads to corresponding depiction of increased optic flow on the display. Moreover, other motivational factors can be introduced such as virtual competitors (as we saw in the rowing example above). Anderson-Hanley et al. (2011) carried out a study with *n* = 14 older adults using a cybercycle (an exercise bike with a screen in front) and competitive avatars as in a race.51 Their evidence suggested that this social factor tended to increase participants' effort. Finkelstein and Suma (2011) used a three-walled stereoscopic display and upper body tracking of participants who had to dodge virtual planets flying toward them. Their experiment included *n* = 30 participants who played for 15 min. They found that the method produces increased heart rate (i.e., is aerobic) and motivates children and adults to exercise. Mestre et al. (2011) had *n* = 12 participants in an experiment that used an exerbike (with a large screen) where they compared video feedback with video and music feedback. They found that the addition of music was beneficial both psychologically (for motivation and pleasure) and behaviorally. Anderson-Hanley et al. (2012) carried out a formal clinical trial where they used "cybercycling," as above, stationary cycling tied to a screen display, with older people (*n* = 102). They were interested in testing among other things whether such cycling would improve executive function. They found that cognitive function was improved among the cybercyclers, and that it was likely that it would help to prevent cognitive decline compared to traditional exercise. Overall, while there has been significant work in this area, a systematic review carried out by Bleakley et al. (2013) found that although these types of approach are safe and effective, that that there is limited high quality evidence currently available.

It is one thing to be cycling or walking on a treadmill or exercise steps while looking at a screen, since this is anyway the case with most exercise machines even though the display may be very simplistic. Since the exerciser is not actually moving through space, looking at a screen should be harmless. However, it is not obvious that the same activities could be safely or successfully carried while people are wearing an HMD, which not only obscures their vision of the real world but may also lead to a degree of nausea – which is all the more likely to occur while moving through virtual space. Shaw et al. (2015b) discussed five major design challenges in this field. First, to overcome the problem of possible sickness; second, to have reliable tracking of the body; third to deal with health and safety aspects; fourth the choice of player visual perspective; and fifth, the problem of latency. They described a system that was designed to overcome these problems, that used an Oculus DK2 HMD, and which was evaluated in an experimental study (Shaw et al., 2015a). This had *n* = 24 participants (2 females, ages between 20 and 24). They compared three setups: a standard exercise bike with no feedback, the exercise bike with an external display, and the bike with the HMD. The fundamental findings were that on several measures (calories burned, distance traveled) the two feedback systems outperformed the bike only condition but did not differ from each other. The two systems with feedback were also evaluated as more enjoyable than the bike only, and the HMD was more enjoyable and was associated with greater motivation than the external display system. Only 4 out of 26 reported some minor symptoms of simulator sickness. As the authors pointed out, the study was limited, since the participants were almost all males, and with limited age range, and it is not known how well these results would generalize. Bolton et al. (2014) also described a system that combined an Oculus Rift HMD52 with an exercise bike that was designed to reduce the possibility of motion sickness; however, no experimental results were given. There are several other applications without associated papers such as RiftRun53 where participants run on the spot to virtually run through an environment.

Overall, as in other fields, there are promising but far from conclusive results, but irrespective of scientific studies it is highly likely that immersive VR will be combined with personal exercise systems, since the relatively low cost now makes this possible, and

<sup>47</sup>https://www.youtube.com/watch?v=RM9IT\_N6jFE

<sup>48</sup>https://archive.org/details/SciterianTechnologiesMars3D\_CahokiaPanorama-VirtualReality

<sup>49</sup>https://www.youtube.com/watch?v=xDqYz5pKA\_o

<sup>50</sup>An online search of "Oculus" and "Mars" will find many "prototype" examples of people experimenting with rendering and walking through a Mars terrain in VR. 51https://www.youtube.com/watch?v=sKz0FVIeEFI

<sup>52</sup>https://www.youtube.com/watch?v=Wy4Ku2iZjQM

<sup>53</sup>https://www.youtube.com/watch?v=cN7W0VBi0jo

some sports providers may decide that the "cool" factor makes such an enterprise worth the economic risk. Whether these are successful or not will obviously depend on consumer uptake.

Finally, as in other applications, we emphasize that VR allows us to go beyond what is possible in reality. Even cycling through Mars is just cycling. It is physically possible, if highly unlikely to be realized. Perhaps though there are fundamentally new paradigms that can really exploit the power of VR – the virtual *unreality* that we mentioned in the opening of this article. One approach is to use VR to implicitly motivate people toward greater exercise rather than as a means to carry out the exercise itself. Fox and Bailenson (2009) carried out a study where participants using a head-tracked HMD-based VR saw a virtual character from 3PP (i.e., across the room and looking toward them) with a face that was based on a photograph of their own face and that therefore had some likeness to themselves. Participants at various points were required to carry out physical exercises or not. While they did not carry out these exercises the body of their virtual doppelganger became fatter, and while they did the exercises the virtual body became thinner. There were *n* = 22 participants in this reinforcement condition, *n* = 22 in another condition where the virtual body did not change, and *n* = 19 in another condition where there was just an empty virtual room with no character. The dependent variable was the amount of voluntary exercise that participants carried out in a final phase of the experiment (during which there was also positive and negative reinforcement). It was found that the greatest exercise was carried out by the group that had the positive and negative reinforcement. In order to check that it was the facial likeness that accounted for this result, a second experiment introduced another condition, which was that the face of the virtual body was that of someone else. Here, the result only occurred for the condition of the virtual doppelganger. Finally, it could be argued that the participants in the voluntary exercise phase only exercised to avoid the unpleasant sensation of seeing their virtual doppelganger "gaining weight." A third study examined participants' level of exercise during a 24-h period after the conclusion of the study, through a questionnaire returned online. The setup was that they saw their doppelganger exercising on a treadmill, or a virtual character that did not look like themselves exercising, or a condition where their doppelganger was not doing any exercise but just standing around. The results suggested that those who saw their virtual look-alike exercising did carry out significantly more exercise in the real world in a period after the experiment than the other two conditions.

A second approach might be to use VR to provide a surrogate for exercising, rather than providing a motivation to exercise physically in reality. Kokkinara et al. (2016) illustrated what might be possible. Participants who were seated wearing an HMD and unmoving (except for their head) saw from 1PP their virtual body standing and carrying out walking movements across a field. They saw this when they looked down directly toward their legs that would be walking, and also in a shadow. In another condition they saw the body from a 3PP. After experiencing this virtual walking for a while they approached a hill, and the body walked up the hill. In the embodied 1PP condition participants had a high level of body ownership and agency over the walking, compared with the 3PP condition. More importantly, for this discussion, while walking up the hill participants had stronger skin conductance responses (more sweat) and greater mean heart rate in the embodied condition, compared to a period before the hill climbing, which did not occur for those in the 3PP. There were 28 participants each of whom experienced both conditions (there was another factor, but it is not relevant to this discussion).

Although there are caveats for both of these studies, the important aspect for our present purpose is that they illustrate how VR might be used to break out of the boundaries of physical reality and achieve useful results through quite novel paradigms. Of course it must always be better to carry out actual physical exercise rather than relying on your virtual body to do it for you. Yet sometimes, for example, on a long flight, virtual exercise might be the only possibility. Indeed, in this context, it has been found that participants who perceive their virtual body from 1PP in a comfortable posture are more likely to feel actual comfort than those who see their body in an uncomfortable posture (Bergström et al., 2016).54 The point is that VR has the power to go beyond what we can do in physical reality, even in principle, and become a radically new medium with different ways of thinking and novel ways of accomplishing life-changing goals.

# 4. SOCIAL AND CULTURAL EXPERIENCES

There are many areas of social interaction between people where it is important to have good scientific understanding. What factors are involved in aggression of one group against another, or in various forms of discrimination? Which factors might be varied in order to decrease conflict, improve social harmony? It is problematic to carry out experimental studies in this area for reasons discussed below. However, immersive VR provides a powerful tool for the simulation of social scenarios, and due to its presence-inducing properties can be effectively used for laboratory-based controlled studies. Similarly, away from the domain of experiments, there are many aspects of our cultural heritage that people cannot experience – how an ancient site might have looked in its day, the experience of being in a Roman amphitheater as it might have been at the time, and so on. Again, VR offers the possibility of direct experience of such historical and cultural sites and events. In this section, we consider some examples of the application of VR in these fields, starting first with social psychology.

Loomis et al. (1999) pointed out how VR would be a useful tool for research in psychology and Blascovich et al. (2002) in social psychology. Here, the potential benefits are enormous. First, studies that are impossible in reality for practical or ethical reasons are possible in VR. Second, VR allows exact repetition of experimental conditions across all trials of an experiment. Moreover, virtual human characters programed to perform actions in a social scenario can do so multiple times. This is not possible with confederates or actors, who can become tired and also have to be paid. Although it is costly to produce a VR scenario, once it is done, it can be used over and over again. Also, the scenarios can be arbitrary rather than restricted to laboratory

<sup>54</sup>https://www.youtube.com/watch?v=P9OXRDc3flU

settings. Rovira et al. (2009) pointed out how the use of VR in social science allows for both internal and ecological validity. The first refers to the possibility of valid experimental designs including issues such as repeatability across different trials and conditions, the precision at which outcomes can be measured, and so on. The second refers to generalizability. For example, in a study of the causes of violence, VR can place people in a situation of violence, which cannot be done in a real-life setting. This means that there is the possibility of generalization of results out of the laboratory to what may occur in reality. In particular, VR can be used to study extreme situations that are ethically and practically impossible in reality. This relies on presence – PI and Psi – leading to behavior in VR that is sufficiently similar to what would be expected in real-life behaviors under the approximately the same conditions. In the sections below, we briefly review examples of research in this area.

# 4.1. Proxemics

How do you feel when a stranger approaches you and stands very close? The answer may vary from culture to culture, but at least in the "Anglo-Saxon" world you are likely to back away. Proxemics is the study of interpersonal distances between people, discussed in depth by Hall (1969). He defined intimate, personal, social, and public distances that people maintain toward each other (and these distances may be culturally dependent). An interesting question is the extent to which these findings also occur in VR. If a virtual human character approaches and stands close to you, in principle this is irrelevant since nothing real is happening – there is no one there. Even if the character represents a physically remote actual person who is in the same shared virtual environment as you, they are not really in the same space as you, and therefore not close. We briefly consider proxemics behavior in VR because it is a straightforward but fundamental social behavior, and finding that the predictions of proxemics theory hold true for VR is a foundation for showing that VR could be useful for the study of social interaction.

There has not been a great deal of work on this topic that has exploited VR. Bailenson et al. (2001) showed that people tend to keep greater distances from virtual representations of people than cylinders in an immersive VR. This work was continued in Bailenson et al. (2003) where it was shown that participants maintain greater distances from virtual people when approaching them from the front, than from the back, and also greater distances when there is mutual eye gaze. Participants also moved away when virtual characters approached them. Readers might be wondering – so what? This is obvious. It has to be remembered though that these are *virtual* characters, no real social interaction is taking place at all. Further studies have shown that proxemics behavior tends to operate in virtual environments (Guye-Vuilleme et al., 1999; Wilcox et al., 2006; Friedman et al., 2007).

McCall et al. (2009) showed that proxemics behavior can be used as a predictor of aggression. Proxemics distances of *n* = 47 (mainly self-identified as White) participants were measured from two White or two Black virtual characters. Subsequently, participants engaged in a shooting game with those virtual characters. It was found that there was a positive correlation between the distance maintained from the characters in the first phase and the degree of aggression exhibited toward them in the second phase but only for the condition where both virtual characters were Black.

Llobera et al. (2010) examined proxemics in immersive VR by measuring how skin conductance response varied with the approach of one or multiple virtual characters toward the participant, to different interpersonal distances. This was to test the finding of McBride et al. (1965) of a relationship between proximity and heightened skin conductance. It was found that there was a greater skin conductance response as a function of the closeness to which the characters approached participants and the number of characters simultaneously approaching. However, it was found that there was no difference in these responses when cylinders were used instead of characters. It was suggested that skin conductance cannot differentiate between the arousal caused by characters breaking social distance norms and the arousal caused by fear of collision with a large object (the cylinder) moving close to the participants.

Kastanis and Slater (2012) showed how a reinforcement learning (RL) agent controlling the movements of a virtual character could essentially learn proxemics behavior in order to realize the goal of moving the participant to a specific location in the virtual environment. Participants in an immersive VR saw a male humanoid virtual character standing at a distance and facing them. Every so often the character would walk varying distances toward the participant, walk away from the participant, or wave for the participant to move closer to him.55 The RL behind the character gained a positive reward every time the participant stepped backwards toward a target position. The long run aim was to get the participant to move far back to this target, unknown to the participant herself. The RL eventually learned that if its character went very close to the participant, then the participant would step backwards. Moreover, if the character was far away then it sacrificed short-term reward by simply waiving toward the participant to come closer to itself, because then its moving forwards action would be effective in moving the participant backwards. Hence, the RL relied on presence (the participant moving back when approached too close – from the prediction of proxemics theory) and learned how to exploit this proxemics behavior to achieve its task. For all participants, the RL learned to get the participant back to the target within a short time. This method could not have worked unless proxemics occurred in the VR. Having shown that this is the case we move on to more complex social interaction.

# 4.2. Discrimination

Research suggests that VR can provide insights into discrimination by affording the opportunity for people to have simulated experiences of the world through another group's perspective even if only briefly. For example, we saw earlier how simply placing White people in a Black body in a situation known to be associated with race discrimination led to an increase in implicit racial bias (Groom et al., 2009). On the other hand, virtual body representation has been shown to be effective with respect to

<sup>55</sup>https://youtu.be/D4KgWpta7YI

racial bias, where White people embodied in a Black-skinned body show a reduction in implicit racial bias (Peck et al., 2013)56 in a neutral social situation as we saw in Section 2.1.2.

More generally, the method of virtual embodiment has also been used to give adults the experience of being a child (Banakou et al., 2013), has been shown to affect motor behavior while playing the drums (Kilteni et al., 2013), and has been used to give people the illusory sensation of having carried out an action that they had in fact not carried out (Banakou and Slater, 2014). Some of the work in the area of body representation applied to implicit bias is reviewed in Slater and Sanchez-Vives (2014) and Maister et al. (2015).

A further question is whether embodied experiences as an "outgroup" member will actually translate into different behavior toward members of the group. Although not in the context of discrimination there is some evidence from the work of Ahn et al. (2013) that this might be the case. They immersed people with normal vision into an HMD-delivered VR where they experienced certain types of color blindness. In three experiments (*N* = 44, *N* = 97, and *N* = 57), they compared the effects of perspective taking where participants simply imagined being color blind to a condition where the display actually made them color blind in the virtual environment. They found that indeed the VR experience did result in greater helping behavior of participants toward color blind people both within the experiment and in their behavior after the experiment (with a moderate effect size of the squared multiple correlation of around 10%). It illustrates how VR might be used to put people experientially in situations and how this may influence their behavior compared with only imaginal techniques.

# 4.3. Authoritarianism

Stanley Milgram carried out a number of experiments in the 1960s designed to address the question of how events such as the Holocaust could have occurred (Milgram, 1974). He was interested in finding explanations of how ordinary people can be persuaded to carry out horrific acts. The type of experiments that he conducted involved experimental subjects giving apparently lethal electric shocks to strangers. These are a very famous experiments that are as topical today as in the 1960s, and barely a week goes by when there is not some mention of it in news media,57 or further research relating to it is reported.58 There were several different variants of the experiment that Milgram designed. Typically, the experimental subject, normally recruited from the local town (near Yale University) rather than from among psychology students, were invited to the laboratory where he or she met another person, also supposedly recruited in the same way. The other person was in fact a confederate of the experimenter, an actor hired for the purpose, this being unknown to the subject. The experimenter invited the subject and the actor to draw lots to determine their respective roles in the experiment. It turned out that the subject was to play the role of Teacher, and the actor the role of Learner, but the outcome of this draw was fixed in advance. Then both the Teacher (subject) and Learner (actor) were taken to another room, where the Learner had electrodes placed on his body connected to an electric shock machine. It was explained that the idea was to examine how punishment might aid in learning. The Learner was to learn some word-pair associations, and whenever he gave a wrong answer he was to be shocked. The Learner, acting in a jovial manner, explained that he had a mild heart condition, and the experimenter assured both Learner and Teacher that "Although the shocks may be painful they are not dangerous." There are online videos showing the original experiment.59

The Learner was left in the room, and the experimenter took the Teacher back into the main laboratory, closing the door to that room. He explained to the Teacher that he had to read out cues for the word-pair tests and whenever the Learner gave the wrong answer the Teacher should increase the voltage on a dial and administer an electric shock at that voltage. The voltages were labeled from 15 V (slight shock) to 375 V (danger: severe shock) to 450 V (marked "XXX"). During the course of the experiment, a tape was played giving the responses of the Learner. With the low voltage shocks there was no response. After a while though the Learner could be heard saying "ouch!" and as the voltage increased further he complained more and more vociferously, eventually saying that he had the heart condition and that his heart was starting to bother him. He shouted that he wanted to be let out of the experiment, and finally with the strongest shocks he became completely silent. If at any point the Teachers said that they felt uncomfortable or that they wanted to stop, the experimenter would say one of "The experiment requires that you continue," "It is absolutely essential that you continue," or "You have no other choice, you must go on" in a prescribed sequence. Participants generally found that the experience was extremely stressful, and even if they continued through to lethal voltages they were clearly very upset.

Prior to the experiment, Milgram had asked a number of psychologists about how many people would go all the way and administer even lethal voltages to the Learner. The view was that only a tiny minority of people, those with psychopathic tendencies, would do so. In the version of the experiment described above, about 60% of subjects went all the way to administer the most lethal shocks. The results stunned the world since it apparently showed that ordinary people could be led to administer severe pain to another at the behest of an authority figure. There is a wealth of data and analysis and a description of many different versions of this experiment in Milgram (1974), but the basic conclusion was that people will tend to obey authority figures. Here, ordinary people were being asked to carry out actions in a lab in a prestigious institution (Yale University) and in the cause of science. They tended to obey even if they found that doing so was extremely uncomfortable. Although this is not the place for discussion of this interpretation, interested readers can find

<sup>56</sup>https://www.youtube.com/watch?v=NrRRKZRGZbE ("Can virtual reality be used to tackle racism?" Report by Melissa Hogenboom, BBC Click).

<sup>57</sup>E.g., http://nymag.com/scienceofus/2015/10/theres-a-new-film-about-the-milgramexperiment.html

<sup>58</sup> In the period of January 1 to May 2, 2016 there were more than 100 articles published that reference the Milgram work.

<sup>59</sup>E.g., https://www.youtube.com/watch?v=fCVlI-\_4GZQ

alternative explanations for the results in, for example, Burger (2009); Miller (2009); Haslam and Reicher (2012); and Reicher et al. (2012).

Participants in these experiments were deceived – they were led to believe that the Learner was really just another subject, a stranger, and that he was really receiving the electric shocks. The problem was not so much the stress, but that fact that participants were not informed about what might happen, were not aware that they may be faced with an extremely stressful situation, and were ordered to continue participating even after they had clearly expressed the desire to stop. These and other issues led to strong criticism from within the academic community that eventually led to a change in ethical standards – informed consent, the right to withdraw from an experiment at any moment without giving reasons, and care for the participants including debriefing. See also a discussion of these issues as they relate to VR in Madary and Metzinger (2016). Hence, these experiments on obedience, no matter how useful, cannot be carried out today for research purposes, no matter how valuable they might seem to be scientifically. Yet, the questions addressed are fundamental since it appears that humans may be too ready to obey the authority of others even to the extent of committing horrific acts.

In 2006, a virtual reprise of one version of the Milgram experiments was carried out (Slater et al., 2006), with full ethical approval. The approval was given because participants were warned in advance about possible stress, could leave the experiment whenever they wanted, and of course they knew for sure that no one in reality was being harmed because in this experiment the Learner was a (poorly rendered) virtual female character displayed in a Cave-like VR setting.60 The participants (Teachers) sat in the Cave system by a desk on which there was an electric shock machine. They saw the virtual Learner on the other side of a (virtual) partition, projected in stereo on the front wall of the Cave. They went through the same routine with the virtual Learner as in Milgram's experiment, reading out cue words, and administering "electric shocks" to the virtual Learner whenever she answered with an incorrect wrong word-pair association. Just as in the original experiment, after a while she began to complain and demanded to be let out of the experiment, and eventually seemed to faint. However, if participants expressed a wish to stop, no argument against this was given, and they stopped immediately.

Even though carried out in VR, many of the same results as the original were obtained, though at a lower level of intensity of stress. There were *n* = 34 participants, 23 of whom saw and heard the virtual Learner throughout the experiment, and 11 who saw and spoke to her initially but then a curtain descended, and they only communicated with her through text once the question and answer session began. All those who communicated by text gave all of the shocks. However, 6 of the 23 who saw and heard the Learner withdrew from the experiment before giving all shocks. In other words, 74% continued to the end, in spite of the fact of feeling uncomfortable, as was shown by their physiological responses (skin conductance and electrocardiogram responses).

In the paper, it was argued that the gap between reality and VR makes these types of experiments possible. Presence (PI and Psi) leads to participants tending to respond to virtual stimuli as if they were real. But, on the other hand, they know that it is not real, which can also dampen down their responses. In debriefing, when participants were asked why they did not stop even though they felt uncomfortable, a typical answer was "Since I kept reminding myself that it wasn't real." From the original experiments of Stanley Milgram we know (at least for the 1960s around Yale in the US) how people actually responded. In VR, we see that they responded similarly, though not with the very strong and visible stress that many of the original participants displayed. Using VR, we can study these types of events, and how people respond to them, and construct predictive theory that may help us understand how people might respond in reality. The predictions can then be tested against what happens in naturally occurring events and the theory examined for its viability. This type of approach can also be used to gather real-time data about brain activity of people when faced with such a situation (Cheetham et al., 2009).

# 4.4. Confronting Violence

You are in a bar or other public place and suddenly a violent argument breaks out between two other people there. It seems to be about something trivial. One man is clearly the perpetrator, and the victim is trying to calm down the situation, but his every attempt at conciliation is used by the perpetrator as a cue for greater belligerence. Eventually the perpetrator starts to physically assault the victim. What do you do? Suppose you are alone there? Suppose there are other people? Perhaps the victim shares some social identity with you, such as being a member of the same club or same ethnic group different to that of the aggressor. How do you respond? Do you try to intervene to stop the argument? Or walk away? How is your response influenced by these factors such as number of other bystanders or shared social identity with the victim or aggressor?

This area of research was initiated in the late 1960s provoked by a specific incident when apparently 38 bystanders observed a woman being murdered and did nothing to help.61 Latane and Darley (1968) introduced the notion of the "bystander effect," which postulates that the more bystanders there are at an emergency event such as this, the less likely it is that anyone would intervene, due to diffusion of responsibility, see also Darley and Latané (1968). However, other researchers have also suggested the importance of social identity as a factor, the perceived relationships between the people involved, for example, see Reicher et al. (2006); Hopkins et al. (2007); Manning et al. (2007); and Levine and Crowther (2008). There is a meta-analysis and review of the field by Fischer et al. (2011).

As pointed out by Rovira et al. (2009), one of the problems in this area of research is that for ethical and practical reasons it is not possible to actually carry out controlled experimental studies that depict a violent incident such as that described in

<sup>61</sup>https://en.wikipedia.org/wiki/Murder\_of\_Kitty\_Genovese. See also a recent New York Times article following the death in prison of the murderer http://www. nytimes.com/2016/04/05/nyregion/winston-moseley-81-killer-of-kitty-genovesedies-in-prison.html?\_r=0

<sup>60</sup>https://youtu.be/RjUNg3pkEag

the opening paragraph of this section. This is very similar to the situation of the Obedience studies discussed above. Instead, researchers have to study surrogates such as the responses of people to someone falling (Latane and Rodin, 1969) or responses to an injured person laying on the ground (Levine et al., 2005). However, these are not violent emergencies so that it may not be valid to extrapolate results from such scenarios to what might happen in actual violent emergencies. In VR it is possible to set up simulated situations, where we know from presence research that people are likely to react realistically to the events portrayed. King et al. (2008) suggested the use of Second Life to provide a non-immersive simulation of the bystander situation and described a case study where a particular person was victimized to examine how the presence of bystanders mediated the level of helping offered. It was concluded that one reason that people did not intervene was that they thought that this should be the responsibility of the Second Life monitors rather than the ordinary "citizens." In another video-game setting, Kozlov and Johansen (2010) found that participants were less prone to helping behavior in the presence of larger groups of virtual characters. A possible problem though with using video games is that they do not mobilize the body – there are no natural sensorimotor contingencies so that PI becomes something at best imaginal. In some applications this may not be important. However, when studying people's responses to emergency situations it may be prudent to have whole body engagement, some illusion that the body itself is present and at risk. Garcia et al. (2002) showed that only imagining the presence of other bystanders results in a bystander effect to the extent that participants are less likely to help others after the end of the study if they had been primed to think about or being in a group than being alone. Hence, it might be the case that video games are mainly aids to imagination and that results obtained from video games might be the same as those from imagination. Indeed, a result from Stenico and Greitemeyer (2014) suggests that this might be the case. This is not to say that such results are invalid but that by themselves they are not convincing enough, and some experimental evidence is needed that does place participants into the midst of a violent emergency so that various factors influencing their responses can be investigated. But, as we have said this cannot be done both for practical and above all ethical reasons.

Slater et al. (2013) used immersive VR (a Cave-like system) to study the social identity hypothesis: that participants who share social identity with the victim are more likely to intervene to help than if they do not share social identity. The method to foster social identity with a virtual human character was through the use of soccer club affiliation. All of the *n* = 40 participants were fans of the English soccer team Arsenal. They were in a virtual bar where they had an initial conversation with a life-sized male virtual character (V). This character was either an Arsenal supporter depicted through his shirt and his enthusiastic conversation about Arsenal (*n* = 20, "ingroup" condition), or a generic football fan, not a supporter of Arsenal (*n* = 20, "outgroup" condition). After a while of this conversation another character (P) – also wearing a generic soccer shirt but not Arsenal – butted in and started to attack V especially because of his support of Arsenal. This attack increased in ferocity until after about 2 min it became a physically violent attack.62 The main response variable was the number of times that the participant intervened on the side of V. It was found in accordance with social identity theory that those in the group where V was an enthusiastic Arsenal supporter intervened much more than those in the other group. There was a second factor, which was whether or not V occasionally looked toward the participant during the confrontation, but this had no effect. However, there was a positive correlation between the number of interventions and the extent to which participants believed that V was looking toward them for help – but only in the ingroup condition.

Since it is impossible to compare these results with any study in real life, of course their validity in the sense of how much they would generalize to real-life behavior cannot be known. However, experiments such as these generate data and concomitant theory, which can be compared in a predictive manner with what happens in real-life events. In fact, there is no other way to do this other than the use of actors – which as mentioned earlier can run into ethical and practical problems. Moreover, the knowledge gained from such experiments can be used also in the policy field, for example, providing advice to victims on how to maximize the chance that other people might intervene to help them, or of use to the emergency or security services on how to defuse such a situation.63 It is a way to provide evidence-based policy, and if the evidence is not generalizable to real situations then with proper monitoring, the policy will ultimately be changed.

# 4.5. Cultural Heritage

"In today's interconnected world, culture's power to transform societies is clear. Its diverse manifestations – from our cherished historic monuments and museums to traditional practices and contemporary art forms – enrich our everyday lives in countless ways. Heritage constitutes a source of identity and cohesion for communities disrupted by bewildering change and economic instability." (Protecting Our Heritage and Fostering Creativity, UNESCO).64

The preservation of the cultural heritage of a society is considered as a fundamental human right, and there is a Hague Convention on the protection of cultural property in the event of armed conflict.65 As we have seen tragically in recent years, there has been massive and deliberate destruction of cultural heritage, two well-known examples being the Buddhas of Bamiyan66 and the partial destruction of Palmyra.67 UNESCO maintains a country-by-country world heritage list.68

<sup>62</sup>https://www.youtube.com/watch?v=yspbUFhzGC0 (experiment scenario – bleeped out swearing).

<sup>63</sup>https://www.youtube.com/watch?v=11NH0K23nEM (BBC TV report about bystander experiment).

<sup>64</sup>http://en.unesco.org/themes/protecting-our-heritage-and-fostering-creativity

<sup>65</sup> http://portal.unesco.org/en/ev.php-URL\_ID=13637&URL\_DO=DO\_ TOPIC&URL\_SECTION=201.html

<sup>66</sup>http://whc.unesco.org/en/list/208

<sup>67</sup>http://whc.unesco.org/en/list/23

<sup>68</sup>http://whc.unesco.org/en/list/

The ideal way to preserve cultural heritage is physical protection, preservation, and restoration of the sites. There has also been significant work over many years concerned with digital capture and visualization of such sites, which of course can be displayed in VR (Ch'ng, 2009; Rua and Alvito, 2011). The first and obvious application of VR in this field is to allow people all over the world to virtually visit such sites and interactively explore them. This is no different from virtual travel or tourism, except for the nature of the sight visited. This is also possible through museums that have VR installations. The second is digitization of sites for future generations, and especially those that are in danger of destruction either through factors such as environment change or conflict. The third type of application is to show how these sites might have looked fully restored in the past and under different conditions such as lighting conditions. For example, it is quite different to see the interior of a building or a cave with electric lighting than under the original conditions that the inhabitants of that time would have seen them – by candlelight or fire. The fourth is to see how sites, both cultural heritage and non-cultural heritage sites might look in the future, under different conditions such as under different global warming scenarios.

This is a massive field and mainly concerned with digitization, computer vision, reconstruction, and computer graphics techniques. Here, we give a few examples of some of the virtual constructions that have been done and that potentially could be experienced immersively in VR.

An example of one type of application is described by Gaitatzes et al. (2001) who show how museum visitors can walk through various ancient sites visualized in a Cave-like system, in particular through the ancient Greek city of Miletus.69 Carrozzino and Bergamasco (2010) give various examples of museum installations.70,71 Interestingly, they speculate on a number of reasons why the use of VR in museum settings may not have been taken up so much recent years: (1) cost; (2) it requires a team to be able to do this; (3) lots of space is needed for the installation; (4) visitors do not want to wear VR equipment; (5) it is a single person experience; and (6) VR might be thought to be not serious enough to include in such august settings as museums. Apart possibly from the last issue, each of these problems is largely overcome with the advent of low-cost, high-quality HMDs with built-in head tracking. Of course it is still true that an interdisciplinary team is required to create the environments, although see Wojciechowski et al. (2004) and Dunn et al. (2012) for an example of how to do this. In particular, digital acquisition and rendering of cultural heritage sites requires a huge amount of data to be processed. An example of how this was handled for the site of the Monastery of Santa Maria de Ripoll in Catalonia, Spain, is presented in Besora et al. (2008) and Callieri et al. (2011) and an example of a user interface for virtually navigating this site in Andújar et al. (2012). A famous example of the virtual recreation of world heritage is the digitization and rendering of Michelangelo's statue of David plus several statues and other artifacts of ancient Rome (Levoy et al., 2000). The David statue72 required 2 billion polygons for its representation, and the software is available as freeware from Stanford.73

Sometimes a digital reconstruction is the only way to view a site. The ancient Egyptian temple of Kalabsha was physically moved in its location to preserve it from rising flood waters. Sundstedt et al. (2004) digitally reconstructed it to show it in its original site, and also how it may have looked two millennia earlier, including illuminating it with simulations of the type lighting that may have been used at that time. Gutierrez et al. (2008) describe a method for highly accurate illumination methods for heritage sites. Happa et al. (2010) review various examples of illuminating the past, together with descriptions of the methodology used.

Many examples of virtual cultural heritage in the past have been implemented for desktop or projection systems – though of course they could always be displayed immersively in HMDs. However, this raises other issues such as appropriate tracking, interfaces, and so on. A joystick for navigation, for example, is not always appropriate for an HMD (especially bearing in mind that movement without body action can sometimes be a cause of simulator sickness). Also a screen display has the advantage that typically it can be much higher resolution than what is possible in an HMD, where all the detailed lighting and detail rendering might not even be perceivable. Webel et al. (2013) describe their experience with a number of the newer technologies for display and tracking in the virtual construction of four different sites for display in a museum. They point out how traditional systems, such as tracking, requiring the wearing of devices, and expensive Caves are not always suitable for busy environments such as museums. However, low–cost, camera-based tracking systems do not require physical contact with visitors, and the use of the Oculus Rift HMD (in their application) allowed visitors to look around the virtual environment simply by turning their head rather than learning a joystick type of navigation method. In other words, these systems provide a natural means of interaction. As the authors wrote: "With the Oculus Rift as a display and head-tracking device, the user's immersion can be extremely increased. The natural camera control just by turning the head, like one would do in the real world, lets users control this aspect without even thinking about it. The combination with natural interaction inputs with the Kinect or the Leap Motion enables the user to directly interact with the virtual world."

Kateros et al. (2015) review the use of Oculus HMDs for cultural heritage and show how they were used in a number of applications and give insight into their ideas for preparing a user study. Casu et al. (2015) carried out such a study comparing the viewing of art masterpieces in the classroom through a nonimmersive multimedia white board display and the Oculus Rift. Their experiment had *n*= 23 students in a between-groups design (12 saw the non-immersive display) and found that the HMD method was superior across a range of subjective questionnairebased factors including motivation. Such studies, while useful, do not address the problem of the "wow factor," i.e., using the

<sup>69</sup>http://www.tholos254.gr/projects/miletus/index-en.html. (This also links to a 360° virtual tour).

<sup>70</sup>https://www.youtube.com/watch?v=U00bmFyipNw

<sup>71</sup>https://www.youtube.com/watch?v=DZx8NqjIgF4

<sup>72</sup>https://www.youtube.com/watch?v=e-l2BMStRcg

<sup>73</sup>https://graphics.stanford.edu/software/scanview/

HMD is novel, and it certainly provides a quite different experience than the multimedia white board. However, maybe once such systems become commonplace, the same results might not be obtained. There are no clear-cut answers, and it is not easy to establish criteria for the success or otherwise in comparing such systems (since there are many factors that vary between them). For example, Loizides et al. (2014) compared a powerwall with an Oculus Rift HMD for virtual visits to cultural heritage scenarios in Cyprus. They found that participants appreciated both types of display and especially the presence-inducing capabilities of the HMD. However, the HMD also led to greater nausea. As mentioned though, it is very difficult to make such comparisons because on the one hand the HMD had the natural interface for viewing (head tracking) but on the other hand much lower resolution. Moreover, the price ratio between powerwall and HMD was (at that time) 40 to 1, a factor not reflected in the difference in participant evaluation.

Finally, it should be noted that cultural heritage is not only buildings and statues. There are rich traditions in societies that are passed down the generations that are certainly no less important to preserve for the future – intangible heritage. An obvious example is folklore stories, but the medium for the ultimate representation of these for preservation through the generations is in written form. However, there are other examples, such as folk dances – which can be preserved through younger generations learning these from their elders – but this does not provide a form for others to experience. Aristidou et al. (2014) show how folk dancing can be digitally captured and represented.74 They concentrate on the technical aspects, but clearly such efforts can be portrayed immersively (see Presentation S3 in Supplementary Material).

# 5. MORAL BEHAVIOR

Sometimes in our professional and personal lives we are faced with problems that cannot be answered by any kind of evidencebased scientific reasoning. The science can provide information, but it cannot determine what *should* be done. Imagine that there is a nuclear reactor providing power for millions of people, and that the science determines that in the next 10 years there is a 5% chance that it will explode causing massive contamination. There are no resources to repair it and no alternatives. It can be decommissioned, and in the short to medium term this will cost many lives and great suffering. It can be left to run, with the corresponding risk. The science can determine the level of risk, but it cannot determine the action. In military or police action, there is the issue of "collateral damage." Action to resolve one kind of threat that might save many lives may indeed cost many lives in its execution. The science can inform about relative risks and costs, but it cannot determine what is the right thing to do.

How people "should" and do make decisions under such conditions of moral uncertainty are subjects for study in moral philosophy and neuroscience. Normally, abstract situations are used for reasoning or gathering evidence about the responses of people. A famous example is the "trolley problem,"75 where you have to make a choice between allowing a runaway trolley (or tram or train…) to run over and kill five unaware people in its path or diverting it to kill another person (Foot, 1967; Thomson, 1976). What do you do? Suppose the trolley were running toward the one person, but there were five others on another track. Would you divert to the train to save the one but kill the five? According to survey evidence (Hauser et al., 2007), most people will choose the action that saves the greatest number – five rather than one.76 Suppose to save the five, however, you have to push someone else onto the track to divert the train. In this case, few people will choose to take that action.

Philosophers distinguish between utilitarian and deontological principles. The first states that it is best to take the action that maximizes the greatest good, i.e., is concerned with consequences (the end justifies the means). The second emphasizes rather that an action in itself must be ethical, based on universal maxims. For example, if it is wrong to steal then it is wrong to steal in any circumstances, irrespective of possible beneficial outcomes. See Hauser (2006) for an exposition of these various principles in the context of psychology and neuroscience. Although sacrificing one person to save five is the utilitarian solution, people also do act out of deontological principles – which is why few support actively pushing someone onto the track even though the outcome is exactly the same in utilitarian terms. Moreover, choosing to take the *action* of diverting the train to save five rather than one has the same outcome as not choosing to divert the train when it is running toward one with five on another track (*omission*). However, omission could be argued to be both utilitarian (five are saved rather than one) and deontological (not personally taking an action that would kill).

These discussions have been going on for centuries. But, how can we know what people would actually do? As we saw in the example of the Stanley Milgram Obedience experiments (Section 4.3) what people might say they would do and what they do actually do when faced with a situation are not necessarily the same. Below we give some examples where VR has been used, relying on its presence-inducing capabilities, to face people with such dilemmas and where their behavior can be observed. Of course, this does not solve the moral problem of what the "right" behavior should be, but rather can inform about what people actually do, and ultimately the factors and brain activity behind this.

# 5.1. Virtual Representations of Moral Dilemmas

Transforming a short verbal description of a scenario such as the trolley problem into VR is non-trivial. There are "five people" – which people? Gender? Age? Ethnicity? Social class? How do they look? What are they doing? Why are they there? There is a trolley or train – exactly how does it look? How fast is it going? What is the surrounding scenery? The experimental subject can

<sup>74</sup>https://www.youtube.com/watch?v=iiuZznpHyPs&feature=youtu.be

<sup>75</sup>https://www.youtube.com/watch?v=bOpf6KcWYyw (a cartoon exposition of the trolley problem).

<sup>76</sup>http://www.moralsensetest.com/experiment/originaldilemmas.html (a survey at Harvard University).

divert the train – exactly how? Which action needs to be taken? How can the designer be sure that the subject will even be looking in the necessary direction? How can it be set up so that the subject sees the five and also sees the one? Doing something in VR means making it concrete and specific, obviously changing the scenario – which in one case is dependent on the imagination of the subject in response to a statement in a questionnaire, but in the other is there to be seen and heard.

Navarrete et al. (2012) implemented a version of the trolley problem, making all of the above choices but staying true to the story line, and they carried out an experiment where participants were faced with the choice between saving five or one.77 There were *n* = 293 participants who experienced the scenario in an HMD-based system (NVIS). This was a between-groups experiment where one group experienced the action condition (they could act to save five) and the other group the omission condition (if they did not act five would be saved). Just over 90% of subjects chose the utilitarian solution in line with questionnaire-based results. However, those who had to actively save the five showed greater arousal (skin conductance levels) than those who could save the five by doing nothing. Moreover, the greater level of arousal was associated with a lower propensity to take the utilitarian outcome. This could indicate that following the utilitarian path leads to greater internal conflict within participants, but following it without simultaneously violating deontological principles is a less stressful choice. Ideally, in order to rule out the effect on arousal simply of carrying out the action there should be a condition that equalizes the level of physical action across the conditions. However, the important point is that such studies can be carried out at all.

Pan and Slater (2011) portrayed a dilemma equivalent to the trolley problem. Participants were taught how to control a platform that operated as an elevator in an art gallery. The gallery consisted of two floors, ground and upper level. Virtual characters entered and could ask to be taken to the upper level to view the paintings there or remained on the ground floor. At one point – in the Action condition – there were five characters on the upper level and one on the ground level. A seventh person entered and asked to be taken to the upper level. While still on the elevator, that character raised a gun and started to shoot toward all those on the upper level. The participant could leave the shooter there (risking the five) or bring the elevator down (risking the one). The Omission condition was similar except that at the critical moment there was one character upstairs and five downstairs. To avoid the problem that the types of people represented by the virtual visitors might influence the results they were portrayed as stick figures, so that characteristics such as those mentioned above – age, gender, etc. – could not be inferred. This was a between-groups experiment with 36 participants in 2 factors: the situation was portrayed in a 4-screen Cave-like system or on a single PC screen. The second factor was the Action and Omission conditions. Running such an experiment in VR really illustrates how different it is than telling people a story and asking for their response. For those in the Cave their fundamental reaction was confusion or panic illustrated by the fact that 61% of them carried out multiple actions in response to the shooting compared to 33% of those in the desktop condition. However, taking into account the final resting point of the platform, 89% of those in the Action condition in the Cave brought the lift down, whereas 22% did so in the Omission condition. For those in the desktop condition the equivalent proportions are 67 and 22%. The differences between Cave and desktop were not significant, although being a pilot experiment the sample sizes were small. This experiment was featured in a BBC Horizon documentary "Are You Born Good or Evil?" where people naïve to the experiment were filmed. More than the statistics, their reactions pointed to the fact that they did actually experience a genuine dilemma.78,79 A more sophisticated version of this setup was repeated in an HMD-based study (Friedman et al., 2014) concerned with embodiment and time travel, where realistic virtual characters were portrayed. In terms of responses to the dilemma they were similar to the other studies. In these studies it has been found that people become more utilitarian in VR compared to what they will say in response to a questionnaire – i.e., they are more likely to adopt a decision depending on the outcome (saving five rather than one). In another study that used desktop VR the same was found. Specifically, subjects were more likely to make utilitarian decisions in VR compared to the same scenario described textually. In other words, although participants judged it less acceptable to sacrifice one person to save five when this dilemma was presented verbally, when it came to their actual action in VR they were more likely to do so. There is therefore a division between what people will say they would do and what they would actually do faced with the situation. This illustrates what VR is useful for in these types of context.

Finally, Skulmowski et al. (2014) used a screen-based system to situate participants in a trolley that they could control and avoid colliding with people standing on branching tracks. They investigated a number of hypotheses relating to specific types of potential victims (male, female), the number balanced against each other (e.g., 10 people rather than 5 against 1, or 1 against 1), ethnicity, altogether with 11 different hypotheses. They found that there were different response times depending on gender of the potential virtual victims, with a greater tendency to sacrifice males. In this study, arousal was estimated by measuring pupil dilation (see Presentation S4 in Supplementary Material).

# 5.2. Doctor/Patient Interaction

One area in which VR is likely to flourish in the coming years, as its cost comes down and it becomes more ubiquitous, is for the training of professionals. In many professions, people make fundamental ethical decisions – not so dramatic as the trolley problem, but nevertheless often very important. How does a lawyer act knowing for certain that a client has committed a horrific crime? Does a health inspector close down a factory putting at risk hundreds of jobs or allow the factory to continue with unsanitary practices – when it is clear after several warnings that there will be no significant improvement? With limited resources

<sup>77</sup>https://www.youtube.com/watch?v=yk\_hftGBHy4

<sup>78</sup>http://www.bbc.co.uk/programmes/p00k9drg

<sup>79</sup>https://www.youtube.com/watch?v=M2aorOAY8o8

Frontiers in Robotics and AI | www.frontiersin.org December 2016 | Volume 3 | Article 74

should an agency responsible for deciding which medicinal drugs should be available on prescription go for the cheaper one that has been shown to have limited success, or the vastly superior one that is also vastly more expensive? Choosing the latter might disadvantage the greater number of people due to restrictions on other drugs, yet also save the lives of a few.

Sometimes, these issues are covered by law and sometimes not. We consider one example. How do medical professionals learn to interact with their patients in such circumstances? Of course they observe their supervisors and teachers, and they read and learn about this in medical school. However, there is no substitute for experience. But, experience requires that prior to interacting with patients the doctors have already learned to interact with patients. Hence, VR can provide training and many different scenarios that will help toward gaining experience (Cook et al., 2010).

The idea of using virtual patients has been very thoroughly studied for many years80 (Cendan and Lok, 2012). For example Kleinsmith et al. (2015) has investigated empathy training with virtual patients. Here, though we consider only ethical problems in dealing with patients – where contrary to medical advice a patient demands a certain medicine; the first time that a doctor confronts this problem with a patient would typically be with a real patient. A case in point is the overprescription of antibiotics. This is a balance between the needs of society as a whole (to avoid enhanced bacterial resistance to antibiotics) and the needs of the individual. If a patient demands antibiotics but the medical evidence suggests that these would not be appropriate, does the doctor prescribe in order to have a quieter life, or perhaps avoid being sued should the decision ultimately have been a wrong one, or follow the higher principle that not prescribing unless clearly necessary may be the best thing to do for the greater good? Pan et al. (2016) carried out an experiment with *n* = 21 medical doctors (general practitioners; 9 being trainees with limited experience and the remainder with an average of about 6 years' experience). The experiment was carried out using an Oculus DK2 through which each doctor had a consultation with a virtual mother and her daughter. The mother had a small cough, and the daughter demanded that the mother be given antibiotics because when faced with the same problem a year before, the antibiotics had cured the problem immediately. Since the medical indications were that this was probably a viral infection, the participants (GPs) resisted the demand for antibiotics, which unleashed a torrent of complaints and anger from the virtual daughter.81 Finally 8 out of the 9 trainees prescribed the antibiotics, whereas 7 out of the 12 experienced doctors did so. The results also suggested that for those in experienced group, the greater their reported level of presence the less the probability that they would administer the antibiotic. The use of this type and many other scenarios in the medical and other professions could be of great utility in training, and preparing people for situations that they are almost bound to face eventually. Just as airline pilots first learn on simulators so the same is likely to be true across a range of professions.

# 6. TRAVEL, MEETINGS, AND INDUSTRY

# 6.1. Virtual Travel

Using VR, it is possible that you may not need to have physically gone to a place to say that you have visited it. Sitting in your home you can be navigating the streets and shopping in Hong Kong, ascending Mount Everest, visiting the Taj Mahal, exploring the Forbidden City in Beijing, or even the landscape of Mars. You can watch at first hand ceremonies and customs from Polynesia to Greenland.82 This is an obvious and long-discussed application. There are various possibilities: to visit a place virtually before going there, to visit the place instead of going there, to have a business meeting virtually with remote partners, meeting in a shared virtual environment, have a break on a beach in the middle of the day in winter during your coffee break in the office; the possibilities are limited only by imagination and what technology can deliver at the time (which of course is always changing).

This is far from a new idea. Already two decades ago people in the travel industry were considering the "virtual threat to travel and tourism" (Cheong, 1995), arguing that "the perceived threat of virtual reality becoming a substitute for travel is not unfounded and should not be ignored. Virtual reality offers numerous distinct advantages over the actual visitation of a tourist site … that could result in the eventual replacement of travel and tourism by virtual reality." The advantages of VR suggested were (1) technology could eventually support "the perfect virtual experience" where the sun never stops shining (for one kind of holiday), or the snow is perfect (for another kind), there are no unruly (real) people around, and so on. (2) It is convenient – there is not the stress of traveling, it is significantly cheaper, there are no inconveniences. (3) Places could be visited that are not easily accessible (Mars is an extreme example). One could even travel in the past or to fantasy worlds. (4) People who are unable to travel because of illness or disability would easily be able to do so. (5) There are no risks – tropical diseases, accidents, and food poisoning. (6) There is no damage to the places visited. (7) Business travel could be simplified. However, Cheong (1995) goes on to discuss the reasons why this might not really be a threat – virtual immersion is not the same as really being there; it would be difficult in VR to engage in exchanges with the locals (like discussions in a market, learning to dance the Hula); there is a level of complexity and randomness in the real world that cannot be reproduced in VR; people might confuse reality and VR; and there would be problems with countries whose revenues depend greatly on tourism.

On the one hand, of course since 1995 tourism has not been replaced by VR (on the contrary – see the next section), but on the other hand, none of the objections above seem insurmountable (even revenue from tourism could be protected by some kind of royalty system). Moreover, as global warming becomes an increasingly serious prospect and threat, VR could provide a way of lessening some of the negative impact of travel. An article by Guttentag (2010) suggested that VR could be useful for tourism for planning, management, marketing, entertainment, education,

<sup>80</sup>https://www.youtube.com/watch?v=05jSp63-W7c&list=PLjjzAm1HXwJOFD6a G9vCYHL4cFoYef6ya

<sup>81</sup>https://www.youtube.com/watch?v=KhcnvdKbHrM&feature=youtu.be

<sup>82</sup> See an example from Marriott https://travel-brilliantly.marriott.com/ our-innovations/oculus-get-teleported

providing accessibility to inaccessible places such as archeological sites (see Section 4.5) with consequent heritage preservation. However, Guttentag wondered whether VR could ever provide an alternative to real travel, emphasizing a point made in Cheong (1995) that VR may never be able to substitute basic sensory experiences – "the smell of ocean spray" or make virtual surfing feel like the real thing. In other words, at the end of the day will VR ever be technically up to the mark in providing a genuine substitute for the real experience?

In this section, we do not attempt to answer this question, since the answer cannot be known. Rather, we describe what has already been accomplished in this realm across a variety of applications that require some kind of travel. Perhaps, VR is not meant to be a substitute for real travel but just another form of travel, no less valid in its own terms than all that physically boarding the real aeroplane entails.

# 6.2. Remote Collaboration

The contribution of travel to the world economy is colossal. According to the World Travel and Tourism Council (WTTC, 2015), travel and tourism generated \$7.6 trillion in 2014, amounting to 10% of global GDP. It also accounted for 10% of all jobs (277 million), with the travel economy growing faster than other sectors such as health, financial services, and automotive. See also the extensive statistics produced by the World Tourism Organization UNWTO.83 On the other side, travel comes with significant costs (Reford and Leston, 2011). The first obvious one is the potentially disastrous impact on the planet's environment (Zhou and Levy, 2007) including the negative impact on health of air pollutants – e.g., Curtis et al. (2006) and Kampa and Castanas (2008) – see, for example, a meta-analysis by Mustafić et al. (2012) that reports a clear relationship between many of the associated pollutants and the near-term risk of heart disease. A second problem is especially in regard to business travel. In the US alone, \$283B was spent on business travel in 2014.84 However, such travel can be disruptive both to the business and the personal life of the traveler (Gustafson, 2012) including contributing to family conflict and burnout (Jensen, 2014). Nevertheless, for business (let alone personal and family relationships) face-to-face contact is thought to be essential. Even if face-to-face meetings can be substituted by one of the various forms of teleconferencing systems available, it has been suggested that these types of virtual meetings may even generate greater physical travel (Gustafson, 2012).

In an analysis of the relationship between air travel and the possibilities offered by videoconferencing in the past four decades Denstadli et al. (2013) did not find any clear picture and certainly not the case that videoconferencing might substitute air travel. Based on the analysis by Jones (2007), it is argued that faceto-face meetings are important for completing projects across international sites, maintaining commitment to strategic plans and shared organizational culture, knowledge sharing, creativity, and new services. There are of course related issues such as trust,

83http://www2.unwto.org

using business meetings to get away from the office from time to time, taking the opportunity to meet friends or relatives in remote locations, and so on. Hence, face-to-face meetings seem to be essential, and interestingly it is precisely those who travel the most who engage in most videoconferencing meetings. Hence, there is a complex relationship between the two. Nevertheless, in the study of Denstadli et al. (2013) (*n* = 1413), of those who had access to videoconferencing tools one-third said that they believed that some air travel could be replaced by videoconferencing. For example, probably some readers of this article would have experienced the situation of several hours of travel to attend or speak at a 1-h meeting and then to travel home shortly afterward – sometimes wondering what the point of it all might have been. Can VR be of benefit in this domain?

In this section, we briefly review the possibilities offered by immersive VR as a means for enabling remote communication and collaboration. We consider a virtual environment that is shared between multiple participants. Each participant is represented by a virtual body (an "avatar") and can see the representations of the others. Ideally participants' movements are tracked, they can move through the virtual environment, and can talk to one another. Hence, they are in a 3D stereo surrounding space along with others. Of course, there are several technical issues involved in how to realize such a system (Steed and Oliveira, 2009), such as how and where to distribute the computation (one master machine broadcasting to all the others or a distributed network?), how to keep the various participant environments synchronized with one another so that they are all able to perceive the same consistent environment etc., but these issues are not considered here. In its *ideal form*, such a system must be superior to videoconferencing – since for example, the latter cannot display spatial relationships, eye contact, and so on. However, an ideal form of a shared VR would require real-time full facial capture, eye tracking, real-time rendering of subtle emotional changes such as blushing and sweating, subtle facial muscle movements such as almost imperceptible eyebrow raising, the possibility of physical contact such as the ability to shake hands, or embrace, or even push, and so on. Such a system does not exist today, though it is one to strive for. Some of these capabilities might be realized with the type of VR referred to as 360° surround, but we defer the discussion of this to Section 7.2. In the following section, we review some of what has been achieved and what the likely prospects are.

# 6.3. Shared Virtual Environments

Probably, the first published work where more than one person could simultaneously inhabit the same virtual environment was presented by Blanchard et al. (1990). This was the VPL system that allowed two people each with their own HMD (Eye Phone) and data glove to be simultaneously copresent in a virtual environment. Over the next few years, there were many systems that provided this and typically extending to multiple participants rather than two (Greenhalgh and Benford, 1995; Frécon and Stenius, 1998; Frecon et al., 2001), and today it is a matter of course that VR systems support this capability (Bierbaum et al., 2001; Tecchia et al., 2010), and VR development platforms of recent choice such as Unreal Engine or Unity3D are also multi-participant systems.

<sup>84</sup>https://www.ustravel.org/research/travel-industry-answer-sheet

So, the capability for virtual environments shared by multiple participants has been around for a long time, supported by many platforms, and realized in massive online systems such as Second Life, although typically non-immersively. The work by Apostolellis and Bowman (2014) is a good recent illustration of collaboration in a learning context that was realized with screen-based displays. The early days of research in this area, apart from the technical issues of how to build systems, concentrated on exploiting the capabilities of VR to improve remote collaboration beyond what might be possible even in face-to-face communications – for example, the type of work reported in Benford and Fahlén (1993) and Koleva et al. (2001). However, the primitive representations of people (very crude block-like characters) due to the relatively limited graphics and processing power at the time made this of interest only in a research context.

Later work concentrated on exploring social dynamics within shared virtual environments. For example, the research described in Tromp et al. (1998); Steed et al. (1999); Sadagic and Slater (2000) and Slater et al. (2000) had three-person groups carry out a task together although they were physically in different places (including even different countries). This also compared the group dynamics in VR to real encounters and found that the dynamics was greatly influenced by the computational power and type of immersion. For example, the group leader that would emerge in VR was the one with an HMD rather than those interacting with the others on screen, but this same person was less likely to be the leader when the group met for real. Also, people were quite respectful of each others' avatars, notwithstanding their extreme simplicity – for example, avoiding collisions and apologizing when collisions invariably happened. Steed et al. (2003) carried this further by having pairs of people, one in London, UK, and the other in Gothenburg, Sweden, each in a Cave-like system spend around 3.5 h working together. Some of the pairs were friends, and some were strangers. They found that the partners could collaborate well on spatial tasks, where the avatars representing their whole bodies played an important role. However, on other negotiation tasks, where facial expression would be quite important to gage the intentions of the other, the friends did better together than the strangers. A review of this type of avatar-mediated communications can be found in Schroeder (2011).

Although during the 2000s the graphics power to display more realistic human avatars in real time and in large numbers became available, the type of "ideal" system mentioned earlier still was far from possible. Nevertheless, researchers began to address critical aspects of non-verbal communications that can make remote face-to-face interactions in virtual environments effective, such as shaking hands (Giannopoulos et al., 2011; Wang et al., 2011). Steptoe et al. (2008) introduced eye tracking as a way to determine the gaze of each individual avatar in virtual meetings between three remote participants (one in London, one in Salford, and the other in Reading, UK) each in a Cave-like system. Analysis suggested that participants automatically used gaze direction much as they would in a similar conversation in reality. This was followed up by Steptoe et al. (2010) who showed that eye tracking data that allowed avatars to be rendered showing gaze direction, blinking, and pupil size resulted in participants being able to better detect one another telling lies compared to a video conferencing system. This was between two participants in different physical places one using a Cave-like system and the other a power wall. Another recent idea for remote collaborative working is for each party to use a whiteboard, where they would see a silhouette of the remote person, like a shadow, on the white board. It was found that participants tended to act as if they were in the presence of the remote person (Pizarro et al., 2015). Although a lot of work on such avatar-mediated communication during this period took place using projection systems such as Caves, Dodds et al. (2011) used HMDs to embody two remote people in the same environment. They found that body tracking, in particular showing arm gestures, played an important role in bidirectional communication between the partners. When, for example, the gestures of the avatar of one of the partners were replaced by prerecorded animations then the communication was not as successful in task achievement.

A combination of HMD and Cave system was used for a case study of remote acting, where two actors rehearsed a short scene using a script from The Maltese Falcon movie85 (Normand et al., 2012a). One actor was in Barcelona wearing a full motion capture suit and a wide field-of-view high-resolution HMD. The other actor was in a Cave in London and had some level of body tracking (arm gestures). The two were in the same virtual environment and could see and hear the avatars representing the other. A director was in a separate room in London. He could see and hear the scenario on screen, and video of the director's face was streamed in real time to both actors. Therefore, the director could communicate to the actors and tell them where to stand, what to say, how to improve their performance – generally act like a director.86 The professional actor involved in London concluded that such a system could be used for remote acting rehearsal especially for aspects such as blocking concerned with spatial locations and movements of actors, lines of sight, and so on. This work was followed up by Steptoe et al. (2012) who used again an actor in Barcelona in VR who saw a virtual representation of the remote London scenario, and she was represented as a wall screen avatar with a spherical display to represent her head to the actor, and the director was in the Cave. See also Steed et al. (2012) for a description of the technology. Observers from the Royal Academy of Dramatic Art commented on the positive potential uses of such a system for rehearsal and blocking, which are the arrangements and lines-of-sights of actors at the different stages of a play. Of course, again the lack of facial expression shown on the avatars is a drawback in these types of system.

Another drawback is the lack of touch – if one participant touches the avatar of another then typically nothing would be felt. Bourdin et al. (2013) set up an application where two remote people wearing an HMD and body-tracking suit interacted with a third person (an experimenter) who was in a Cave, so that all three saw representations of one another in a shared virtual environment. The experimenter had the task of persuading the other two to sing together. As part of the persuasion, she could touch the avatars of the two participants on the shoulder, upon which

<sup>85</sup>http://www.imdb.com/title/tt0033870/

<sup>86</sup>https://www.youtube.com/watch?v=c9bLWQhbJz0

they could feel a vibration from a small actuator located on their shoulder. Thus touch was used as part of the persuasion.87 Earlier Bailenson et al. (2007) carried out experiments using haptic only virtual environments where they showed that touch helped in the communication of emotions between people, both with respect to recognizing emotions recorded as haptics earlier by others, and with respect to simultaneous communications between remote partners. Their paper also contains a review of the field and a theoretical model. Basdogan et al. (2000) using a haptic only environment carried out a series of experiments, which also found that haptic feedback could impart critical information in remote communications. This work culminated in a "hands across the Atlantic" experiment where remote participants, one in London, UK, and the other in Cambridge, MA, USA, carried out joint tasks together such as lifting an object that they saw on screen and using haptics to help in the communication between them (Kim et al., 2004). Apart from describing the technological issues involved in setting up such a system, the results showed that the haptic feedback improved the sense of copresence, that is, that the remote participants felt that they were together.

# 6.4. Virtual Beaming

One obvious way to introduce haptics into remote VR-enabled communication is to actually use physical representations of people in the form of remotely controlled robots. This was envisaged and implemented in the very early days of VR. Fisher et al. (1987) described a telerobotic control system developed at NASA Ames (CA, USA), where the participant wearing a head-tracked HMD and other tracking, audio, and tactile feedback equipment received visual input from the cameras mounted on a remote robot. The robotic body essentially visually substituted the person's own body, therefore appearing to be colocated somewhat like the discussion of embodiment in Section 2.1.1. Recently, this idea of the symbiosis between a person in VR being represented remotely as a humanoid robot has seen some new applications as a particularly exciting form of remote collaboration where the participants are given physical form in the remote place. Here, the participant uses VR to perceive the remote location in full stereo with head- and body-tracking but is represented as a humanoid robot in the remote location. The humanoid robot moves as a function of the real-time body tracking of the participant, who can speak (through the robot) to local people in the remote location. It is a further and up-to-date realization of what was presented in Fisher et al. (1987) except now for the purposes of remote collaboration.

An example was shown in a BBC interview.88 The BBC interviewer in London (Technology Correspondent Rory Cellan-Jones) interviewed a scientist in Barcelona who was fitted with a wide field-of-view head-tracked HMD and a body-tracking suit. She was represented as a humanoid robot that was in the same room as the journalist in London. Her movements captured by the motion capture suit were transmitted across the Internet to the robot and applied to it so that it moved almost synchronously and in correspondence with her. A Skype connection allowed her to speak through the robot, whose mouth opened and closed in sync with her speech. Cameras fitted as the eyes of the robot transmitted video back to the HMD, so that she saw the surrounding London environment in stereo. Since the HMD head tracking data were transmitted and applied to the robot head, she could look around the room in London and converse with the BBC interviewer. The technology used was described in Spanlang et al. (2013). The same technology was used to beam journalist Nonny de la Peña from Los Angeles (CA, USA) to Barcelona. In Los Angeles, she wore the body-tracking suit and HMD. She was represented as the humanoid robot in Barcelona. Embodied as the robot, she conducted a debate between three students on the issue of Catalan independence from Spain and also interviewed a scientist about his research on HIV.89

The idea is reminiscent of "beaming" in Star Trek. Instead of a person being physically decomposed, transmitted to a remote place, and then recomposed there, a person in VR has their movements and speech transmitted to the remote place and applied to a humanoid robot, and sensory data – vision, sound, and touch – is transmitted back from the robot's sensory apparatus to the person, that is perceived in VR. The locals in the remote place interact with the robot that is embodied by the beamer. The beamer, however, through the VR becomes present in the remote place. This has also been used by journalist Nonny de la Peña to beam from London, UK, to Barcelona to interview neuroscientist Dr. Perla Kaliman about food for the brain.90 This journalism resulted in a news article about the results of the interview itself, rather than about the system used to realize it91 (Kishore et al., 2016).

The same kind of beaming setup has been used to create a shared environment between a small animal and a human. Normand et al. (2012b) showed a human participant in VR interacting with a virtual human, which in fact was a tracked rat in a cage 12 km away. Simultaneously, the rat interacted with a ratsized robot, which in fact was moving determined by the tracked the movements of the remote human. Hence, each interacted with an entity at its own scale (the rat with a small robot, the human with a human-sized avatar), leading to interspecies communication. This type of setup is of value in ethology. In an article on animal geography and related issues, Hodgetts and Lorimer (2015) wrote in reference to this work that "… it is claimed that the human and the rat were able to participate in a purportedly playful meeting of species that seems straight from the pages of science fiction. Such experiments in adjusting scale do little to shift power dynamics in interspecies communication. Nor does the lab maze create anything more than a novel environment for encounter. Yet the prospect of engaging with animal worlds in more embodied, interactive and exploratory ways opens new avenues for developing richer accounts of animal lifeworlds."

The issue of non-verbal communications is critical for face-toface communications, and as we have mentioned above there are

<sup>89</sup>https://www.youtube.com/watch?v=FFaInCXi9Go (in Catalan and English). 90https://www.youtube.com/watch?v=I58wF9f3\_a0

<sup>91</sup>The news article was published in Latino LA and focused solely on the substantive issue of food for the brain, rather than the system that was used for the interview: http://latinola.com/story.php?story=12654

<sup>87</sup>https://www.youtube.com/watch?v=gc8ySZHZLC0

<sup>88</sup>http://www.bbc.com/news/technology-18017745

Slater and Sanchez-Vives Enhancing Our Lives with VR

attempts to overcome this problem, for example, using eye tracking to animate the eyes of avatars. Telerobotics enables physical presence and to some extent the conveyance of body language, depending on the extent of body tracking and the capabilities of the robot; however, facial expression remains a problem, even though some robots can do this. Nevertheless, the subtle cues of which we are not consciously even aware in communication are not rendered. One way out of this problem has been explored through the combination of animatronics and "shader lamp" technology. Shader lamps project computer-generated images onto neutral objects so that observers would see the simple object as animated. In particular, an animated human face can be projected onto, for example, a spherical or egg-shaped object, thus making it appear as if the physical object were an animated face. Moreover, the face could be one that is captured by face-tracking or video from a remote person. Lincoln et al. (2009) proposed and implemented shader lamps for the faces of remote people projected onto animatronic puppets. The participant could be far away seeing the real surroundings of the puppet through a VR, and his or her face back-projected onto a shell, so that an observer of the puppet would see video of the real face of the distant person, and be able to interact with that person.92 Some research has suggested that this type of technology, where faces are displayed on physical objects, in this case a spherical display, can improve the aspects of trust in remote communications (Pan et al., 2014) (see Presentation S5 in Supplementary Material).

# 6.5. Interacting by Thought

The descriptions above of embodiment in remote robots through which social interaction can take place with distant people are reminiscent of movies such as *Avatar* (see text footnote 11) and *Surrogates*. 93 The fundamental difference is that whereas in the systems above people move their remote robotic bodies through their own deliberate movement (realized through real-time motion capture), in the vision presented in these movies, the remote representation is moved through a brain interface. The participant only has to think or imagine moving the remote body, and it moves the corresponding cyborg or robot body (in the movies perfectly) just as if they were moving their own real body. To a limited extent, this has been achieved today. For example, Millan et al. (2004) were able to control a mobile robot through non-invasive brain recordings or BCIs. Leeb et al. (2006) described their research with a tetraplegic patient who was able to use a BCI to navigate through a virtual environment presented in a Cave. He triggered his movement entirely by the voluntary production or halting of a specified electrical brain signal (EEG pattern).94 The same motor-imagery paradigm was used for the voluntary control of an arm belonging to the participant's virtual body (Perez-Marcos et al., 2009), resulting in an illusion of ownership over the virtual arm. BCI was used in a telepresence application for disabled patients by Tonin et al. (2011), although the patients did not see the remote environment *via* VR but rather video on a PC display. Nevertheless, this demonstrated the possibility. A survey of the use of BCI in VR and games was presented by Lécuyer et al. (2008).

Martens et al. (2012) demonstrated that a number of whole body tasks could be realized by a participant wearing an HMD embodied in a remote robot controlled through various BCI paradigms. Participants could pick and place objects, and engage in a game. This study also illustrated how the BCI could be used to recognize the intentions of the participant (for example, pick up a glass) and the robot would execute and complete the intention (since non-invasive BCI today simply does not permit the fine control necessary).

The lack of fine motor control results from the fact that most BCI systems use non-invasive scalp electrodes that therefore record brain signals of low spatial resolution. For patients who cannot otherwise move, acting in the world through the motor control of a robot is a possibility that may justify (invasive) brain implants. Small electrodes placed in the cortical tissue record the activity of groups of neurons with higher spatial resolution, allowing the control of finer movements. Wessberg et al. (2000) first showed that direct recording from the neurons in monkeys enables them to control quite sophisticated movements of a remote robot arm without using their own real arm. A similar approach has been used in people with tetraplegia that could successfully control robotic arms through brain implants (Hochberg et al., 2006, 2012). Moreover, depending on what the actuators may encounter, feedback can be used to stimulate appropriate groups of neurons that cause different tactile sensations. This was realized in monkeys by O'Doherty et al. (2011) where they were able to move a virtual arm that touched virtual objects distinguished only by their texture. Such technology could be used to drive prostheses that replace missing limbs, or exoskeletons that move actual but paralyzed limbs, or virtual bodies experienced in immersive VR or remote physical robots or cyborgs.

The latter possibility is the vision of *Avatar* and *Surrogates*. In each case, people perceive through the senses of their remotely embodied cyborg or robot and act in the world through those bodies. In John Scalzi's novel *Lock In*95 people suffering from "locked in syndrome" are present in the world through such robot embodiment. Although these are works of science fiction they are beginning now to be technically feasible and almost surely are going to be realized with the advance of neuroscience, VR, and robotic technology. For example, Kishore et al. (2014) showed how BCI could be used to embody people in a remote robot through which they could gesture and maintain a conversation with the people there.96,97

The "Embodiment Station" reported by Leonardis et al. (2014) was inspired by the setup in *Surrogates*. The Embodiment Station is a large chair that is a mobile platform that can induce force feedback (see text footnote 97 from minute 2:50). The participant is fitted with an HMD and has a multitude of physiological responses recorded and various different types of stimulation

<sup>92</sup>https://www.youtube.com/watch?v=eQLr83Co-GI

<sup>93</sup>https://www.youtube.com/watch?v=UGwQ74cH5O0

<sup>94</sup>https://www.youtube.com/watch?v=cu7ouYww1RA

<sup>95</sup>http://us.macmillan.com/lockin/johnscalzi

<sup>96</sup>https://www.youtube.com/watch?v=iGurLgspQxA

<sup>97</sup>https://www.youtube.com/watch?v=XUg990uZjEo

applied to his or her body. The participant may be embodied in a virtual body or remote physical body.

People in *Avatar* are shut into a tubular structure that monitors their brain and provides feedback so that they become embodied into a remote genetically engineered cyborg body. Cohen et al. (2014b) [see also Cohen et al. (2012)] show how to use real-time fMRI to decode particular thoughts of participants so that they are able to embody a virtual character98 and control a remote robot thousands of kilometers away (Cohen et al., 2014a).99 Although of course the degree of control and the level of embodiment are generations away from what is depicted in Avatar, it is nevertheless a clear step along the road toward this vision (see Presentations S6 and S7 in Supplementary Material).

# 6.6. Industrial Applications and Design

During the 25 years when VR was supposedly dead, or at best confined to University laboratories, industry was busy using it to develop products, inventing new methods of manufacturing, assembly and training, maintenance, and shopping. We briefly review some work in this area.

In a major review of the use of VR in car manufacture, Lawson et al. (2016) pointed out that VR can be used for design, avoiding the complex and expensive procedure of building physical mockups. With a mockup, any small change can result in major new work. Of course, VR is far more flexible in this regard. VR is also used for virtual manufacturing, that is part of the preparation, planning, and risk assessment in the manufacturing process, and clearly also invaluable for training. VR can be used for learning the assembly and disassembly of parts. Data from an in-depth survey revealed that VR was being used for a number of aspects in the design, manufacture, and evaluation – to examine the look of the vehicle including product reviews with clients, motion capture of manufacturing procedures, reviews relating to ergonomic use of the vehicle.

There has been significant work on industrial assembly, training for maintenance and remote maintenance – for example, Gavish et al. (2011, 2015) and Seth et al. (2011). This is also enhanced by the possibility of mixed reality where a participant in a VR can see their own hands incorporated into the virtual environment (Tecchia et al., 2014; Sportillo et al., 2015).100 Immersive VR is also being used for automobile testing.101

In another context, Tiainen et al. (2014) found that customers were equally at home in evaluating furniture presented virtually as physically. Indeed, they made more suggestions for design improvements in evaluations of the virtual products. Customers designing aspects of the interior of automobiles is also being prototyped using HMD-based VR.102

Virtual reality has also been used in the clothing industry where powerful computer graphics-based cloth simulators are used to allow customers to virtually try on clothes on virtual representations of their own bodies (Hauswiesner et al., 2011; Magnenat-Thalmann et al., 2011; Sun et al., 2015). Although not yet used in an immersive way, such systems are bound eventually to be a normal part of shopping – as we will have our own body representations, trying on clothing in the comfort of our homes without the inconvenience of traveling, queues, and fitting rooms would be a possible major application.

A final example is a highly innovative potential application in the food industry. Ruppert (2011) describes how VR is used to study the behavior of shoppers in response to different kinds of packaging and layout in supermarkets. It is suggested that where consumers want to buy healthier products that experimentation with different types of presentation could result in knowledge about how to best present such products so that they stand out for these types of consumer.

As argued by Lawson et al. (2016), VR can improve the prototyping, production, evaluation processes in manufacture, it can also be part of the design process, and ultimately for marketing. It also offers the possibility of consumers being involved in design and even designing aspects of the products that they will buy. In fact, VR combined with 3D printing could totally revolutionize how products are designed, manufactured, and delivered, giving enormous new power and possibilities to consumers103 (see Presentations S8 and S9 in Supplementary Material).

# 7. NEWS AND ENTERTAINMENT

We have already mentioned the potential benefits of VR for travel, for visiting remote relatives, and so on. Moreover, the use of VR in games is obviously going to be a huge area of application and one of the driving forces of the industry.104,105 There is a clear role also for immersive movies, where the participant plays a role within the story, somewhere between a game and a movie. These are such obvious applications of VR we are not going to discuss them further here. The chances are that any person first learning of VR in 2016 will do so because of a game or movie. In this section, we therefore concentrate on a quite novel field that VR opens up, which is the immersive presentation of news. This is usually called "immersive journalism." However, it is important to note that it is not the journalism that is immersive but the presentation of its results through immersive media, leading to the creation of a genuine new type of media for news reporting. We will consider the issues involved, including ethical issues, and finally discuss the differences between computer graphics-based VR and 360° video.

# 7.1. News and Immersive Journalism

The idea of immersive journalism is "the production of news in a form in which people can gain first-person experiences of the events or situation described in news stories" (de la Peña et al., 2010). Let's consider the main headlines (online) of the *Los Angeles Times* on January 23, 2016 and see what this might mean.

<sup>98</sup>https://www.youtube.com/watch?v=PeujbA6p3mU

<sup>99</sup>https://www.youtube.com/watch?v=pFzfHnzjdo4

<sup>100</sup>https://www.youtube.com/watch?v=3Q3ZC124Qbc

<sup>101</sup>https://www.youtube.com/watch?v=EP0olmaL4Xs 102https://www.youtube.com/watch?v=TOx4q711dY8

<sup>103</sup>https://www.youtube.com/watch?v=6nHw4RsNJ3Q

<sup>104</sup>http://www.cnet.com/news/virtual-reality-is-taking-over-the-video-gameindustry/

<sup>105</sup>https://storystudio.oculus.com/en-us/henry/

# 7.1.1. Los Angeles Times January 23, 2016


If we compare the report with the VR version we can see that they reflect quite different purposes. In each row, the left side is the reporting of "news" ("Newly received or noteworthy information, especially about recent events," Oxford English Dictionary). There are masses of academic research studies and theories of what makes it into "The News" (as reported by newspapers, radio, TV, and of course now myriad online outlets). Interested readers could read, for example, a classic analysis by Galtung and Ruge (1965) who identify a number of factors that influence what events typically get into the news, and a follow-up study by Harcup and O'Neill (2001) who examined the earlier theory in the light of a content analysis of stories in three British newspapers. The theory includes factors such as those events involving elite nations or persons are more like to be newsworthy than non-elites. For example, news in Western media is more likely to report on events in the USA, Europe, China, and Russia than in the Seychelles, except, for example, when events in other places directly affect those countries (e.g., events in the Middle East). The divorce of a movie star is far more likely to make it into the news than the divorce of your next-door neighbor (unless you happen to live next to a movie star). However, who decides what is important? This reflects another aspect of news, which is that there are not events just "out there" floating around, and they just happen and then are selected by journalists according to some criteria and then reported factually, but it is an active process where what is news is defined by journalists and multifarious interests and ideologies that make up particular media cultures (O'Neill and Harcup, 2008). For example, a President attends an important international event. If the President is a man, the reporting may focus on the event and its background. If the President is a woman, a great deal of attention may be instead paid to her clothing.106,107 News values can differ enormously between different organizations. What makes it into the equivalent of the left side of each row in the table above, and how it is reported, are not simply matters of fact.

Now considering the possible immersive VR versions there is quite a difference – the goal is not so much the presentation of "what happened" but to give people experiential, non-analytic insight into the events, to give them the illusion of being present in them. That presence may lead to another understanding of the events, perhaps an understanding that cannot be well expressed verbally or even in pictures. It reflects the fundamental capability of what you can experience in VR – to be there and to experience a situation from different perspectives. This is no more or less "objective" than news in traditional forms – what is selected, and how it is presented inevitably will reflect the interests, culture, political views of the journalists involved, and perhaps even more importantly their news organizations. There is no way around that, since what might be "news" is infinite, and *something* has to be selected.

<sup>106</sup>http://www.telegraph.co.uk/news/worldnews/europe/germany/9427863/ Double-take-Angela-Merkel-steps-out-in-same-dress-she-wore-to-same-eventfour-years-ago.html

<sup>107</sup>http://www.ft.com/intl/cms/s/2/10369810-aeaf-11e3-aaa6-00144feab7de. html#slide0

Moreover, how news in VR will be understood will also be actively shaped by the participant. Recall that in VR there are neither "users" nor "observers" but *participants* or *consumerparticipants*. Even if you are just an observer without the actual ability to intervene, presence in VR is such that you will likely have the perception that ongoing events could affect you. Hence, the consumer of a news story in one medium becomes a participant in the virtual story in the other, the "immersive journalism" that creates a scenario to represent aspects of the news story in VR. However, there is a difference. Let's go back to the woman President attending an event. A VR rendition of this puts you in the scene in the 1PP of someone who attended and who was greeted by the President. She moves over to you, smiles, and says some words of greeting: to you. Assuming that the journalist had made every effort in visual reconstruction to be faithful to the original event, whether the clothes that the President is wearing stand out or not depend wholly on you, the perceiver. You may pay attention to them or not, you may see them as remarkable or not. If the journalist wanted to really point out to participants the clothing worn by the President, this is of course entirely possible in VR – whether openly or surreptitiously. However, if the goal is to try to be objective, then how certain aspects of the events are interpreted will depend more on the perceiver than on the designer. We will come back to some of these points later.

The first immersive journalism piece was developed in 2010 in Barcelona, Spain, and directed by journalist Nonny de la Peña with the help of digital artist Peggy Weil. It followed on from the idea of their 2009 interactive Second Life piece that portrayed a virtual Guantánamo Bay prison.108 The immersive news story was displayed in a Wide5 HMD by Fakespace for the display (see text footnote 18) and incorporated body tracking. It established a pattern that was to be used by Nonny de la Peña in later productions, which was to use a mix of data from actual events combined with a computer graphics-based reconstruction. It relied on transcripts of the interrogation of Detainee 063, Mohammed Al Qahtani, at Guantánamo Bay Prison 2002–2003. The scenario was in a single cell-like room, and the participant was embodied in a virtual character wearing an orange "jump-suit." From a 1PP, the participant's virtual body posture was shown in a stress position – one reportedly used for "harsh interrogations." The participant could see the virtual body either directly looking toward his own body and in a virtual mirror. However, in fact the participant was seated comfortably in a chair. The participant would hear an interrogation as if coming from a cell next door.109 A case study (de la Peña et al., 2010) with three participants was carried out who were interviewed after their experience. All reported that even though they were seated comfortably, they felt uncomfortable, even pain, from the posture of their virtual body. This result that the posture of the virtual body can actually influence feelings of comfort or discomfort of participants has recently found new evidence (Bergström et al., 2016) (see text footnote 54). The three participants felt a foreboding that the interrogation in the next cell would soon shift to them. Although the participants had not been given any forewarning of the meaning of the event that they were to experience, one of them said: "During the experience I was kind of reminded of the news that I heard about the Guantánamo prisoners and how they feel and I really felt like if I were a prisoner in Iraq or some… war place and I was being interrogated." It illustrates the difference between the left column (traditional reporting of news) and right column (news in VR) in the Table above. The left column might be a written piece about harsh interrogation methods, or a TV news piece illustrating aspects of this. But, on the right hand side there is experience. Of course, this is not the real experience, but may give participants insight into how some aspects of the situations depicted might have been.

"Hunger in Los Angeles"110 was a subsequent piece by Nonny de la Peña. This puts participants in a food line in Los Angeles where one of the people in the queue faints due to diabetes, and the various characters around react. It was based on an actual event and blended real sound recordings with computer graphics. The virtual characters in the food line were animated through the motion capture of actors. It was experienced by hundreds of people at the Sundance Film Festival in 2012. The 2014 World Economic Forum featured "Project Syria" by de la Peña, which depicted a bomb explosion in a Syrian town and its aftermath (see text footnote 110). This followed the same pattern of being based on an actual event and starting from video and audio from the real scenario. Further pieces on the same lines are "One Dark Night"111 about the shooting of teenager Travyon Martin and "Kiya" about an incident of domestic violence and murder112 (recall the fifth item in the table above).

An alternative to using computer graphics to reconstruct events is the use of 360° video. A scenario is captured by using a special camera and subsequent software to patch video together to form a completely surrounding scene that can be displayed in an HMD. Due to head tracking, the viewer can look all around the scene, and depending on how it has been captured, it can also be displayed in stereo. We will return to the technology in Section 7.2. This is therefore an alternative way of displaying events immersively.

"Waves of Grace"113 by Gabo Arora (Senior Advisor and Filmmaker, United Nations) and Chris Milk (Vrse.works) use this technique to recreate the true story of a survivor of Ebola in Liberia. They also created "Clouds over Sidra," a documentary about a child refugee in the Syrian war.114 Louis Jebb founder and Edward Miller head of visuals of Immersiv.ly use 360° video to create immersive news events. Some examples have been the coverage of unrest in Hong Kong115 and a 360° VR experience of the paintings of the artist Gretchen Andrew on a self-guided interactive tour of a computer-generated recreation of the De Re Galler in Los Angeles.116 The Des Moines Register working with

<sup>110</sup>https://www.youtube.com/watch?v=SSLG8auUZKc

<sup>111</sup>http://www.emblematicgroup.com/#/one-dark-night/

<sup>112</sup>http://www.emblematicgroup.com/#/kiya/

<sup>113</sup>http://vrse.works/creators/chris-milk/work/waves-of-grace/

<sup>114</sup>https://www.youtube.com/watch?v=FFnhMX6oR1Q

<sup>115</sup>http://www.hongkongunrest.com/vr-player.html

<sup>116</sup>http://virtualrealityderegallery.com

<sup>109</sup>https://www.youtube.com/watch?v=\_z8pSTMfGSo

Dan Pacheco produced a documentary that combined both computer graphics-generated VR and 360°, which can be viewed in an Oculus HMD that provided an in-depth study of the situation of farmers in Iowa, called "Harvest of Change."117 The New York Times has started VR news based on 360°, using Google Cardboard as the means of display and has created a number of stories with this technology.118 The BBC is also experimenting with 360° HMD-based news,119 for example, providing experience of the refugee crisis.120

At the same time as the great enthusiasm of VR in this domain,121 there are also warnings about its ethics. For example, in an excellent and comprehensive article on potential problems, Tom Kent (Standards Editor, Associated Press and Columbia University) urges "an ethical reality check for virtual reality journalism."122 The first point concerns the depiction of reality. For example, "Hunger in Los Angeles" was a reconstruction using computer graphics for the display. It was not the real thing. It is important for consumer-participants to always be made aware of this, and it should form part of the ethics code being devised by digital journalists.123 However, it is important to note that all journalistic reporting necessarily involves transformation and cannot possibly ever depict every aspect of reality. At the moment that the news camera focuses on the face of a politician, it of course misses everything else that is happening at the same time, some of which may change the meaning of the facial expression. Depicting any event with its infinite aspects and nuances in any media whatsoever necessarily involves a transformation. As we argued above, starting from what is selected to how it is portrayed involves myriads of choices. VR is no different in this regard. It can be argued that in VR a journalist could, for example, deliberately change the facial expression of a protagonist from a friendly smile (as it was in reality) to an arrogant grin. This could happen deliberately or by accident. However, how different is this from taking a small sentence in a speech of a politician out of context, thus distorting its meaning away from that intended? The use of VR requires ethical standards no more or less than conventional news reporting.

Another point relates to 360° video-based pieces, where there is an issue of image integrity. Since the Associated Press does not allow manipulation of images should particularly disturbing parts of a scene on a battlefield or bomb site be left in or not? Again, this is nothing special for VR. Of course a 360° view is less selective than a single camera shot or normal video shot. There are conventions where images are "distorted" though – such as blurring the faces of vulnerable people in order to protect them. It is not clear why such conventions could not be applied in the same way. This is nothing really to do with VR. As we argued in Section 1.1, VR is a media where conventional approaches will eventually be overtaken by a new paradigm. Today, shooting a 3D movie inevitably draws on the conventions of traditional movie making, so that problems of inclusion are paramount, since 360° in principle shows "everything." New paradigms will eventually overcome this problem.

The third point is that there may be competing views of what happened in any event, so VR portraying one version may not reflect the diversity of views. This also has nothing to do with VR. In fact, VR may have an advantage that it is possible to relive a scenario from multiple points of view – from the viewpoints of different protagonists, which may sometimes even explain why they describe an event quite differently. The 1950 Japanese movie "Rashomon"124 received international acclaim for doing this – depicting a story from the multiple points of view of the characters involved. Another version was released in 1964 called "The Outrage."125 VR could excel in such multi-viewpoint recreations.

Tom Kent argues that since VR is excellent for producing empathy, and identification with characters who may be experienced as being physically close to consumer-participants, that journalists have a special responsibility to make sure that their piece is balanced. For example, if they have the goal of producing sympathy toward particular people or situations they could emphasize aspects that provoke empathy or leave out balancing information that could be inconvenient to their story. This is of course true but again it applies no less than to conventional media. It could be argued though that VR is particularly adept at raising emotions and therefore unwitting consumer-participants might be more easily manipulated. This may be true. For example, we have seen in Section 2.1.2 how embodying White people in a Black body appears to reduce their implicit racial bias against Black people (Peck et al., 2013). However, we also saw in Section 4.4 that in a fight between two virtual characters about soccer teams, only participants who supported the same team as the victim tended to try to intervene to stop the fight (Slater et al., 2013). People did not change their behavior simply as a result of being near a virtual character that was attacked by another. In other words, people are not like sponges and just soak up whatever emotion is poured into them. In the racial bias example, participants were generally not explicitly biased, so in reducing their implicit (i.e., largely non-conscious) bias perhaps they were being helped toward realizing their own non-biased preferences. Imagine a VR scenario that placed a United States Democrat supporter into a Republican rally or an English vociferously anti-European voter into the heart of the Brussels decision-making community. Are either of these likely to change their views as a result? Of course, research is needed on this issue, but people should not be considered as empty vessels ready to be filled by whatever propaganda comes along. At the end of the day if a journalist wants to present a particular viewpoint they will do so with whatever means they have, so that the critical requirement is

<sup>117</sup>http://www.desmoinesregister.com/pages/interactives/harvest-of-change/

<sup>118</sup>http://www.nytimes.com/newsgraphics/2015/nytvr/

<sup>119</sup>http://bbcnewslabs.co.uk/projects/360-video-and-vr/

<sup>120</sup>http://www.bbc.co.uk/taster/projects/we-wait

<sup>121</sup>http://www.nytimes.com/2016/01/21/opinion/sundance-new-frontiersvirtual-reality.html?hp&action=click&pgtype=Homepage&clickSource=st ory-heading&module=mini-moth®ion=top-stories-below&WT.nav=topstories-below&\_r=0 (NYT Feature "Where Virtual Reality Takes Us").

<sup>122</sup>https://medium.com/@tjrkent/an-ethical-reality-check-for-virtual-realityjournalism-8e5230673507#.ftgz6i1v3

<sup>123</sup>https://ethics.journalism.wisc.edu/resources/digital-media-ethics/

<sup>124</sup>http://www.imdb.com/title/tt0042876/

<sup>125</sup>http://www.imdb.com/title/tt0058437/

openness, information about potential distortions, and appropriate ethical standards.

The final main point made in the article by Tom Kent is that the virtual environment is a circumscribed world, and of course the scenario is embedded in a wider world in which other related events may be happening. On the one side, the VR gives the impression to participants that they can freely go wherever they want, but of course the specific virtual environment has boundaries outside of which nothing can be perceived. This is a problem of selection, applying no less to other news media. When you are reading a story in a newspaper is it the whole story? Of course not, and it never can be.

Arguments about the ethics of VR miss the point that it is not the only way or even the "best" way to deliver news (or indeed any story at all, whether supposedly real or fictional). Just as VR is not going to replace novels in the form of books, it is not going to replace traditional media. It is another medium, another method for the production and display of narrative, providing a different kind of "information," providing a different kind of emotional engagement. These are not "better" or "worse" but just different. You can read about the refugee camp at Calais in France full of people wanting to enter the UK, or you can visit there virtually,126 or really go there. Each of these will provide quite different information and responses. One may give facts and figures and talk about policy and implications for the future of the European Union, another may show the physical and emotional plight of particular people in that camp. Visiting the camp virtually might lead someone already so inclined to do something to try to help the individuals concerned, but not necessarily result in a change in their political convictions about immigration. What is important is that all types of journalism follow ethical standards, and this applies no matter what the medium (see Presentation S10 in Supplementary Material).

# 7.2. 360° and VR

There is some discussion about whether 360° video as has been used in some of the pieces described above is "really" VR. For example, Will Smith in an article in Wired127 argued that systems such as 360° video as might be seen through Google Cardboard should not be called "VR," the main argument being that the relationship between head moves and image changes are more likely to lead to simulator sickness in 360°. However, this battle has already been lost. Mainstream media are already referring to 360° video as VR, and that is not going to change.

In order to consider this question, we return to the concept of "immersion" discussed in Section 1.3. Immersion refers exclusively to the technical affordances of a system. Different types of immersion may give rise to different types of subjective experience, but this is a different issue. One system is "more immersive" than another if the first can be used to simulate the second. This can classify all systems into what mathematicians call a "partial order." It is partial because that not all pairs can be classified in this way – there may be two systems where neither can be used to simulate the other.128 Now, if we consider 360° VR as video captured in a real setting and displayed in a head-tracked HMD then that can, in principle, be entirely simulated by a computer graphics rendering of the same scene, but not *vice versa*. By a graphics rendering of the scene we mean one based on a computer model (the model ultimately describes all the geometry, material properties, lighting, and dynamics of objects in the scene). Since there is a model, participants can change their point of view to anywhere within the scene. For example, they can move close to any object and then circle around it while observing it. If the viewpoint is restricted to only a few specific points, where from those points the viewer can turn around and look 360° then this is equivalent to "360-degree" VR. However, 360° VR cannot allow participants the full range of movement through the scene, to be able to observe any object arbitrarily from any angle.

In normal vision based on natural sensorimotor contingencies, when we see one object obscuring another, we can move our head and in principle see completely behind the obscuring object. This can be done with correct perspective and head movement parallax in graphics-based VR. This cannot be done, or to a very limited extent in 360° video. Graphics-based VR can be restricted to simulate the 360° simulation, but not *vice versa*. Therefore, there is a fundamental technical difference that will always persist by definition between 360° and model-based VR. Model-based VR can simulate 360°, but not *vice versa*. Therefore, technically it has a greater immersion in this classification of systems.

Ultimately, this means that they are useful for different purposes. If the VR is meant to depict something up-close and personal, such as interaction with a virtual character where the participant and virtual character might be arbitrarily changing their positions in the space, then this cannot be accomplished by 360°, since this type of parallax effect (e.g., just moving the head to see behind the character) just is not possible, unless every possible move that the participant was going to make was determined in advance and camera data made available for these possibilities. On the other hand, for a large-scale scene such as witnessing street protests as in Immersiv.ly's Hong Kong protests mentioned above, then 360° is sufficient. Provided that the designers did not intend the possibility for a participant to move up close to any arbitrary protestor for one-on-one unplanned interaction then this is fine.

Therefore, we would conclude that model or graphics-based VR and 360° VR are different possibilities in the domain that is referred to as "virtual reality," and designers and application builders will use the type of system that fits best with their goals. For close-up interaction, 360° will quickly break the natural sensorimotor contingencies that are necessary for the generation of presence. On the other hand, for large-scale scenes looking at objects far enough away, 360° is not only the simpler form of construction and rendering, but it is good enough in terms of sensorimotor contingencies. It is not either one or the other, both have their role. A major worry of Will Smith is that one would

<sup>126</sup>http://www.fastcompany.com/3053219/fast-feed/virtual-reality-journalismis-coming-to-the-associated-press

<sup>127</sup>http://www.wired.com/2015/11/360-video-isnt-virtual-reality

<sup>128</sup>For example, we can say that coordinate (*x*, *y*) is "less than" (*z*, *w*) if *x* < *z* and *y* < *w*. This defines a partial order over the set of all such coordinates. (1, 2) is less than (3, 4), but there is no order between (1, 2) and (0, 3).

be confused with the other, and that people with poor experiences in 360° will therefore label "virtual reality" as poor. Sensible and careful use of both types of technology where they are most appropriate would avoid this possibility.

It should be noted that it is not the model-based solution in itself that is important here, but what it offers in terms of natural sensorimotor contingencies for perception. There will eventually be other solutions that are not model-based but offer the same. One likely solution will be based on light fields (Levoy and Hanrahan, 1996; Ng et al., 2005), which attempt to fully simulate the propagation of light through an environment, and therefore allow a viewer to dynamically move anywhere within a scene. The problem is that dynamic changes to objects, and especially changing lights, cannot easily be supported. Some recent developments for HMDs based on light field displays are discussed in Lanman and Luebke (2013).

# 8. CONCLUSION

# 8.1. Recent Novel Ideas and Applications

In this article, we have mainly reviewed developments in VR that have taken place since its origins in the 1980s, focusing on applications, and especially those with outcomes that have some level of research support. The field is changing extremely rapidly, and the inventiveness of people is amazing, with new ideas and projects emerging daily. Here, we briefly list some recent ideas that have caught our attention (as of May 2016). Mostly, these are ideas in progress, with no results, or maybe not even any level of implementation. They are presented in random order.

Mark Zuckerberg: Virtual Reality Might Be Coming to Your Baby Photos

https://www.youtube.com/watch?v=rACZOac1w8w The idea that VR may be used to share photos immersively.

Dreams of Dali http://thedali.org/dreams-of-dali/ A VR experience based on Dali's 1935 painting Archeological Reminiscence of Millet's "Angelus."

Visualizing Big Data

http://www.mastersofpie.com/project/winners-of-the-big-datavr-challenge-set-by-epic-games-wellcome-trust/ How "big data" in particular a longitudinal social survey can be explored in HMD-based VR.

Topshop – London Fashion Week https://www.inition.co.uk/case\_study/ virtual-reality-catwalk-show-topshop/ Attend the show using VR.

A History of Cuban Dance http://with.in/watch/a-history-of-cuban-dance/ A 360° VR documentary.

Second Life in VR

http://www.bizjournals.com/sanfrancisco/blog/techflash/2016/01/second-life-second-act-virtual-reality-sansar.html San Francisco Business Times reports "In virtual reality, Second Life prepares for its second act."

# Megadeth in VR

https://www.youtube.com/watch?v=PnQAz8jWAh0 A YouTube documentary about Megadeth bringing heavy metal to VR.

In the eyes of the Animal

http://www.sundance.org/projects/in-the-eyes-of-the-animal A Sundance Festival winner showing views of how the world might look to various animals

Virtual Reality in Court

http://www.popsci.com/jurors-may-one-day-visit-crime-scenesusing-forensic-holodecks

A Popular Science report "Scientists Want To Take Virtual Reality To Court – Jurors May One Day Visit Crime Scenes Using Forensic Holodecks."

Project Nourished – A Gastronomical Virtual Reality Experience

http://www.projectnourished.com

"You can eat anything you want without regret."

Curing Cataract Blindness

http://www.ndtv.com/world-news/virtual-reality-could-be-thenext-big-thing-in-curing-cataract-blindness-1269591 NDTV report "Virtual Reality Could Be The Next Big Thing In Curing Cataract Blindness."

Oculus Quill https://www.youtube.com/watch?v=kPHWHJNTlkg Drawing in VR.

Producer of Acclaimed "First" Sets Sights on Anne Frank VR Experience

http://www.roadtovr.com/producer-of-acclaimed-first-setssights-on-anne-frank-vr-experience/

Plans for a historical VR reconstruction of aspects of the life of Anne Franke.

Step inside the Large Hadron Collider (360 video)—BBC News https://www.youtube.com/watch?v=d\_OeQxoKocU&index=1& list=PLS3XGZxi7cBXqnRTtKMU7Anm-R-kyhkyC "A 360 tour of CERN that takes you deep inside the Large Hadron Collider—the world's greatest physics experiment with BBC Click's Spencer Kelly."

And so on…

# 8.2. General Considerations

We have reviewed numerous applications of VR many of which were already envisioned or developed in its earlier forms in the 1980–1990s and have been more extensively developed and tested in the last 25 years. In most cases, the societal reach has been restricted given that the VR systems used (in combination or not with robotics, tracking, etc.) were too costly to move out the research laboratories and reach consumers. There has nevertheless been significant testing and validation of potential applications in many different areas.

This article has shown that the applications of VR are very extensive and range across numerous domains of knowledge. This means that even though the most frequent use that the mass of people are going to experience as a consumer product will probably be for games and entertainment, all advances and developments in VR will also have an impact in more specialized research and professional fields. More affordable systems will facilitate not only the reach to final consumers but also to more developers and research groups, resulting in a much wider range of applications and generation of content for VR that will emerge in the near future.

Even though applications in psychology, medicine, education, or research will reach many, there are some sectors of the population that may be also directly benefited from VR: those with reduced mobility for any reason, lesions, neurological disorders, or aging. To such people VR may provide a new space to move freely, interact, or work. This could be achieved by acting in VR through various means including motor action, BCIs, eye tracking, or physiological responses.

Finally, we also point out that since the use of VR in these many application realms should be evidence-based, that scientific papers should adhere to the highest standards of rigor and reporting. In the hundreds of papers we have reviewed in the preparation of this article, there are many that do not even say what type of equipment was being used. The term "virtual reality" has been overused, when scientific papers are often simply talking about a PC display with a mouse, and the reader has to look very hard through the paper in order to discover that – if is stated at all.

# 8.3. Speculations – "I've seen things **…**"

"I've seen things you people wouldn't believe; attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost, in time, like tears in rain. Time to die." (Replicant Roy Batty, near the closing scene of the movie Blade Runner).129

In the introduction to this article, we defined our notion of "immersion" as the "physics" of a system – how well it can afford people real-world sensorimotor contingencies for perception and action. We pointed out that this also offers a way of ordering systems – where one system is "more immersive" than a second if the first can be used to simulate experiences on the second, but not *vice versa*. We used this classification, for example, to show that model-based VR is "more immersive" than 360° VR, so that these have different functionality and uses.

Yet, this raises a paradox. Immersive VR simulates experiences of physical reality. Does that mean that VR is more "immersive" than reality? Like any paradox, this helps us to understand the underlying concepts. *There must always be some aspect of the VR that does not conform with reality.* This is certain. Why? Because were it not the case then what the participant experiences would be his or her reality! This is not word play but rather illustrates a fundamental aspect of VR. The reader may respond – "Yes, but it is only a matter of time before the graphics, sound, tracking, haptics, etc. become so advanced that people will not be able to distinguish a VR experience from a real one, just like nowadays it is becoming difficult to distinguish pictures or videos that are photographs of real world scenes from those that are wholly generated by graphics." However, in order for the VR to be indistinguishable from reality, the participant *would have to not remember that they had* "*gone into*" *a VR system*. Even if the devices become almost completely transparent and just a part of normal clothing, still the participant has to not know, in other words, has to forget that this is VR, has to forget pressing the button, or having the right thought in a BCI that commands: "Now put me into VR." If it goes so far that they do not remember getting into VR and they consider that they are directly perceiving physical reality, then *they are perceiving their own physical reality*.

When we think of VR we are typically thinking about experiences in the visual and auditory domains, rather than haptics (touch and force feedback). The field of haptics has excellent solutions for specific types of interaction, such as pushing a needle through soft tissue (as in medical applications), or using an exoskeleton to apply force feedback to an arm. However, unlike the visual and auditory fields, there is no *generalized* solution. By a generalized solution we mean a single device whereby participants in a VR can feel *anything* (just as a display can be programmed to display anything), for example, feel something when their virtual body accidentally brushes against a virtual wall or fall backwards when hit by a tidal wave of virtual water. As argued by Slater (2014), solutions to such issues may well have to go down the route of direct brain interfaces to solve such fundamental problems in a general way that can never be solved with external devices, which in the haptics domain always provide very specific stimuli. VR would become an applied branch of neuroscience in this view. Since as we and others have argued before our notion of reality is a constructed one, by activating the appropriate brain areas, our perception in this type of VR based on direct neural intervention would be indistinguishable from perception of "reality." As the philosopher Thomas Metzinger has pointed out130 we are about to embark on an enormous process of new learning through mass availability of VR: "The real news, however, may be that the general public will gradually acquire a new and intuitive understanding of what their very own conscious experience really is and what it always has been" – that our conscious experience is one possible model – an interpretation – of the world.

Now, let us imagine the perfect VR system with *perfect immersion*, so perfect that for most people it is completely indistinguishable from reality – it *is* their reality (recall that they must not remember that they "went into VR" and likewise they must not know when they "come out of VR"). Again seemingly paradoxically in such a situation the notion of *presence* vanishes. There is no sense of presence in physical reality. Presence is the feeling of being transported to another place. This is why our notion of "place illusion" as "being there" includes the rider "…*in spite of the fact* 

<sup>129</sup>http://www.warnerbros.com/blade-runner

<sup>130</sup>http://edge.org/response-detail/26699 Edge "Virtual Reality Goes Mainstream: A Complex Convolution."

*that you know for sure that you are not actually there.*" It contains an element of surprise: "I know I am at home wearing a HMD, but I feel as if I am in the Himalayas." In physical reality, there is no perceptual surprise, no feeling "Wow! Look at that, it is amazing that I am here!" (except, for example, as a way of expressing good fortune at being in a fabulous place). We are just "here." We do not comment on it or think about it from the perceptual point of view – only sometimes at the content of our perception – the scenery or surprising events. There is no special or remarkable feeling associated with being in a place. It is how things always are. The only time we might feel something unusual is when some aspect of our perception breaks – for example, through mental illness, hallucinogens, the aftermath of an injury – where we find ourselves outside of the reference frame of our normal perception. In the movie The Matrix,131 almost everyone was living in perfect immersion, perfect VR. They only became aware of "presence" (i.e., that their world was illusory) at moments when the system failed.

Hence, the illusion of presence actually represents the nonperfection of immersion. On one side, as we improve immersion more and more through technical advances what this means in terms of "presence" is that the "wow" factor, the sensation of the difference between where we know ourselves to be, but where we feel ourselves to be, i.e., the level of illusion, will become stronger and stronger. The shock of putting on the HMD and seeing an alternate reality in high-resolution, all around, with fantastic vision, sound, haptics, smell, taste, and full body tracking will become overwhelming. But, on the other side, when immersion becomes perfect – to the point that we do not in any way distinguish between perception of reality and VR even to the extent of not knowing when we are perceiving from one rather than the other – then presence will disappear.

However, it is also possible that the surprise element of "presence" will disappear for another reason. Imagine the generation that grows up where VR is just as much part of their lives as cell phones are today. Although they will distinguish reality from VR, their illusion of presence may diminish because the surprise element will disappear through acclimatization. Older generations today still marvel at being able to have real-time video connections at virtually zero added cost with people half way around the world, but a younger generation that is growing up with that find it completely unremarkable. So, this new generation that grows up with VR will of course have the illusion of "being there" in VR, but it will be nothing special, and therefore there will be all the more reason that they will tend to behave the same in VR as they do in reality in similar circumstances. It will be like: Now I am at home. Now I am at school. Now I am in place X in VR. They will become equivalent perceptually, cognitively, and behaviorally. But, just as kids learn "Don't run in the school corridor," "Don't shout in the classroom," so they will learn different forms of behavior that apply to different places in different modes of reality. VR will have its own customs, norms of behavior, and politeness. Today all we can say is that however we imagine this might be – it won't be like that, since it will be the result of an unpredictable and complex product of technological advance and social evolution.

We have used the term "presence" slightly loosely here. Recall that there are two components: PI (resting as a necessary condition on sensorimotor contingencies) and Psi (the illusion that events are real). The latter is just as critical and maybe more difficult to get right in many applications. For example, in a real street we might avoid parking our car because we see a police officer standing nearby. On closer inspection we realize that the police officer is actually a manikin dummy. So we park. This is a failure of Psi of the dummy. In VR, we are enjoying talking to a very nice virtual person. Eventually, we realize that the virtual person is going through some repetitive actions and is not actually aware of what we are doing. We move away. This is a failure of Psi, even though our illusion of being in the place is intact. Both PI and Psi are critical components of successful VR applications.

Virtual reality, however, can deliver forms of Psi that have never existed in reality and yet still lead to the illusion of these happening. In Slater et al. (1996), we put people in a VR where they could play 3D chess (like in Star Trek). Not one person was shocked or made any comment about the fact that when they touched the chess pieces these would float in the virtual space to their next location. When asked about this one participant said: "Oh that's just how things behave in this reality." So Psi is a difficult concept. In some circumstances, expectations cannot be broken. In others VR can create new expectations that seem completely natural even though they could never happen in physical reality. This is something really worth understanding, and it is connected to our final point.

Virtual Reality encompasses virtual *un*reality. Almost all the applications we have reviewed, and a lot of what we see, translate something from reality into VR. A fear of heights application puts people … on a height. A fear of public speaking application puts people … in front of an audience. These are fine. However, maybe there are completely new ways to think about these types of applications that make use of the *amazing power to put people outside of the bounds of reality and have a positive effect*. Even though VR has been around for half a century, still not enough is known about it. The goal is to shape it to create moments that enhance the lives of people and maybe help secure the future of the planet.

And those moments need not be lost.132

# AUTHOR CONTRIBUTIONS

All the authors listed have made substantial, direct, and intellectual contribution to the work and approved it for publication.

# ACKNOWLEDGMENTS

Thanks to James Hairston of Oculus for his support of this work. In addition, the authors thank the following people who have provided images or video that appear in the Supplementary Presentations: Abderrahmane Kheddar, Aitor Rovira, Albert 'Skip' Rizzo, Anatole Lécuyer, Angus Antley, Anthony Steed,

<sup>131</sup>http://www.warnerbros.com/matrix

<sup>132</sup>https://www.youtube.com/watch?v=NoAzpa1x7jU&feature=youtu.be ("I've seen things …" Blade Runner).

Antonio Frisoli, Barbara Rothbaum, Christoph Guger, Daniel Freeman, Doron Friedman, Emmanuele Tidoni, Ferran Argelaguet, Franck Multon, Franco Tecchia, Greg Welch, Henry Fuchs, Henry Markram, Hunter Hoffman, Jeremy Bailenson, Jordi Moyes Ardiaca, Larry Hodges, Louis Jebb, Lucia Valmaggia, Mark Huckvale, Nonny de la Peña, Pablo Bermell, Pere Brunet, Rafi Malach, Robert Riener, Salvatore Aglioti, Stephen Ellis, Sylvie Delacroix, Will Steptoe, Xueni (Sylvia) Pan, Yiorgos Chrysanthou, and Zillah Watson.

# REFERENCES


# FUNDING

This work was funded by Oculus VR, LLC, a Facebook Company.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/frobt.2016.00074/ full#supplementary-material.


interaction patterns? *Transport. Res. C Emerg. Technol.* 29, 1–13. doi:10.1016/ j.trc.2012.12.009


reality platforms," in *Proc. SPIE 9392, The Engineering Reality of Virtual Reality 2015* (San Francisco, CA: International Society for Optics and Photonics), 939202–939214.


Lanier, J. (2006). Homuncular flexibility. *Edge* 26, 2012. Available at: https://www. edge.org/response-detail/11182

Lanier, J. (2010). *You Are Not a Gadget: A Manifesto*. New York: Random House.


*Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection* (Limassol: Springer), 572–579.


Noë, A. (2004). *Action in Perception*. Cambridge, MA: MIT Press.


exposure therapy system for combat-related PTSD. *Ann. N. Y. Acad. Sci.* 1208, 114–125. doi:10.1111/j.1749-6632.2010.05755.x


**Conflict of Interest Statement:** The authors were approached by the company Facebook to write an article on potential applications of VR. After completion, the article was subject to a review by the Facebook legal team. There was neither implicit nor explicit encouragement to promote or favor any Facebook products or services. The authors were free to write about virtual reality as they wished. The work is a review of virtual reality in general and not related to any particular products, software, or services.

*Copyright © 2016 Slater and Sanchez-Vives. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*