# VIRTUAL AND AUGMENTED REALITY METHODS IN NEUROSCIENCE AND NEUROPATHOLOGY

EDITED BY : Valerio Rizzo, Thomas D. Parsons and Pietro Cipresso PUBLISHED IN : Frontiers in Human Neuroscience

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-299-9 DOI 10.3389/978-2-88966-299-9

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# VIRTUAL AND AUGMENTED REALITY METHODS IN NEUROSCIENCE AND NEUROPATHOLOGY

Topic Editors:

Valerio Rizzo, University of Palermo, Italy Thomas D. Parsons, University of North Texas, United States Pietro Cipresso, Istituto Auxologico Italiano (IRCCS), Italy

Citation: Rizzo, V., Parsons, T. D., Cipresso, P., eds. (2020). Virtual and Augmented Reality methods in Neuroscience and Neuropathology. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-299-9

# Table of Contents


Panagiotis Kourtesis, Simona Collina, Leonidas A. A. Doumas and Sarah E. MacPherson


Mariano Alcañiz Raya, Irene Alice Chicchi Giglioli, Javier Marín-Morales, Juan L. Higuera-Trujillo, Elena Olmos, Maria E. Minissi, Gonzalo Teruel Garcia, Marian Sirera and Luis Abad

*113 Usability Issues of Clinical and Research Applications of Virtual Reality in Older People: A Systematic Review*

Cosimo Tuena, Elisa Pedroli, Pietro Davide Trimarchi, Alessia Gallucci, Mattia Chiappini, Karine Goulene, Andrea Gaggioli, Giuseppe Riva, Fabrizia Lattanzio, Fabrizio Giunco and Marco Stramba-Badiale

*132 Agency and Performance of Reach-to-Grasp With Modified Control of a Virtual Hand: Implications for Rehabilitation*

Raviraj Nataraj, Sean Sanford, Aniket Shah and Mingxiao Liu


Azzurra Rizzo, Giuditta Gambino, Pierangelo Sardo and Valerio Rizzo

# Immersive Virtual Reality as an Adjunctive Non-opioid Analgesic for Pre-dominantly Latin American Children With Large Severe Burn Wounds During Burn Wound Cleaning in the Intensive Care Unit: A Pilot Study

Hunter G. Hoffman<sup>1</sup> \*, Robert A. Rodriguez 2,3, Miriam Gonzalez 2,3, Mary Bernardy <sup>3</sup> , Raquel Peña2,3, Wanda Beck <sup>3</sup> , David R. Patterson<sup>4</sup> and Walter J. Meyer III 2,3

 Department of Mechanical Engineering, College of Engineering, University of Washington, Seattle, WA, United States, Psychiatry and Behavioral Sciences, University of Texas Medical Branch at Galveston, Galveston, TX, United States, Shriners Hospitals for Children, Galveston, TX, United States, <sup>4</sup> Department of Rehabilitation Medicine, University of Washington, Seattle, WA, United States

#### Edited by:

Valerio Rizzo, University of Palermo, Italy

#### Reviewed by:

Marco Fyfe Pietro Gillies, Goldsmiths University of London, United Kingdom Daniel Simon Harvie, Griffith University, Australia

> \*Correspondence: Hunter G. Hoffman hoontair@gmail.com

Received: 09 April 2019 Accepted: 11 July 2019 Published: 08 August 2019

#### Citation:

Hoffman HG, Rodriguez RA, Gonzalez M, Bernardy M, Peña R, Beck W, Patterson DR and Meyer WJ III (2019) Immersive Virtual Reality as an Adjunctive Non-opioid Analgesic for Pre-dominantly Latin American Children With Large Severe Burn Wounds During Burn Wound Cleaning in the Intensive Care Unit: A Pilot Study. Front. Hum. Neurosci. 13:262. doi: 10.3389/fnhum.2019.00262 Background/Aim: Using a within-subjects, within-wound care design, this pilot study tested for the first time, whether immersive virtual reality (VR) can serve as an adjunctive non-opioid analgesic for children with large severe burn wounds during burn wound cleaning in the ICU, in a regional burn center in the United States, between 2014–2016.

Methods: Participants included 48 children from 6 years old to 17 years of age with >10% TBSA burn injuries reporting moderate or higher worst pain during no VR on Day 1. Forty-four of the 48 children were from developing Latin American countries. Patients played adjunctive SnowWorld, an interactive 3D snowy canyon in virtual reality during some portions of wound care, vs. No VR during comparable portions of the same wound care session (initial treatment condition randomized). Using Graphic Rating scales, children's worst pain ratings during "No VR" (treatment as usual pain medications) vs. their worst pain during "Yes VR" was measured during at least 1 day of wound care, and was measured for up to 10 study days the patient used VR.

Results: VR significantly reduced children's "worst pain" ratings during burn wound cleaning procedures in the ICU on Day 1. Worst pain during No VR = 8.52 (SD = 1.75) vs. during Yes VR = 5.10 (SD = 3.27), t(47) = 7.11, p < 0.001, SD = 3.33, CI = 2.45–4.38, Cohen's d = 1.03 (indicating large effect size). Patients continued to report the predicted pattern of lower pain and more fun during VR, during multiple sessions.

Conclusion: Immersive virtual reality can help reduce the pain of children with large severe burn wounds during burn wound cleaning in the Intensive Care Unit. Additional research and development is recommended.

Keywords: virtual reality, pain, pediatric burn injuries, analgesia, critical care, burn, opioid, developing countries

## INTRODUCTION

Acute pain is a frequent medical problem world wide, but children with large severe burn injuries (e.g., 40% TBSA) experience some of the most painful procedures in medicine. During the course of their weeks in the hospital burn center's intensive care unit, children with large severe burns must have their wounds cleaned/scrubbed frequently to prevent infection and speed up healing. Opioid analgesics are widely regarded as effective and essential tools for acute pain management (Malchow and Black, 2008; Vijayan, 2011; McIntyre et al., 2016; Ballantyne, 2018; Krane, 2019). According to Berterame et al. (2016, p. 1664) "In developing countries, access to opioids is very limited. In 2009, more than 90% of worldwide use of opioid analgesics occurred in the USA, Canada, Australia, New Zealand, and several European countries. Use in that year was deemed low in 21 countries and very low in more than 100." Patients in Latin American often have limited access to opioids for pain control (used for both analgesia and anesthesia). Yet even in the U.S.A., there are currently shortages of pharmaceutical medical opioid analgesics needed for acute pain control during medical procedures (Davis et al., 2018). And because of a large increase in opioid related overdose deaths unrelated to burn patients (Chen et al., 2019), there is growing political and legal pressure to further reduce reliance on opioids for pain control in the U.S.A.

For patients treated with opioid pain medications (e.g., patients treated in regional hospital burn centers in the United States), opioid side effects (Dunwoody and Jungquist, 2018) limit dose levels, limiting analgesic effectiveness (Cherny et al., 2001; Malchow and Black, 2008; Clark et al., 2017; Ballantyne, 2018). And opioid tolerance/habituation is a challenge for patients with large severe burns (Bittner et al., 2015), who typically receive the same painful procedures over and over, several times per week, often daily, during several weeks of hospitalization. Excessive pain and/or repeated high opioid doses can pathologically alter the patients pain perception system, disrupting the patient's natural endogenous opioid analgesia system (Schwaller and Fitzgerald, 2014; Ballantyne, 2018; Chambers, 2018), and can increase patient's risk of developing chronic pain, anxiety disorders, and/or Post-Traumatic Stress Disorder (McGhee et al., 2011; Rosenberg et al., 2015, 2018; Pardesi and Fuzaylov, 2017; Peña et al., 2017).

Psychological factors such as fear, anxiety, and depression can increase or amplify how much pain patients subjectively experience during painful medical procedures (Hemington et al., 2017; Nitzan et al., 2019), making pain management even more challenging. What people are thinking about during wound care, and where patients direct their attention during medical procedures can influence pain intensity (Melzack and Wall, 1965). For example, if patients predict wound care is going to be painful, that can make their pain worse. According to Fields (2018, p. S8) ". . . expectation of pain becomes a self-fulfilling prophecy through top down amplification of the pain signal," and memories of previous painful procedures can also increase pain intensity (Noel et al., 2015).

Fortunately, just as psychological factors can make pain worse, psychological treatments can help reduce acute pain during medical procedures. For example, distraction techniques (e.g., music) are widely used in clinical practice, and can be used in addition to traditional pain medications to help control pain during burn wound care. Some studies show strong benefits of music therapy during burn wound care in patients (e.g., Rohilla et al., 2018). But in other studies the benefits of listening to music during burn wound care had small effect sizes and/or nonsignificant results (Fratianne et al., 2001; Bellieni et al., 2013; van der Heijden et al., 2018), and/or involved patients with small burn wounds (e.g., 5% TBSA, Hsu et al., 2016).

For the extreme pain levels experienced by children with large severe burn wounds during burn wound debridement in the intensive care unit, creating stronger non-pharmacologic pain control techniques is a national and international priority (Keefe et al., 2018).

Immersive virtual reality is a promising new non-opioid psychological pain distraction technique. There is growing evidence that adjunctive immersive virtual reality distraction can significantly reduce how much pain patients experience during a growing number of different painful medical procedures e.g., during urological endoscopies, physical therapy after surgery for cerebral palsy, venipuncture for onco-therapy, and pediatric dental procedures (Hoffman et al., 2011; Garrett et al., 2014; Scheffler et al., 2017; Atzori et al., 2018a,b; Indovina et al., 2018; Honzel et al., 2019).

Brain scan studies provide converging evidence that VR reduces acute pain. Using neuroimaging assessments, a laboratory functional magnetic resonance imaging study found that in addition to reducing subjective pain ratings, VR reduced pain-related brain activity (Hoffman et al., 2004b). In a second fMRI brain scan study, the amount of pain reduction from VR alone was comparable to the amount of pain reduction from a moderate dose of hydromorphone, and "VR + opioids" combined resulted in the largest pain reductions (Hoffman et al., 2007).

The logic for why VR would reduce pain is based on an attentional mechanism (Hoffman, 1998; Hoffman et al., 2000, 2006). The essence of immersive virtual reality analgesia is the patient's illusion of going to a different place, the subjective experience of "feeling present" in the computer generated world, as if the virtual reality world is a place they are visiting (Slater and Wilbur, 1997). Human brains are limited in how much information they can process (Kahneman, 1973). Pain requires attention. Researchers argue that the illusion of "being there" in virtual reality is unusually attention grabbing, reducing the amount of attentional resources the patient's brain has available for pain perception (Hoffman, 1998; Hoffman et al., 2000, 2003; Hoffman et al., 2004a).

According to a gate control theory explanation of psychological analgesia (Melzack and Wall, 1965, p. 978), ". . . psychological factors such as past experience, attention, and emotion can influence pain response and perception. . . ." Melzack and Wall proposed that the brain may inhibit nociceptive signals.

Regardless of the mechanism, several small clinical studies have shown encouraging preliminary evidence that adjunctive VR can help reduce pain during burn wound care in adults (Hoffman et al., 2004a, 2011; van Twillert et al., 2007; Maani et al., 2011a,b; McSherry et al., 2018), and in children with small burns, (Hoffman, 1998; Hoffman et al., 2000; Faber et al., 2013; Jeffs et al., 2014; Khadra et al., 2018). There is also preliminary evidence that VR is more effective than conventional distractions such as video games or movies. In the first study to report using immersive virtual reality for pain control during a medical procedure, two adolescent boys with large burn injuries underwent staple removals from healing burn skin grafts during immersive VR vs. while playing a Nintendo video game (no VR). Both patients reported large reductions in pain during staple removal during immersive virtual reality compared to their pain during staple removal while playing the (no VR) traditional Mario Kart Nintendo video game (Hoffman et al., 2000) during the same wound care session. More recently, in a study by Jeffs et al. (2014) adolescent burn patients with small burns (5% TBSA) treated in an outpatient clinic reported significantly lower pain during virtual reality compared to a group that watched a movie during wound cleaning.

There are a number of barriers to using VR in the ICU tubroom. The patients in the current study had a burn size of 40% Total Body Surface Area (TBSA). As is often the case for patients with such unusually large severe burn injuries, most of the burn patients in our study had head and face burns, preventing them from wearing a conventional commercially available head mounted VR helmet. Furthermore, even when treated with powerful pain medications, pain during burn wound care procedures in the ICU hydrotank is often "severe to excruciating," which may make it harder for children to concentrate enough to play in VR during wound care. In theory, pain may become so attention grabbing that psychological distraction techniques cannot compete with pain for the patient's limited attention (Eccleston and Crombez, 1999; Eccleston, 2001). In other words, some patients may not benefit from VR if their acute procedural pain becomes too intense. Similarly, traditional distraction may fail if patients feel threatened during the wound care (McCaul and Malott, 1984; Crombez et al., 1998). High catastrophizers (people who have unusually negative emotions and pessimistic beliefs about their ability to deal with the upcoming pain) may have difficulty disengaging attention from pain information (Verhoeven et al., 2012; Van Loey et al., 2018).

To address these challenges, using a custom water-friendly VR system, the current pilot study tests for the first time, whether adjunctive virtual reality can reduce the acute procedural pain of children with large severe burn injuries during burn wound debridement/cleaning in the pediatric intensive care unit, in an understudied patient population, critically injured pediatric patients.

We hypothesize that compared to standard of care (standard pain medications + No VR), during adjunctive Yes VR, children will report significant reductions in worst pain ratings. Our secondary hypothesis is that during VR, children will report significant reductions in pain unpleasantness, and will spend less time thinking about pain during burn wound debridement in the ICU hydrotank. We further hypothesize that VR will increase how much fun patients have during wound care, and that patients will be more satisfied with their pain management during VR.

## MATERIALS AND METHODS

This research was conducted between Jan 2014 and Dec 2016, in accordance with the Declaration of the World Medical Association (www.wma.net). The studies were approved by the IRB from UTMB, and all participants and their parents/legal guardians provided written informed consent/assent in accordance with the Declaration of Helsinki.

Most of the children in the current study were transported from Latin America to Shriners Hospitals for Children in Galveston Texas, U.S.A., where they were hospitalized, treated, and returned to their country of origin, post-discharge.

## Inclusion Criteria

Children were included in the study if they were (1) compliant and able to complete subjective evaluations, (2) had no history of previous psychiatric (DSM-III-R Axis I) disorder(s), (3) were not demonstrating delirium, psychosis, or any form of organic brain disorder, (4) were able to communicate verbally in English or Spanish, and (5) had moderate or higher worst pain during no VR on Day 1, (6) were admitted to Shriners Hospitals for Children: Galveston Texas/University of Texas Medical Branch.

Children were excluded from the study if (1) they had a burn size <10% TBSA, (2) they were not capable of completing the study measures, (3) if no wound cleaning sessions were required, (4) if they had a history of previous psychiatric (DSM-III-R Axis I) disorder(s), (5) if they were demonstrating delirium, psychosis, or organic brain disorder, (6) if the child was unable to communicate verbally in English or Spanish, (7) if they had a history of significant cardiac, endocrine, neurologic, metabolic, respiratory, gastrointestinal, or genitourinary impairment, (8) if they were receiving prophylaxis for alcohol or drug withdrawal, (9) if they had a developmental disability, (10) if they were younger than 6 years old, (11) if they were older than 17 years old, or (12) if they had burns of eyes, eyelids, or face so severe the burns precluded the use of VR equipment, (13) or if patients reported having a previous history of severe motion sickness.

## Equipment

The current study introduced for the first time, a new portable water-friendly VR system customized for the unique needs of pediatric patients with large severe burn injuries during wound care in the intensive care unit hydrotank. As shown in **Figure 1**, a custom robot-like articulated arm goggle holder was used in the current study to hold a pair of VR goggles near the patient's eyes, so patients did not have to wear a VR helmet on their head. This "Magula arm" robot-like goggle holder minimized or ideally eliminated contact between the patient and the VR goggles. The VR goggles largely blocked the patient's view of the Intensive Care Unit hydrotank room. The goggles were MX90 VR goggles, from NVIS.com, with 90 degrees field of view diagonal, per eye, and 1,280 × 1,024 pixels resolution per eye. All of the VR equipment in the current study was battery powered. A battery powered laptop and battery powered audio-visual unit were used with the MX90 VR goggles. The 90 degrees diagonal field of view goggles increased the amount of peripheral vision stimulated. During the VR condition, patients were encouraged to interact with

FIGURE 1 | A patient playing SnowWorld during burn wound debridement in the ICU tankroom. Photo and copyright Hunter Hoffman, www.vrpain.com.

the virtual environment via a wireless computer mouse. Stereo speakers helped isolate patients from hearing hospital sounds. The custom robot-like articulated arm goggle holder was securely mounted to the frame of the Anthro medical cart. The VR goggles orientation could be adjusted and locked into position for a patient who was sitting up during wound care, or the goggles could be rotated and locked into position for a patient who was lying on their backs during wound care (see **Figure 2**). The goggles stayed in one position, and the patient used their wireless mouse to look around, aim and shoot snowballs in SnowWorld (mouse-tracking instead of head tracking).

The portable robot-like arm goggle holder was designed by Hoffman and Magula and built by Jeff Magula, an advanced instrument maker at the University of Washington in Seattle. Once finished, the water-friendly VR system was then safety inspected by Clinical Engineering at the University of Washington, and was inspected again by Clinical Engineering at Shriners Hospitals for Children. The equipment was also approved for use in the Intensive Care Unit and the equipment cleanliness was monitored by infection control at Shriners Hospitals for Children. After each use, the VR cart/portable VR system was returned to the Psychology Department at Shriners Galveston, where it was plugged in to recharge the batteries after each use. As shown in **Figure 2**, the goggles were partially covered with disposable plastic, which was discarded after each use. The equipment was systematically disinfected after each use using chemical disinfectants, and was periodically supercleaned using ultraviolet radiation (using a portable UV lamp wand, UV protective glasses, while wearing latex gloves). For example, the UVC Blade Handheld Germicidal Fixtures by American Ultraviolet. The VR system was periodically tested for pathogens, using swabs that were then analyzed by Shriners infection control, to test for the presence of bacteria. Culture samples (swabs) were sent to the microbiology laboratory at Shriners hospital in Galveston for immediate analysis. The postcleaning tests all came back as "safe" (no pathogens). There

FIGURE 2 | A patient looking into VR goggles during burn wound debridement in the ICU tank room. Photo and copyright Hunter Hoffman, www.vrpain.com.

was no significant problem with infection, using the current VR system, which minimized or eliminated physical contact between the patient and the VR goggles.

## MEASURES

After each wound care session, subjects received the following instructions once prior to answering each of five separate questions. "Please indicate how you felt during wound care today by making a mark anywhere on the line. Your response doesn't have to be a whole number."

For the primary dependent measure, using Graphic Rating Scales (GRS), after the wound care session, patients answered the following GRS ratings. Pain was measured using Graphic Rating Scales (GRS) (Jensen and Karoly, 2001; Jensen, 2003). In the current study, the GRS tool was used to assess three reports of the pain experience ("worst pain," "pain unpleasantness," and "time spent thinking about pain") that correspond to three separable components of the pain experience; sensory pain, affective pain, and cognitive pain, respectively. The GRS is a 10-unit horizontal line labeled with number and word descriptors. Descriptor labels were associated with each mark to help the respondent rate pain magnitude in each domain. For worst pain, the GRS descriptors were no pain at all, mild pain, moderate pain, severe pain, and excruciating pain. For pain unpleasantness, the GRS descriptors were not unpleasant at all, mildly unpleasant, moderately unpleasant, severely unpleasant, and excruciatingly unpleasant. For time spent thinking about pain, the GRS descriptors were none of the time, some of the time, half of the time, most of the time, all of the time.

The Graphic Rating Scale has previously been used to assess pain intensity in children eight and older and has been documented to be the preferred report method for young children (Tesler et al., 1991). The GRS is more sensitive than simple descriptive pain scales and patients can easily answer these pain ratings despite having no previous experience. Visual Analog Scales have been validated for use in children aged 7 and higher (Bringuier et al., 2009).

A single rating "to what extent did you feel like you 'went into' the virtual world," adapted from Slater et al. (1994) was also used in the present study to assess user presence in the virtual world. Descriptor labels were I did not feel like I went inside at all, mild sense of going inside, moderate sense of going inside, strong sense of going inside, I went completely inside the computer generated world. Hendrix and Barfield (1995) showed the reliability of a similar VR presence rating. The measure's ability to detect treatment effects (Hoffman et al., 2004c) is preliminary evidence of our VR presence measure's validity. Patients also rated how real the objects seemed in virtual reality, descriptors were completely fake, somewhat real, moderately real, very real, indistinguishable from a real object. Patients rated how satisfied they were with their pain management during No VR vs. during VR, with descriptors completely unsatisfied, mostly unsatisfied, half satisfied, mostly satisfied, completely satisfied, and patients rated nausea as a result of VR, using a graphic rating scale with descriptors no nausea at all, mild nausea, moderate nausea, severe nausea, vomit. All text was translated into Spanish for Spanish speaking participants using an official translator (90% of the participants in this study were Spanish only speaking). To assess whether patients in the upper quartile on catastrophizing showed pain reduction during immersive Virtual Reality, we administered the Pain Catastrophizing Scale for Children (PSC-C) (Sullivan et al., 1995; Crombez et al., 2003). The PCS total score is calculated by summing the 13-item responses, and provides a good index of the catastrophizing construct through the inclusion of highly correlated subscales of helplessness, rumination, and magnification. Higher scores on the PCS-C are indicative of greater pain-related catastrophizing. The PCS-C has been validated for use with children (Crombez et al., 2003).

## Experimental Design

There is high variability in the analgesic effectiveness of any given dose of pharmacologic analgesia from one burn wound care session to the next (Khadra et al., 2018). And furthermore, pain medication dose levels can also vary from day to day. For these reasons, in the current preliminary study, a statistically powerful within-subjects, within-wound care design was used (Maani et al., 2011a). During VR, patients played SnowWorld, an interactive 3D snowy canyon in virtual reality during some portions of wound care, vs. No VR during comparable portions of the same wound care session. Childrens' worst pain during "No VR" (treatment as usual pain medications) vs. their worst pain during "Yes VR" was measured during at least 1 day of wound care, and was measured for up to 10 study days the patient used VR. Initial treatment order was randomized using blocked randomization, based on random number sequences generated using www.random.org. All patients received their

FIGURE 3 | SnowWorld. An icy 3D canyon in virtual reality. Image by Ari Hollander and Howard Rose, copyright Hunter Hoffman, www.vrpain.com.

usual pain medications on all study days, i.e., VR was always used adjunctively, in addition to usual traditional pain medications.

During wound care, the nurses cut off and removed the patient's gauze bandages, and began cleaning the patients burn wounds, using warm wet washcloths and a hand held warm water shower hose to scrub and rinse away dead tissue and debris out of the burn wound. During wound debridement, patients received No VR and Yes VR during approximately equally painful portions of the same wound care session. The patient began receiving wound care for 5 min with Yes VR vs. 5 min with No VR, Yes VR for five more minutes, etc. repeatedly alternating between No VR and Yes VR every 5 min. Whether patients received Yes VR or No VR during the first 5 min treatment segment was randomized (blocked randomization using a random sequence generated at random.org). During the portions of their burn wound care that they received VR, the research staff positioned the VR goggles weightlessly near the patient's eyes, with little or no physical contact between the VR goggles and the patient, using a robot-like-arm goggle holder (Maani et al., 2008). The patient looked into the VR goggles, and interacted with the virtual reality world.

All patients used SnowWorld (see **Figure 3**) during all VR sessions. SnowWorld is a non-profit VR world specifically designed for pain distraction of immobilized severe burn patients, including children. SnowWorld is designed to give burn patients the illusion of going inside a snowy 3D canyon (Hoffman et al., 2001, 2004b,c; see also Bloemink et al., 2006, p. 104– 106). In SnowWorld (www.vrpain.com), patients interacted with snowmen, igloos, penguins, wooly mammoths, and flying fish by throwing snowballs, using a wireless computer mouse to aim and trigger snowballs while keeping their heads and bodies motionless. During VR, patients heard music (e.g., Paul Simon's song Graceland, and several Spanish songs), and 3D sound effects e.g., ice breaking when a snowball hits a snowman. Mammoths trumpeted angrily when pelted.

After the wound care session was over, patients briefly rated how much pain they had experienced during No VR vs. during Yes VR using graphic rating scales. The patient's burns were rebandaged, the patient was wheeled back to their hospital room and returned to their hospital beds, and the research staff thoroughly cleaned and disinfected the VR equipment.

## Statistical Analyses

IBM SPSS (2018) statistical analyses of the primary and secondary hypotheses involved an apriori two-tailed withinsubjects paired t-test, with alpha = 0.05.

## RESULTS

Patients participated between January 2014 and December 2016. Out of the 62 patients initially screened, 48 pediatric patients met our apriori inclusion criterion of having a moderate or higher "worst pain" rating during No VR on Day 1 (33 hispanic males children, 11 hispanic female children from developing Latin American countries, and also three non-hispanic female children and one non-hispanic male from the United States). The mean size of the patient's severe burn injuries was 40 percent Total Body Surface Area (TBSA) burned, 28% third degree burns. Patients' ages ranged from 6 to 17 years of age at time of enrollment (Mean age was 12 years old). Seventy-seven percent of the patients had hand burns, 85% had arm burns, 44% had foot burns, 79% had leg burns, 71% had neck/head burns, 79% had trunk/torso burns, and 23% had groin burns. Regarding the (sometimes overlapping) etiology of their burns, 81% had burns involving flame, 6% scalds, 25% electrical, and zero patients had chemical burns.

## Test of Our Primary Hypothesis

The patients GRS pain ratings on Day 1 are shown in **Table 1** and **Figure 4**. On Day 1, on a zero to 10 graphic rating scale, using a paired t-test, VR significantly reduced children's "worst pain" ratings during burn wound cleaning procedures in the ICU. On Day 1, worst pain during No VR = 8.52 (SD = 1.75) vs. during Yes VR = 5.10 (SD = 3.27), t(47) = 7.11, p < 0.001, SD = 3.33, CI = 2.45–4.38, Cohen's d = 1.03, indicating a large effect size.

## Descriptive Statistics About "Worst Pain" Ratings

On Day 1, the number of patients reporting excruciating pain (worst pain = 10) during wound care was 22 patients, which dropped to only five patients reporting excruciating pain (worst pain = 10) during Yes VR, and Cohen's d showed a strong effect size of VR analgesia. However, many of those patients with pain of 10 during No VR only dropped to 8 during VR (i.e., they dropped from excruciating pain during No VR down to severe pain during VR, but still reported severe pain during adjunctive VR).

On Day 1, 40% of the 48 patients still reported pain of 7 or higher (severe to excruciating) during VR, despite receiving powerful traditional pharmacologic pain medications combined with immersive virtual reality.

On average, patients spent mean = 16.56 min of wound care during No VR vs. 12.89 min during VR, t(44) = 2.47, p < 0.05, SD = 9.97, CI =0.67–6.66, e.g., patients could not use VR while having their faces or heads cleaned. On Day 1, 14 of the 48 patients spent exactly the same amount of time during No VR (13.21 min) and during VR (13.21 min). These 14 patients also reported large and statistically significant reductions in pain during VR, worst pain during No VR = 8.50 (SD = 1.83), VR = 4.43 (SD = 3.08), t(13) = 4.56, p < 0.005, SD = 3.34, CI = 2.14–6.00.

The mean number of days that patients rated their pain during Yes VR vs. during No VR was 4 study days. Collapsed across days, VR significantly reduced worst pain: Worst pain during "No VR" (Mean = 7.09, SD = 2.10) vs. worst pain during "Yes virtual reality" (Mean = 4.29, SD = 2.55), t(47) = 7.32, p < 0.001, SD = 2.65, CI = 2.01–3.57, Cohen's d = 1.06, large effect size.

Consistent with the prediction that VR would continue to reduce pain when used day after day, a one-way within-subjects ANOVA comparing worst pain during "No VR" minus worst pain during "Yes VR" difference scores for days 1–7 showed no significant difference in the size of the VR analgesia effect over days 1–7, Wilks' Lamda F(4,6) = 1.50, p = 0.36, NS.

In exploratory analyses, patients scoring in the upper quartile on the children's pain catastrophizing score (PSC-C) in the current sample, showed significant VR analgesia. For patients scoring in the upper quartile on catastrophizing, mean worst pain during No VR = 7.00 (SD = 3.56), vs. VR = 2.86 (SD = 3.63), t(6) = 2.80, p < 0.05, SD = 2.56, CI = 0.34 vs. 5.09. Patients scoring in the lower quartile in the current sample also showed significant VR analgesia, mean worst pain ratings during No VR = 6.00 (SD = 4.04), and during VR = 2.86 (SD = 2.85), t(6) = 4.01, p < 0.01, SD = 1.03, CI = 1.61–6.67.

## Test of Secondary Hypotheses

As shown in **Table 1** and **Figure 4**, on secondary GRS measures, on Day 1, pediatric burn patients reported large and significant reductions in pain on secondary measures of "pain unpleasantness" and "time spent thinking about pain during wound care." Although children reported having 27% more fun during VR, the increase in fun on Day 1 was not statistically significant in the paired t-test. The children were significantly more satisfied with their pain management during VR, on average. Patients reported only a moderate illusion of "being there" inside the 3D computer generated world as if it was a place they visited. VR nausea was nearly zero (<1 on a 10 point scale).

The current study included 48 pediatric patients total. As shown in **Table 2**, in an exploratory analysis, to see if children from developing countries show VR analgesia, the subset (sub-analysis) of 44 patients from developing Latin American countries were analyzed separately from the four patients from the United States. As predicted, children from developing countries showed significant reductions in worst pain during VR, as well as significant reductions in pain unpleasantness (the emotional component of pain) and significant reductions in time spent thinking about pain during wound care (the cognitive component of pain). Encouragingly, analyzed separately, the four TABLE 1 | Means (Standard Deviation) in "No-VR" condition vs. "Yes-VR" condition.


All 48 patients (44 children from developing countries and also 4 children from the USA).

participants from the United States also showed the predicted patterns of large reductions of pain during VR.

## DISCUSSION

This pilot study provides preliminary evidence that immersive virtual reality can help reduce the pain of children with large severe burn wounds during burn wound cleaning in the Intensive Care Unit. Although using VR in the ICU hydrotank room was challenging and required creating custom equipment, in the current study, on Day 1, patients reported significant reductions in worst pain (pain intensity), children spent less time thinking about their pain during VR, children reported significant reductions in pain unpleasantness, and the children reported 27% higher ratings of fun during wound care during virtual reality. In addition, these pediatric patients were also significantly more satisfied with their pain management during virtual reality, they reported a moderate illusion of presence in VR (i.e., a moderately strong illusion of "being there" in the VR computer generated world during wound care), and VR nausea was nearly zero (<1 on a 10 point scale). Patients who received VR during more than 1 day of wound care continued to report the predicted pattern of reductions in worst pain during multiple wound care sessions. And patients with a tendency toward negative emotions and pessimistic beliefs about their ability to deal with the upcoming pain (i.e., patients in the upper quartile on catastrophizing), still benefitted from virtual reality distraction.

The reductions in worst pain ratings in the current study are similar to the pattern of VR analgesia reported in previous studies of 12 U.S. soldiers with combat-related burn injuries (TBSA of 21%) during wound care in their hospital beds. The soldiers spent 6 min in No VR vs. 6 min of wound care during VR (Maani et al., 2011a). In the current study the mean burn size was over 40%, the patients were all children, and the sample size was larger (n = 48 patients). Furthermore, in the current study, on average, patients spent over 12 min in VR and over 12 min in No VR, the wound care was conducted in the ICU instead of in the patients TABLE 2 | Means (Standard Deviation) in "No-VR" condition vs. "Yes-VR" condition.


Sub-analysis of only the 44 children from developing Latin American countries (excluding the four children from the USA).

hospital beds, and the current study is the first to use a portable water-friendly VR system.

## LIMITATIONS

The demographics and characteristics of the participants of this pediatric pain study may limit generalization of findings of this study to other populations. Of interest is that 44 of the 48 patients were Spanish speaking patients from developing Latin American countries. As predicted, in an exploratory sub-analysis, the 44 children from developing Latin American countries showed statistically significant reductions in pain during VR. Encouragingly, analyzed separately, the four participants from the United States also showed the predicted patterns of large reductions of pain during VR. The VR system used in the current study was customized for use in the ICU hydrotank room, for patients with head and facial burns. Future randomized controlled trials research is needed to determine whether the current results replicate, and generalize to other VR systems.

Despite these limitations, the current study makes several important original contributions to the literature, and the results of the current study could have important implications for clinical practice: (a) this is the first study ever to attempt to use virtual reality during burn wound care in the intensive care unit, (b) the patients had unusually severe burn injuries much larger than burn injuries treated in any previous burn debridement VR analgesia study, (c) all of the patients were children, and 44 out of the 48 patients were Spanish speaking children from developing Latin American Countries, (d), the current study shows for the first time that children with large severe burns were generally able to play SnowWorld during severely painful medical procedures, and (e) playing SnowWorld in virtual reality significantly reduced worst pain ratings during wound care.

In the current study, a custom portable water-friendly VR system was used that did not have to physically contact the patient. The equipment was carefully cleaned with sterilizing cloths after each use, and the equipment was periodically swabbed/tested by the hospitals infection control team to test for the presence of any bacterial or viral pathogens. There was no problem with infection in the current study, using the custom VR system, which minimized or eliminated physical contact between the patient and the VR goggles. For patients with limited ability to wear VR helmets, modified VR systems that reduce contact surfaces (Hoffman et al., 2014) are highly recommended for use of VR during burn wound care for patients with severe unbandaged head and/or face burns. We also recommend discarding disposable foam liners that touch the patients face, after each use. Burn patients are especially vulnerable to infections when unbandaged (during wound care), and VR equipment should be monitored by infection control, especially when used in the Intensive Care Unit.

## CONCLUSION

The results from the current pilot study support our hypothesis that immersive virtual reality can significantly reduce acute pain during burn wound care, even in pediatric patients with large severe burn wounds treated in the hydrotanks in the Intensive Care Unit. And VR continued to reduce pain when used day after day.

## FUTURE DIRECTIONS

Virtual reality (VR) may eventually prove to be "opioid sparing" during hospitalization (Kipping et al., 2012; McSherry et al., 2018). Additional research and development is needed on how to make VR analgesia more powerful (Wender et al., 2009), how to make pharmacologic pain medications more effective (McIntyre et al., 2016), and how to best combine pharmacologic pain medications and VR analgesia, to maximize total pain control. Development of more powerful new non-pharmacologic pain management techniques is a national and international priority (Keefe et al., 2018), and Virtual Reality has strong potential as a new direction for behavioral medicine (Keefe et al., 2012).

Fortunately, VR analgesia is not limited to severe burn patients, but could potentially be used for a wide range of painful medical procedures, and could be especially valuable for highly populated, lower income developing countries (4/5ths of the World's population), where large severe burns and other serious injuries are more common, and powerful pharmacologic analgesics are more scarce or unavailable. Additional research and development of VR analgesia is recommended.

## DATA AVAILABILITY

The datasets for this study will not be made publicly available because IRB restrictions.

## ETHICS STATEMENT

This research was conducted in accordance with the Declaration of the World Medical Association (www.wma.net). The studies were approved by the IRB from UTMB, and all participants provided written informed consent/assent in accordance with the Declaration of Helsinki.

## AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

## FUNDING

This research was financed by Shriners Hospitals for Children, Tampa Florida (award ID #71011-GAL, PI Walter Meyer), with

## REFERENCES


help from a charitable donation from the MayDay Fund (PI Walter Meyer). The portable water friendly VR system was developed via NIH grant R01GM042725 to DP.

## ACKNOWLEDGMENTS

Thanks to the patients and their parents for volunteering to participate in this study. Thanks to the wound care staff, and to Drs. Laura and Marta Rosenberg at Shriners Hospitals for Children in Galveston TX. Thanks to Jeff Magula (and also Eric J. Seibel and Bill Russell) for the custom VR system, and thanks to Bill Russell and Maribel Ramirez, for early pilot research on water-friendly VR in the ICU at Shriners Galveston. We would also like to thank Kristen Darken and all of the several teams of VR worldbuilders at Multigen-Paradigm, SimWright, Howard Abrams, and Duff Hendrickson, who have created the original SnowWorld 2001 and SnowWorld 2003, and thanks to Ari Hollander and Howard Rose for the latest version of the University of Washington's SnowWorld (2006) used in the current study (www.vrpain.com). Thanks to the Ladies of the Nile, and Shriners Hospitals for Children Board of Directors, and to Dr. Steve Wolf, and to the Mexican Michou y Mao Foundation for their generous encouragement and support. Special thanks to singer/songwriter Paul Simon for suggesting we use Graceland with SnowWorld.


during medical procedures: a comprehensive literature review. Clin. J. Pain 34, 858–877. doi: 10.1097/AJP.0000000000000599


**Conflict of Interest Statement:** All authors have completed the ICMJE uniform disclosure and declare support for the submitted work.

Copyright © 2019 Hoffman, Rodriguez, Gonzalez, Bernardy, Peña, Beck, Patterson and Meyer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Technological Competence Is a Pre-condition for Effective Implementation of Virtual Reality Head Mounted Displays in Human Neuroscience: A Technological Review and Meta-Analysis

Panagiotis Kourtesis 1,2,3,4 \*, Simona Collina3,4, Leonidas A. A. Doumas <sup>2</sup> and Sarah E. MacPherson1,2

*<sup>1</sup> Human Cognitive Neuroscience, Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom, <sup>2</sup> Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom, <sup>3</sup> Lab of Experimental Psychology, Suor Orsola Benincasa University of Naples, Naples, Italy, <sup>4</sup> Interdepartmental Centre for Planning and Research "Scienza Nuova", Suor Orsola Benincasa University of Naples, Naples, Italy*

Immersive virtual reality (VR) emerges as a promising research and clinical tool. However, several studies suggest that VR induced adverse symptoms and effects (VRISE) may undermine the health and safety standards, and the reliability of the scientific results. In the current literature review, the technical reasons for the adverse symptomatology are investigated to provide suggestions and technological knowledge for the implementation of VR head-mounted display (HMD) systems in cognitive neuroscience. The technological systematic literature indicated features pertinent to display, sound, motion tracking, navigation, ergonomic interactions, user experience, and computer hardware that should be considered by the researchers. Subsequently, a meta-analysis of 44 neuroscientific or neuropsychological studies involving VR HMD systems was performed. The meta-analysis of the VR studies demonstrated that new generation HMDs induced significantly less VRISE and marginally fewer dropouts. Importantly, the commercial versions of the new generation HMDs with ergonomic interactions had zero incidents of adverse symptomatology and dropouts. HMDs equivalent to or greater than the commercial versions of contemporary HMDs accompanied with ergonomic interactions are suitable for implementation in cognitive neuroscience. In conclusion, researchers' technological competency, along with meticulous methods and reports pertinent to software, hardware, and VRISE, are paramount to ensure the health and safety standards and the reliability of neuroscientific results.

Keywords: virtual reality, VRISE, HMD, cybersickness, neuroscience, neuropsychology, psychology, VR

#### Edited by:

*Valerio Rizzo, University of Palermo, Italy*

#### Reviewed by:

*Camila Rosa De Oliveira, Faculdade Meridional (IMED), Brazil Uri Maoz, Chapman University, United States*

> \*Correspondence: *Panagiotis Kourtesis pkourtes@exseed.ed.ac.uk*

#### Specialty section:

*This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience*

Received: *27 June 2019* Accepted: *18 September 2019* Published: *02 October 2019*

#### Citation:

*Kourtesis P, Collina S, Doumas LAA and MacPherson SE (2019) Technological Competence Is a Pre-condition for Effective Implementation of Virtual Reality Head Mounted Displays in Human Neuroscience: A Technological Review and Meta-Analysis. Front. Hum. Neurosci. 13:342. doi: 10.3389/fnhum.2019.00342*

## INTRODUCTION

In recent years, virtual reality (VR) technology has attracted attention, demonstrating its utility and potency in the field of neuroscience and neuropsychology (Rizzo et al., 2004; Bohil et al., 2011; Parsons, 2015). Traditional approaches in human neuroscience involve the utilization of static and simple stimuli which arguably lack ecological validity (Parsons, 2015). VR offers the usage of dynamic stimuli and interactions with a high degree of control within an ecologically valid environment which enables the collection of advanced cognitive and behavioral data (Rizzo et al., 2004; Bohil et al., 2011; Parsons, 2015). VR can be combined with non-invasive imaging techniques (Bohil et al., 2011; Parsons, 2015) and has been effective in the assessment of cognitive and affective functions and clinical conditions (e.g., social stress disorders) which require ecological validity (Rizzo et al., 2004; Parsons, 2015) for their assessment, rehabilitation and treatment (e.g., post-traumatic stress disorder) (Rizzo et al., 2004; Bohil et al., 2011).

However, researchers and clinicians have reported caveats with the implementation of immersive VR interventions and assessments, particularly when head mounted display (HMD) systems are utilized (Sharples et al., 2008; Davis et al., 2015; de França and Soares, 2017; Palmisano et al., 2017). A predominant concern is the presence of adverse physiological symptoms (i.e., cyber/simulation-sickness which includes nausea, disorientation, instability, dizziness, and fatigue). These undesirable effects are categorized as VR Induced Symptoms and Effects (VRISE) (Sharples et al., 2008; Davis et al., 2015; de França and Soares, 2017; Palmisano et al., 2017), and are evaluated by using questionnaires such as the Simulator Sickness Questionnaire (Kennedy et al., 1993) and the Virtual Reality Sickness Questionnaire (Kim et al., 2018).

VRISE may risk the health and safety of participants or patients (Kane and Parsons, 2017; Parsons et al., 2018), which raises ethical considerations for the adoption of VR HMDs as research and clinical tools. Additionally, the presence of VRISE has modulated substantial decline in reaction times and overall cognitive performance (Plant and Turner, 2009; Nalivaiko et al., 2015; Plant, 2016; Nesbitt et al., 2017; Mittelstaedt et al., 2018), as well as increasing body temperature and heart rates (Nalivaiko et al., 2015). Also, the presence of VRISE robustly increases cerebral blood flow and oxyhemoglobin concentration (Gavgani et al., 2018), the power of brain signals (Arafat et al., 2018), and the connectivity between stimulus response brain regions and nausea-processing brain regions (Toschi et al., 2017). Thus, VRISE could be considered confounding variables, which significantly undermine the reliability of neuropsychological, physiological, and neuroimaging data.

VRISE are predominantly mediated by an oculomotor discrepancy between what is being perceived through the oculomotor (optic nerve) sensor and what is being sensed via the rest of the afferent nerves in the human body (Sharples et al., 2008; Davis et al., 2015; de França and Soares, 2017; Palmisano et al., 2017). Nevertheless, technologically speaking, VRISE are derivatives of hardware and software inadequacies, i.e., the type of display screen, resolution, and refresh rate of the image, the size of the field of view (FOV) as well as non-ergonomic movements within an interaction in the virtual environment (VE; de França and Soares, 2017; Palmisano et al., 2017). Notably, VR HMDs have substantially evolved during the last two decades. Important differences may be seen between the HMDs released before 2013 (old generation) and those released from 2013 onwards (new generation). While the last old generation HMD was released in 2001 (i.e., nVisor SX111), the year 2013 is used to distinguish between old and new generation HMDs, since it is the year that the first new generation HMD prototype (i.e., Oculus Development Kit 1) was released. This systematic review attempts to clarify the technological etiologies of VRISE and provide pertinent suggestions for the implementation of VR HMDs in cognitive neuroscience and neuropsychology. In addition, a meta-analysis of the neuroscience studies that have implemented VR HMDs will be conducted to elucidate the frequency of VRISE and dropout rates as per the VR HMD generation.

## TECHNOLOGICAL SYSTEMATIC REVIEW

In **Table 1**, a glossary of the key terms and concepts is provided to assist with comprehension of the utilized terminology. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines using a decremental stepwise method to perform the literature review (see **Figure 1**). The selected papers and book chapters included an explicit explanation and discussion of VRISE and users' experiences pertinent with the specified technological features of the VR hardware and software. Digital databases specialized in technologies were used: (1) IEEE Xplore Digital Library; (2) ACM Digital Library; (3) ScienceDirect; (4) MIT CogNet; and (5) Scopus. Two categories of keywords were used, where each category had three or more keywords and each paper had to include at least one keyword from each category in the main body of the text. The categories were: (1) "virtual reality" OR "immersive virtual reality" OR "headmounted display;" AND (2) "VRISE" OR "motion sickness" OR "cyber sickness" OR "simulation sickness." Finally, the extracted information from the identified papers was clustered together under common features (i.e., display, sound, motion tracking, navigation, ergonomic interactions, user experience, and computer hardware).

## Technological Etiologies of VRISE Display

VR HMDs use the following three types of screens: Cathode Ray Tubes (CRT); Liquid Crystal Display (LCD); and organic light emitting diode (OLED). LCD screens replaced CRT ones due to VRISE (Costello, 1997). LCD, in comparison to CRT, alleviated the probability of visual complications and physical burdens (e.g., fatigue) (Costello, 1997). However, the suitability of LCD was challenged by the emergence of OLED screens. While old generation VR HMDs mainly utilize LCD screens (Costello, 1997), the commercial versions of new generation VR HMDs predominantly use OLED screens (Kim J. W. et al., 2017). The OLED screens have been found to be


better than LCD screens for general implementation in VR, because of their faster response times, lighter weight, and better color quality (Kim J. W. et al., 2017). OLED screens decrease the likelihood of VRISE and offer an improved VR display (Kim J. W. et al., 2017).

Three more factors related to display type are crucial for the avoidance of VRISE: the width of the FOV (Rakkolainen et al., 2016; Kim J. W. et al., 2017); the resolution of the image per eye (Hecht, 2016; Rakkolainen et al., 2016; Kim J. W. et al., 2017; Brennesholtz, 2018); and the latency of the images (frames per second) (Hecht, 2016; Rakkolainen et al., 2016; Kim J. W. et al., 2017; Brennesholtz, 2018). A wider FOV significantly decreases the chance of VRISE and increases the level of immersion (Rakkolainen et al., 2016; Kim J. W. et al., 2017). The canonical guidelines suggest a lowest threshold of 110◦ FOV (diagonal) (Hecht, 2016; Rakkolainen et al., 2016; Kim J. W. et al., 2017; Brennesholtz, 2018). In addition, an increased refresh rate and resolution alleviates the danger of discomfort or VRISE (Hecht, 2016; Rakkolainen et al., 2016; Kim J. W. et al., 2017; Brennesholtz, 2018). The refresh rate should be ≥75 Hz (i.e., ≥75 frames per s) (Goradia et al., 2014; Hecht, 2016; Brennesholtz, 2018), while the resolution is required to be higher than 960 × 1,080 sub-pixels per eye (Goradia et al., 2014).

### Sound

A second important consideration for a user's experience in VR is the sound quality. The integration of spatialized sounds (e.g., ambient and feedback sounds) in the VE may increase the level of immersion, pleasantness of the experience, and successful navigation (Vorländer and Shinn-Cunningham, 2014), while they significantly decrease the likelihood of VRISE (Viirre et al., 2014). However, the volume and localization of sounds need to be optimized in terms of audio spatialization to ensure a user's experience is pleasant without adverse VRISE (Viirre et al., 2014; Vorländer and Shinn-Cunningham, 2014).

#### Motion Tracking

Motion tracking in VR is a pre-condition for naturalistic movement within an immersive VE (Slater and Wilbur, 1997; Stanney and Hale, 2014). Motion tracking allows the precise tracking of the user's physical body within the VE (i.e., it allows the computer to provide accurate environmental feedback, which modulates and consolidates the awareness of the position and movement of the user's body). This phenomenon is called proprioception or kinesthesia (Slater and Wilbur, 1997) and is linked with vestibular and oculomotor mediated VRISE (Slater and Wilbur, 1997; Plouzeau et al., 2015; Caputo et al., 2017). Hence, motion tracking should be adequately

rapid and accurate to facilitate ergonomic interactions in the VE (Caputo et al., 2017).

## Navigation

A highly important factor in the quality of VR software and to avoid VRISE is the movement of the user in the VE (Porcino et al., 2017). New generation HMDs deliver an adequate play area for interactions to facilitate ecologically valid scenarios (Porcino et al., 2017; Borrego et al., 2018). However, there are restrictions in the size of the play area, which does not permit navigation solely by physical walking (Porcino et al., 2017; Borrego et al., 2018). Teleportation allows movement beyond the play area size and elicits a high-level of immersion and pleasant user experience, whilst alleviating VRISE (Bozgeyikli et al., 2016; Frommel et al., 2017; Porcino et al., 2017). In contrast, movement dependent on a touchpad, keyboard, or joystick results to high occurrences of VRISE (Bozgeyikli et al., 2016; Frommel et al., 2017; Porcino et al., 2017). Therefore, teleportation in conjunction with physical movement (i.e., free movement of the upper limbs and walking in a small-restricted area) is the most suitable method for movement in VR (Bozgeyikli et al., 2016;

#### TABLE 2 | Minimum hardware criteria: old and new generation VR HMDs.


*MEMS, Microelectromechanical systems.*

TABLE 3 | Criteria for suitable VR software in cognitive neuroscience and neuropsychology.


Frommel et al., 2017; Porcino et al., 2017). Yet, there are additional factors such as external hardware (i.e., controllers and wands), which are needed to facilitate optimal ergonomic interactions in VR.

### Ergonomic Interactions

Ergonomic and naturalistic interactions are essential to minimize the risk of VRISE, while non-ergonomic and non-naturalistic interactions increase the occurrence of them (Slater and Wilbur, 1997; Stanney and Hale, 2014; Plouzeau et al., 2015; Caputo et al., 2017; Porcino et al., 2017). Importantly, controllers, joysticks, and keyboards do not support ergonomic and naturalistic interactions in VR (Plouzeau et al., 2015; Bozgeyikli et al., 2016; Caputo et al., 2017; Frommel et al., 2017; Porcino et al., 2017; Sportillo et al., 2017; Figueiredo et al., 2018). Instead, wands with 6 degrees of freedom (DoF) of movement (e.g., Oculus Rift and HTC Vive wands), and realistic interfaces with direct hand interactions (e.g., Microsoft's Kinect) facilitate naturalistic and ergonomic interactions (Sportillo et al., 2017; Figueiredo et al., 2018). Both hardware systems facilitate easy familiarization with their controls and their utilization (Sportillo et al., 2017; Figueiredo et al., 2018). However, direct hand interactions are easier than 6DoF controllers-wands in terms of familiarization with their controls and efficiency (Sportillo et al., 2017; Figueiredo et al., 2018). Direct hand interactions were also found to offer more pleasant user experiences (Sportillo et al., 2017; Figueiredo et al., 2018), although, they are substantially less accurate than 6DoF controller-wands (Sportillo et al., 2017; Figueiredo et al., 2018).

#### User Experience

Notably, ergonomic interactions might be available to the user; however, the user is required to learn the necessary interactions and how the VE functions to facilitate a pleasant user experience (Gromala et al., 2016; Jerald et al., 2017; Brade et al., 2018). The inclusion of comprehensible tutorials where the user may spend an adequate amount of time acquiring the necessary skills (i.e., navigation, use and grab of items, two-handed interactions) and knowledge of the VE (i.e., how it reacts to your controls) is crucial (Gromala et al., 2016; Jerald et al., 2017; Brade et al., 2018). Additionally, in-game instructions and prompts should be offered to the user through interactions in the VE (e.g., directional arrows, non-player characters, signs, labels, ambient sounds, audio, and videos) (Gromala et al., 2016; Jerald et al., 2017; Brade et al., 2018).

### Computer Hardware

The computer hardware (i.e., the processor, graphics card, sound card) should at least meet the minimum requirements of the VR software and HMD (Anthes et al., 2016). The performance of VR HMDs is analogous to the computing power and the quality of the hardware (Stanney and Hale, 2014; Anthes et al., 2016; Borrego et al., 2018). The processor, graphics card, sound card, and operating system (e.g., Windows) need to be considered and reported because they modulate the performance of the software (Plant and Turner, 2009; Plant, 2016; Kane and Parsons, 2017; Parsons et al., 2018). Research software developers and researchers are required to be technologically competent in order to opt for the appropriate hardware and software to achieve their research and/or clinical aims (Plant, 2016; Kane and Parsons, 2017; Parsons et al., 2018).

## Conclusions

Based on the outcomes of the above technological review, VR HMDs should have a good quality display-screen (i.e., OLED or upgraded LCD), an adequate FOV (i.e., diagonal FOV ≥ 110◦ ), adequate resolution per eye (i.e., resolution > 960 × 1,080 sub-pixels per eye), and an adequate image refresh rate (i.e., refresh rate ≥ 75 Hz) to safeguard the health and safety of the participants and the reliability of the neuroscientific results (see **Table 2**). Also, the VR HMD should have external hardware which offers an adequate VR area, fast and accurate motion tracking, spatialized audio, and ergonomic interactions. The computer's processor, graphics card, and sound card should meet the minimum requirements of the VR software and HMD too. New generation VR HMDs appear to have all the necessary hardware characteristics (i.e., graphics, level of immersion, and sound) to be used in ecological valid research and clinical paradigms (Borrego et al., 2018; see **Table 2** for a comparison between old and new generation HMDs). New generation VR HMDs have the required hardware to support and produce high-quality spatialized sounds in VEs (Borrego et al., 2018). Additionally, new generation VR HMDs have integrated rapid and precise motion tracking which facilitates naturalistic and ergonomic interactions within the VE (Borrego et al., 2018).

Both the Oculus development kit (DK) 1 and DK2 do not meet the minimum hardware features highlighted by the technological review, despite being new generation VR HMDs (see **Table 2**). The DK1 has substantially lower resolution per eye and image refresh rates, while the DK2 has marginally acceptable refresh rates, yet a slightly lower resolution per eye. These DKs are not available for general use but are used by professional developers to produce beta (early) versions of their games or apps (Goradia et al., 2014; Suznjevic et al., 2017). Moreover, they were removed from the market after the release of the Oculus Rift CV. VR HMDs should have hardware characteristics equal to or better than the commercial versions (CV) of the Oculus Rift and HTC Vive in order to ensure the health and safety of the participants, as well as the reliability of the neuroscientific results (i.e., physiological, neuropsychological, and neuroimaging data). The researchers and clinicians should have the technological competence to choose an HMD which is equal to or greater than the CVs of the Oculus Rift and HTC Vive (e.g., Valve Index, HTC Vive Pro, Oculus Quest, Pimax VR, and StarVR).

However, the VR software's features are equally important. The VR software should include an ergonomic interaction and navigation system, as well as tutorials, in-game instructions, and prompts. A suitable navigation system should combine teleportation and physical movement, while ergonomic interactions should include those that simulate real-life interactions by using a direct hands system or 6DoF controllers. Also, the tutorials, in-game instructions, and prompts should be informative and easy to follow, especially for experimental or clinical purposes where users should be equally able to interact with the VE (Plant and Turner, 2009; Plant, 2016; Kane and Parsons, 2017; Parsons et al., 2018). The criteria for effective VR software are displayed in **Table 3**. These criteria should be met before implementing VR software for research and/or clinical purposes. Otherwise, researchers or clinicians may compromise the reliability of their study's results (Plant and Turner, 2009; Plant, 2016; Kane and Parsons, 2017; Parsons et al., 2018), and/or jeopardize the health and safety of their participants/patients (Kane and Parsons, 2017; Parsons et al., 2018).

The above features enable researchers or clinicians to administer a sophisticated and pleasant VR experience, which substantially alleviates or eradicates adverse VRISE. Therefore, the technological competency of neuroscientists and neuropsychologists is a pre-condition for the efficient adoption and implementation of innovative technologies like VR HMDs in cognitive neuroscience or neuropsychology.

## META-ANALYSIS OF VR STUDIES IN COGNITIVE NEUROSCIENCE

## Literature Research and Inclusion Criteria

We followed the PRISMA guidelines to conduct the literature research using a decremental approach, where the selection commenced with a relatively vast accumulation of abstracts and concluded with a diminished list of full papers that comprise standardized and detailed VR research paradigms. The procedure is described in **Figure 2**. The following databases were used for the literature research: (1) PsycInfo; (2) PsycArticles; (3) PubMed; and (4) Medline. Two categories of keywords were used, with three keywords in each category. The minimum threshold for each study was the inclusion of at least one keyword from each category in the main body of text. The keywords

for each category were: (1) "virtual reality;" OR "Immersive;" OR "Head Mounted Display;" AND (2) "Psychology;" OR "Neuropsychology" OR "Neuroscience." Additional filters and criteria were: (1) chronological specification (2004 and later); and (2) a comprehensive description of the VR research methods in conjunction with the research aims and results. Finally, the selected studies were allocated into two groups according to the generation of the implemented VR HMD. Two tables display the studies that utilize old generation (**Table 4**) and new generation (**Table 5**) HMDs.

## Data Collection and Coding Target Variables

The principal aim of the meta-analysis was to measure the frequency of VRISE in neuroscience or psychology studies using a VR HMD. However, only six studies reported VRISE quantitatively (i.e., using a questionnaire). For this reason, we considered only the presence or absence of VRISE. The dichotomous VRISE variable (i.e., presence or absence of VRISE) was quantified (i.e., absent VRISE = 0; present VRISE = 1) to facilitate a comparison (i.e., Bayesian t-tests) between the studies

#### TABLE 4 | Neuroscience studies employing old generation VR HMDs.


*HMD, Head-Mounted Display; VRISE, VR induced adverse symptoms and effects; YA, Young Adults; MA, Middle-Aged Adults; OA, Older Adults; C, Children; VRET, VR Exposure Therapy; PTSD, Post-Traumatic Stress Disorder; ADHD, Attention Deficit Hyperactivity Disorder.*

that used old generation HMDs, new generation DK HMDs, and new generation CV HMDs, as well as the examination of potential correlations with other variables (i.e., Bayesian Pearson's correlation analysis).

secondary aim of the meta-analysis was to inspect the dropout rates in neuroscience or psychology studies that used VR HMDs. However, as the vast majority of studies had no dropouts, studies with some dropouts (e.g., 3, 5, 6) were statistically considered as outliers. For this reason, we considered the existence of dropouts in each study. The dropout variable was dichotomized as presence = 1 and absence = 0. This dichotomous dropout variable was used to investigate whether using a certain generation HMD (i.e., old generation HMDs, new generation DK HMDs, or new generation CV HMDs) could increase/decrease the dropout size. We compared (i.e., Bayesian t-tests)

#### TABLE 5 | Neuroscience studies employing new generation VR HMDs.


*HMD, Head-Mounted Display; VRISE, VR induced adverse symptoms and effects; YA, Young Adults; MA, Middle-Aged Adults; OA, Older Adults; PD, Parkinson's disease; AD, Alzheimer's disease; DK, Development Kit; CV, Commercial Version; B-P, Body Perception; DBS, Deep Brain Stimulation.*

the dropout rate across studies that used old generation HMDs, new generation DK HMDs, and new generation CV HMDs. We also inspected whether the dropout rates correlated with other variables by using Bayesian Pearson's correlation analysis.

#### Grouping Variables

We subdivided studies into groups based on the HMD generation they used. Hence, two groups of studies were created and compared by using Bayesian t-tests; the first group included studies that utilized old generation HMDs, while the second group included studies which utilized new generation HMDs (i.e., both DKs and CVs).

The new generation studies were further distinguished and compared by using Bayesian t-tests based on the type of new generation HMDs adopted (i.e., DK or CV). Two sub-groups were formed; the first group included studies that utilized DK HMDs, and the second group included studies that utilized CV HMDs.

Furthermore, the recency of the HMD technology was compared by using an ordinal variable where 1 indicated old generation HMDs, 2 indicated new generation DKs, and 3 indicated new generation CVs. This ordinal variable allowed us to inspect whether the HMD generation correlated with other variables by using Bayesian Pearson's correlation analysis.

Lastly, we considered the type of interactions, where the type of interactions were expressed in a binary form (i.e., non-ergonomic interactions = 0 and ergonomic interactions = 1). This allowed a comparison between the VR studies which had ergonomic interactions and the VR studies which had non-ergonomic interactions by using a Bayesian ttest. It also allowed us to inspect whether the interaction type correlated with other variables (i.e., Bayesian Pearson's correlation analysis).

### Definition of Ergonomic Interactions

In line with the definition of ergonomic interactions in our technological review, we considered interactions to be ergonomic or non-ergonomic based on their proximity to reallife interactions. We provide some examples below to clarify our criteria:

Example 1—Ergonomic Interaction: if the VR software required the participant to look around moving his or her head.

Example 2—Non-Ergonomic Interaction: if the VR software required the participant to look around by using a joystick or mouse.

Example 3—Ergonomic Interaction: if the VR software required the user to interact with objects (e.g., pushing a button, holding an item) in the VE or to navigate within the VE by using either 6DoF controllers or directhand interactions.

Example 4—Non-Ergonomic Interaction: if the VR software required the user to interact with objects (e.g., pushing a button, holding an item) in the VE or to navigate within the VE by using a keyboard or joystick (e.g., Xbox controller).

## Statistical Analyses

Bayesian statistics were preferred over null hypothesis significance testing (NHST). The Bayesian factor (BF10) was therefore used instead of p-values for statistical inference, although we do report both BF<sup>10</sup> and p-values. P-values measure the difference between the data and the null hypothesis (H0) (e.g., the assumption of no difference or no effect), while the BF<sup>10</sup> calibrates p-values by converting them into evidence in favor of the alternative hypothesis (H1) over the H<sup>0</sup> (Cox and Donnelly, 2011; Bland, 2015; Held and Ott, 2018). BF<sup>10</sup> is considered substantially more parsimonious than the pvalue in evaluating the evidence against the H<sup>0</sup> (Cox and Donnelly, 2011; Bland, 2015; Held and Ott, 2018). Also, the difference between BF<sup>10</sup> and the p-value in evaluating the evidence against H<sup>0</sup> is even greater in small sample sizes (Held and Ott, 2018). Bayesian Factor (BF10) threshold ≥ 10 was set for statistical inference in all analyses, which indicates strong evidence in favor of the H<sup>1</sup> (Rouder and Morey, 2012; Wetzels and Wagenmakers, 2012; Marsman and Wagenmakers, 2017), and corresponds to a p-value < 0.01 (e.g., BF<sup>10</sup> = 10) (Cox and Donnelly, 2011; Bland, 2015; Held and Ott, 2018). JASP software was used to perform the statistical analyses (JASP Team, 2018). Bayesian independent samples t-tests were conducted to investigate the difference in VRISE frequency and dropout occurrence between old and new generation HMDs, as well as between new generation DKs and CVs. A Bayesian Pearson's correlations analysis examined the possible statistical relationships amongst the HMD generations, VRISE presence, the type of interactions, and dropout occurrences.

## Results

### The Implementation of Old and New Generation HMDs in Cognitive Neuroscience

The studies that utilized old generation HMDs are displayed in **Table 4** and recruited 1,200 participants in total. Nine out of 22 studies examined stress disorders, 7 of these were VR exposure therapy (VRET) studies either for phobias or posttraumatic stress disorder (PTSD), while 2 studies attempted to assess stress levels in context (e.g., assessment of social stress during a job interview). In 9 studies, there were VR assessments of cognitive functions, 2 studies assessed memory, 3 studied attention, 3 examined executive functions, and 1 examined visuospatial ability. Two of the studies involved social cognition while only one involved paranoid thinking. Lastly, only one study provided rehabilitation sessions in VR for patients with spinal injuries. The targeted age groups were young adults in 18 studies, middle-aged adults in 8 studies, older adults in one study, and children in one study.

The studies that utilized new generation HMDs are displayed in **Table 5** and recruited 982 individuals in total. Specifically, 376 individuals were recruited in 10 studies where new generation DKs were used, while 606 individuals were recruited in 12 studies where new generation CVs were used. Nine out of the 22 studies attempted to assess cognitive functions (i.e., memory, attention, visuospatial ability, executive functions), 4 investigated anxiety disorders (i.e., fear of death, social stress, general anxiety disorder), 3 provided sensorimotor rehabilitation interventions, 3 studies examined the effects of presence in specific VEs, 2 assessed social cognition and 1 study offered a psychoeducational session to patients with motor-related disorders. Lastly, the targeted age groups were young adults in 18 studies, middle-aged adults in 6 studies, and older adults in 4 studies.

FIGURE 3 | VRISE per HMD generation and ergonomic interactions. ABSENT, Absence of VRISE; PRESENT, Presence of VRISE; OLD, Old Generation HMD; DK, New Generation HMD—Development Kit; CV, New Generation HMD—Commercial Version; Ergonomic, Ergonomic Interactions; Non-Ergonomic, Non-Ergonomic Interactions.

### Meta-Analysis

The descriptive statistics are presented in **Figures 3**, **4**. In **Figure 3**, the number of studies with VRISE are displayed according to their HMD generation and interaction type. In **Figure 4**, the dropouts and sample sizes are presented according to their HMD generation, VRISE presence, and interaction type. The presence of VRISE substantially becomes less frequent when new generation HMDs are implemented (**Figure 3**). In new generation HMDs, VRISE are present in only 4 out of 22 studies, while across 982 participants, there are only 11 dropouts. In contrast, in old generation HMDs, VRISE are present in 14 out of 22 studies, while in a total sample size of 1,200 participants, there are 58 dropouts.

In the 14 old generation HMDs studies where VRISE are present, half of them involved ergonomic interactions and the other half involved non-ergonomic interactions. Similarly, there is an equal distribution of dropouts (29 in each) between the old generation HMDs studies that had ergonomic and nonergonomic interactions. When only old HMDs with ergonomic interactions are considered, VRISE are present in 7 out of 12 studies, while in an entire sample size of 598, the dropouts are 29. In the studies with new generation DKs, non-ergonomic interactions had an increased presence of VRISE than the ones with ergonomic interactions. Also, in the studies which used DKs with non-ergonomic interactions, 7 participants out of 158 dropped out, while in studies with ergonomic interactions, only one participant out of 218 dropped out. Importantly, when new generation CVs with ergonomic interactions are exclusively considered, there are no VRISE or dropouts in any of the 11 studies with 546 participants. Finally, VRISE were only present in one study using a new generation CV HMD, where 3 participants dropped out. This study was the only one with a new generation CV HMD that did not involve ergonomic interactions.

The Bayesian independent samples t-test highlighted that studies involving new generation VR HMDs have significantly less frequent VRISE (BF<sup>10</sup> = 144.68; p < 0.001). The difference in the existence of dropouts was not substantial, yet the studies with new generation HMDs have less frequent dropouts (BF<sup>10</sup> = 4.69; p < 0.05) than studies with old HMDs. Notably, the studies which used a new generation CV HMD have significantly less frequent VRISE (BF<sup>10</sup> = 46.39; p < 0.001) but not less frequent dropouts (BF<sup>10</sup> = 1.66; p = 0.16) than the studies which used a new generation DK HMD. Finally, the studies which implemented VR software with ergonomic interactions had substantially less frequent VRISE (BF<sup>10</sup> = 19.54; p < 0.001) and dropouts (BF<sup>10</sup> = 16.01; p < 0.001) than studies which used VR software with non-ergonomic interactions.

The Bayesian Pearson's correlations demonstrated a substantial negative correlation between the presence of VRISE and the HMD generation [BF<sup>10</sup> = 328.03; r(44) = −0.56, p < 0.001], while the presence of VRISE robustly demonstrated a positive correlation with the existence of dropouts [BF<sup>10</sup> = 83510.53; r(44) = 0.68, p < 0.001]. Also, the utilization of ergonomic interactions was significantly negatively correlated with VRISE [BF<sup>10</sup> = 20.11; r(44) = −0.42, p < 0.001] and the existence of dropouts [BF<sup>10</sup> = 16.11; r(44) = −0.41, p < 0.001].

## Discussion

The results of the meta-analysis indicated that VR HMDs have been implemented in diverse clinical conditions and age groups, as well as the unquestionable difference between old generation and new generation HMDs. There were significantly more frequent VRISE in the studies involving old generation VR HMDs compared to studies with new generation HMDs. Additionally, the frequency of VRISE correlated negatively with the HMD generation. Hence, the older the utilized HMD, the higher the VRISE frequency. Moreover, the existence of dropouts significantly and positively correlated with the presence of VRISE.

Nevertheless, one potential reason for the higher dropouts in old generation studies is that several studies included followup sessions (e.g., VRET) and participants may have opted not to return to complete the remaining sessions for reasons other than the presence of VRISE. However, in the old generation studies, the dropout rates were low in relation to the size of the population, albeit there were VRISE present. The low dropout rates in the old generation HMD studies may be due to the fixed intervals between the VR sessions where the participants were able to rest and obtain relief from any adverse effects they were experiencing.

Furthermore, the incidence of VRISE in old generation HMD studies may be due to anxiety levels (Bouchard et al., 2011) or be self-induced (Almeida et al., 2017) as several of the studies had either stress-related aims or included participants with stress disorders. However, several of the new generation HMD studies also had comparable aims and/or populations and included patients with clinical conditions (e.g., Alzheimer's disease, Parkinson's disease, stroke, and movement disorders) which have high comorbidity with stress and anxiety (Factor et al., 1995; Smith et al., 2000; Jenner, 2003; Allen and Bayraktutan, 2009). Also, the rates of self-induced VRISE are expected to be equal in both new and old generation HMD studies. In addition, the reporting of VRISE may be for reasons unrelated to the quality of the hardware or software (e.g., subjectivity in the reporting of VRISE, individual differences in the experience of VRISE) (Kortum and Peres, 2014; Almeida et al., 2017). However, this modulation is again expected to have affected both new and old generation HMD studies in a similar way.

Beyond the difference between old and new generations HMDs, a substantial difference is observed between DK and CV new generation HMDs. Significantly fewer VRISE were present in the studies that used a CV, indicating the superiority of new generation CV HMDs compared to new generation DK HMDs. Furthermore, the studies (i.e., both old and new generation studies) which utilized VR software with ergonomic interactions had robustly less frequent VRISE and dropouts than the studies which implemented VR software with non-ergonomic interactions. However, the ergonomic interactions do not appear to mitigate the dropout frequency and the incidence of VRISE in old generation HMDs. In contrast, VRISE were present in more DK studies with non-ergonomic interactions compared to DK studies with ergonomic interactions. Similarly, more participants dropped out from DK studies with non-ergonomic interactions. Notably, there were no VRISE or dropouts in CV studies with ergonomic interactions. Therefore, the contribution of ergonomic interactions in the reduction of VRISE increases when newer and better HMDs are utilized. To conclude, the findings of the meta-analysis are aligned with the outcomes of the technological review.

## GENERAL DISCUSSION

## Technological Competence in VR Neuroscience and Neuropsychology

The findings of our technological literature review suggest that the hardware features of old generation HMDs and new generation DKs do not meet the minimum hardware features that alleviate or eradicate VRISE. Instead, the technological literature review postulates the suitability of new generation CVs which have specific hardware capabilities to alleviate VRISE. However, VR software attributes (e.g., ergonomic interactions) are equally vital.

Secondly, the findings of our meta-analysis of 44 neuroscientific or neuropsychological studies using VR are aligned with the outcomes of our technological review, where VRISE were substantially less frequent in studies which utilized new generation VR HMDs. In particular, the studies which used new generation CVs accompanied by ergonomic interactions did not have any VRISE or dropouts. Therefore, the combined outcomes of the technological review and the meta-analysis indicate that the appropriate VR HMDs are those with hardware characteristics equal to or greater than the HTC Vive and Oculus Rift, though the VR HMD should be implemented in conjunction with VR software which offers ergonomic interactions.

However, researchers may have to opt for an HMD based on their available budget. For example, the Oculus Rift costs around \$400, while the HTC Vive costs around \$500. Moreover, the majority of HMDs also require a VR-ready desktop PC or a laptop to be operated, so a researcher needs to additionally spend around \$500–\$1,500 for a desktop computer or laptop to utilize these HMDs. Hence, the combined cost would be between \$800 and \$1,900. The cost of VR equipment (e.g., both HMD and computer) may lead researchers to use HMDs that are cheaper, albeit that they are more likely to result in VRISE. However, in the market, there are plenty of cost-effective alternatives that meet the minimum hardware criteria. For example, the Oculus Quest is a standalone HMD (i.e., it does not require a PC, a laptop, or a smartphone to be operated) and it costs approximately \$400. Hence, a researcher can spend the equivalent of the price of a neuropsychological test or a smartphone to acquire and use an HMD that meets the minimum hardware criteria to lower the presence of VRISE.

Nonetheless, the selection of an appropriate VR HMD and software requires technological competency from the researchers, clinicians, and/or research software developers. Unfortunately, the meta-analysis results do not indicate that technological knowledge of VR has been well-established in neuroscience. Of course, the utilization of old generation HMDs and new generation DKs pre-2016 is justified as the new generation CVs were not available. However, in our meta-analysis, 25 studies were conducted between 2016 and 2019, where half of these studies (13/25) implemented an inappropriate HMD (i.e., old generation HMD or new generation DK). However, 10 studies used a DK2 which has a marginally lower resolution than the minimum hardware criteria, while our meta-analysis results indicated that its utilization in conjunction with ergonomic interactions appears to alleviate the frequency of VRISE, but not as effectively as the CV HMDs. Furthermore, one fifth of the studies did use a new generation HMD, but they did not have ergonomic interactions in their VR software. Therefore, at this time, VR technological competence does not seem to have been well-established in neuroscience. As a result, in the studies since 2016, the health and safety of the participants may not substantially guaranteed, and the reliability of the results may be questionable, as VRISE substantially decreases reaction times and overall cognitive performance (Plant and Turner, 2009; Nalivaiko et al., 2015; Plant, 2016; Nesbitt et al., 2017; Mittelstaedt et al., 2018), as well as confounding neuroimaging and physiological data (Toschi et al., 2017; Arafat et al., 2018; Gavgani et al., 2018). The selection of an appropriate HMD is paramount for successfully implementing VR HMDs in cognitive neuroscience and neuropsychology.

However, the implementation of the currently available and appropriate HMDs in neuroscience and neuropsychology should be compatible with the research aims. For example, in research designs where the user should be active (i.e., navigating, walking, and interacting within the VE) instead of being idle, or in a standing or a seated position, the researcher should opt for the best HMD that permits intense body movement and activity. In this setting, the Oculus Rift was found to be inferior to the HTC Vive on pick-and-place (i.e., relocating objects) tasks, whilst the HTC Vive also provided a substantially superior VR experience for users compared to the Oculus Rift (Suznjevic et al., 2017). Moreover, the HTC Vive provides an interactive area that is twice the size (25 m<sup>2</sup> ) of the Oculus Rift, albeit that both are very accurate in tracking (Borrego et al., 2018). Nevertheless, the HTC Vive was found to lose motion-tracking and the ground level becomes slanted when the user goes out of bounds (Niehorster et al., 2017). This shortcoming solely affects studies where the participant needs to go out of the tracking area. In most neuroscientific designs, the recommended maximum play area by HTC (6.25 m<sup>2</sup> ) or by Borrego et al. (2018) (25 m<sup>2</sup> ) are both substantially adequate for conducting ecological valid experiments (Borges et al., 2018; Borrego et al., 2018). Nonetheless, the slanted floor or lost tracking is not a hardware problem but a software one (Borges et al., 2018). In cases where the participant is required to go out of the tracking area, the tracking problem or the slanted floor may be easily corrected by adding 3 additional trackers (Peer et al., 2018), using software with an improved algorithm (freely distributed by NASA Ames Research Center) (Borges et al., 2018), or by simply updating the firmware of the lighthouse base stations. In summary, the researchers should be technologically competent to not only identify and implement a safe HMD and software, but an HMD and software that facilitate the optimal research methods pertinent to their research needs and aims.

As discussed in our technological review, the quality of the implemented VR software is equally important to avoid VRISE. Our meta-analysis of VR studies indicated that the utilization of ergonomic interactions is crucial albeit with the utilization of an appropriate HMD. For example, Detez et al. (2019) used the HTC Vive to investigate physiological arousal and behavior during gambling. However, the interactions and navigation within the VE were facilitated by using a typical controller (Detez et al., 2019). Hence, their VR software did not support the utilization of the ergonomic 6DoF controllers (both hands) of the HTC Vive, which facilitate naturalistic navigation (e.g., teleportation) and interaction within the VE. Consequently, Detez et al.'s (2019) participants experienced VRISE and 3 of their participants discontinued their sessions and so their data were discarded (Detez et al., 2019). Importantly, Detez et al. (2019) only reported the presence of VRISE and dropout size. They did not provide any quantitative data on the intensity of VRISE, or the quality of their software attributes (e.g., graphics, sound, tutorials, ingame instructions and prompts) (Detez et al., 2019). Indeed, only six of the studies in the meta-analysis provided adequately explicit reports on VRISE and VR software. Since Detez et al. (2019) assessed reaction times and heart rates, these data are likely to be affected by VRISE, despite the study having a rigorous experimental design and using the HTC Vive. Therefore, it is important to use appropriate VR software and external hardware to prevent risks to the health and safety of the participants as well as the reliability of the results.

## Limitations and Future Studies

The above technological review and meta-analysis of VR studies evidenced the importance of technological and methodological features in VR research and clinical designs. However, our meta-analysis of VR studies has some limitations. The metaanalysis considered VR studies with diverse populations and designs, which may have affected the frequency of VRISE and the existence of dropouts. Uniformity across studies (e.g., considering only VRET, assessments or a specific clinical population) was not possible due to the scarcity of neuroscience studies involving VR, especially using new generation HMDs. Moreover, the review did not consider any software details due to the scarceness of such descriptions in published studies. Future VR studies should report software and hardware features to allow an in-depth meta-analysis. Equally, only six studies provided quantitative reports of VRISE intensity, consequently, only the presence or absence of VRISE was considered. The dichotomous consideration of VRISE is susceptible to reports based on subjective criteria and individual differences, but this is likely to have affected the VRISE rates in both old and new generation studies. Future studies should aim to appraise the quality of the software and intensity of VRISE (e.g., using questionnaires). Studies should also attempt to clarify the acceptable duration of immersive VR sessions, which will aid researchers in designing their studies appropriately. Importantly, the cost of the VR software development should also be considered. Finally, studies should attempt to provide software development guidelines that enable researchers and/or research software developers

## REFERENCES


to develop VR research software without depending on third parties (e.g., freelance developers or software development companies) and these guidelines should embed suggestions and instructions for VR software development, which meet the criteria discussed above.

## Conclusion

The use of VR HMDs is becoming more popular in neuroscience either for clinical or research purposes and VR technology and methods have been well-accepted by diverse populations in terms of age groups and clinical conditions. A more pleasant VR experience and a reduction in VRISE symptomatology has been found using new generation CV HMDs, which deliver an adequately high display resolution, rapid image refresh rate, ergonomic design and has controllers which allow naturalistic navigation and movement within the VE environment, especially when there is restricted teleportation. The outcomes of the current technological review and meta-analysis support the feasibility of new generation VR CV HMDs to be implemented in cognitive neuroscience and neuropsychology. The findings of the technological review suggest methods that should be considered in the development or selection of VR research software, as well as hardware and software features that should be included in the research protocol. The selected VR HMD and the VR research software should enable suitable ergonomic interactions, locomotion techniques (e.g., teleportation), and kinetic mechanics which ensure VRISE are reduced or completely avoided. A meticulous approach and technological competence are compulsory to consolidate the viability of VR research and clinical designs in cognitive neuroscience and neuropsychology.

## DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the manuscript/supplementary files.

## AUTHOR CONTRIBUTIONS

PK had the initial idea and contributed to every aspect of this study. SC, LD, and SM contributed to the methodological aspects and the discussion of the results.


simulator sickness. Int. J. Aviation Psychol. 3, 203–220. doi: 10.1207/s15327108 ijap0303\_3


stress disorder in Iraq and Afghanistan War veterans. Am. J. Psychiatry 171, 640–648. doi: 10.1176/appi.ajp.2014.13121625


motion and nausea-associated brain regions. Auton. Neurosci. 202, 108–113. doi: 10.1016/j.autneu.2016.10.003


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Kourtesis, Collina, Doumas and MacPherson. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Dynamics of Attention Shifts Among Concurrent Speech in a Naturalistic Multi-speaker Virtual Environment

Keren Shavit-Cohen and Elana Zion Golumbic\*

The Gonda Multidisciplinary Brain Research Center, Bar Ilan University, Ramat Gan, Israel

Focusing attention on one speaker on the background of other irrelevant speech can be a challenging feat. A longstanding question in attention research is whether and how frequently individuals shift their attention towards task-irrelevant speech, arguably leading to occasional detection of words in a so-called unattended message. However, this has been difficult to gauge empirically, particularly when participants attend to continuous natural speech, due to the lack of appropriate metrics for detecting shifts in internal attention. Here we introduce a new experimental platform for studying the dynamic deployment of attention among concurrent speakers, utilizing a unique combination of Virtual Reality (VR) and Eye-Tracking technology. We created a Virtual Café in which participants sit across from and attend to the narrative of a target speaker. We manipulated the number and location of distractor speakers by placing additional characters throughout the Virtual Café. By monitoring participant's eye-gaze dynamics, we studied the patterns of overt attention-shifts among concurrent speakers as well as the consequences of these shifts on speech comprehension. Our results reveal important individual differences in the gaze-pattern displayed during selective attention to speech. While some participants stayed fixated on a target speaker throughout the entire experiment, approximately 30% of participants frequently shifted their gaze toward distractor speakers or other locations in the environment, regardless of the severity of audiovisual distraction. Critically, preforming frequent gaze-shifts negatively impacted the comprehension of target speech, and participants made more mistakes when looking away from the target speaker. We also found that gaze-shifts occurred primarily during gaps in the acoustic input, suggesting that momentary reductions in acoustic masking prompt attention-shifts between competing speakers, in line with "glimpsing" theories of processing speech in noise. These results open a new window into understanding the dynamics of attention as they wax and wane over time, and the different listening patterns employed for dealing with the influx of sensory input in multisensory environments. Moreover, the novel approach developed here for tracking the locus of momentary attention in a naturalistic virtual-reality environment holds high promise for extending the study of human behavior and cognition and bridging the gap between the laboratory and real-life.

#### Edited by:

Pietro Cipresso, Italian Auxological Institute (IRCCS), Italy

#### Reviewed by:

Kristina C. Backer, University of California, Merced, United States Carlo Sestieri, Università degli Studi G. d'Annunzio Chieti e Pescara, Italy

> \*Correspondence: Elana Zion Golumbic elana.zion-golumbic@biu.ac.il

#### Specialty section:

This article was submitted to Speech and Language, a section of the journal Frontiers in Human Neuroscience

> Received: 02 May 2019 Accepted: 16 October 2019 Published: 08 November 2019

#### Citation:

Shavit-Cohen K and Zion Golumbic E (2019) The Dynamics of Attention Shifts Among Concurrent Speech in a Naturalistic Multi-speaker Virtual Environment. Front. Hum. Neurosci. 13:386. doi: 10.3389/fnhum.2019.00386

Keywords: speech processing, auditory attention, eye-tracking, virtual reality, cocktail party effect, distractability

## INTRODUCTION

Focusing attention on one speaker in a noisy environment can be challenging, particularly in the background of other irrelevant speech (McDermott, 2009). Despite the difficulty of this task, comprehension of an attended speaker is generally good and the content of distractor speech is rarely recalled explicitly (Cherry, 1953; Lachter et al., 2004). Preferential encoding of attended speech in multi-speaker contexts is also mirrored by enhanced neural responses to attended vs. distractor speech (Ding and Simon, 2012b; Mesgarani and Chang, 2012; Zion Golumbic et al., 2013b; O'Sullivan et al., 2015). However, there are also indications that distractor speech is processed, at least to some degree. Examples for this are the Irrelevant Stimulus Effect, where distractor words exert priming effect on an attended task (Treisman, 1964; Neely and LeCompte, 1999; Beaman et al., 2007), as well as occasional explicit detection of salient words in distractor streams (Cherry, 1953; Wood and Cowan, 1995; Röer et al., 2017; Parmentier et al., 2018). These effects highlight a key theoretical tension regarding how processing resources are allocated among competing speech inputs. Whereas Late-Selection models of attention posit that attended and distractor speech can be fully processed, allowing for explicit detection of words in so-called unattended speech (Deutsch and Deutsch, 1963; Duncan, 1980; Parmentier et al., 2018), Limited-Resources models hold that there are inherent bottlenecks for linguistic processing of concurrent speech due to limited resources (Broadbent, 1958; Lachter et al., 2004; Lavie et al., 2004; Raveh and Lavie, 2015). The latter perspective reconciles indications for occasional processing of distractor speech as stemming from rapid shifts of attention toward distractor speech (Conway et al., 2001; Escera et al., 2003; Lachter et al., 2004). Yet, despite the parsimonious appeal of this explanation, to date, there is little empirical evidence supporting and characterizing the psychological reality of attention switches among concurrent speakers.

Establishing whether and when rapid shifts of attention towards distractor stimuli occur is operationally challenging since it refers to individuals' internal state that researchers do not have direct access to. Existing metrics for detecting shifts of attention among concurrent speech primarily rely on indirect measures such as prolongation of reaction times on an attended task (Beaman et al., 2007) or subjective reports (Wood and Cowan, 1995). Given these limitations, the current understanding of the dynamics of attention over time, and the nature and consequences of rapid attention-shifts among concurrent speech is extremely poor. Nonetheless, gaining insight into the dynamics of internal attention-shifts is critical for understanding how attention operates in naturalistic multispeaker settings.

Here, we introduce a new experimental platform for studying the dynamic deployment of attention among concurrent speakers. We utilize Virtual Reality (VR) technology to simulate a naturalistic audio-visual multi-speaker environment, and track participant's gaze-position within the Virtual Scene as a marker for the locus of overt attention and as a means for detecting attention-shifts among concurrent speakers. Participants experienced sitting in a ''Virtual Café'' across from a partner (avatar; animated target speaker) and were required to focus attention exclusively towards this speaker. Additional distracting speakers were placed at surrounding tables, with their number and location manipulated across conditions. Continuous tracking of gaze-location allowed us to characterize whether participants stayed focused on the target speaker as instructed or whether and how often they performed overt glimpses around the environment and toward distractor speakers. Critically, we tested whether shifting one's gaze around the environment and away from the target speaker impacted comprehension of target speech. We further tested whether gaze-shifts are associated with salient acoustic changes in the environment, such as onsets in distractor speech that can potentially grab attention exogenously (Wood and Cowan, 1995) or brief pauses that create momentary unmasking of competing sounds (Lavie et al., 2004; Cooke, 2006).

Gaze-shifts are often used as a proxy for attention shifts in natural vision (Anderson et al., 2015; Schomaker et al., 2017; Walker et al., 2017), however this measure has not been utilized extensively in dynamic contexts (Marius't Hart et al., 2009; Foulsham et al., 2011). This novel approach enabled us to characterize the nature of momentary attentionshifts in ecological multi-speaker listening conditions, as well as individual differences, gaining insight into the factors contributing to dynamic attention shifting and its consequences on speech comprehension.

## MATERIALS AND METHODS

## Participants

Twenty-six adults participated in this study (ages 18–32, median 24; 18 female, three left handed), all fluent in Hebrew, with self-reported normal hearing and no history of psychiatric or neurological disorders. Signed informed consent was obtained from each participant prior to the experiment, in accordance with the guidelines of the Institutional Ethics Committee at Bar-Ilan University. Participants were paid for participation or received class credit.

## Apparatus

Participants were seated comfortably in an acoustic-shielded room and viewed a 3D VR scene of a café, through a head-mounted device (Oculus Rift Development Kit 2). The device was custom-fitted with an embedded eye-tracker (SMI, Teltow, Germany; 60 Hz monocular sampling rate) for continuous monitoring of participants' eye-gaze position. Audio was presented through high-quality headphone (Sennheiser HD 280 pro).

## Stimuli

Avatar characters were selected from the Mixamo platform (Adobe Systems, San Jose, CA, USA). Soundtracks for the avatars' speech were 35–50 s long segments of natural Hebrew speech taken from podcasts and short stories<sup>1</sup> . Avatars' mouth and articulation movements were synced to the audio to

<sup>1</sup>www.icast.co.il

create a realistic audio-visual experience of speech (LipSync Pro, Rogo Digital, England). Scene animation and experiment programming was controlled using an open-source VR engine (Unity Software<sup>2</sup> ). Speech loudness levels (RMS) were equated for all stimuli, in 10-s long bins (to avoid biases due to fluctuations in speech time-course). Audio was further manipulated within Unity using a 3D sound algorithm, so that it was perceived as originating from the spatial location of the speaking avatar, with overall loudness decreasing logarithmically with distance from the listener. Participant's head movements were not restricted, and both the graphic display and 3D sound were adapted on-line in accordance with head-position, maintaining a spatially-coherent audio-visual experience.

## Experiment Design

In the Virtual Café setting, participants experienced sitting at a café table facing a partner (animated speaking avatar) telling a personal narrative. They were told to focus attention exclusively on the speech of their partner (target speaker) and to subsequently answer four multiple-choice comprehension questions about the narrative (e.g., ''What computer operating system was mentioned?''). Answers to the comprehension questions were evenly distributed throughout the narrative, and were pre-screened in a pilot study to ensure accuracy rates between 80% and 95% in a single-speaker condition. The time-period containing the answer to each question was recorded and used in subsequent analysis of performance as a function of gaze-shift behaviors (see below). Additional pairs of distracting speakers (avatars) were placed at surrounding tables, and we systematically manipulated the number and location of distractors in four conditions: No Distraction (NoD), Left Distractors (LD), Right Distractors (RD), Right and Left Distractors (RLD; **Figure 1**). Each condition consisted of five trials (∼4 min per condition) and was presented in random order, which was different for each participant. The identity and voice of the main speaker were kept constant throughout the experiment, with different narratives in each trial, while the avatars and narratives serving as distractors varied from trial to trial. The allocation of each narrative to the condition was counter-balanced across participants, to avoid material-specific biases. Before starting the experiment itself, participants were given time to look around and familiarize themselves with the Café environment and the characters in it. During this familiarization stage, no audio was presented and participants terminated it when they were ready. They also completed two training-trials, in the NoD and RLD conditions, to familiarize them with the stimuli and task as well as the type of comprehension questions asked. This familiarization and training period lasted approximately 3-min.

## Analysis of Eye-Gaze Dynamics

Analysis of eye-gaze data was performed in Matlab (Mathworks, Natick, MA, USA) using functions from the fieldtrip toolbox<sup>3</sup> as well as custom-written scripts. The position of eye-gaze position in virtual space coordinates (x, y, z) was monitored continuously

<sup>2</sup>unity3d.com

3 fieldtriptoolbox.org throughout the experiment. Periods surrounding eye-blinks were removed from the data (250 ms around each blink). Clean data from each trial were analyzed as follows.

First, we mapped gaze-positions onto specific avatars/locations in the 3D virtual scene. For data reduction, we used a spatial clustering algorithm (k-means) to combine gaze data-points associated with similar locations in space. Next, each spatial cluster was associated with the closest avatar, by calculating the Euclidean distance between the center of the cluster and the center of each avatar presented in that condition. If two or more clusters were associated with looking at the same avatar, they were combined. Similarly, clusters associated with the members of the distractor avatar-pairs (left or right distractors) were combined. If a cluster did not fall within a particular distance-threshold from any of the avatars, it was associated with looking at ''The Environment.'' This resulted in a maximum of four clusters capturing the different possible gaze locations in each trial: (1) Target Speaker; (2) Left Distractors (when relevant); (3) Right Distractors (when relevant); and (4) Rest of the Environment. The appropriateness of cluster-toavatar association and distance-threshold selection was verified through visual inspection.

Based on the clustered data, we quantified the percent of time that participants spent focusing at each location (Percent Gaze Time) in each trial, and detected the times of Gaze-Shifts from one cluster to another. Gaze-shifts lasting less than 250 ms were considered artifacts and removed from the analysis, as they are physiologically implausible (Bompas and Sumner, 2009; Gilchrist, 2011). The number of Gaze-shifts as well as the Percent Gaze Time spent at each of the four locations—Target Speaker, Left Distractors, Right Distractors and Environment—were averaged across trials, within condition. Since conditions differed in the type and number of distractors, comparison across conditions focused mainly on metrics pertaining to gazing at/away-from the target speaker.

Mixed linear regression models were used in all analyses to fit the data and test for effects of Condition on gaze patterns (both Percent Gaze-Time Away and Gaze-Shifts), as well as possible correlations with speech comprehension accuracy measures. These analyses were conducted in R (R Development Core Team, 2012) and we report statistical results derived using both regular linear (lme4 package for R; Bates et al., 2015) and robust estimation approaches (robustlmm package for R; Koller, 2016), to control for possible contamination by outliers. The advantage of mixed-effects models is that they account for variability between subjects and correlations within the data, as well as possible differences in trial numbers across conditions (Baayen et al., 2008), which makes them particularly suitable for the type of data collected here.

## Analysis of Speech Acoustics Relative to Gaze-Shifts

A key question is what prompts overt gaze-shifts away from the target speakers, and specifically whether they are driven by changes in the acoustic input or if they should be considered more internally-driven. Two acoustic factors that have been suggested as inviting attention-shifts among concurrent speech

are: (a) onsets/loudness increases in distractor speech that can potentially grab attention exogenously (Wood and Cowan, 1995); and (b) brief pauses that create momentary unmasking of competing sounds (Lavie et al., 2004; Cooke, 2006). To test whether one or both of these factors account for the occurrence of gaze-shifts away from the target speaker in the current data, we performed a gaze-shift time-locked analysis of the speechacoustics of target speech (in all conditions) and distractor speech (in the LD, RD and RLD conditions).

To this end, we first calculated the temporal envelope of the speech presented in each trial using a windowed RMS (30 ms smoothing). The envelopes were segmented relative to the times where gaze-shifts away from the target speaker occurred in that particular trial (−400 to +200 ms around each shift). Given that the initiation-time for executing saccades is ∼200 ms (Gilchrist, 2011), the time-window of interest for looking at possible influences of the acoustics on gaze-shifts is prior to that, i.e., 400–200 ms prior to the gaze-shift itself.

Since the number of gaze-shifts varied substantially across participants, we averaged the gaze-shift-locked envelopesegments across all trials and participants, within condition. The resulting average acoustic-loudness waveform in each condition was compared to a distribution of non-gaze-locked loudness levels, generated through a permutation procedure as follows: the same acoustic envelopes were segmented randomly into an equal number of segments as the number of gaze-shifts in each condition (sampled across participants with the same proportion as the real data). These were averaged, producing a non-gaze-locked average waveform. This procedure was repeated 1,000 times and the real gaze-shift locked waveform was compared to the distribution of non-gaze-locked waveforms. We identified time-points where the loudness level fell above or below the top/bottom 5% tile of the non-gazelocked distribution, signifying that the speech acoustics were particularly quiet or loud relative (relative to the rest of the presented speech stimuli). We also quantified the signal-to-noise ratio (SNR) between the time-resolved spectrograms of target and distractor speech surrounding gaze-shifts, according to: SNR(f , t) = log ( Ptarget(f ,t) Pdistractor(f ,t) ), with P(f,t) depicting the power at frequency f at time t. This was calculated for target-distractor combinations surrounding each gaze-shift, and averaged across shifts and trials.

## RESULTS

## Gaze-Patterns and Speech Comprehension

On an average, participants spent -7.6% of each trial (-3 s in a 40-s-long trial) looking at locations other than the target speaker and they performed an average of 2.5 gaze-shifts per trial. **Figure 2A** shows the distribution of eye-gaze location in two example trials taken from different participants, demonstrating that sometimes gaze was fixated on the target-speaker throughout the entire trial, and sometimes shifted occasionally towards the distractors. The distribution of Gaze-shifts was relatively uniform over the course of the entire experiment (**Figure 2B**, left). Twenty-three percentage of gaze-shifts were performed near the onset of the trial, however, the majority of gaze-shifts occurred uniformly throughout the entire trial (**Figure 2B**, right).

**Figures 3A,B** show how the average Gaze Time Away from the target speaker (i.e., time spent looking at distractor avatars or other locations in the Environment) and the number of Gaze-Shifts away from the target speaker, varied across the four conditions. To test whether gaze patterns (number of Gaze-Shifts and/or proportion Gaze-Time Away) differed across conditions, we estimated each of them separately using linear mixed effect model with the factor Condition as a fixed effect (Gaze-Shifts' Condition and Gaze-Time–Condition), where each of the three distraction conditions (RD, LD and RLD) was compared to the NoD condition. By-subject intercepts were

included as random effects. No significant effects of Condition were found on Gaze-Time, however, participants performed significantly more Gaze-Shifts in the RLD condition relative to the NoD condition (lmer: β = 0.8, t = 2.5, p = 0.01; robustlmm: β = 0.54, t = 2.5).

Of critical interest is whether the presence of distractors and gaze-shifts towards them impacted behavioral outcomes of speech comprehension. Accuracy on the multiple-choice comprehension questions of the target speaker was relatively good in all conditions (mean accuracy 82% ± 3; **Figure 3C**). A mixed linear model estimating Accuracy ∼ Condition did not reveal any significant differences in Accuracy between conditions (lmer: all t's < 0.199, p > 0.6; robustlmm: all t's < 0.05). However, adding Percent Gaze-Time as a second fixed effect to the Accuracy ∼ Condition model, improved the model significantly (χ <sup>2</sup> = 9.14, p < 10<sup>3</sup> ), with Percent Gaze-Time showing a significant correlation with Accuracy (lmer: β = −0.19, t = −3.13, p = 0.001; robustlmm: β = −0.23, t = −3.77; **Figure 3D**). Adding Number of Shifts to the Accuracy ∼ Condition model, however, did not yield any additional significant advantage (likelihood ratio test χ <sup>2</sup> = 2.4, p > 0.1; **Figure 3E**), suggesting that the number of gaze-shifts performed per se did not affect speech comprehension.

To further assess the link between performance on the comprehension questions and gaze-shifts, we tested whether participants were more likely to make mistakes on specific questions if they happened to be looking away from the target-speaker when the critical information for answering that question was delivered. Mistake rates were slightly lower when participants fixated on the target speaker when the critical information was delivered (16% miss-rate) vs. when they looked away (18% miss-rate). To evaluate this effect statistically, we fit a linear mixed model to the accuracy results on individual questions testing whether they were mediated by the presence of a gaze-shift when the answer was given, as well as the condition [Accuracy ∼ Shift (yes/no) + Condition as fixed effects], with by-subject intercepts included as random effects. This analysis demonstrated a small yet significant effect of the presence of a gaze-shift during the period when the answer was given (lmer β = −0.05, t = −2.16, p < 0.04; robustlmm t = −3; **Figure 3F**), however there was no significant effect of Condition (all t's < 0.5).

## Individual Differences in Gaze Patterns

When looking at gaze-patterns across participants, we noted substantial variability in the number of gaze-shift performed and percent time spent gazing away from the target speaker. As illustrated in **Figures 2A**, **4**, some participants stayed completely focused on the main speaker throughout the entire experiment, whereas others spent a substantial portion of each trial gazing

FIGURE 3 | Summary of gaze-shift patterns and behavioral outcomes across conditions. (A,B) Proportion of Gaze-Time and Number of Gaze-Shifts Away from target speaker, per trial and across conditions. Results within each condition are broken down by gaze-location (Right Distractors, Left Distractors or Environment in blank, left and right diagonals, respectively). There was no significant difference between conditions in the total Gaze-time away from the target speaker or number of gaze-shifts. Significantly more Gaze-Shifts were performed in the RLD condition relative to the NoD condition. No other contrasts were significant. (C) Mean accuracy on comprehension questions, across condition. Difference between conditions was not significant. (D,E) Analysis of Accuracy as a function of Gaze-Shift Patterns, at the whole trial level. Trials where participants spent a larger proportion of the time looking away from the target-speaker were associated with lower accuracy rates. No significant correlation was found between accuracy rates and the number of Gaze-Shifts performed. (F) Analysis of Accuracy on single question as a function of Gaze-Shift Patterns. Mistake rates were significantly higher if participants were looking away from the target speaker vs. fixating on the target speaker during the time-window when the information critical for answering the question was delivered. Error bars indicate Standard Error of the Mean (SEM). <sup>∗</sup>p < 0.05.

around the environment (range across participants: 0–18 average number gaze-shifts per trial; 0–34.52% average percent of trial spent looking away from the target speaker). This motivated further inspection of gaze-shift behavior at the individual level. Specifically, we tested whether individual behavior of performing many or few gaze-shifts away from the target was stable across conditions. We calculated Cronbach's α between conditions and found high internal consistency across conditions in the number of gaze-shifts performed as well as in the percent of gaze-time away from the target speaker (α = 0.889 and α = 0.832, respectively). This was further demonstrated by strong positive correlations between the percent time spent gazing away from the target speaker in No Distraction condition vs. each of the Distraction conditions (lmer: all r's > 0.5; robustlmm all r's > 0.6) as well as the number of gaze-shifts (lmer and robustlmm: all r's > 0.5; **Figures 4C,D**). This pattern suggests that individuals have characteristic tendencies to either stay focused or gaze-around the scene, above and beyond the specific sensory attributes or degree of distraction in a particular scenario.

## Gaze-Locked Analysis of Speech Acoustics

Last, we tested whether there was any relationship between the timing of gaze-shifts and the local speech-acoustics. To this end, we performed a gaze-shift-locked analysis of the envelope of the target or distractor speech (when present). Analysis of distractor speech envelope consisted only of eye-gaze shifts toward that distractor (i.e., excluding shifts to other places in the environment). **Figure 5** shows the average time-course of the target and distractor speech envelopes relative to the onset of a gaze-shift. For both target speech (top row) as well as for distractor speech (bottom row), gaze-shifts seem to have been preceded by a brief period of silence (within the lower 5% tile; red shading) between 200 and 300 ms prior to the shift.

Frequency-resolved analysis of the SNR between target and distractor speech similarly indicates low SNR in the period preceding gaze-shifts. A reduction in SNR prior to gaze-shifts was primarily evident in the 3–8 kHz range (sometimes considered the ''unvoiced'' part of the speech spectrum; Atal and Hanauer, 1971), whereas SNR in the lower part of the spectrum (0–2 kHz) was near 1 dB both before and after gaze-shifts. Although SNR does not take into account the overall loudness-level of each speaker but only the ratio between the speakers, the observed SNR modulation is consistent with momentary periods of silence/drops in the volume of both concurrent speakers.

This pattern is in line with an acoustic release-from-masking account, suggesting that gaze-shifts are prompted by momentary gaps in the speech, and particularly when gaps in concurrent

extreme conditions: NoD vs. RLD. Correlations were significant in both cases (r > 0.5).

speech coincide-temporally (as seen here in the Single and Two Distractor conditions). Conversely, the suggestion that attentionshifts are a product of exogenous capture by salient events in distracting speech does not seem to be supported by the current data, since the acoustics of the distractor speech that participants shifted their gaze towards did not seem to contain periods with consistently loud acoustics. We did, however, find increases in loudness of the target speech acoustics near gaze-shift onset (within the top 5% tile; red shading between −100 and +50 ms).

## DISCUSSION

The current study is a first and novel attempt to characterize how individuals deploy overt attention in naturalistic audiovisual settings, laden with rich and competing stimuli. By monitoring eye-gaze dynamics in our Virtual Café, we studied the patterns of gaze-shifts and its consequences for speech comprehension. Interestingly, we found that the presence and number of competing speakers in the environment did not, on average, affect the amount of time spent looking at the target speaker, nor did it impair comprehension of the target speaker, although participants did perform slightly more gaze-shifts away in the two-distractor RLD condition. This demonstrates an overall resilience of the attention and speech-processing systems for overcoming the acoustic-load posed by distractors in naturalistic audio-visual conditions. This ability is of utmost ecological value, and likely benefits both from the availability of visual and spatial cues (Freyman et al., 2004; Zion Golumbic et al., 2013a) as well as the use of semantic context to maintain comprehension despite possible reductions in speech intelligibility (Simpson and Cooke, 2005; Vergauwe et al., 2010; Ding and Simon, 2012a; Calandruccio et al., 2018). At the same time, our results also suggest that the ability to maintain attention on the designated speaker under these conditions is highly individualized. Participants displayed characteristic patterns of either staying focused on a target speaker or sampling other locations in the environment overtly, regardless of the severity of the so-called sensory distraction. Critically, the amount of time that individuals spent looking around the environment and away from the target speaker was negatively correlated with speech comprehension, directly linking overt attention to speech comprehension. We also found that gaze-shifts away from the target speaker occurred primarily following gaps in the acoustic input, suggesting that momentary reductions in acoustic masking can prompt attention-shifts between competing speakers, in line with ''glimpsing'' theories of processing speech in noise. These results open a new window into understanding the dynamics of attention as they wax and wane over time, and the listening patterns exhibited by individuals for dealing with the influx of sensory input in complex naturalistic environments.

## Is Attention Stationary?

An underlying assumption of many experimental studies is that participants allocate attention solely to task-relevant stimuli, and that attention remains stationary over time. However, this assumption is probably unwarranted (Weissman et al., 2006; Esterman et al., 2013) since sustaining attention over long periods of time is extremely taxing (Schweizer and Moosbrugger, 2004; Warm et al., 2008; Avisar and Shalev, 2011), and individuals spend a large proportion of the time mind-wandering or ''off-task'' (Killingsworth and Gilbert, 2010; Boudewyn and Carter, 2018; but see Seli et al., 2018). Yet, empirically testing the studying the frequency and characteristics of attention shifts is operationally difficult since it pertains to participants' internal state that experimenters do not have direct access to. The use of eye-gaze position as a continuous metric for the locus of momentary overt attention in a dynamic scene in the current study contributes to this endeavor.

Here, we found that indeed, in many participants eye-gaze was not maintained on the target speaker throughout the entire trial. Roughly 30% of participants spent over 10% of each trial looking at places in the environment other than the tobe-attended speaker, across all conditions. Interestingly, this proportion is similar to that reported in previous studies for

the prevalence of detecting ones' own name in a so-called unattended message (Cherry, 1953; Wood and Cowan, 1995), an effect attributed by some to rapid attention shifts (Lachter et al., 2004; Beaman et al., 2007; Lin and Yeh, 2014). Although in the current study we did not test whether these participants also gleaned more information from distractors' speech, we did find that comprehension of the target speaker was reduced as a function of the time spent looking away from the target speaker. Participants were also more likely to miss information from the target-speech during gaze-shifts away, yielding slightly higher mistake-rates. These results emphasize the dynamic nature of attention and attention-shifts, and demonstrate that brief overt attention-shifts can negatively impact speech processing in ecological multi-speaker and multisensory contexts.

They also highlight the importance of studying individual differences in attentional control. In the current study set, we did not collect additional personal data from participants which may have shed light on the source of the observed variability in gaze-patterns across individuals. However, based on previous literature, individual differences may stem from factors such as susceptibility to distraction (Ellermeier and Zimmer, 1997; Cowan et al., 2005; Avisar and Shalev, 2011; Bourel-Ponchel et al., 2011; Forster and Lavie, 2014; Hughes, 2014), working memory capacity (Conway et al., 2001; Kane and Engle, 2002; Tsuchida et al., 2012; Sörqvist et al., 2013; Hughes, 2014; Naveh-Benjamin et al., 2014; Wiemers and Redick, 2018) or personality traits (Rauthmann et al., 2012; Risko et al., 2012; Baranes et al., 2015; Hoppe et al., 2018). Additional dedicated research is needed to resolve the source of the individual differences observed here.

## Is Eye-Gaze a Good Measure for Attention-Shifts Among Concurrent Speech?

One may ask, to what extent do the current results fully capture the prevalence of attention-shifts, since it is known that these can also occur covertly (Posner, 1980; Petersen and Posner, 2012)? This is a valid concern and indeed the current results should be taken as representing a lower-bound for the frequency of attention-shifts and we should assume that attention-shifts are probably more prevalent than observed here. This motivates the future development of complementary methods for quantifying covert shifts of attention among concurrent speech, given the current absence of a reliable metrics.

Shavit-Cohen and Zion Golumbic Dynamics of Attention in a Virtual Cafe

Another concern that may be raised with regard to the current results is that individuals may maintain attention to the target speaker even while looking elsewhere, and hence the gaze-shifts measured here might not reflect true shifts of attention. Although in principle this could be possible, previous research shows that this is probably not the default mode of listening under natural audiovisual conditions. Rather, a wealth of studies demonstrate a tight link between gaze-shifts and attention-shifts (Chelazzi et al., 1995; Deubel and Schneider, 1996; Grosbras et al., 2005; Szinte et al., 2018) and gaze is widely utilized experimentally as a proxy for the locus of visuospatial attention (Gredebäck et al., 2009; Linse et al., 2017). In multi-speaker contexts, it has been shown that participants tend to move their eyes towards the location of attended speech sounds (Gopher and Kahneman, 1971; Gopher, 1973). Similarly, looking towards the location of distractor-speech significantly reduces intelligibility and memory for attended speech and increases intrusions from distractor speech (Reisberg et al., 1981; Spence et al., 2000; Yi et al., 2013). This is in line with the current finding of a negative correlation between the time spent looking at the target speaker and speech comprehension, and higher mistake-rates during gaze-shifts, which further link overt gaze to selective attention to speech. Studies on audiovisual speech processing further indicate that looking at the talking face increases speech intelligibility and neural selectivity for attended speech (Sumby and Pollack, 1954; Zion Golumbic et al., 2013a; Lou et al., 2014; Crosse et al., 2016; Park et al., 2016), even when the video is not informative about the content of speech (Kim and Davis, 2003; Schwartz et al., 2004), and eye-gaze is particularly utilized for focusing attention to speech under adverse listening condition (Yi et al., 2013). Taken together, current findings support the interpretation that gaze-shifts reflect shifts in attention away from the target speaker, in line with the limited resources perspective of attention (Lavie et al., 2004; Esterman et al., 2014), making eye-gaze a useful and reliable metric for studying the dynamics of attention to naturalistic audio-visual speech. Interestingly, this metric has recently been capitalized on for use in assistive listening devices, utilizing eye-gaze direction to indicate the direction of a listener's attention (Favre-Felix et al., 2017; Kidd, 2017). That said, gaze-position is likely only one of several factors in determining successful speech comprehension in multi-speaker environments (e.g., SNR level, audio-visual congruency, engagement in content etc.), as suggested by the significant yet still moderate effect-sizes found here.

## Listening Between the Gaps—What Prompts Attention Shifts Among Concurrent Speech?

Besides characterizing the prevalence and behavioral consequences of attention-shifts in audio-visual multi-talker contexts, it is also critical to understand what prompts these shifts. Here we tested whether there are aspects of the scene acoustics that can be associated with attention-shifts away from the target speaker. We specifically tested two hypotheses: (1) that attention is captured exogenously by highly salient sensory events in distracting speech (Wood and Cowan, 1995; Itti and Koch, 2000; Kayser et al., 2005); and (2) that attention-shifts occur during brief pauses in speech acoustics that momentarily unmask the competing sounds (Lavie et al., 2004; Cooke, 2006).

Regarding the first hypothesis, the current data suggest that distractor saliency is not a primary factor in prompting gazeshifts. Since gaze-shifts were just as prevalent in the NoD condition as in conditions that contained distractors and since no consistent increase in distractor loudness was observed near gaze-shifts, we conclude that the gaze-shifts performed by participants do not necessarily reflect exogenous attentional capture by distractor saliency. This is in line with previous studies suggesting that sensory saliency is less effective in drawing exogenous attention in dynamic scenarios relative to the stationary contexts typically used in laboratory experiments (Smith et al., 2013).

Rather, our current results seem to support the latter hypothesis that attention-shifts are prompted by momentary acoustic release-from-masking. We find that gaze-shifts occurred more consistently ∼200–250 ms after instances of low acoustic intensity in both target and distractor sounds and low SNR. This time-scale is on-par with the initiation time for saccades (Gilchrist, 2011), and suggests that momentary reduction in masking provide an opportunity for the system to shift attention between speakers. This pattern fits with accounts for comprehension of speech-in-noise, suggesting that listeners utilize brief periods of unmasking or low SNR to glean and piece together information for deciphering speech content (''acoustic glimpsing''; Cooke, 2006; Li and Loizou, 2007; Vestergaard et al., 2011; Rosen et al., 2013). Although this acousticglimpsing framework is often used to describe how listeners maintain intelligibility of target-speech in noise, it has not been extensively applied to studying shifts of attention among concurrent speech. The current results suggest that brief gaps in the audio or periods of low SNR may serve as triggers for momentary attention shifts, which can manifest overtly (as demonstrated here), and perhaps also covertly. Interestingly, a previous study found that eye-blinks also tend to occur more often around pauses when viewing and listening to audio-visual speech (Nakano and Kitazawa, 2010), pointing to a possible link between acoustic glimpsing and a reset in the oculomotor system, creating optimal conditions for momentary attention-shifts.

## CONCLUSION

There is growing understanding that in order to really understand the human cognitive system, it needs to be studied in contexts relevant for real-life behavior, and that tightly constrained artificial laboratory paradigms do not always generalize to real-life (Kingstone et al., 2008; Marius't Hart et al., 2009; Foulsham et al., 2011; Risko et al., 2016; Rochais et al., 2017; Hoppe et al., 2018). The current study represents the attempt to bridge this gap between the laboratory and reallife, by studying how individuals spontaneously deploy overt attention in a naturalistic virtual-reality environment. Using this approach, the current study highlights the characteristics and individual differences in selective attention to speech under naturalistic listening conditions. This pioneering work opens up new horizons for studying how attention operates in real-life and understanding the factors contributing to success as well as the difficulties in paying attention to speech in noisy environments.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

The study was approved by the Institutional Ethics Committee at Bar-Ilan University, and the research was conducted according

## REFERENCES


to the guidelines of the committee. Signed informed consent was obtained from each participant prior to the experiment.

## AUTHOR CONTRIBUTIONS

EZG designed the study, oversaw data collection and analysis. KS-C collected and analyzed the data. Both authors wrote the article.

## FUNDING

This work was supported by the Israel Science Foundation I-Core Center for Excellence 51/11, and by the United States–Israel Binational Science Foundation grant #2015385.


**Conflict of Interest**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Shavit-Cohen and Zion Golumbic. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Validation of the Virtual Reality Neuroscience Questionnaire: Maximum Duration of Immersive Virtual Reality Sessions Without the Presence of Pertinent Adverse Symptomatology

#### Panagiotis Kourtesis1,2,3,4 \*, Simona Collina3,4, Leonidas A. A. Doumas<sup>2</sup> and Sarah E. MacPherson1,2

## Edited by:

Valerio Rizzo, University of Palermo, Italy

#### Reviewed by:

Eugene Nalivaiko, University of Newcastle, Australia Mark Dennison, United States Army Research Laboratory, United States Justin Maximilian Mittelstädt, Institute of Aerospace Medicine, German Aerospace Center (DLR), Germany

> \*Correspondence: Panagiotis Kourtesis pkourtes@exseed.ed.ac.uk

#### Specialty section:

This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience

Received: 12 August 2019 Accepted: 11 November 2019 Published: 26 November 2019

#### Citation:

Kourtesis P, Collina S, Doumas LAA and MacPherson SE (2019) Validation of the Virtual Reality Neuroscience Questionnaire: Maximum Duration of Immersive Virtual Reality Sessions Without the Presence of Pertinent Adverse Symptomatology. Front. Hum. Neurosci. 13:417. doi: 10.3389/fnhum.2019.00417 <sup>1</sup> Human Cognitive Neuroscience, Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom, <sup>2</sup> Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom, <sup>3</sup> Lab of Experimental Psychology, Suor Orsola Benincasa University of Naples, Naples, Italy, <sup>4</sup> Interdepartmental Centre for Planning and Research "Scienza Nuova", Suor Orsola Benincasa University of Naples, Naples, Italy

There are major concerns about the suitability of immersive virtual reality (VR) systems (i.e., head-mounted display; HMD) to be implemented in research and clinical settings, because of the presence of nausea, dizziness, disorientation, fatigue, and instability (i.e., VR induced symptoms and effects; VRISE). Research suggests that the duration of a VR session modulates the presence and intensity of VRISE, but there are no suggestions regarding the appropriate maximum duration of VR sessions. The implementation of high-end VR HMDs in conjunction with ergonomic VR software seems to mitigate the presence of VRISE substantially. However, a brief tool does not currently exist to appraise and report both the quality of software features and VRISE intensity quantitatively. The Virtual Reality Neuroscience Questionnaire (VRNQ) was developed to assess the quality of VR software in terms of user experience, game mechanics, ingame assistance, and VRISE. Forty participants aged between 28 and 43 years were recruited (18 gamers and 22 non-gamers) for the study. They participated in 3 different VR sessions until they felt weary or discomfort and subsequently filled in the VRNQ. Our results demonstrated that VRNQ is a valid tool for assessing VR software as it has good convergent, discriminant, and construct validity. The maximum duration of VR sessions should be between 55 and 70 min when the VR software meets or exceeds the parsimonious cut-offs of the VRNQ and the users are familiarized with the VR system. Also, the gaming experience does not seem to affect how long VR sessions should last. Also, while the quality of VR software substantially modulates the maximum duration of VR sessions, age and education do not. Finally, deeper immersion, better quality of graphics and sound, and more helpful in-game instructions and prompts were found to reduce VRISE intensity. The VRNQ facilitates the brief assessment and reporting of the quality of VR software features and/or the intensity of VRISE, while its minimum and

**45**

parsimonious cut-offs may appraise the suitability of VR software for implementation in research and clinical settings. The findings of this study contribute to the establishment of rigorous VR methods that are crucial for the viability of immersive VR as a research and clinical tool in cognitive neuroscience and neuropsychology.

Keywords: virtual reality, VRISE, VR sickness, cybersickness, neuroscience, neuropsychology, psychology, motion sickness

## INTRODUCTION

Immersive virtual reality (VR) has emerged as a novel tool for neuroscientific and neuropsychological research (Bohil et al., 2011; Parsons, 2015; Parsons et al., 2018). Nevertheless, there are concerns pertinent to implementing VR in research and clinical settings, especially regarding the head-mounted display (HMD) systems (Sharples et al., 2008; Bohil et al., 2011; de França and Soares, 2017; Palmisano et al., 2017). A primary concern is the presence of adverse physiological symptoms (i.e., nausea, dizziness, disorientation, fatigue, and postural instability), which are referred to as motion-sickness, cybersickness, VR sickness or VR induced symptoms and effects (VRISE) (Sharples et al., 2008; Bohil et al., 2011; de França and Soares, 2017; Palmisano et al., 2017).

Longer durations in a virtual environment have been associated with a higher probability of experiencing VRISE, while the intensity of VRISE also appears to increase proportionally with the duration of the VR session (Sharples et al., 2008). However, extensive linear and angular accelerations provoke intense VRISE, even in a short period of time (McCauley and Sharkey, 1992; LaViola, 2000; Gavgani et al., 2018). VRISE may place the health and safety of the participants or patients at risk of experiencing adverse physiological symptoms (Parsons et al., 2018). Research has also shown that VRISE induce significant decreases in reaction times and overall cognitive performance (Nalivaiko et al., 2015; Nesbitt et al., 2017; Mittelstaedt et al., 2019), as well as substantially increasing body temperatures and heart rates (Nalivaiko et al., 2015), which may compromise physiological data acquisition. Furthermore, the presence of VRISE has been found to significantly augment cerebral blood flow and oxyhemoglobin concentration (Gavgani et al., 2018), electrical brain activity (Arafat et al., 2018), and the connectivity between stimulus-response regions and nauseaprocessing regions (Toschi et al., 2017). Thus, VRISE appear to confound the reliability of neuropsychological, physiological, and neuroimaging data (Kourtesis et al., 2019).

To our knowledge, there do not appear to be any guidelines as to the appropriate maximum duration of VR research and clinical sessions to evade or alleviate the presence of VRISE. Recently, our work has suggested that VRISE are substantially reduced or prevented by VR software that facilitates ergonomic navigation (e.g., physical movement) and interaction (e.g., direct-hand tracking) facilitated by the hardware capabilities (e.g., motion tracking) of commercial, contemporary VR HMDs comparable to or more advanced than the HTC Vive and/or Oculus Rift (Kourtesis et al., 2019). However, there are other factors such as the type of display and its features that may also induce or reduce VRISE (Mittelstaedt et al., 2018; Kourtesis et al., 2019). Nevertheless, we note that adequate technological competence is required to be able to implement appropriate VR hardware and/or software. In an attempt to reach a methodological consensus, we have proposed minimum hardware and software features, which appraise the suitability of VR hardware and software (see **Table 1**; Kourtesis et al., 2019).

While VRISE may occur for various reasons, they are predominantly the undesirable outcomes of hardware and software insufficiencies (e.g., low resolution and refresh rates of the image, a narrow field of view, non-ergonomic interactions, and inappropriate navigation modes) (de França and Soares, 2017; Palmisano et al., 2017; Kourtesis et al., 2019). In terms of hardware, the technical specifications of the computer (e.g., processing power and graphics card), and VR HMD (e.g., the field of view, refresh rate, and resolution) suffice to appraise their suitability (Kourtesis et al., 2019). However, there is not a tool to quantify the software's recommended features, as well as the intensity of VRISE (Kourtesis et al., 2019). Currently, the most frequently used measure of VRISE is the simulator sickness questionnaire (SSQ), which only considers the symptoms pertinent to simulator sickness (Kennedy et al., 1993). However, the SSQ does not assess software attributes (Kennedy et al., 1993), and there is an argument that simulator sickness symptomatology may not be identical to VRISE (Stanney et al., 1997). There is thus a need for a tool, which will enable researchers to assess both the suitability of VR software, as well as the intensity of VRISE.

Our recent technological literature review of VR hardware and software pinpointed four domains that should be considered in the development or selection of VR research/clinical software (Kourtesis et al., 2019). The domains are user experience, game mechanics, in-game assistance, and VRISE. Each domain has five criteria that should be met to ensure the appropriateness of the software (see **Table 1**). Also, in the same study, the metaanalysis of 44 VR neuroscientific studies revealed that most of the studies did not report quantitatively VR software's quality and/or VRISE intensity (Kourtesis et al., 2019). In an attempt to provide a brief tool for the appraisal of VR research/clinical software features and VRISE intensity, we developed the virtual reality neuroscience questionnaire (VRNQ), which includes twenty questions that address five criteria under each domain. This study aimed to validate the VRNQ and provide suggestions for the duration of VR research/clinical sessions. We also considered the gaming experience of the participants to examine whether this may affect the duration of the VR sessions. Lastly, we investigated the software predictors of VRISE as measured by the VRNQ.

## MATERIALS AND METHODS

fnhum-13-00417 November 23, 2019 Time: 16:0 # 3

## Participants

Forty participants (21 males) aged between 28 and 43 years (M = 32.08; SD = 3.54) and an educational level between 12 and 16 full-time years of education (M = 14.25; SD = 1.37) were recruited for the study. Eighteen participants (10 males) identified themselves as gamers through self-report and 22 as non-gamers (11 males). The gamer experience was a dichotomous variable (i.e., gamer or non-gamer) based on the participants' response to a question asking whether they played games on a weekly basis. The participants responded to a call disseminated through mailing lists at the University of Edinburgh and social media. The study was approved by the Philosophy, Psychology and Language Sciences Research Ethics Committee of the University of Edinburgh. All participants provided written informed consent prior to taking part.

## Material

### Hardware

An HTC Vive HMD with two lighthouse-stations for motion tracking was used with two HTC Vive's wands with 6 degrees of freedom (DoF) to facilitate navigation and interactions within the environment (Kourtesis et al., 2019). The VR area where the participants were immersed and interacted with the virtual environments was 4.4 m<sup>2</sup> . Additionally, the HMD was connected to a laptop with an Intel Core i7 7700HQ processor at 2.80 GHz, 16 GB RAM, a 4095 MB NVIDIA GeForce GTX 1070 graphics card, a 931 GB TOSHIBA MQ01ABD100 (SATA) hard disk, and Realtek High Definition Audio.

### Software

Three VR games were selected, which included ergonomic navigation (i.e., teleportation and physical mobility) and interactions (i.e., 6 DoF wands simulating hand movements) with the virtual environment. In line with Kourtesis et al. (2019), the VR software inclusion criteria (see **Table 1**) were: (1) ergonomic interactions which simulate real-life hand movements; (2) a navigation system which uses teleportation and physical mobility; (3) comprehensible tutorials pertinent to the controls; and (4) in-game instructions and prompts which assist the user in orientating and interacting with the virtual environment. The suitability of the VR software for both gamers and non-gamers was also considered. The selected VR games which met the above software criteria were: (1) "Job Simulator" (Session 1)<sup>1</sup> ; (2) "The Lab" (Session 2)<sup>2</sup> ; and (3) "Rick and Morty: Virtual Rickality" (Session 3)<sup>3</sup> . In "Job Simulator," the participant becomes an employee who has several occupations, such as a cook (preparing simply recipes), car mechanic (doing rudimentary tasks e.g., replacing faulty parts), and an office worker (making calls and sending emails). In "The Lab," the participant needs to complete several mini-games like slingshot (shooting down piles of boxes), longbow (shooting down invaders), xortex (spaceship-battles), postcards (visiting exotic places), human medical scan (exploring the human body), solar system (exploring the solar system), robot repair (repairing a robot), and secret shop (exploring a magical shop). In "Rick and Morty: Virtual Rick-ality," the participant needs to complete several imaginary home-chores as in "Job Simulator," though, in this case, the participant is required to follow a sequence of tasks according to a fictional storyline.

### Virtual Reality Neuroscience Questionnaire (VRNQ)

The VRNQ measures the quality of user experience, game mechanics, and in-game assistance, as well as the intensity of VRISE. The VRNQ involves 20 questions where each question corresponds to one of the criteria for appropriate VR research/clinical software (e.g., the level of immersion; see **Table 1**). The 20 questions are grouped under four domains, where each domain encompasses five questions. Hence, VNRQ produces a total score corresponding to the overall quality of VR software, as well as four sub-scores (i.e., user experience, game mechanics, in-game assistance, VRISE). The user experience score is based on the intensity of the immersion, the level of enjoyment, as well as the quality of the graphics, sound, and VR technology (i.e., internal and external hardware). The game mechanics' score depends on the ease to navigate, physically move, and interact with the virtual environment (i.e., use, pick

<sup>3</sup>https://store.steampowered.com/app/469610/Rick\_and\_Morty\_Virtual\_ Rickality/


Derived from Kourtesis et al. (2019).

<sup>1</sup>https://store.steampowered.com/app/448280/Job\_Simulator/ <sup>2</sup>https://store.steampowered.com/app/450390/The\_Lab/

and place, and hold items; two-handed interactions). The ingame assistance score appraises the quality of the tutorial(s), in-game instructions (e.g., description of the aim of the task), and prompts (e.g., arrows showing the direction). The VRISE are evaluated by the intensity of primary adverse symptoms and effects pertinent to VR (i.e., nausea, disorientation, dizziness, fatigue, and instability). VRNQ responses are indicated on a 7-point Likert style scale, ranging from 1 = extremely low to 7 = extremely high. The higher scores indicate a more positive outcome; this also applies to the evaluation of VRISE intensity. Hence, the higher VRISE score indicates a lower intensity of VRISE (i.e., 1 = extremely intense feeling, 2 = very intense feeling, 3 = intense feeling, 4 = moderate feeling, 5 = mild feeling, 6 = very mild feeling, 7 = absent). The VRNQ also includes space under each question, where the participant may provide optional qualitative feedback. For further details, please see the VRNQ in **Supplementary Material**.

## Procedure

The participants individually attended three separate VR sessions; in each session, they were immersed in different VR software. The period between each session was 1 week for each participant (i.e., 3 weeks in total). The participants went through an induction pertinent to the VR software for that session and the specific HMD and controllers used (i.e., HTC Vive and its 6DoF wands-controllers) before being immersed. Subsequently, the participants were asked to play the respective VR game until they completed it, or they felt any discomfort or fatigue. The duration of each VR session was recorded from the time the software was started until the participant expressed that they wanted to discontinue. At the end of each session, participants were asked to complete the VRNQ. The "Job Simulator" was always used in the 1st session, "The Lab" was always used in the 2nd session, and "Rick and Morty: Virtual Rick-ality" was always used in the 3rd session.

## Statistical Analyses

A reliability analysis of the VRNQ was conducted to calculate Cronbach's alpha and inspect whether the items have adequate internal consistency for research and clinical purposes. A Cronbach's alpha of 0.70–1.00 indicates good to excellent internal consistency (Nunally and Bernstein, 1994). A confirmatory factor analysis (CFA) was performed to examine the construct validity of the VRNQ in terms of convergent and discriminant validity (Cole, 1987). The reliability analysis and CFA were conducted using AMOS (version 24) (Arbuckle, 2014), and IBM Statistical Package for the Social Sciences (SPSS) 24.0 (Ibm Corp, 2016). Several tests for goodness of fit were implemented to allow the evaluation of VRNQ's structure. The (CFI), Tuckere Lewis index (TLI), standardized root mean square residual (SRMR), and the root mean squared error of approximation (RMSEA) were used to assess model fit. A CFI and TLI equal to or greater than 0.90 indicate good structural model fit to the data (Hu and Bentler, 1999; Jackson et al., 2009; Hopwood and Donnellan, 2010). An SRMR and RMSEA less than 0.08 postulate a good fit to the data (Hu and Bentler, 1999; Hopwood and Donnellan, 2010). Lastly, the variance of the results was assessed by dividing the χ 2 by the degrees of freedom (df), which is an indicator of the sample distribution (Hu and Bentler, 1999; Jackson et al., 2009; Hopwood and Donnellan, 2010).

The reliability and confirmatory factor analyses were conducted based on 120 observations (40 participants <sup>∗</sup> 3 sessions with different software). The a priori sample size calculator for structural equation models was used to calculate the minimum sample size for model structure. This calculator uses the error function formula, the lower bound sample size formula for a structural equation model, and the normal distribution cumulative distribution function (Soper, 2019a), which are in perfect agreement with the recommendations for statistical power analysis for the behavioral sciences (Cohen, 2013). A sample size of 100 observations was suggested as the minimum for conducting CFA to examine the model structure with statistical power equal to or greater than 0.80. Hence, the 120 observations in our sample appear adequate to conduct a CFA with statistical power equal to or greater than 0.80.

Bayesian Pearson correlation analyses were conducted to examine whether any of the demographic variables were significantly associated with the VRNQ total score and subscores, or the length of the VR sessions. Bayesian paired samples t-tests were performed to investigate possible differences between each session's duration, as well as the VRNQ results for each VR game. Also, a Bayesian independent samples t-test examined whether there were any differences between gamers and nongamers in the duration of the session. Lastly, a Bayesian linear regression was performed to examine the predictors of VRISE, where the Jeffreys–Zellner–Siow (JZS) mixed g-prior was used for the selection of the best model. JZS has the computational advantages of a g-prior in conjunction with the theoretical advantages of a Cauchy prior, which are valuable in variable selection for the best model (Liang et al., 2008; Rouder and Morey, 2012). For all the analyses, a Bayes Factor (BF10) ≥ 10 was set for statistical inference, which indicates strong evidence in favor of the alternative hypothesis (Rouder and Morey, 2012; Wetzels and Wagenmakers, 2012; Marsman and Wagenmakers, 2017). All the Bayesian analyses were performed using JASP (Version 0.8.1.2) (Jasp Team, 2017). The Bayesian Pearson correlation analyses and Bayesian linear regression analysis were conducted based on 120 observations (40 participants <sup>∗</sup> 3 different software sessions). The post hoc statistical power calculator was used to calculate the observed power of the best model using Bayesian linear regression analysis (Soper, 2019b).

## RESULTS

## Reliability Analysis and CFA

The reliability analysis demonstrated good to excellent Cronbach's α for each domain of the VRNQ (i.e., user experience – α = 0.89, game mechanics – α = 0.89, in-game assistance – α = 0.90, VRISE – α = 0.89; see **Table 2**), which indicate very good internal reliability (Nunally and Bernstein, 1994). VRNQ's fit indices are displayed in **Table 2** with their respective thresholds. The χ 2 /df was 1.61, which indicates good


VRNQ Domains: USER, user experience; GM, game mechanics; GA, in-game assistance; VR VRISE.

variance in the sample (Hu and Bentler, 1999; Jackson et al., 2009; Hopwood and Donnellan, 2010). Both CFI and TLI were close to 0.95, which suggest a good fit for the VRNQ model (Hu and Bentler, 1999; Jackson et al., 2009; Hopwood and Donnellan, 2010). Comparably, SPMR and RMSEA values were between 0.06 and 0.08, which also support a good fit (Hu and Bentler, 1999; Jackson et al., 2009; Hopwood and Donnellan, 2010). The VRNQ's path diagram is displayed in **Figure 1**, where from left to right are depicted the correlations among the factors/domains of the VRNQ, the correlations between each factor/domain and its items, and the error terms for each item. The VRNQ items/questions are efficiently associated with their respective factor/domain, which shows good convergent validity (Cole, 1987). Furthermore, there was not any significant correlation amongst the factors/domains, which indicates good discriminant validity (Cole, 1987).

## Descriptive Statistics of Sessions' Duration and VRNQ Scores

The descriptive statistics for the sessions' durations and the VRNQ scores are displayed in **Table 3**. In session 1, the participants were immersed for 59.65 (8.42) minutes. In session 1, the average time of gamers seems more than the average time of non-gamers (**Table 3**). In session 2, the participants spent 64.72 (6.24) minutes (**Table 3**). In session 3, gamers spent 70.44 (7.78) minutes, while non-gamers spent 65.73 (6.75) minutes (**Table 3**). The average total score of the VRNQ for all software was 126.30 (7.55) (maximum score is 140), where gamers and non-gamers scores did not appear to differ. Similarly, the median scores for each domain were 30–32 out of 35, where again gamers and nongamers scores did not appear to differ. Importantly, all the VRISE scores (per item) for both gamers and non-gamers were equal to 5 (i.e., mild feeling), or 6 (i.e., very mild feeling), or 7 (absent feeling). The vast majority of scores were equal to 6 (i.e., very mild feeling) or 7 (absent feeling) (see **Figure 2**).

## Minimum and Parsimonious Cut-Off Scores of VRNQ

Cut-off scores were calculated for the VRNQ total score and sub-scores to inspect the suitability of the assessed VR software (see **Table 4**). In the VRNQ, the ordinal 1–3 responses are paired with negative qualities, response 4 is paired with neutral/moderate qualities, and 5–7 responses are paired with positive qualities (see **Supplementary Material**). The minimum cut-offs suggest that if the median of the responses is 25 for every sub-score, and 100 in the total score (i.e., at least a median of 5 for every item), then the VRNQ outcomes indicate that the evaluated VR software is of an adequate quality not to cause any significant VRISE. Furthermore, the parsimonious cut-offs suggest that, if the median of the responses is 30 for every sub-score, and 120 for the total score (i.e., at least a median of 6 for every item) then the utilization of the parsimonious cut-offs more robustly supports the suitability of the VR software. The minimum and parsimonious cut-offs hence appear adequate to guarantee the safety, pleasantness, and appropriateness of the VR software for research and/or clinical purposes.

## Bayesian T-Tests

The Bayesian independent samples t-test between gamers and non-gamers indicated that the former spent significantly more time in VR across the total duration for the 3 sessions (BF<sup>10</sup> = 14.99), as well as the duration of the 1st session (BF<sup>10</sup> = 2,532; see **Table 4**) (Wetzels and Wagenmakers, 2012; Marsman and Wagenmakers, 2017). The difference is much smaller in the total duration than the difference in the 1st session. Thus, the difference between the gamers and non-gamers in the total duration appears to be driven by the substantial difference in the1st session's duration (see **Table 5**). Conversely, the Bayesian paired samples t-test (i.e., differences between the VR games) indicated significant differences in the total score and every subscore of VRNQ (see **Table 6**) between the VR software. The VR software in the 3rd session was evaluated higher than the VR software in the 1st and 2nd sessions, while the VR software in the 2nd session was rated better than the VR software in the 1st session. There was also an important difference between the duration of the 3rd session (longer) and the duration of the 1st session (shorter; BF<sup>10</sup> = 103,568), while there was not a substantial difference between the duration of the 2nd and 3rd sessions (BF<sup>10</sup> = 2.78), as well as between the duration of 1st and 2nd sessions (BF<sup>10</sup> = 7.05; see **Table 6**) (Wetzels and Wagenmakers, 2012; Marsman and Wagenmakers, 2017).

## Bayesian Pearson Correlation Analyses and Regression Analysis

The Bayesian Pearson correlation analyses did not show any significant correlation between age and any of the VRNQ scores, between age and duration of the sessions, between education and any of the VRNQ scores, or between education and duration of the sessions. However, the duration of the session was positively correlated with the total VRNQ score [BF<sup>10</sup> = 81.54; r(120) = 0.310, p < 0.001]. Furthermore, the VRISE score substantially correlated with the following VRNQ items: immersion, pleasantness, graphics, sound, pick and place, tutorial's difficulty, tutorial's usefulness, tutorial's duration, instructions, and prompts (see **Table 7**). In contrast, VRISE did not significantly correlate with the following VRNQ

in-game assistance; VR, VRISE.

items: VR tech, navigation, physical movement, use items, or two-handed interactions (see **Table 7**). Moreover, the Bayesian regression analysis indicated the five best models that predicted the VRNQ's VRISE score (see **Table 8**). The best model includes the following items from the VRNQ: immersion, graphics, sound, instructions, and prompts. All the predictors exceeded the prior inclusion probabilities (see **Figure 3**). The best model showed a BF<sup>M</sup> = 117.42, whereas the second-best model displayed a BF<sup>M</sup> = 56.40 (see **Table 8**); hence, the difference between the best model compared to the second-best model was robust (Rouder and Morey, 2012; Wetzels and Wagenmakers, 2012; Marsman and Wagenmakers, 2017). Also, the best model has an R <sup>2</sup> = 0.324 (see **Table 8**), which postulates that the model explains the 32.4% of the variance of VRISE score (Rouder and Morey, 2012; Wetzels and Wagenmakers, 2012). Lastly, the post hoc statistical power analysis for the best model indicated an observed statistical power of 0.998, p < 0.001, which postulates a high efficiency, precision, reproducibility, and reliability


of the regression analysis and results (Button et al., 2013; Cohen, 2013).

## DISCUSSION

## The VRNQ as a Research and Clinical Tool

The VRNQ is a short questionnaire (5–10 min administration time) which assesses the quality of VR software in terms of user experience, game mechanics, in-game assistance, and VRISE. The values of the fit indices of CFA (i.e., CFI, TLI, SPMR, and RMSEA) indicated that the VRNQ's structure was a good fit to the data, which postulates good construct validity for the VRNQ (Hu and Bentler, 1999; Jackson et al., 2009; Hopwood and Donnellan, 2010). In addition, the construct validity of the VRNQ was supported by its convergent and discriminant validity (Cole, 1987). VRNQ items were strongly correlated with their grouping factor, which indicates robust convergent validity, while there were substantially poor correlations between the factors, which postulates very good discriminant validity (Cole, 1987). Furthermore, the Cronbach's α for each VRNQ domain (i.e., user experience – α = 0.89, game mechanics – α = 0.89, in-game assistance – α = 0.90, VRISE – α = 0.89; see **Table 2**) suggest very good construct validity (Nunally and Bernstein, 1994). Henceforth, the VRNQ emerges as a valid and suitable tool to evaluate the quality of the VR research/clinical software as well as the intensity of the adverse VRISE.

Furthermore, minimum and parsimonious cut-off scores were calculated for the VRNQ total score and sub-scores to inspect the suitability of the assessed VR software. The minimum cut-offs indicate the lowest acceptable quality that VR research/clinical software should be, while the parsimonious cut-offs are offered for more robust support of the VR software's suitability, which may be required in experimental and clinical designs with more conservative standards. However, the individual scores from the VRNQ may be modulated by individual differences and preferences unrelated to the quality of the software (Kortum and Peres, 2014). In addition, the VRNQ produces ordinal data; therefore, the median is the appropriate measure for their analysis (Harpe, 2015). Hence, the median VRNQ scores for the whole sample should be used to assess the VR software's quality effectively. Also, the medians of the VRNQ total score and subscores allow the generalization of the results and comparison between different VR software (Kortum and Peres, 2014; Harpe, 2015). Researchers, clinicians, and/or research software developers should use the medians of the VRNQ total score and sub-scores to assess whether the implemented VR software exceed the minimum or parsimonious cut-offs. Hence, if the medians of the VRNQ sub-scores and totals score for VR research software meet the minimum cut-offs, then these results support the VR software's suitability. Likewise, if the medians of VRNQ sub-scores and totals score for VR research software meet the parsimonious cut-offs, then these results provide even stronger support for its suitability. However, median scores below these

cut-offs suggest that the suitability of the VR software is questionable, but they do not indicate that this VR software is certainly unsuitable.

Also, VRNQ appears as an appropriate tool to measure both VRISE and VR software features compared to other questionnaires. The SSQ is the most implemented questionnaire in VR studies. However, the SSQ only considers the symptoms pertinent to simulator sickness and it does not assess software attributes (Kennedy et al., 1993), while there is a dispute that simulator sickness symptomatology may not be the same as VRISE (Stanney et al., 1997). Alternatively, Virtual reality sickness questionnaire (VRSQ) was recently developed (Kim et al., 2018). The development of VRSQ was based on the SSQ, where the researchers attempted to isolate the items which are pertinent to VRISE (Kim et al., 2018). However, their sample size was relatively small (i.e., 24 participants <sup>∗</sup> 4 sessions = 96 observations) (Kim et al., 2018). Notably, the factor analyses of Kim et al. (2018) accepted only items pertinent to oculomotor and disorientation components of SSQ, and rejected all the items pertinent to nausea (i.e., 7 items) (Kim et al., 2018), while nausea is the most frequent symptom in VRISE (Stanney et al., 1997; Sharples et al., 2008; Bohil et al., 2011; de França and Soares, 2017; Palmisano et al., 2017). Also, comparable to SSQ, VRSQ does not consider software features. Hence, the VRNQ appears to be the only valid and suitable tool to evaluate both the intensity of predominant VRISE and the quality of VR software features.

The VRNQ allows researchers to report the quality of VR software and/or the intensity of VRISE in their VR studies. However, an in-depth assessment of the numerous software features requires a questionnaire with more than the 20 questions of the VRNQ (Zarour et al., 2015). For an in-depth software analysis, questionnaires with more questions pertinent to the whole spectrum of software features should be preferred (Zarour et al., 2015). Additionally, the VRNQ has solely five items pertinent to VRISE. Hence, it does not offer an exhaustive assessment of VRISE. Studies that aim to investigate VRISE in depth should opt for a tool which contains more items pertinent to VRISE than VRNQ (e.g., SSQ). The VNRQ is a brief questionnaire (5–10 min administration time) including 20 items, which enables researchers, clinicians, and research software developers to evaluate and report the quality of the VR software and the intensity of VRISE for research and clinical purposes.

## Maximum Duration of VR Sessions

The duration of the VR session is a crucial factor in research and/or clinical design. In our sample, the participants discontinued the VR session due to loss of interest, while none discontinued due to VRISE. In the 1st session, gamers spent significantly more time immersed than the non-gamers; a difference which modulated the difference between the two

#### TABLE 4 | VRNQ cut-offs.

fnhum-13-00417 November 23, 2019 Time: 16:0 # 9


The median of each sub-score and totals scores should meet the suggested cutoffs to support that the evaluated VR software has an adequate quality without any significant VRISE. The utilization of the parsimonious cut-offs more robustly supports the suitability of the VR software.

TABLE 5 | Bayesian independent samples t-test: gamers against non-gamers.


BF<sup>10</sup> = Bayes Factor; <sup>∗</sup> BF<sup>10</sup> > 10, ∗∗∗ BF<sup>10</sup> > 100.

groups in the summed duration across all sessions. However, it is worth noting that there was not a significant difference between the two groups in the time spent in VR for the 2nd and 3rd sessions. The observed difference in the 1st session and the absence of a difference in the later sessions' durations postulates that when users are familiarized with the VR technology, while the influence of their gaming experience on the session's duration becomes insignificant. In support of this, a recent study showed that user gaming experience does not affect the perceived workload of the users in VR (Lum et al., 2018). Hence, the level of familiarization of the participants with the VR technology appears to affect substantially the duration of the VR session.

Nevertheless, in the whole sample, irrespective of participants' gaming experience, the durations of the 2nd and 3rd sessions are sufficiently longer than the duration of the 1st session. The duration of the 3rd session is not significantly longer than the duration of the 2nd session. Furthermore, given that in each session, a different VR software was administered, the VRNQ correspondingly pinpointed significant differences amongst the implemented VR software' quality. All the VRNQ scores for the 3rd session's VR software are greater than the 2nd session's VR software scores. Similarly, all the VRNQ scores for the 2nd session's VR software are greater than the 1st session's VR software scores. Also, the duration of VR session was positively correlated with the total score of VRNQ. Thus, the quality of the VR software as measured by the VRNQ seems to be significantly associated with the duration of the VR session.

Overall, in every session, the intensity of VRISE was reported as very mild to absent by the vast majority of the sample. However, comparable to the rest of the VRNQ scores, the VRISE score for the 3rd VR session was significantly higher (i.e., milder feeling) than the 2nd and 3rd sessions. Similarly, the VRISE score for the 2nd session's VR software was substantially higher than the 1st session's VR software score. Notably, there was not any difference between gamers and non-gamers in the VRNQ scores across the three sessions. Equally, the age and education of participants did not correlate with any of the VRNQ scores or the duration of sessions. Thus, the age, education, and gaming experience of the participants did not affect the responses in the VRNQ. Therefore, the observed differences in the VRISE scores between the VR sessions support that the quality of the VR software as measured by the VRNQ and the level of familiarization of the participants with the VR technology also affect the intensity of VRISE.

The findings postulate that the implementation of VR software with a maximum duration between 55 and 70 min is substantially feasible. However, long exposures in VR have been found to increase the probability of experiencing VRISE and the intensity of VRISE (Sharples et al., 2008). In our sample, especially in the 3rd session, which was substantially longer than the other sessions, the intensity of VRISE was significantly lower than the rest of the sessions. As discussed above, the substantially lower intensity of VRISE in the 3rd session appears to be a result of increased VR familiarity, and the better quality of the implemented VR software as measured by the VRNQ. Hence, researchers and/or clinicians should consider the quality of their VR software to define the appropriate duration of their VR session. In research and clinical designs where the duration of the VR session is required to be between 55 and 70 min, the researchers and/or clinicians should opt for the parsimonious cut-offs of the VRNQ to ensure adequate quality of their VR software to facilitate longer sessions without significant VRISE. Additionally, an extended introductory tutorial which allows participants to familiarize themselves with the VR technology and mechanics would assist with the implementation of longer (i.e., 55–70 min) VR sessions, where the presence and intensity of VRISE would not be significant.

## The Quality of VR Software and VRISE

The VRISE score substantially correlated with almost every item under the section of user experience and in-game assistance (see **Table 6**). However, the VRISE score did not correlate with VR tech (the item under the user experience's domain) or most of the items under the section of game mechanics. The quality of VR hardware (i.e., the HMD and its controllers) and interactions (i.e., ergonomic or non-ergonomic) with the virtual environment are crucial for the alleviation or evasion of VRISE (Kourtesis et al., 2019). Nevertheless, in this sample, the VR tech item (i.e., the quality of the internal and external VR hardware) was not expected to correlate with the VRISE score, because the HMD and its 6DoF controllers were the

TABLE 6 | Bayesian paired samples t-tests: differences between the VR software.


BF<sup>10</sup> = Bayes Factor; <sup>∗</sup> BF<sup>10</sup> > 10, ∗∗ BF<sup>10</sup> > 30, ∗∗∗ BF<sup>10</sup> > 100; S1, Session 1; S2, Session 2; S3, Session 3.

same for all 3 VR software versions and sessions. Hence, the variance in the responses to this item was limited. Also, the three VR software games share common game mechanics, especially the same navigation system (i.e., teleportation) and a similar amount of physical mobility. Likewise, apart from some controls (i.e., the button to grab items), the interaction systems of the implemented VR software were very proximal. Therefore, the absence of a correlation between VRISE scores and most of the items in the game mechanics' section was also an expected outcome. Nonetheless, the VRISE score was strongly associated with the level of immersion and enjoyment, the quality

TABLE 7 | Bayesian Pearson correlations analyses: VRISE score with VRNQ items.


BF<sup>10</sup> = Bayes Factor; <sup>∗</sup> BF<sup>10</sup> > 10, ∗∗ BF<sup>10</sup> > 30, ∗∗∗ BF<sup>10</sup> > 100;

of graphics and sound, the comfort to pick and place 3D objects, and the usefulness of in-game assistance modes (i.e., tutorials, instructions, and prompts).

The items which correlated with the VRISE score were also included in the best models of predicting its value (see **Table 7**). Importantly, the best model includes as predictors of VRISE, the level of immersion, the quality of graphics and sound, and the helpfulness of in-game instructions and prompts (see **Table 7**). The higher scores for prompts and instructions indicate that the user was substantially assisted by the in-game assistance (e.g., an arrow showing the direction that the user should follow) to orientate and guide his or herself from one point of interest to the next in accordance with the scenario of the VR experience. This may be interpreted as ease to orient and interact with the virtual environment, as well as a significant decrease in confusion (Brade et al., 2018). The quality of the in-game assistance methods is essential for the usability and enjoyment that VR software offers (Brade et al., 2018). Equally, the quality of the graphics is predominantly dependent upon rendering which encompasses the in-game quality of the image known as perceptual quality, and the exclusion of redundant visual information known as occlusion culling (Lavoué and Mantiuk, 2015). The improvement of these two factors not only results in improved quality of the graphics but also in improved performance of the software (Brennesholtz, 2018). Furthermore, the spatialized sound of VR software, which assists the user to orient his or herself (Ferrand et al., 2017), deepens the experienced immersion (Riecke et al., 2011), and enriches the geometry of the virtual space without affecting the performance of the software (Kobayashi et al., 2015). Lastly, the level of immersion appears to be negatively correlated with the frequency and intensity of VRISE (Milleville-Pennel and Charron, 2015; Weech et al., 2019). The best model hence aligns with the relevant TABLE 8 | Models' comparison: predictors of VRISE score.

fnhum-13-00417 November 23, 2019 Time: 16:0 # 11


P, Probability; M, Model; BFM, Model's Bayesian Factor; <sup>∗</sup> BF<sup>M</sup> > 10, ∗∗ BF<sup>M</sup> > 30, ∗∗∗ BF<sup>M</sup> > 100; BF<sup>10</sup> = BF against null model.

literature and provides further evidence in support of the utility of the VRNQ as a valid and efficient tool to appraise the quality of the VR software and intensity of VRISE.

## Limitations and Future Studies

This study also has some limitations. In this study, construct validity for the VRNQ is provided. However, future work should endeavor to provide convergent validation of the VRNQ with tools that measure VRISE symptomatology (e.g., SSQ) and/or VR software attributes. Moreover, the sample size was relatively small, but it offered an adequate statistical power for the conducted analyses. Also, the VRNQ does not directly quantify linear or angular accelerations, which may induce intense VRISE in a relatively short period of time (McCauley and Sharkey, 1992; LaViola, 2000; Gavgani et al., 2018). However, the VRNQ quantifies the effect(s) of linear and angular accelerations (i.e., VRISE), where VR software with a highly provocative content (e.g., linear and angular accelerations) would fail to meet or exceed the VRNQ cutoffs for the VRISE domain. Furthermore, the study utilized only one type of VR hardware, which did not allow us to inspect the effect of VR HMD's quality on VRISE presence and intensity. Similarly, our VR software did not allow us to compare different ergonomic interactions or levels of provocative potency pertaining to VRISE. Future studies with a larger sample, various types of VR hardware, and VR software with substantially more diverse features will offer further insights on the impact of software features on VRISE intensity, as well as provide additional support for the VRNQ's structural model. Lastly, neuroimaging (e.g., electroencephalography) and physiological data (e.g., heart rates) may correlate, classify, and predict VRISE symptomatology (Kim et al., 2005; Dennison et al., 2016, 2019). Hence, future studies should consider collecting neuroimaging and/or physiological data that could further elucidate the relationship between VRNQ's VRISE score(s) and brain region activation or cardiovascular responses (e.g., heart rate).

## CONCLUSION

This study showed that the VRNQ is a valid and reliable tool which assesses the quality of VR software and intensity of VRISE. Our findings support the viability of VR sessions with a duration up to 70 min, when the participants are

familiarized with VR tech through an induction session, and the quality of the VR software meets the parsimonious cut-offs of VRNQ. Also, our results offered insights on the software-related predictors of VRISE intensity, such as the level of immersion, the quality of graphics and sound, and the helpfulness of ingame instructions and prompts. Finally, the VRNQ enables researchers to quantitatively assess and report the quality of VR software features and intensity of VRISE, which are vital for the efficacious implementation of immersive VR systems in cognitive neuroscience and neuropsychology. The minimum and parsimonious cut-offs of VRNQ may appraise the suitability of VR software for implementation in research and clinical settings. The VRNQ and the findings of this study contribute to the endeavor of establishing thorough VR research and clinical methods that are crucial to guarantee the viability of implementing immersive VR systems in cognitive neuroscience and neuropsychology.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## REFERENCES


## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Philosophy, Psychology and Language Sciences Research Ethics Committee of the University of Edinburgh. The patients/participants provided their written informed consent to participate in this study.

## AUTHOR CONTRIBUTIONS

PK had the initial idea and contributed to every aspect of this study. SC, LD, and SM contributed to the methodological aspects and the discussion of the results. The VRNQ may be downloaded from **Supplementary Material**.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum. 2019.00417/full#supplementary-material

assessments during a virtual ride on a roller coaster. Physiol. Behav. 191, 56–64. doi: 10.1016/j.physbeh.2018.04.007



**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Kourtesis, Collina, Doumas and MacPherson. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Virtual Reality Analgesia With Interactive Eye Tracking During Brief Thermal Pain Stimuli: A Randomized Controlled Trial (Crossover Design)

Najood A. Al-Ghamdi<sup>1</sup> , Walter J. Meyer III2,3, Barbara Atzori<sup>4</sup> , Wadee Alhalabi5,6,7 , Clayton C. Seibel<sup>8</sup> , David Ullman<sup>8</sup> and Hunter G. Hoffman8,9 \*

<sup>1</sup> Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia, <sup>2</sup> Shriners Hospitals for Children, Galveston, TX, United States, <sup>3</sup> Department of Psychiatry, The University of Texas Medical Branch at Galveston, Galveston, TX, United States, <sup>4</sup> Department of Health Sciences, School of Psychology, University of Florence, Florence, Italy, <sup>5</sup> Department of Computer Science, College of Engineering, Effat University, Jeddah, Saudi Arabia, <sup>6</sup> The Virtual Reality Research Group, King Abdulaziz University, Jeddah, Saudi Arabia, <sup>7</sup> Department of Computer Science, Dar Al-Hekma University, Jeddah, Saudi Arabia, <sup>8</sup> Virtual Reality Research Center, Human Photonics Lab, University of Washington, Seattle, WA, United States, <sup>9</sup> Department of Mechanical Engineering, College of Engineering, University of Washington, Seattle, WA, United States

#### Edited by:

Valerio Rizzo, University of Palermo, Italy

#### Reviewed by:

Luana Colloca, University of Maryland, Baltimore, United States Karel Allegaert, University Hospitals Leuven, Belgium Paula Goolkasian, University of North Carolina at Charlotte, United States Elisa Pedroli, Italian Institute for Auxology (IRCCS), Italy

> \*Correspondence: Hunter G. Hoffman hoontair@gmail.com

#### Specialty section:

This article was submitted to Sensory Neuroscience, a section of the journal Frontiers in Human Neuroscience

Received: 24 April 2019 Accepted: 19 December 2019 Published: 23 January 2020

#### Citation:

Al-Ghamdi NA, Meyer WJ III, Atzori B, Alhalabi W, Seibel CC, Ullman D and Hoffman HG (2020) Virtual Reality Analgesia With Interactive Eye Tracking During Brief Thermal Pain Stimuli: A Randomized Controlled Trial (Crossover Design). Front. Hum. Neurosci. 13:467. doi: 10.3389/fnhum.2019.00467 In light of growing concerns about opioid analgesics, developing new nonpharmacologic pain control techniques has become a high priority. Adjunctive virtual reality can help reduce acute pain during painful medical procedures. However, for some especially painful medical procedures such as burn wound cleaning, clinical researchers recommend that more distracting versions of virtual reality are needed, to further amplify the potency of virtual reality analgesia. The current study with healthy volunteers explores for the first time whether interacting with virtual objects in Virtual Reality (VR) via "hands free" eye-tracking technology integrated into the VR helmet makes VR more effective/powerful than non-interactive/passive VR (no eye-tracking) for reducing pain during brief thermal pain stimuli.

Method: Forty eight healthy volunteers participated in the main study. Using a withinsubject design, each participant received one brief thermal pain stimulus during interactive eye tracked virtual reality, and each participant received another thermal pain stimulus during non-interactive VR (treatment order randomized). After each pain stimulus, participants provided subjective 0–10 ratings of cognitive, sensory and affective components of pain, and rated the amount of fun they had during the pain stimulus.

Results: As predicted, interactive eye tracking increased the analgesic effectiveness of immersive virtual reality. Compared to the passive non-interactive VR condition, during the interactive eye tracked VR condition, participants reported significant reductions in worst pain (p < 0.001) and pain unpleasantness (p < 0.001). Participants reported a significantly stronger illusion of presence (p < 0.001), and significantly more fun in VR (p < 0.001) during the interactive condition compared to during passive VR. In summary, as predicted by our primary hypothesis, in the current laboratory acute pain analog study with healthy volunteers, increasing the immersiveness of the VR system

**58**

via interactive eye tracking significantly increased how effectively VR reduced worst pain during a brief thermal pain stimulus. Although attention was not directly measured, the pattern of pain ratings, presence ratings, and fun ratings are consistent with an attentional mechanism for how VR reduces pain. Whether the current results generalize to clinical patient populations is another important topic for future research. Additional research and development is recommended.

Keywords: virtual reality, analgesia, pain, distraction, non-pharmacologic analgesic techniques, opioid analgesia

## INTRODUCTION

Excessive pain during medical procedures is a frequent problem, worldwide, and recovering from a severe burn is often unusually painful. As part of the natural healing process, severe burns exfoliate dead skin cells. To prevent infection and speed up healing, wound care nurses remove the bandages, and wipe/scrub the burn wound with a wet washcloth to remove/clean off the thin layer of sloughed off dead skin cells, and other debris.

Despite giving patients powerful analgesia pain medications shortly before wound care, patients typically remain conscious during burn wound care, pain during non-surgical wound debridement of severe burn wounds is often severe to excruciating, and wound care procedures are repeated frequently. Children with large burns typically have their burn wounds cleaned several days per week, during several weeks of hospitalization (Hoffman et al., 2019).

Psychological factors such as fear, anxiety and depression can increase or amplify how much pain patients subjectively experience during painful medical procedures (Hemington et al., 2017), making pain management even more challenging. What people are thinking about, and where patients direct their attention during medical procedures (Heathcote et al., 2017), expectations of pain, and memories of previous painful procedures can increase pain intensity (Melzack and Wall, 1965; Noel et al., 2012, 2015a,b; Fields, 2018; Fischer et al., 2018).

Opioid analgesics are widely regarded as effective and essential tools for acute pain management such as burn wound care (Malchow and Black, 2008; McIntyre et al., 2016; Ballantyne, 2018) but opioid side effects such as nausea, constipation, urinary retention, drowsiness, and lack of appetite (Mendez-Romero et al., 2018) limit opioid dose levels (Cherny et al., 2001). In addition, there are growing concerns about overprescription of opioids in the United States and Europe (Krane, 2019; Yaster et al., 2019). There are also growing concerns about the opposite problem of limited availability of opioids in under-developed countries (Vijayan, 2011), recent shortages of medical opioid analgesics in the United States, and possible large reductions in availability in the future, in light of stricter laws and large lawsuits against pharmaceutical companies that sell opioid analgesics. In light of growing problems with opioid analgesics, developing effective new non-pharmacologic pain control treatments has become a national and international priority (Keefe et al., 2012, 2018).

Psychological pain control techniques can help reduce reliance on opioids, and may help compensate for undermedication. There is growing evidence that adjunctive immersive virtual reality distraction can help reduce the suffering of patients during medical procedures with little or no side effects from the VR (Hoffman, 1998; Hoffman et al., 2000, 2011; Garrett et al., 2014; Atzori et al., 2018a,b; Indovina et al., 2018). Virtual reality goggles with eye tracking technology embedded into the goggles, have recently become commercially available, and could potentially help make VR more distracting. However, to date, one important gap in the scientific VR analgesia literature is that there have been no PubMed indexed studies using eye tracked Virtual Reality to treat pain.

In previous clinical studies, virtual reality has typically been used in addition to traditional pain medications to help reduce the pain experienced by patients during painful severe burn wound cleaning sessions (Hoffman et al., 2000, 2004a, 2019; Maani et al., 2011a,b; Kipping et al., 2012; Faber et al., 2013; Jeffs et al., 2014; Khadra et al., 2018; McSherry et al., 2018) 1 . In a military study, while in virtual reality, soldiers with combat-related burn injuries spent less time thinking about their pain, patients reported reductions in worst pain intensity and reductions in the emotional component of pain (pain unpleasantness) during VR, and patients reported having more fun when they went into virtual reality during medical procedures, compared to standard of care pain medications alone (Maani et al., 2008, 2011a,b).

The essence of immersive virtual reality analgesia is the patient's illusion of going to a different place, the subjective experience of "feeling present" in the computer generated world, as if the virtual world is a place the patient is visiting (Slater et al., 1994; Slater and Wilbur, 1997). Researchers argue that the illusion of "being there" in virtual reality is unusually attention grabbing (Hoffman et al., 2000, 2001, 2003b, 2004a,b,c, 2007; Law et al., 2011). For example, people perform more poorly on a divided attention task while in virtual reality (Hoffman et al., 2003b). The perception of pain requires attentional resources. Researchers (e.g., Hoffman, 1998; Hoffman et al., 2000; Birnie et al., 2017), propose that VR uses up so much attention, that the patient's brain has less attention available to process incoming nociceptive signals traveling from the burn wound to the brain while the wound is being cleaned.

One clinical research study recently explored the use of virtual reality to help reduce the pain of pediatric patients with large severe burn injuries during burn wound cleaning sessions in the

<sup>1</sup>https://www.nationalgeographic.com/magazine/2020/01/scientists-areunraveling-the-mysteries-of-pain-feature/

intensive care unit of a regional pediatric burn center (Hoffman et al., 2019). Although patients reported large and significant reductions in pain during burn wound care, the researchers recommended that stronger versions of virtual reality need to be developed, in order to better distract patients experiencing such high levels of pain during burn wound cleaning.

Researchers have described and tested design guidelines for how to make VR more effective. Several analog laboratory thermal pain studies have shown that more immersive VR systems designed to elicit a stronger illusion of feeling present in the virtual world are more effective at reducing pain (Hoffman et al., 2004c, 2006, 2014; Dahlquist et al., 2007; Wender et al., 2009; Law et al., 2011; Zeroth et al., 2019). Interacting with the virtual world increases the immersiveness of the VR system, potentially increasing the amount of attention drawn into the virtual world (e.g., Wender et al., 2009). The results of these laboratory studies have helped guide the design of effective VR analgesia systems, e.g., fMRI magnet-friendly wide field of view VR helmets, e.g., Hoffman et al. (2004b, 2007) and the development robot-like arm VR goggle holders for severe burn patients (Maani et al., 2008, 2011a,b; Hoffman et al., 2019).

Unfortunately, increasing the immersiveness of VR systems used in the ICU for children with large severe burn injuries is both highly recommended, but also technically challenging. Burns on their heads and face often preclude burn patients from wearing a traditional VR helmet, so head tracking is not possible. Children with large severe burn injuries also often have severe burns on their hands/fingers, making it difficult for them to use mouse tracking to interact with the virtual world during burn wound care.

In the current pilot laboratory study, we used a new VR helmet that allows participants to use their eye movements as a "hands free" input device to interact with the virtual world. We predicted that adding interactivity via an eye tracking system embedded into the VR goggles would increase the immersiveness of the VR system, and would increase the participants illusion of "being there" in the virtual world, making VR more attention grabbing and more effective at reducing the acute pain of healthy volunteers during brief thermal pain stimuli. Several large computer companies are developing and marketing new virtual reality and augmented reality eye tracking technologies. https://www.forbes.com/sites/solrogers/2019/ 02/05/seven-reasons-why-eye-tracking-will-fundamentallychange-vr/#16bfb2c83459. For example, Apple Computers recently purchased SMI. Note that SMI is the company that made the eye tracking technology used in the current study.

The current analog laboratory pain study with healthy volunteer participants is the first controlled study in the PubMed literature to explore whether interactive eye tracking can enhance the analgesic effectiveness of virtual reality distraction. The SMI eye tracked HTC VIVE VR helmet starts with a standard HTC VIVE helmet. But in addition, each eyepiece of the goggles is trimmed with a small ring of eye tracking technology<sup>2</sup> . Six infrared lights are positioned in a circle around each eye. In addition to the low energy infrared lights, miniature infrared cameras mounted onto the same ring record the pattern of red lights with an infrared camera (see **Figure 3**). These miniature cameras can make real time digital video streams of the six small red dots of infrared light reflected off of the outer surface of the patient's eyes (the cornea). As the participants look at different objects in the computer generated world, the pattern of infrared dots changes shape. The VR computer can tell from the pattern of dots, where the patient is looking (search www.youtube.com for "SMI eyetracked virtual reality" for related informational videos). Because the eye tracking system only uses light in the narrow bandwidth of infra-red, the video camera is able to ignore confusing reflection noise from the visible spectrum and infrared thus improves eye tracking accuracy.

The information from the miniature infrared cameras mounted in the VR helmet is transmitted to the VR software program in the VR computer. In the current study, participants interact with the virtual world by aiming virtual snowballs at virtual objects in the 3D virtual canyon. Just as a computer mouse input device can be used to move a computer cursor around a computer screen, using eye tracking technology embedded into the VR goggles, in the current study, the participant in virtual reality can aim snowballs at objects in virtual reality by simply looking at the virtual objects. Essentially, the "cursor" or reticle crosshair, follows the patient's eye fixations. So if the patient looks at a Snowman in virtual reality, the virtual snowballs hit the Snowman, and the virtual snowman reacts (with special animated effects) when hit by a snowball.

The current laboratory thermal pain study with healthy volunteers explores for the first time, whether interactive eye tracking can enhance the analgesic effectiveness of virtual reality distraction.

## MATERIALS AND METHODS

## Subjects

Forty-eight female college student volunteers from Effat University (age range 18–30 years old, mean = 21.77, SD = 2.07) participated in the main study, and an additional 24 students (from the identical context as the main study participants) were randomized to participate in a small pilot side study, to test our pain paradigm assumption that pain ratings were stable over repeated stimulations for people who received two test pain stimuli with No VR. Effat University is an all female institution of higher education for women in Jeddah Saudi Arabia. All data was collected by female research assistants and all participants were female, an understudied gender. This research was conducted in accordance with the Declaration of the World Medical Association<sup>3</sup> . All subjects gave written informed consent in accordance with the Declaration of Helsinki. Both written and verbal informed consent were obtained using a protocol approved by the Effat University's Human Subjects Review Committee.

<sup>2</sup>https://en.wikipedia.org/wiki/SensoMotoric\_Instruments

<sup>3</sup>www.wma.net

## Within-Subjects Design

fnhum-13-00467 January 20, 2020 Time: 18:50 # 4

Each of the 48 participants who received VR rated their pain during "eye tracked VR" during one thermal pain stimulus (e.g., Test 1), and rated their pain again during a second thermal stimulus during "passive VR" (e.g., Test 2). Treatment order of passive VR vs. interactive eye tracked VR was randomized using random number sequences from www.random.org. Some people received "eye tracked VR" first and "passive VR" second, and some people received "passive VR" first and "eye tracked VR" second. Each individual participant's pain during passive VR was compared to that same participant's pain during interactive eye tracked VR.

## Measures and Procedures Experimental Thermal Pain Model

Controlled thermal pain stimulation was applied using a commercially available computerized Medoc thermal pain stimulator<sup>4</sup> (Medoc Q-Sense Ramp and Hold program). During the first phase of the study, each participant selected the temperature they would use in this study. The stimulus temperature (range = 44 – 48.5◦C in the present study) of each 10 s heat stimulus temperature was individually determined for each subject using the psychophysical method of ascending levels (Hoffman et al., 2004b, 2007). A 10-s heat stimulus (always 44◦C for the first stimulus) was delivered via a thermode attached to the participant's forearm (by a female researcher), and the subject was asked to rate their pain during the stimulus using a 0–10 graphic rating scale. With the subject's permission, the temperature for the next stimulus was then increased by 1◦C (or less, if the participant was approaching their maximum) and participants again rated their pain. This sequence was repeated until the subject reported a stimulus that was "painful but tolerable" for the brief stimulus duration, and that the subject was willing to receive for two additional 10-s thermal pain stimuli. This final stimulus temperature that the participant selected for the baseline pain condition (10 s thermal stimulus with no distraction) also served as the pain stimulus temperature used during the subsequent VR interventions (10 s of thermal pain during passive VR distraction, and 10 s of thermal pain during interactive eye-tracked VR, VR treatment order randomized). Allowing participants to select the temperature they would use in this experiment was popular with the participants.

The VR system was carried out using a gaming laptop: MSI GeForce GTX 1080 8 GB, Intel Core i7 7th (2.80 GHz), 16 GB RAM, Windows 10 operating system connected to an SMI HTC VIVE VR helmet with FOV 110◦ from HTC, with 1080 × 1200 pixels per eye resolution and a refresh rate of 90 Hz. The head mounted display VR helmet, integrated with SMI eye-tracking 250 Hz, works with the SDK C + + \C# for various VR engines like Unity. A new VR world, SnowCanyon<sup>5</sup>,<sup>6</sup> was integrated with the eye tracking hardware, enabling participants to use the eye-tracker to select a virtual object by simply looking at the

FIGURE 1 | A still shot from SnowCanyon (image by bigenvironments.com, copyright Hunter Hoffman, www.vrpain.com).

FIGURE 2 | Researcher with a student volunteer participant during the laboratory pain study (photo and copyright Hunter Hoffman, UW, www.vrpain.com).

virtual object target (e.g., a virtual snowman) in the VR goggles. The SnowCanyon virtual environment<sup>5</sup> presents a virtual arctic canyon to the user, complete with flowing river below, blue sky above, and terraced canyon walls to the sides containing virtual penguins, igloos, and snowmen (see **Figure 1**). Subjects in both treatment conditions wore an HTC VIVE VR helmet headmounted display with an integrated SMI eye tracking system (see **Figures 2**, **3**). For all participants in the current study, sound was muted and the helmets were immobilized (no head tracking), as an analog to the robot-like articulated arm goggle held eye-tracked VR goggles that will eventually be used with actual burn patients in future clinical studies. The eye-tracking interactions in the VR game were designed to be easy for the

<sup>4</sup>www.medoc-web.com

<sup>5</sup> bigenvironments.com

<sup>6</sup>www.vrpain.com

FIGURE 3 | An artist's rendition of an eye-tracked eye. Photo, image and copyright Hunter Hoffman, www.vrpain.com.

participants to make sense of how the players interact with game objects in that game/environment.

In both VR treatment conditions, each subject "glided" through the virtual world along a pre-determined path. During the interactive VR treatment condition, participants in SnowCanyon could target and shoot virtual objects in the virtual world by moving their eyes (i.e., simply looking at an object to aim snowballs at that object). The Passive VR treatment condition consisted of the identical SnowCanyon software and VR system, but with no eye-tracking and no interactivity/no snowballs. Subjects passively glided through the snowy 3D canyon in the 110 degree field of view HTC VIVE VR goggles.

#### Measures

After each pain stimulus, subjects received the following instructions prior to answering six separate subjective questions: "Please indicate how you felt during the most recent 10-s pain stimulus by making a mark anywhere on the line. Your response does not have to be a whole number."

After each pain stimulus, participants rated their pain using Graphic Rating Scales (GRS). Such pain rating scales have been shown to be valid through their strong associations with other measures of pain intensity, as well as through their ability to detect treatment effects (Jensen and Karoly, 2001; Jensen, 2003; Hoffman et al., 2004c). The GRS is a 10-unit horizontal line labeled with number and word descriptors. In the current study, the tool was used to assess three reports of the pain experience ("worst pain," "pain unpleasantness," and "time spent thinking about pain") that correspond to three separable components of the pain experience; sensory pain, affective pain, and cognitive pain, respectively.

Descriptor labels were associated with each mark to help the participant rate their pain magnitude in each domain. For pain intensity, the GRS descriptors were no pain at all, mild pain, moderate pain, severe pain, and excruciating pain. For pain unpleasantness, the GRS descriptors were not unpleasant at all, mildly unpleasant, moderately unpleasant, severely unpleasant, and excruciatingly unpleasant. For time spent thinking about pain, the GRS descriptors were none of the time, some of the time, half of the time, most of the time, all of the time.

The Graphic Rating Scale has previously been used to assess pain intensity in children eight and older and has been documented to be the preferred report method for young children (Tesler et al., 1991). The GRS is more sensitive than simple descriptive pain scales and participants can easily answer these pain ratings despite having no previous experience. Visual Analog Scales have been validated for use in children aged 7 and higher (Bringuier et al., 2009).

A single rating "to what extent did you feel like you 'went into' the virtual world," adapted from Slater et al. (1994) was also used in the present study to assess user presence in the virtual world. Descriptor labels were I did not feel like I went inside at all, mild sense of going inside, moderate sense of going inside, strong sense of going inside, I went completely inside the computer generated world. Hendrix and Barfield (1995) showed the reliability of a similar VR presence rating. The measure's ability to detect treatment effects (Hoffman et al., 2003a, 2004c) is preliminary evidence of our VR presence measure's validity. Participants also rated how real the objects seemed in virtual reality, descriptors were completely fake, somewhat real, moderately real, very real, indistinguishable from a real object. Participants rated nausea as a result of VR, using a graphic rating scale with descriptors no nausea at all, mild nausea, moderate nausea, severe nausea, vomit. As a surrogate measure of positive affect, participants rated how much fun they had during the painful stimulus (Hoffman et al., 2004a). The verbal descriptors associated with the fun rating were no fun at all (0), mildly fun (1–4), moderately fun (5–6), pretty fun (7–9), and extremely fun (10). Previous studies indicate that these secondary measures are sensitive to manipulations of the immersiveness of the VR system (Hoffman et al., 2004a,c; Wender et al., 2009).

The specific questions used in the current study were designed to assess the cognitive component of pain (amount of time spent thinking about pain), the affective component of pain (pain unpleasantness), and the sensory component of pain (worst pain). Nausea/Dizziness was assessed in an effort to identify the incidence of this component of simulator sickness sometimes associated with VR use.

Participants individually identified and pre-approved a baseline thermal pain stimulus temperature they found "painful but tolerable for 10 s, that they were willing to tolerate for two more 10 s thermal pain stimuli at that same temperature." The mean thermal stimulation temperature was 47.18◦C (SD = 0.93).

Subjective pain ratings were obtained from each healthy volunteer participant after brief thermal pain stimuli at three time

points, with an interstimulus interval of approximately 4 min, using the same temperature each time: (a) baseline, (b) Test 1, and (c) Test 2.

## RESULTS

As mentioned earlier, treatment order was randomized, using random number sequences from random.org. Twenty-four participants received passive VR first and interactive eye tracked VR second (treatment order 1) and the other 24 participants received interactive eye tracked VR first and passive VR second (treatment order 2).

A two way Mixed ANOVA was conducted to test for undesired treatment order effects. The repeated measures ANOVA factor was (passive VR vs. interactive eye tracked VR), and the between groups factor was treatment order 1 vs. treatment order 2. Mixed ANOVAs showed there was no significant interaction between treatment order and worst pain ratings (i.e., no significant treatment order effects for Worst pain), F(1,46) = 1.81, p = 0.19 ns, MS = 2.04, η 2 <sup>p</sup> = 0.04. There was no significant interaction between treatment order and pain unpleasantness, F(1,46) < 1, p = 0.69 ns, MS = 0.26, η 2 <sup>p</sup> = 0.004. There was no significant interaction between treatment order and participants ratings of Time spent thinking about pain during the thermal stimulus, F(1,46) = 1.00, p = 0.32 ns, MS = 0.51, η 2 <sup>p</sup> = 0.02. And, there was no significant interaction between treatment order and participants ratings of Fun during the thermal stimulus, F(1,46) = 1.45, p = 0.23 ns, MS = 1.76, η 2 <sup>p</sup> = 0.03.

Because no significant order effects were found, the results were collapsed across treatment order for all of the analyses below.

## Statistical Analyses Collapsed Across Treatment Order

One way repeated measure ANOVAs were performed to test if there were significant main effects of No VR vs. passive VR vs. interactive eye-tracked VR. Paired t-test analyses were performed for the primary outcome measure (worst pain), as well as for the secondary pain measures (i.e., pain unpleasantness, and time spent thinking about pain). For these three pain ratings, alpha was conservatively set at 0.05/3 = 0.017. Any p-value less than 0.017 was considered significant (Bonferroni corrected for familywise error). Additional paired t-test analyses were performed for the other secondary graphic rating scale measures (fun, nausea, presence, real) with α = 0.05.

### WORST PAIN

A one way repeated measure ANOVA indicated a significant main effect of No VR vs. passive VR vs. interactive eye-tracked VR for worst pain, F(2,94) = 61.41, p < 0.001, MS = 92.15, partial η <sup>2</sup> = 0.57. As predicted, compared to No VR, worst pain ratings were significantly lower during passive VR. Compared to No VR, worst pain was significantly lower during interactive eye tracked VR. And compared to passive VR, worst pain was significantly lower during interactive eye tracked VR.



## Pain Unpleasantness

A one way repeated measure ANOVA showed a significant main effect of No VR vs. passive VR vs. interactive eye tracked VR for pain unpleasantness, F(2,94) = 44.33, p < 0.001, MS = 68.92, η 2 <sup>p</sup> = 0.49.

Post hoc paired comparisons (paired t-tests) are shown below. As predicted, compared to No VR, pain unpleasantness ratings were significantly lower during passive VR. Compared to No VR, pain unpleasantness was significantly lower during interactive eye tracked VR. And compared to passive VR, pain unpleasantness was significantly lower during interactive eye tracked VR.

## PAIN UNPLEASANTNESS


## Time Spent Thinking About Pain

A one way repeated measure ANOVA (using Greenhouse– Geisser) showed a significant main effect of No VR vs. passive VR vs. interactive eye tracked VR for "time spent thinking about pain," F(2,94) = 28.83, p < 0.001, MS = 79.49, η 2 <sup>p</sup> = 0.38.

Post hoc paired comparisons (paired t-tests) are shown below. As predicted, compared to No VR, participants ratings of time spent thinking about pain during the thermal stimulus were significantly lower during passive VR. Compared to No VR, time spent thinking about pain was significantly lower during interactive eye tracked VR. But contrary to predictions, compared to passive VR, time spent thinking about pain was NOT significantly lower during interactive eye tracked VR.

#### TIME Spent Thinking About PAIN

fnhum-13-00467 January 20, 2020 Time: 18:50 # 7


## Fun During the Thermal Stimulus

A one way repeated measure ANOVA showed a significant main effect of No VR vs. passive VR vs. interactive eye tracked VR for "fun," F(2,94) = 107.40, p < 0.001, MS = 234.65, η 2 <sup>p</sup> = 0.70.

Post hoc paired comparisons (paired t-tests) are shown below. As predicted, compared to No VR, participants' ratings of fun during the thermal stimulus were significantly higher during passive VR. Compared to No VR, fun was significantly higher during interactive eye tracked VR. And compared to passive VR, fun was significantly higher during interactive eye tracked VR.

#### FUN


#### NAUSEA FROM VIRTUAL REALITY

No significant difference in "nausea during VR" was found between passive VR and interactive eye tracked VR.


## Presence

Compared to their illusion of presence during passive VR, participants reported having a significantly stronger illusion of presence in virtual reality (being there), during interactive eye tracked VR.

### PRESENCE IN VIRTUAL REALITY


## Real

Participants rated the virtual objects as significantly more real during interactive eye tracked VR, compared to during passive VR.

### HOW REAL WERE THE OBJECTS IN VIRTUAL REALITY

**Real**


In order to test an important assumption of our thermal pain paradigm, pilot data collected from an additional 24 participants in the same context received No VR during baseline, No VR during Test 1 vs. No VR again during Test 2 (see **Table 1**). As predicted, participants who received No VR did not habituate to the thermal pain stimuli. In other words, the pain ratings from the thermal pain stimulations were stable over repeated pain stimulations for people who received one baseline pain and two test pain stimuli with No VR, using the same thermal pain paradigm as the main study. As predicted, as shown in **Table 1**, three separate Bonferroni corrected One-Way repeated measure ANOVAs indicated no significant main effect of No VR vs. passive VR vs. interactive eye-tracked VR for worst pain, pain unpleasantness, or time spent thinking about pain.

## DISCUSSION

In the current study, we measured the brief acute pain of healthy volunteers during 10 s thermal pain stimuli to test whether increasing the immersiveness of the VR system increased how effectively VR reduces acute pain during brief thermal stimulations. As predicted, compared to passive VR, interactive eye tracked VR was significantly more effective

TABLE 1 | Results for the control group: one-way repeated measures ANOVAs for worst pain, pain unpleasantness, and time spent thinking about pain.


at reducing worst pain (sensory pain), more effective at reducing pain unpleasantness (the emotional component of pain), and interactive eye tracked VR was more fun than passive VR. Furthermore, as predicted, compared to the passive VR condition, participants rated their illusion of presence significantly higher during the interactive eye tracked VR condition, and virtual objects seemed significantly more real during interactive VR, compared to passive VR.

In addition, the current study also tested an important assumption of our thermal pain paradigm. Pilot data collected from an additional 24 participants in the same context, received No VR during baseline, No VR during Test 1, and No VR again during Test 2. As predicted, the 24 participants who received No VR reported no reduction in pain. In other words, the pain ratings from the thermal pain stimulations were stable over repeated pain stimulations for people who received one baseline pain and two test pain stimuli with No VR, using the same thermal pain paradigm as the main study.

## Limitations

The within-subjects study design used in the current study reduces noise variance and increases statistical power. However, one limitation of our current study is that researchers and subjects were not blinded to the treatment conditions (Campbell and Stanley, 1963; Schulz and Grimes, 2002). In the current study, VR was "visual only" with no sound effects. Having sounds muted may have exaggerated the benefit from the eye movement interactivity. One of the advantages of VR is the multimedia exposure. Sounds of rivers running and birds singing enhance the illusion of presence and heighten the participant's sensitivity to the 3D environment. On the other hand, the lack of sound effects in the current study may have underestimated the benefits of eye movement interactivity, because there would be more sound effects in the interactive VR condition, which could make interactive VR more distracting and make the advantage of interactive VR over passive VR even more pronounced. Another limitation is that this study did not investigate/measure participants' attention level (e.g., Heathcote et al., 2017). In the current study, eye tracking technology was simply used as a naturalistic human computer interface. Future, more advanced versions of our new VR analgesia system may use eye blinks and duration of fixation to help define interactions, and further increase interactivity. In the current study, "hands free" eye tracked interactive VR was compared to passive non-interactive VR. Whether eye tracking increases analgesia compared to other "hands free" interactive VR systems (e.g., voice controlled, etc.) remains a possible topic for future research. In the current study, each individual selected a temperature they found "painful but tolerable for 10 s." Clinical research is needed to determine whether the current results generalize to clinical pain settings (e.g., for 20 min wound cleaning procedures of severe burn patients that often involve severe to excruciating pain). But, encouragingly, our first pediatric burn patient pilot subject, using the same eye tracked VR analgesia system, reported unusually large reductions in pain during burn wound care in the intensive care unit, during eye-tracked VR vs. No VR.

Despite these limitations, the current study is the first PubMed indexed VR analgesia study to involve eye tracking technology embedded into the VR goggles. Although there is a large scientific literature on traditional eye tracking spanning more than 40 years of research (Kredel et al., 2017), there are very few studies combining eye tracking and immersive virtual reality, a very recent innovation/combination.

## Why Does VR Reduce Pain? Possible Mechanisms of VR Analgesia

Although there is growing evidence that VR can be effective for reducing acute pain during painful medical procedures

(Hoffman et al., 2019), the non-pharmacologic mechanism of how VR reduces pain is not fully understood and is an important research topic. According to Ballantyne (2018, p. s25) "The conscious perception of pain depends on the conversion of nociception to perception. . .". The authors of the current study speculate that VR interferes with the conversion of nociception to conscious pain perception by inserting a powerful perceptual illusion into the painful experience. Instead of directing most of their attention toward converting nociceptive signals into pain perception during No VR, we speculate that during virtual reality, the patient's brain is pre-occupied with converting neural signals from the visual, auditory and other sensory systems into a multisensory perceptual illusion of "presence" in virtual reality.

In the current study, we predicted that adding hands free interactive eye tracking would make VR that much more attention grabbing, and thus more effective at reducing pain, compared to passive VR. As predicted, participants in the current study also reported a significantly stronger illusion of presence, and VR was significantly more fun during the eye tracked VR. Although the current study did not directly measure attention, the pattern of results of the current study are consistent with an attentional mechanism of how VR reduces pain (see also Hoffman, 1998, 2004; Gold et al., 2006, 2007; Dahlquist et al., 2007; Birnie et al., 2017; Gold and Mahrer, 2017; Zeroth et al., 2019).

## Future Directions

Whether the current results generalize to clinical patient populations is an important topic for future research (e.g., whether eye tracking increases VR analgesia effectiveness for pediatric burn patients during burn wound care). In the current laboratory thermal pain study with healthy volunteers, eye movements were used to tell the computer what virtual objects the participant was looking at in virtual reality. In future studies on VR analgesia (e.g., virtual reality pain distraction), eye tracking technology can also be used to collect data about the patient's current mental state. For example, pupil size, and patterns of eye movements may correlate with how much pain patients are consciously experiencing. When a burn patient's pain becomes so extreme that the patient's attention shifts away from VR and onto their pain, we predict large reductions in successful eye fixations on target objects in SnowCanyon.

Immersive Virtual Reality with eye tracking has wide potential clinical uses beyond pediatric burn patients. For example, VR has recently been used with spinal injury patients (Flores et al., 2018). Most paralyzed patients are able to move their eyes, and can thus use eye movements to interact with objects in the virtual world. In the future, eye-tracked virtual reality and also augmented

## REFERENCES


reality glasses may allow people to quantify and improve the efficiency of information processing/learning, etc. And there is also growing interest in using eye tracking to help improve social skills (e.g., helping autistic patients learn to make more natural patterns of eye contact with other humans). Additional research and development is recommended.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

All subjects gave written informed consent in accordance with the Declaration of Helsinki. Both written and verbal informed consent were obtained using a protocol approved by the Effat University's Human Subjects Review Committee.

## AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

## FUNDING

This research was supported by the King Abdulaziz City for Science and Technology (KACST) grant no. 1425-37- طأ . This research was also supported by the Effat University Research and Consultancy Institute, in Jeddah, Saudi Arabia. HH's salary (via the University of Washington in the United States) was covered by generous support of the Mayday Fund, by NIH grant R01GM042725 to David R. Patterson, and by Shriners Hospital for Children grant (award ID #71011-GAL, PI WM).

## ACKNOWLEDGMENTS

This study was completed in partial fulfillment of the requirements for the degree of Ph.D. in Computer Science by NA-G. We would like to thank Annechien Helsdingen, Ph.D., research assistants Ahad Alhudali, Nada Aljohani, Maya Akbar, Maryam Daffa and the students at Effat University who volunteered to participate in this study.

Ballantyne, J. C. (2018). The brain on opioids. Pain 159(Suppl. 1), S24–S30. doi: 10.1097/j.pain.0000000000001270


morphine: an evidence-based report. J. Clin. Oncol. 19, 2542–2554. doi: 10.1200/jco.2001.19.9.2542



**Conflict of Interest:** After the current study was completed, HH joined the Scientific Advisory Board of BehaVR.com, but no products or funding from this source were involved in the current study.

All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Al-Ghamdi, Meyer, Atzori, Alhalabi, Seibel, Ullman and Hoffman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Adoption of New Treatment Modalities by Health Professionals and the Relative Weight of Empirical Evidence in Favor of Virtual Reality Exposure Versus Mindfulness in the Treatment of Anxiety Disorders

#### Edited by:

Pietro Cipresso, Italian Auxological Institute (IRCCS), Italy

#### Reviewed by:

Daniel David, Babe ¸s-Bolyai University, Romania Julia Elisabeth Diemer, kbo Inn-Salzach-Klinikum, Germany

> \*Correspondence: Stéphane Bouchard stephane.bouchard@uqo.ca

#### Specialty section:

This article was submitted to Health, a section of the journal Frontiers in Human Neuroscience

Received: 16 September 2019 Accepted: 27 February 2020 Published: 25 March 2020

#### Citation:

Nolet K, Corno G and Bouchard S (2020) The Adoption of New Treatment Modalities by Health Professionals and the Relative Weight of Empirical Evidence in Favor of Virtual Reality Exposure Versus Mindfulness in the Treatment of Anxiety Disorders. Front. Hum. Neurosci. 14:86. doi: 10.3389/fnhum.2020.00086 Kevin Nolet<sup>1</sup> , Giulia Corno1,2 and Stéphane Bouchard<sup>1</sup> \*

<sup>1</sup> Cyberpsychology Laboratory of UQO, Department of Psychoeducation and Psychology, Université du Québec en Outaouais, Gatineau, QC, Canada, <sup>2</sup> LabPsiTec, Departamento de Personalidad, Evaluación y Tratamientos Psicológicos, Universitat de València, Valencia, Spain

Anxiety disorders are among the most prevalent mental disorders, and cognitivebehavioral therapy (CBT) with exposure exercises is considered as the gold-standard psychological intervention. New psychotherapeutic modalities have emerged in the last decade and, among them, mindfulness has been rapidly adopted by therapists. The adoption rate is slower for the use of virtual reality (VR) to conduct exposure. The goal of the present position paper is to contrast, for the treatment of anxiety disorders, the weight of empirical evidences supporting the use of exposure in VR with the use of mindfulness-based therapy (MBT). Based on the most recent meta-analyses, we found that CBT with exposure conducted in VR was more thoroughly researched and supported than MBT, receiving validation from roughly twice as many studies with high control (i.e., randomized, active controls with clinical samples). However, this conclusion is nuanced by reviewing gaps in the literature for both therapies. Potential factors influencing clinicians' choice of treatment and suggestions for future research directions are proposed.

Keywords: anxiety disorders, exposure therapy, cognitive behavioral therapy, virtual reality, mindfulness

## INTRODUCTION

Anxiety disorders are highly prevalent (Bandelow and Michaelis, 2015). They involve dysfunctional information processing from the limbic system. As such, cognitive-behavioral therapy (CBT) is recognized as the treatment of choice (Katzman et al., 2014; Nathan and Gorman, 2015; David et al., 2018). CBT is based on the premise that hyperactivation of the amygdala is maintained

**69**

by the interplay between environmental, biological, cognitive, and behavioral factors and that psychological interventions lead to changes in information processing of threat-related cues from the limbic system through active cognitive and behavioral changes. Techniques based on exposure and its variations, such as behavioral experiments or response prevention, are considered as the key strategies leading to significant clinical improvement (Craske et al., 2014). However, although CBT with exposure has been considered as the gold-standard psychosocial intervention for treating anxiety disorders (Hofmann and Smits, 2008; Otte, 2011), it is not without limitations.

Exposure is usually combined with other cognitive behavior techniques, including anxiety management and cognitive restructuring of dysfunctional cognitions (Abramowitz et al., 2011). A successful exposure allows the client to learn to tolerate her/his fear and anxiety while developing a new behavioral repertoire rather than relying on threat avoidance, leading to new mental associations in the limbic system with lack of threat and with stronger perceived self-efficacy for managing emotions and previously avoided situations. While exposurebased treatments are listed as one of the evidenced-based treatments (EBT) for anxiety disorders by the Division 12 of the American Psychological Association (Chambless and Ollendick, 2001) and the National Institute for Health and Care Excellence (2014) guidelines, its dissemination is confronted with numerous barriers (Hembree and Cahill, 2007). For example, when surveyed about their practice with patients suffering from anxiety disorders, the practitioners report opting for alternative therapies with less empirical support than CBT with exposure. The negative beliefs about exposure, namely, in terms of its safety and tolerability for the patient, and the impracticability of its implementation have been found predictive of the lack of use among therapists (Pittig et al., 2019).

Several therapeutic alternatives to standard exposure have emerged, notably conducting exposure in virtual reality (VR; Wiederhold and Bouchard, 2014). CBT with exposure conducted in VR (CBT-VRexp) has been developed to counter some of the limitations of in vivo exposure. By offering a standardized, controlled, and replicable environment that can elicit emotions for therapeutic purposes, VR is a medium that could be more practical and effective for exposure therapy.

Although extensively studied in the scientific community, the use of CBT-VRexp is not widespread among clinicians who tend to favor interventions from other paradigms, such as mindfulness-based therapy (MBT).

Documenting which forms and which psychotherapy and their variations are adopted by therapists and at which rate is challenging. Not only those data are scarce in the literature, they can also vary between mental disorders and countries, inducing biases that further limit the comparisons. In addition to the difficulty in recruiting a representative sample of therapists using probabilistic techniques, generalizing the results on the therapists' adoption of various forms of psychotherapy could be problematic. Keeping the above limitations in mind, in a survey of German behavioral therapists working in the healthcare system, exposure was used in only 46.8% of treatments focusing on anxiety disorders (Pittig and Hoyer, 2018). Although not specifically focusing on anxiety disorders, Michalak et al. (2020) found, again in a German sample, that up to 82% of licensed therapists integrate MBT in their clinical practice, most of them (80%) using it at least occasionally (fewer than one out of two sessions). However, only 10% of those used a manualized group-based MBT. In their samples, the therapists preferred to integrate in their treatment plans stand-alone interventions such as body scan, breathing meditation, self-soothing with the five senses, or other informal practices. In a sample of practicing CBTs attending a European clinical conference, with the majority working with anxiety disorders, only 13.67% reported using CBT-VRexp occasionally or frequently with their patients (Lindner et al., 2019).

Informal observations rapidly show that the number of training, workshops, and classes on MBT clearly outweigh those on VR or CBT-VRexp. A search on Google<sup>1</sup> with the keywords "anxiety" and "VR workshops" yielded 230 results and 23,000 results with the keywords "anxiety" and "mindfulness workshops". The specific numbers vary when other keywords are used, but the ratios remain in the order of 1 to 100 in favor of MBT. Finally, although MBT has experienced a marked increase in scientific and popular interest in the past two decades, recent commentaries (e.g., Farias et al., 2016) have raised questions regarding the evidence base for this family of therapies.

The current paper was motivated by the apparent enthusiasm of mental health professionals to embrace some variations of CBT for anxiety disorders more than others. The aim of this review is to contrast the bulk of evidences supporting the efficacy of treatment of anxiety disorders using CBT-VRexp versus using MBT. The goal is not to compare the relative efficacy of both forms of CBT but specifically to compare the amount, or weight, of empirical evidences supporting each of them and contrast it with the therapists' enthusiasm to adopt each of them. The weight was defined here as the number, and relative efficacy, of randomized controlled trials (RCTs) conducted with clinical samples comparing a treatment with at least another treatment, ideally a treatment considered as an established standard.

## METHODS

The general methodology will follow three steps: (a) define CBT-VRexp and MBT, (b) provide a brief overview of their relevance for the treatment of anxiety and its disorders [i.e., as defined in DSM-5, with the addition of obsessive-compulsive disorder and post-traumatic stress disorder (PTSD)], and (c) review and contrast the relative weight of empirical support for both techniques based on already published meta-analyses.

## Defining CBT-VRexp and Mindful-Based Interventions

VR has been defined in different ways, but the practical definition from Schultheis and Rizzo (2001) will be used here: VR is an advanced form of human–computer interface that

<sup>1</sup>On July 5, 2019.

allows the user to "interact" with and become "immersed" in a computer-generated environment in a naturalistic fashion. Three main features differentiate VR systems from other technologies: immersion, the impression of really being in the environment, and interaction with that environment (e.g., Biocca, 1997; Lombard and Ditton, 1997; Slater, 2009; Fuchs, 2011; Wiederhold and Bouchard, 2014; Cipresso et al., 2018). Computer-generated virtual environments allow clinical assessment, treatment, and rehabilitation, providing interactive ecologically valid scenarios designed to target specific needs.

Cognitive-behavioral therapy-VRexp refers to the use of VR to conduct exposure (Bouchard and Rizzo, 2019). CBT rarely relies only on exposure, although it clearly is the main component of CBT for phobias. Additional therapeutic ingredients include working alliance, case conceptualization, psychoeducation, cognitive reframing, and relapse prevention. For more complex anxiety disorders, the treatment always includes the aforementioned ingredients, plus a stronger involvement of cognitive techniques, and may also involve other techniques, such as problem solving or assertiveness training.

In traditional CBT, many strategies are designed to change internal experiences, such as emotional states (e.g., reducing negative moods), bodily sensations (e.g., reducing pain), and the content of thoughts (e.g., from irrational and/or distorted to rational, realistic, and/or balanced) (Harrington and Pickles, 2009). To the contrary, mindfulness-based approaches teach an alternative way of relating to such experiences. Bishop et al. (2004) identified two basic components of mindfulness: one involves self-regulation of attention and another one involves an orientation toward the present moment in a way characterized by openness, curiosity, and acceptance (Hofmann et al., 2010). In other words, the essential premise underlying mindfulness practices is that experiencing the present moment in a non-judgmental and open way can effectively counter the effects of stressors, as excessive orientation toward the past or future when dealing with stressors can be related to depressive and anxious feelings (e.g., Kabat-Zinn, 2003; Hofmann et al., 2010).

Therefore, mindfulness practice encourages cultivating a new relationship with internal experiences that involves directing attention in a way that it is maintained on immediate experience, without avoiding, over engaging, or elaborating the experience (Kumar et al., 2008). More specifically, it is believed that, by a training focused on approaching stressful situations more reflectively rather than reflexively, mindfulnessbased interventions (MBI) can effectively counter the use of avoidance strategies, which attempt to alter the intensity or frequency of unwanted internal experiences (Hayes et al., 2006; Hofmann et al., 2010). These maladaptive strategies are believed to contribute to the maintenance of many, if not all, emotional disorders (Bishop et al., 2004; Hayes, 2004; Hofmann et al., 2010). An important and contrasting feature of MBT is how cognitions are handled. Instead of trying to restructure them, MBT focuses on accepting them and letting them go. However, when it comes to being in contact with feared stimuli, acceptance and orienting attention to fully experience the moment share many similarities with exposure in terms of opportunities to build new mental associations with lack of threat.

## Relevance of VR and MBT in Psychotherapy

VR is used in a wide range of fields, such as physical and neurological rehabilitation (e.g., Schultheis et al., 2002; Holden, 2005; Lam et al., 2006), neuropsychological evaluation (e.g., Rizzo and Buckwalter, 1997; Rizzo et al., 2000), education, and cognitive neuroscience (e.g., Tarr and Warren, 2002). VR started to be used in clinical psychology in the early 1990s. The most common application of VR in clinical psychology has been the treatment of phobias and anxiety-related disorders (i.e., anxiety disorders as defined in the DSM-IV). For example, in the early 1990s, Hodges et al. (1995) reported to have been using virtual environments to provide acrophobic patients with fearproducing experiences of heights in a safe situation. Since that time, VR has been proposed as a new medium for conducting exposure. The rationale behind its use is that the exposure can be conducted with more control from the therapist. CBT-VRExp offers several other advantages over in vivo or imaginal exposure (see Côté and Bouchard, 2008 for a detailed list), such as increased attractiveness for patients, more cost-effective, better protection of confidentiality and patient's safety, etc.

The research in this field shows that VR is able to reduce the anxiety symptoms significantly in different anxiety disorders: social anxiety (SA) disorder (Bouchard et al., 2017), generalized anxiety disorder (GAD) (e.g., Repetto et al., 2013), phobias (e.g., Garcia-Palacios et al., 2002; Parsons and Rizzo, 2008), PTSD (e.g., Gonçalves et al., 2012), panic disorder (PD) and agoraphobia (Botella et al., 2007), and psychological stress (Gaggioli et al., 2014). Studies show that the clinical outcome is superior to waitlist control conditions and comparable to in vivo exposure-based interventions. During the last decade, clinicians extended this field to more complex disorders, for instance, eating disorders and body image disturbance (e.g., Ferrer-Garcia et al., 2013; Corno et al., 2018), schizophrenia (e.g., da Costa and de Carvalho, 2004; Freeman, 2008; Kim et al., 2008), and building resilience and post-traumatic growth (e.g., Corno and Bouchard, 2015).

In terms of criticisms, although it is a promising therapeutic medium, adding VR to CBT may not always provide additional benefit to exposure-based therapy (e.g., McLay et al., 2017) and adds costs and complexity to an already effective treatment. The exact role of some psychological factors involved in exposure conducted in VR also needs to be clarified. While the sense of presence, the feeling of being inside the virtual environment, has been considered as relevant for treatment success, studies about its actual impact on treatment outcomes have produced mixed results (Botella et al., 2017).

Mindfulness-based therapy has been defined as comprising the third wave of CBT because of its differences with the first two waves, behavior therapy and cognitive therapy (Hayes, 2004; Baer and Sauer, 2009). Having its origins in Eastern Buddhist tradition that is over 2,500 years old, MBT includes mindfulness-based cognitive therapy

(MBCT; e.g., Segal et al., 2002) and mindfulness-based stress reduction (MBSR; e.g., Kabat-Zinn, 1982). MBT has become a very popular form of treatment in contemporary psychotherapy (e.g., Kabat-Zinn, 1994; Bishop, 2002; Baer, 2003; Hayes, 2004). For instance, both MBCT and MBSR have demonstrated significant clinical efficacy in the treatment of mood disorders (e.g., Segal et al., 2010), resistant depression (e.g., Kenny and Williams, 2007; Eisendrath et al., 2008), and anxiety disorders (e.g., Evans et al., 2008; Kim et al., 2009; Hofmann et al., 2010). Other clinical applications include pain management, substance use, attention disorders, PTSD, and eating disorders [see Wielgosz et al. (2019) for a review of these applications].

Despite the popularity of MBT, a limited number of clinical trials have specifically examined this treatment in anxiety disorders. More specifically, while the empirical support for the treatment of recurrent depression seems to be strong, the same cannot be as easily said for other clinical-like anxiety disorders. Questions about the methodological qualities of the literature have also been raised, ranging from a lack of active control groups (Farias et al., 2016) to problems in operationalization and measurements (Grossman, 2019).

## Contrasting the Relative Weight of Empirical Support

Previous researchers have worked in great lengths to find all available outcome studies on CBT-VRexp and MBT in order to publish meta-analyses, and contrasting the adoption of treatment modalities by therapists based on information already available to them leads to a fairer comparison. Therefore, our source of information to balance the weight of evidences providing empirical support for both techniques is based on the most recent and comprehensive meta-analyses for each treatment modality. We searched the Scopus database for metaanalyses on clinical trials for MBT, CBT-VRexp, and also CBT with in vivo exposure as a gold-standard comparison. We complemented this search with Google Scholar to ensure that all relevant papers were found. We used the following terms in the title, abstract, and keywords: "meta-analysis" and "anxiety", combined with "mindfulness", "virtual reality" or "VR" or "VRET", and "CBT" or "exposure therapy" or "cognitive behavioral therapy". We limited our search to papers written in English and published in peer-reviewed journals. To ensure the longest possible coverage, we aimed for the most recent meta-analysis, thus limiting our search to papers published between 2018 and July 15, 2019. The papers were reviewed by the first author following specific inclusion and exclusion criteria.

The following criteria were used to identify the metaanalyses that responded to our needs: (1) the longest coverage possible (i.e., the most recent papers going as far back possible in publication history), (2) information available on the randomization procedure used for the studies included, (3) effect sizes (ES) for each control type separately (i.e., inactive, active, and evidence-based), (4) ES calculated on an anxiety measure, and (5) preferably with patients diagnosed with an anxiety- or stress-related disorder or with a score above the cutoff on a clinical measure. When available, we also used results pertaining to attrition and deterioration rates within these meta-analyses.

To make sure that the meta-analyses included were representative of a larger part of the literature, we excluded papers if (1) they were limited to a specific population (e.g., youth or elderly), (2) they were limited to one particular therapy or modality (e.g., self-compassion for MBT and online interventions), and (3) they were limited to one diagnostic category (e.g., phobias for CBT-VRexp) or did not provide information about each included category individually; to make sure that the ES were observed specifically for anxiety, we excluded meta-analyses if (4) they aggregated heterogenous outcomes (e.g., psychological distress); and papers were also excluded if (5) the treatment modality was not objectively isolated (e.g., by adding a new modality to the basic treatment; see **Supplementary Material** for a complete list of the papers reviewed).

## RESULTS

For MBT, the search yielded 76 hits. An initial screening eliminated 39 papers because they were limited in their population scope (e.g., children, youth, and cancer survivors), 15 were not meta-analyses, seven were not about anxiety disorders, three were not about MBT, two were limited to online interventions, and four were limited to selfcompassion, self-help, or stand-alone interventions. Of the remaining six papers, two were further discarded because they agglomerated heterogenous outcomes (e.g., "internalizing symptoms", de Abreu Costa et al., 2019; "negative affectivity", Schumer et al., 2018), two did not report ES separately for each intervention and/or each control category (Bandelow et al., 2018; Hedman-Lagerlof et al., 2018), and one which had raised many methodological concerns<sup>2</sup> (Singh and Gorey, 2018). Thus, the meta-analysis from Goldberg et al. (2018) was retained for our study (see **Supplementary Material 1**).

For CBT-VRexp, the search yielded 12 hits. Three papers were removed because they were not meta-analyses, three were not about CBT-VRexp or specific to this intervention, one was not specific to anxiety disorders, and one was limited in population scope (children). Of the four remaining papers, the meta-analysis from Carl et al. (2019) was retained for our study, supplemented with the one by Benbow and Anderson (2019) for attrition data.

<sup>2</sup> Singh and Gorey (2018) paper is of special note. It directly addresses our subject matter, yet we have doubts on the robustness of their methodology. While they reported 9 RCT studies directly comparing MBI and CBT, they included 2 studies with the same sample, 2 studies on MBI with added exposure, 1 with CBT as a control group without clear definition (TAU with or without medication, not on the whole sample) and with depressive and/or anxious participants (again without stats on each diagnostic category included), 1 study with a 1-h intervention, and 1 with an error on the ES reported (advantage for MBI that was originally found for CBT). This leaves a total of 3 valid studies, which is more in line with Goldberg et al. (2018) and Hedman-Lagerlof et al. (2018). For these reasons, with opted to exclude this paper and report results from the former.

The remaining two publications were rejected because they were either limited to a specific diagnostic category (SA; Chesham et al., 2018) or focusing on deterioration data (Fernández-Álvarez et al., 2019) (see **Supplementary Material 2**).

For CBT, a total of 96 hits were obtained from our search. Of these, eight papers were rejected because they were not about CBT, 11 were not meta-analyses, 25 were not about anxiety disorders, 31 were limited in their population scope, two were not about treatment efficacy, five were limited to Internet or computer-based intervention, and six were about CBT-VRexp or MBT. Of the eight remaining papers, one was rejected because it was limited in scope (group therapy for PTSD; Schwartze et al., 2019), one was too restrictive on the measure of outcome to allow comparisons (remission rate; Springer et al., 2018), two studied the effect of added interventions to CBT (Bernard et al., 2018; Marker and Norton, 2018), and two were limited to primary care settings without information about specific anxiety disorders (Zhang et al., 2019a,b). Since Barry et al. (2018) did not provide information about RCTs, we favored Carpenter et al. (2018) for our study (see **Supplementary Material 3**).

A summary of the information extracted from published meta-analyses documenting the efficacy of CBT-VRExp and MBT is reported in **Table 1**, with the meta-analysis comparing CBT with in vivo exposure to other active control treatments reported as a reference for comparison. For manualized MBT, Goldberg et al. (2018) identified 22 RCTs with clinical samples of anxiety disorders and PTSD, totaling 26 ES. Of those, nine used an active control and only seven compared the efficacy over an EBT, such as CBT with in vivo exposure. In comparison, of the 30 RCT studies identified for CBT-VRexp (Carl et al., 2019) totaling 40 ES, twice as many (14) used CBT with in vivo exposure as a control intervention.

The nature of clinical samples also differs between the MBT and the CBT-VRexp studies retained in the various metaanalyses. For MBT, eight studies used SA samples, seven for PTSD, five for GAD, 2 for obsessive-compulsive disorder, and three mixed samples. For CBT-VRexp, the bulk of evidence pertains to specific phobias (SP) with 17 studies, followed by 13 for SA. Less frequent evidence was documented for PTSD and PD, with five studies each.

The number of clinical trials reporting follow-up data in the meta-analyses is much smaller and not very different between MBT and CBT-VRexp. For MBT, only two studies fitting the criteria of the meta-analyses documented the long-term efficacy compared to an EBT (CBT with in vivo exposure), compared to seven for CBT-VRexp. The average effect size of the comparisons with inactive or active control conditions was consistently lower in MBT compared to that in CBT-VRexp. The attrition rate reported in studies on MBT and CBT-VRexp was very similar. Odds of dropping out of MBT and CBT-VRexp were not different from other EBT.

Overall, the amount of information documenting the efficacy of using CBT-VRexp for anxiety disorders is about twice as much as for MBT. Note that our analysis is about the relative number of evidences; the comparative efficacy has not yet been empirically tested and comparing the pooled ES in **Table 1** may be hazardous. Nevertheless, these observations about available evidence from the published literature cannot justify the disproportionately larger acceptance and enthusiasm of MBT over CBT-VRexp.

## DISCUSSION

Although CBT with exposure exercises has been considered as the gold-standard treatment for anxiety disorders, researchers and clinicians in mental health have embraced and combined different approaches to overcome some limits of CBT and exposure. In this article, we focused on two forms of CBT, CBT-VRexp, and MBT. Specifically, this study was driven by the wish to document and reflect on the apparent widespread scientific and popular interest and preference in using MBT over the use of CBT-VRexp in the treatment of anxiety disorders. Therefore, the aim of this study was to contrast the bulk of evidences supporting the efficacy, specifically for the treatment of anxiety disorders, of using CBT-VRexp versus using MBT. Faced with the growing hype around mindfulness, among both the general population and the clinicians, our question was: is this hype empirically supported? Reviewing studies gathered in meta-analyses, we found twice as many studies supporting CBT-VRexp over MBI. When looking at comparisons with CBT plus in vivo exposure, twice as many studies were published in support of CBT-VRexp (14) compared to MBT (seven). Strength is in the numbers: with more studies, the pooled ES are more robust and less likely to be artificially inflated by publication bias. We can also note that these ES are higher in favor of CBT-VRexp compared to MBT. The available information in meta-analyses reported large pooled ES favoring CBT-VRexp over active control conditions at posttreatment and over inactive control conditions (e.g., waiting list or no treatment) at follow-up. However, the pooled ES favoring MBT over active controls were small at post-treatment and even smaller when compared with inactive controls at follow-up. Overall, while expected changes are still clinically significant for MBT, stronger effects with more empirical support are found for CBT-VRexp for treating anxiety disorders.

However, nuances exist when looking at specific diagnostic categories. SA is by far the most studied diagnosis, both for CBT-VRexp and MBT, with respectively six and four published RCT against EBT<sup>3</sup> in the meta-analyses that we used. For PD and SP, only papers for CBT-VRexp were found. For GAD and PTSD, the studies included in the meta-analyses were only for MBT. Thus, based on the evidences, the relevance of MBT or CBT-VRexp was also carefully considered given the target disorder.

In all cases, the number of studies with follow-up data is low, both in terms of numbers and duration. Not enough data are available to draw conclusions about specific disorders. Nonetheless, clinicians may choose CBT-VRexp over MBT with some confidence for its long-term effect as more studies in the meta-analyses found no difference in the long run with the participants treated with EBT.

Another important limitation in the literature that practitioners should keep in mind is that deterioration data

<sup>3</sup>Pooled effect size for MBT was recalculated from Goldberg et al. (2018) with SA studies only: MBT was found to be significantly less effective than EBT, −0.31 (−0.60, −0.03).

TABLE 1 | Documenting the relative weight of evidences, based on the number of clinical trials on MBT and CBT-VRexp and on in vivo exposure as a reference, and the pooled effect sizes.


Data are based on meta-analyses on anxiety-related disorders. CBT, cognitive-behavioral therapy; EBT evidence-based treatment; in vivo exp, real-life exposure; ES, effect sizes; MBT, mindfulness-based therapy; VRexp, virtual reality-based exposure. Goldberg et al. (2018) did not pool PTSD studies with those of anxiety disorders in their analysis, contrary to the other meta-analyses presented here. After contacting the main author, who accepted to share his database, we performed a re-analysis of the data (Mantel–Haenszel odds ratio) with the PTSD and the anxiety disorder studies pooled together. Sources of information: <sup>1</sup>Goldberg et al., 2018. <sup>2</sup>Carl et al., 2019 for all data except for attrition rates. <sup>3</sup>Benbow and Anderson, 2019 for attrition rates. <sup>4</sup>Carpenter et al., 2018 for a meta-analysis including only comparisons with active controls.

are rarely reported for both CBT-VRexp and MBT. This is not surprising as it is a relatively new line of inquiry for clinical efficacy studies. It is also challenging as it requires monitoring of individual data compared to group-level analysis. Yet more papers in the literature report such data for CBT-VRexp (around 40%; Fernández-Álvarez et al., 2019) compared to MBT (around 15%; Wong et al., 2018). Deterioration rates reported in the literature were lower for both patients receiving CBT-VRexp (4.0%) and other forms of treatment (2.8%) compared to the wait-listed control (15%; Fernández-Álvarez et al., 2019), while deterioration was reported for only 1% of the participants in both the MBT and the control groups [although only three studies included samples with an anxiety disorder or PTSD (Wong et al., 2018)]. While practicing meditation could be thought of as relatively harmless, this might not be the case in patients suffering from a diagnosed mental disorder. Also, mindfulness practice can be unpleasant and challenging without causing harm. As suggested by Baer et al. (2019), systematic research is needed to address this question, which would require monitoring individual data like what Fernández-Álvarez et al. (2019) did in their analysis. While more studies reported such data for CBT-VRexp, there is still room for improvement. Adverse effects can come in many forms that are not constantly measured (Fernández-Álvarez et al., 2019), such as cybersickness symptoms (i.e., feelings of nausea, dizziness, and discomfort) when using VR technology. While these symptoms do not deteriorate the condition of the patients, they might hinder their capacity to profit from the intervention.

The results of the present study lead to an interesting question: given the higher frequency of support found for CBT-VRexp over MBT for treating anxiety disorders, why has MBT attracted a much wider scientific and popular interest compared to CBT-VRexp? We have formulated some tentative answers.

One possible answer to our question could be the cost and apparent complexity of using VR technologies. While highend technologies are costly and thus more suitable for research purposes, head-mounted display systems are increasingly suitable for the general public. Indeed 3 years ago affordable headsets became available, and now the technology is becoming even more affordable and more immersive<sup>4</sup> . Yet in order to use VR, researchers and clinicians need virtual environments (software) and some hardware, which may be seen as cumbersome and represent additional costs. Thus, applying MBT, which "only" requires training from the health professional, could be seen as more affordable and more attractive than using technologies.

A likely answer could also be that, unfortunately, some professionals do not rely on empirical data to choose their therapeutic interventions but rather on their preferences, the appeal of the model, and the current trends in clinical orientations. Adopting an intuitive thinking style is predictive of more negative attitudes toward EBT requiring exposure and more positive ones toward the adoption of alternative therapeutic interventions (Gaudiano et al., 2011). MBT is part of the current zeitgeist (Michalak and Heidenreich, 2018): psychological stress and its reduction are major concerns in modern societies, and MBT offers a solution appealing for both scientific and spiritual reasons. At the same time, government agencies start to adopt MBT as a first-line treatment for anxiety and depression, such as the National Health Service in the United Kingdom (Mindfulness All Party Parliamentary Group [MAPPG], 2015). Thus, both psychological and sociological factors would influence clinicians in their adoption of MBT in their practice. CBT-VRexp might not have the same appeal.

There are still worries over the use of CBT-VRexp. One which is frequently cited is that VR technology could hinder the therapeutic relationship. This is not the case, as shown by Ngai et al. (2015). In addition, therapists from Lindner et al. (2019) rated "making exposure less stressful" as an advantage of using VR and "patients experiencing the VR environment as too real" as a disadvantage. Endorsements of these items by therapists might be indicative of a misunderstanding of exposure and a reluctance

<sup>4</sup>Retrieved from: https://www.wired.com/story/oculus-rift-s-vr-headset/ and http: //time.com/4169430/oculus-rift-price-release-date-2016/

to induce anxiety in one's patients, even for their own benefit. This could lead to the adoption of alternative clinical interventions, such as MBT, not because it would benefit the patient but because it is easier on the therapist. In fact, novice therapists tend to have comparable stress levels to their patients, both subjectively and physiologically, right before engaging in exposure compared to a control therapy session (Schumacher et al., 2014). Also, novice therapists experimentally led by researchers to hold negative beliefs about exposure did deliver the treatment sub-optimally by being overly cautious compared to those in the positive beliefs condition (Farrell et al., 2013). Many therapists deliver exposure at lower doses (i.e., for a shorter period of time and in less threatening situation) and in conjunction with controlled breathing strategies without clear empirical evidence of its added clinical efficacy (Deacon et al., 2013). In sum, CBT-VRexp may be less attractive essentially because it is a form of exposure, and therapists may not like to do exposure with their patients. Achieving suboptimal results with their patient could reinforce their negative beliefs about exposure and further justify the choice of alternative interventions. Finally, if therapists avoid using exposure because of fear of patients' discomfort, they may also favor MBT in ways that reduce its usefulness to develop new associations with lack of threat and that reinforce avoidance. When facing the avoided stimuli, MBT can be used constructively to foster exposure (e.g., "Let us be fully aware that there is a live spider crawling on the table, that it is disgusting and looking at you, and embrace the situation and how you can remain in it") or less constructively to foster avoidance (e.g., "Although there is a spider here, let us focus on your respiration and your body, let go of your worries about the spider, and pace your breathing to slowly calm down"). Doing mindful breathing exercises in conjunction with CBT-VRexp could diminish its effectiveness by reducing the anxiety of the patient, thus acting as a form of avoidance. Exposure requires the patient to experience the threatening stimuli to learn through experience that it is safe. To do so, the therapists must tolerate the idea of being "responsible" of "inflicting" anxiety to their patients, which can prompt them to seek out alternative interventions or to supplement them with tempering ones. There is a thin line that is very easy to cross in favor of helping patients develop avoidance behaviors that will either be detrimental in the long term or will need to always be used by patients as safety seeking behaviors or neutralization in order to cope.

## Future Directions

First, researchers should begin gathering data on the fit between patients and treatment modalities. While efficacy studies are important, their focus is at the group level. While we can show that two treatments are equally effective in the treatment of a disorder for two randomly selected individuals, nothing tells us that they will respond equally well to both treatments. For example, some people have a hard time feeling immersed in a VE (e.g., they have cybersickness symptoms), thus not responding to the stimuli used in the exposure therapy. Others do not adhere to the mindfulness philosophy and will not meditate or practice acceptance at home. To better inform professionals about choosing therapeutic approaches, future research should include measures of potential predictors of treatment efficacy. With such information, clinicians could take the best decision for their patient based not only on what science tells them that is effective but also on why it works that way for some people and not for others.

To date, research on CBT-VRexp has mostly been about replicating in virtuo what can normally be done in vivo. Doing so, researchers were able to demonstrate that VR is useful to do exposure with hard-to-access stimuli in a controlled and secure environment. Thus, their line of scientific inquiry was mostly focused on phobias. As a result, VR might have been less attractive to professionals. The field of VR is now addressing the more complex anxiety disorders to provide solutions for patients that may be more frequent in therapists' caseloads (e.g., OCD, PTSD, GAD). Still we feel that this is a missed opportunity. VR could be used to improve exposure therapy by pushing its limits way further than what can be done in vivo, thus allowing to build stronger associations with lack of threat than what can be done in vivo. For example, VR is not limited to visual and auditory stimuli. Studies have successfully integrated olfactory (Baus and Bouchard, 2014), haptic (Tremblay et al., 2016), and thermal (Shaw et al., 2019) stimuli to VE. This could potentially lead to a multisensory exposure therapy for PTSD, bringing recollection of the traumatic experience and reprocessing to a new level. VR can also be used to expose patients to situations hard or impossible to do in vivo. For acrophobia, a therapist using CBT-VRexp could ask his/her patient to dance on the edge of a virtual cliff, test his/her balance, and actually jump at will over the cliff to confront his/her fear of falling. For social phobia, it is possible in VR to ask people on dates or set up social blunders that would be delicate to do with actual people.

Stand-alone MBI, as opposed to full MBT treatment programs, are of special note here as they are the most commonly reported form of mindfulness intervention used by clinicians (Michalak et al., 2020). This is not surprising and probably not limited to MBT, with less than 2% of psychotherapists reporting adopting only one practice orientation (Cook et al., 2010). For our review on studies with anxiety disorders, our comparisons were with studies using some form of MBI. The meta-analysis from Blanck et al. (2018), which was not used in our study because it had less RCT than that of Goldberg et al. (2018), focused on studies where only MBI were used as a stand-alone treatment. Of the 21 studies identified by Blanck et al. (2018), only five were RCT. None compared the efficacy of MBI as a standalone treatment to an EBT and, most importantly, none used clinical samples. No data were available for long-term effect nor adverse effects. Moreover, no study tested the impact of integrating MBI to other validated intervention protocols. This lack of empirical support could be problematic if the intervention in the integrated stand-alone treatment does not fully address the therapeutic goals of a full MBT program and does not include adequate exposure strategies. The question remains: does the professional choose a treatment strategy to avoid discomfort in their patients? Given that the therapist's experiential avoidance is a significant negative predictor of choosing exposure as a treatment option (Scherr et al., 2015), the question deserves an empirical answer.

Among the limits of the current paper, the first that comes to mind is the reliance on meta-analyses. Meta-analyses are imprecise and limited by design. Variations in inclusion criteria, search terms, and search engines can yield important differences in the result. Given the publication process, papers published in 2018 or 2019 has a coverage running up to 2017, thus important articles could have been left out of this review. The goal of this review was not to be exhaustive but to contrast the state of the literature. It would be quite surprising if the gap in the number of RCT between MBT and CBT-VRexp had been filled in the last year. Conducting our own systematic search of the literature, including unpublished reports and theses, may have provided with more precise numbers, but the ratio of evidences would have remained in the same range.

Another problem of reviewing and comparing meta-analyses is that we had no control on how the results were reported, namely, which parameters were used. For example, we were unable to report on the methodological quality evaluation of the studies as different indexes were used across metaanalyses or simply not reported. Most biases toward CBT-VRexp and MBT are the same and those found in the literature on clinical efficacy: selective reporting, small sample sizes, no intent-to-threat analyses, no deterioration analyses, or adverse effects reporting.

## CONCLUSION

The objective of this paper was to contrast, specifically for the treatment of anxiety disorders, the weight of evidences supporting the use of exposure in VR versus the use of MBT. Overall, the results of the comparisons have shown that CBT with exposure conducted in VR was more thoroughly researched and

## REFERENCES


supported than MBT. Nevertheless, this conclusion is nuanced by reviewing several gaps in the literature for both therapies, and much more research is required to establish which therapies for the treatment of anxiety disorders are suitable, how they should be carried out, and for whom.

## AUTHOR CONTRIBUTIONS

KN, GC, and SB designed the study. KN and GC collected and analyzed the data and drafted the manuscript. SB facilitated the study execution and aided in the interpretation of findings. All the authors critically reviewed the draft and made significant contributions to the final version.

## FUNDING

This work was supported by the Canada Research Chairs (#950- 231039) awarded to SB and a FRQSC postdoctoral research grant (#255365) awarded to KN.

## ACKNOWLEDGMENTS

The authors wish to thank the reviewers for their thoughtful comments.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum. 2020.00086/full#supplementary-material





**Conflict of Interest:** SB is the President of, and own equity in, Cliniques et Développement In Virtuo, a spin-off company from the university that uses virtual reality and distributes virtual environments designed for the treatment of mental disorders. The terms of these arrangements have been reviewed and approved by the Université du Quebec en Outaouais in accordance with its conflict of interest policies.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Nolet, Corno and Bouchard. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Factors Associated With Virtual Reality Sickness in Head-Mounted Displays: A Systematic Review and Meta-Analysis

Dimitrios Saredakis <sup>1</sup> \* † , Ancret Szpak 1†, Brandon Birckhead<sup>2</sup> , Hannah A. D. Keage<sup>1</sup> , Albert Rizzo<sup>3</sup> and Tobias Loetscher <sup>1</sup>

<sup>1</sup> Cognitive Ageing and Impairment Neurosciences Laboratory, School of Psychology, Social Work and Social Policy, University of South Australia, Adelaide, SA, Australia, <sup>2</sup> Division of Health Services Research, Department of Medicine, Cedars-Sinai Health System, Los Angeles, CA, United States, <sup>3</sup> Institute for Creative Technologies, University of Southern California, Los Angeles, CA, United States

#### Edited by:

Pietro Cipresso, Italian Auxological Institute (IRCCS), Italy

#### Reviewed by:

Bernhard E. Riecke, Simon Fraser University, Canada Hai-Ning Liang, Xi'an Jiaotong-Liverpool University, China Frederic Merienne, ParisTech École Nationale Supérieure d'Arts et Métiers, France

> \*Correspondence: Dimitrios Saredakis dimitrios.saredakis@ mymail.unisa.edu.au

†These authors share first authorship

#### Specialty section:

This article was submitted to Health, a section of the journal Frontiers in Human Neuroscience

Received: 18 November 2019 Accepted: 02 March 2020 Published: 31 March 2020

#### Citation:

Saredakis D, Szpak A, Birckhead B, Keage HAD, Rizzo A and Loetscher T (2020) Factors Associated With Virtual Reality Sickness in Head-Mounted Displays: A Systematic Review and Meta-Analysis. Front. Hum. Neurosci. 14:96. doi: 10.3389/fnhum.2020.00096 The use of head-mounted displays (HMD) for virtual reality (VR) application-based purposes including therapy, rehabilitation, and training is increasing. Despite advancements in VR technologies, many users still experience sickness symptoms. VR sickness may be influenced by technological differences within HMDs such as resolution and refresh rate, however, VR content also plays a significant role. The primary objective of this systematic review and meta-analysis was to examine the literature on HMDs that report Simulator Sickness Questionnaire (SSQ) scores to determine the impact of content. User factors associated with VR sickness were also examined. A systematic search was conducted according to PRISMA guidelines. Fifty-five articles met inclusion criteria, representing 3,016 participants (mean age range 19.5–80; 41% female). Findings show gaming content recorded the highest total SSQ mean 34.26 (95%CI 29.57–38.95). VR sickness profiles were also influenced by visual stimulation, locomotion and exposure times. Older samples (mean age ≥35 years) scored significantly lower total SSQ means than younger samples, however, these findings are based on a small evidence base as a limited number of studies included older users. No sex differences were found. Across all types of content, the pooled total SSQ mean was relatively high 28.00 (95%CI 24.66–31.35) compared with recommended SSQ cut-off scores. These findings are of relevance for informing future research and the application of VR in different contexts.

Keywords: cybersickness, simulator sickness, head-mounted display, virtual reality, virtual environment

## INTRODUCTION

Despite advancements in virtual reality (VR) technology, many people still report experiencing simulator sickness symptoms from its use (Rebenitsch and Owen, 2016; Gavgani et al., 2017; Duzmanska et al., 2018; Guna et al., 2019). Characterizing and quantifying these symptoms is challenging, as several factors are at play including a diverse range of technologies; the use of inconsistent terminology for sickness from using virtual environments; little consensus on the biological mechanisms of symptoms; the diverse range of VR content; along with user characteristics such as age and sex (Hale and Stanney, 2014). Identifying factors that increase the occurrence of simulator sickness becomes necessary with the increased use of VR for rehabilitation, industry training and gaming/entertainment consumers (Gallagher and Ferrè, 2018; Powell et al., 2018; Wang et al., 2018).

Side effects from virtual environment usage has been referred to by many terms including simulator sickness (Kennedy et al., 1993), cybersickness (LaViola, 2000) and VR sickness (Kim et al., 2018). The term simulator sickness originated from the early use of flight simulators in the military (Kennedy et al., 1993), and is still currently used in research using modern HMD technology (Tyrrell et al., 2018; Ziegler et al., 2018). Cybersickness, originally used to describe side effects from use of virtual environments (McCauley and Sharkey, 1992), has often been mentioned in studies using a variety of technologies including flat screen displays and head-mounted displays (HMD) (Rebenitsch and Owen, 2016). The term VR sickness has typically been used in studies using HMDs (Cobb et al., 1999; Kim et al., 2018). Thus, diverse terminology is often used interchangeably across the virtual environments literature.

This current review focuses on adverse symptoms from HMD use, hence the term "VR Sickness" will be referred to as the symptoms (and their severity) typically reported in the literature from HMD use. The term "motion sickness" will be used to refer to more general reporting of symptoms from motion environments (e.g., air, land, or sea travel), not specific to HMDs, where symptoms can differ. For example, nausea can be more severe in seasickness, compared with simulator use (Kennedy et al., 2010). Symptomatology of sickness also differs between technologies. Compared with simulators, HMDs have been reported to produce higher symptoms related to nausea, dizziness and blurred vision (Kennedy et al., 2003).

Measures of VR sickness are a fundamental part of establishing prevalence and symptomatology in virtual environments. The Simulator Sickness Questionnaire (SSQ) (Kennedy et al., 1993), originally developed for measuring motion sickness in simulators, is the most commonly used measure of sickness in virtual environments (Rebenitsch and Owen, 2016). Alternate measures, such as the Virtual Reality Symptom Questionnaire, which was specifically developed for HMDs (Ames et al., 2005) or the Virtual Reality Sickness Questionnaire (Kim et al., 2018) have yet to be widely adopted. Single item assessments that are easy to administer and monitor symptoms during VR exposure (Bos et al., 2005; Keshavarz and Hecht, 2011) are commonly used, but do not provide comprehensive measurements of the symptoms of VR sickness. Very few studies report on the use of objective physiological measures (e.g., heart rate, skin conductance, electroencephalograms, eye blink rate, and electrogastrogram) that do not rely on individual self-report data (Kim et al., 2005; Dennison et al., 2016).

Recent advances in HMD technology (field of view, resolution, framerate, and ergonomic factors) have increased the levels of immersion and realism that may have an influence on the occurrence of VR sickness (Nichols, 1999; Lee et al., 2017; Kourtesis et al., 2019). For example, if an image is clear and tracking of movement is accurate, there may be fewer sensory conflicts, and that could lead to a reduction in VR sickness symptoms (White et al., 2015; Shin et al., 2016; Ray et al., 2018). However, an increase in the field of view may also increase risk of VR sickness (Fernandes and Feiner, 2016). Despite the improvements in HMD technology, a recent review suggests that the prevalence of VR sickness is still problematic (Rebenitsch and Owen, 2016). In addition to this, Kourtesis et al. (2019), in their review found that although recent hardware features have been an important factor in reducing VR sickness, software features also need to be taken into consideration.

The VR content delivered to users can induce or even reduce VR sickness. A rollercoaster ride may be more likely to induce VR sickness to the level of severity where users will request to discontinue the experience. For example, almost 67% of participants in a study using a rollercoaster virtual environment were unable to complete an exposure time of 14 min (Nesbitt et al., 2017). In contrast, content consisting of low amounts of motion may be less likely to induce VR sickness (Guna et al., 2019), as well as in cases where head movement in a fixed position is concordant with what the user would experience in the real world (Rizzo and Koenig, 2017).

Length of time exposed to a virtual environment may also influence likelihood and severity of VR sickness (Duzmanska et al., 2018). Significant correlations have been found between exposure time and VR sickness, with longer exposure times increasing risk of VR sickness (Stanney et al., 2003). For example, research measuring VR sickness at multiple time points found symptoms increased at 2-min increments, with the highest VR sickness scores measured in the final trial at 10 min (Moss and Muth, 2011). In contrast, a recent review has found that some people may build up a resistance or adapt over time to VR sickness, particularly over multiple sessions (Duzmanska et al., 2018). Although content and duration are significant contributing factors that may increase the likelihood of sickness symptoms, the user also needs to be taken into consideration.

User characteristics adds another layer of complexity in understanding the relationship between hardware, content and VR sickness. Research on sex and age, have generated mixed findings when it comes to the likelihood of sickness from VR (Cheung and Hofer, 2002; Benoit et al., 2015; Munafo et al., 2017; Arcioni et al., 2018). In reference to age, physiological differences over the lifespan (i.e., visual, vestibular senses) (Bermúdez Rey et al., 2016) may influence the occurrence of VR sickness and symptom profiles. For example, hormonal differences in females have been reported to influence and likely to be a factor in increased rates of VR sickness (Clemes and Howarth, 2005). Moreover, females can have a smaller interpupillary distance (Fulvio et al., 2019) and some HMDs may not be able to be adjusted accordingly therefore creating eye strain and general discomfort. Thus, it is important to increase the understanding of the relationship between these user characteristics and VR sickness.

Previous reviews (Rebenitsch and Owen, 2016; Duzmanska et al., 2018; Kourtesis et al., 2019) have focused on temporal or technological aspects of VR sickness. To date, none of the reviews on VR sickness have systematically evaluated VR content and user characteristics in a meta-analysis. The primary aim of this systematic review is to examine if VR sickness symptoms measured with the SSQ using HMDs are influenced by different factors. More specifically, factors that will be examined in this review are content, the amount of visual stimulation (motion of virtual environment), whether a person is stationary or moving in the virtual environment and time. As the SSQ consists of three grouped factors (nausea, oculomotor, and disorientation), a summary of the most common symptoms using HMDs will be provided. Studies with the intention of inducing or not inducing VR sickness will also be compared. A secondary aim is to examine the influence of user characteristics (i.e., age and sex) on SSQ scores and dropout rates.

## METHODS

## Search Strategy

In accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement (Liberati et al., 2009), a systematic literature search was conducted to reveal journal and conference papers related to VR sickness from using HMDs. This review included the following search terms: virtual reality OR virtual environment<sup>∗</sup> OR VR OR VR headset OR virtual reality headset OR head-mounted display OR HMD OR helmet mounted display AND cybersickness OR motion sickness OR simulator sickness OR visually induced motion sickness OR virtual reality induced motion sickness OR virtual reality induced symptoms and effects OR virtual reality sickness OR visualvestibular OR nausea OR aftereffect<sup>∗</sup> OR after effect<sup>∗</sup> OR VIMS. No limiters were inserted in the database searches.

This search was carried out on the 10th October 2018 in the six databases: Cochrane Library, IEEExplore, Medline, PsycINFO, Scopus, and Web of Science. Terms were mapped to subject headings. Both journal and conference articles were included in this review if: participants used a head-mounted display (HMD); VR sickness was measured using the SSQ; articles were peerreviewed and complete (i.e., includes a full paper, not just an abstract or poster presentation); the text was in English or had been translated for publication. Papers were excluded if they: used augmented reality (AR) or see-through displays; were reviews, dissertations, abstracts or poster presentations; used prototype HMD devices; and were case studies. Papers that included clinical samples were also excluded, however, if the study included a healthy control group, this data was included. Eligibility of studies was assessed by two independent reviewers (DS and AS).

Papers were included if they supplied mean data for the SSQ (either subscales or total scores), if no mean data was supplied they were still included in the dropout analysis if they indicated drop out rates. If papers supplied mean scores without standard deviations, authors were contacted to supply the standard deviations. Current contact details were searched for online in each case. A follow-up email was sent to authors that did not respond to the initial email. If the authors did not respond to the second email the paper was excluded. The calculation of subscale and total SSQ scores required weighting. Subscales are weighted as follows; nausea 9.54; oculomotor 7.58; and disorientation 13.92, while total scores can be calculated by multiplying unweighted subscale scores by 3.74 (Kennedy et al., 1993). This can create some confusion at times, and there were instances where researchers calculated the scores differently. For example, multiplying the weighted subscale scores by 3.74 thereby producing inflated total scores. There were also instances where the total SSQ scores did not match the subscale scores, the same contact procedure was followed for these papers as per the missing standard deviations.

**Figure 1** shows the results of the electronic search and article selection as per PRISMA guidelines (Liberati et al., 2009).

## Statistical Approach

Comprehensive Meta-Analysis (CMA) Version 3 (Borenstein et al., 2013) was used to conduct meta-analyses. A random effects model was used to calculate pooled effect estimates with 95% confidence intervals. In studies reporting multiple experiments within groups, these means were merged in CMA to produce one mean per study. In studies reporting multiple experiments between groups, these means were calculated separately for each experiment. Pooled means were calculated for all factors separately on each subscale of the SSQ. Pooled means were also calculated for all factors separately for the total SSQ score. Differences between sub-factors within each factor were assessed using the Q-test based on analysis of variance (Borenstein et al., 2011). The Q-value for the between group analyses corresponded to the weighted sum of squared deviations of the subgroup means about the grand mean. P-values were obtained by comparing the Q-values with a chi-squared distribution with degrees of freedom equal to the number of subgroups minus one (Borenstein et al., 2013). A p-value lower than 0.05 was assumed to indicate a significant statistical difference of SSQ scores between the subfactors. A correlation was performed between the percentage of females in studies and total SSQ scores as breakdowns for sex of means for the SSQ scores were not supplied in most studies.

## Operationalisation of Factors Being Examined

All factors were operationalised and independently reviewed by DS and AS. Any disagreements were resolved by discussion.

## Content

Four types of content were categorized in studies included for analysis; 360 videos; gaming content; minimalist content; and scenic content. User interaction and environmental features differed for each category. The 360 videos included content captured with a 360 camera or video taken that allowed a 360 view of the virtual environment. Gaming included high detailed content where the user could actively interact and perform tasks in the virtual environment including off-theshelf games and content developed by researchers. Minimalist content consisted of basic shapes or minimal textures, with typically simple interactions. Scenic content included detailed environments, for example, a landscape or cityscape with no or simple interaction by the user. See **Figure 2** for a summary of content characteristics.

## Visual Stimulation

All studies were categorized based on the amount of visual movement within the content regardless of user-directed

movement, such as locomotion and head movement. Low visual stimulation included content with slow visual changes, while high visual stimulation included content with fast visual changes.

## Locomotion

Locomotion refers to how a user navigates in the virtual environment. For the analysis in this review, locomotion was classified as either stationary, controller-based movement, or physically walking. With stationary content, the user does not move in the virtual environment. Two moving categories were included; controller and walking. Controller-based movement included the following navigation methods; flying; controller-based walking; teleporting and driving, therefore any movement for navigation by the user. Walking included the following physical movements; walking; walking in place and walking on a treadmill. The two categories of moving were used as physically walking has been found to reduce the incidence of VR sickness compared to controller-based navigation (Chance et al., 1998).

## Time

Sickness in virtual environments has been found to increase after 10 min in HMDs and simulator studies (Min et al., 2004; Moss and Muth, 2011). Thus, time was categorized into three intervals of 10 min: <10 min, ≥10 min, or ≥20 min.

## VR Sickness Condition

Studies that explicitly set out to increase/decrease the occurrence of VR sickness or measured VR sickness as a secondary aim, were categorized into two conditions: induce, and not induce.

## User Characteristics

The user characteristic of age was categorized into a mean age of <35 years old and ≥35 years old. This cut-off was used to correspond with theories of both sensory conflict and postural instability. For example, vestibular function involved in the sensory conflict theory starts to decline around the age of 40 (Bermúdez Rey et al., 2016). With relevance to the postural instability theory, changes in altered postural balance have been reported to commence at the ages of 30–39 (Era et al., 2006).

Mean breakdowns by sex were not supplied in most SSQ studies. Therefore, a correlational analysis was performed looking at the proportion of sex (females) in studies with total SSQ mean scores. This approach aimed to give an approximation due to the lack of available data, a positive correlation in this analysis will indicate higher susceptibility of VR sickness in females.

## Dropouts

Dropouts in this review refer to participants that exited an experiment due to VR sickness.

## RESULTS

A total of 2,654 publications were identified through the search. A snowballing strategy was used to identify an additional 15 articles for inclusion. These publications were imported into EndNote where 1,045 duplicates were removed. The remaining 1,609 articles were sent to Covidence systematic review management software (Covidence, 2019) for title and abstract screening, which identified 292 articles for full-text screening. A further 237 articles were excluded as outlined in **Figure 1**. Authors were contacted for 15 papers as per the procedure described in the methods section if mean scores were supplied without standard deviations (10), or if scores did not appear to be weighted correctly (5). A total of 54% of authors replied with 20% supplying raw data to enable calculation of SSQ scores. Hence, 55 publications were identified through the systematic review process and listed in **Table 1**.

## Dropouts

The mean dropout rate reported across 46 experiments due to VR sickness was 15.6%. If studies did not report dropouts, they were not included in this analysis as it was unknown whether there were no instances of dropouts or whether they were just not reported.

#### TABLE 1 | Summary of included articles.





Factors Associated

With VR Sickness

#### TABLE 1 | Continued


F, Female; M, Male; HMD, Head-mounted display; DoF, Depth of field; VR, Virtual reality. Factors Associated

With VR Sickness

#### Saredakis et al. Factors Associated With VR Sickness



SSQ, Simulator Sickness Questionnaire; VR, Virtual reality.

### Description of Studies

Out of the 55 papers included in this review, 20 papers reported both subscale scores and total SSQ scores, 7 papers reported subscale SSQ scores only, and 16 papers reported total SSQ scores only. Twenty papers that reported SSQ scores also reported dropout rates. A further 12 papers that used the SSQ but only reported dropout rates were also included. The total number of experiments from these papers included 54 that reported the total SSQ scores and 38 that reported the subscale SSQ scores (these numbers include between group studies from the same paper). The number of participants included in all experiments represented 3,016 participants. Heterogeneity was consistently high for all analyses (I <sup>2</sup> > 90).

Studies came from: Australia (n = 3), Canada (n = 1), Columbia (n = 1), Cyprus (n = 1), Finland (n = 1), Germany (n = 11), Greece (n = 1), Japan (n = 1), Korea (n = 4), Netherlands (n = 3), New Zealand (n = 1), Portugal (n = 1), Slovenia (n = 1), Spain (n = 1), United Kingdom (n = 2), United States of America (n = 22).

The pooled mean age of participants was 24 years (of 45 studies that included mean age), with the youngest sample having a mean age of 19.5 years and the oldest having a mean age of 80 years. Fifty-one studies included both female and male participants, 4 studies did not report sex distributions, and 41% of participants were female. Bivariate correlations between the SSQ and percentage of females in studies were not significant (r = −0.172, p =0.170).

See **Table 2** for a summary of results showing factors associated with both total and subscale SSQ scores.

## DISCUSSION

The aim of the review was to synthesize the literature on VR sickness symptoms using HMDs measured using the SSQ. The primary aim was to examine if VR sickness symptoms are influenced by content (four categories), the amount of visual stimulation, how a person moves in the virtual environment and exposure times. With a secondary aim of examining the influence of user characteristics (i.e., age and sex).

TABLE 3 | Simulator sickness questionnaire total scores.


VS, Visual stimulation; VR, Virtual reality.

## SSQ Scores Interpretation

In this review, total SSQ mean scores ranged from 14.30 to 35.27. Pooled total SSQ scores were relatively high across all studies and content type (M = 28.00) with high levels of heterogeneity. Historically the SSQ was intended for military personnel using simulators, however, the different applications and interpretation of the scores have changed with increased use of VR and advancements in technology. When interpreting



VS, Visual stimulation; VR, Virtual reality.

total SSQ scores, according to Kennedy et al. (2003); scores between 10 and 15 indicate significant symptoms; between 15 and 20 are a concern; and scores over 20 indicate a problem simulator. These cut-off scores were established from military personnel using flight simulators, these scores may differ in the general population, additionally, SSQ scores do tend to be higher in other virtual environments compared to flight simulators (Stanney and Kennedy, 1997; Kennedy et al., 2003). According to the Kennedy et al. (2003) categories, even the lowest total SSQ mean score of 14.30 found in studies including older adults in this current review would be regarded as significant symptoms. All remaining classifications displayed higher means with the highest total SSQ score displayed in studies that set out to induce motion sickness.

## VR Sickness Symptom Profiles

Across all studies, this review found the highest pooled SSQ subscale scores for disorientation (M = 23.50), followed by oculomotor (M = 17.09) and nausea (M = 16.72). This subscale distribution demonstrates the difference with the symptom profile of motion sickness where nausea typically has the highest rating, followed by oculomotor and disorientation (Rebenitsch and Owen, 2016). These findings increase awareness of symptoms that may be more likely to develop when using HMDs (e.g., dizziness, blurred vision and difficulty focusing). However, the weighting of these subscales makes it unclear as to what degree these symptoms differ.

## VR Content

The content characteristics in **Figure 2** highlight the distinguishing features of the four content types that may account for the distribution of SSQ scores in this review. SSQ scores were significantly influenced by content type with gaming content displaying the highest total SSQ mean (M = 34.26). This effect was also seen for subscale SSQ scores with all measured subscale symptoms of nausea, oculomotor and disorientation highest for gaming content compared to other types of content (see **Table 4**). Consistent with these results, previous studies using gaming content reported the highest dropout rates, ranging from 44 to 100% (Merhi et al., 2007; Dennison et al., 2016; Munafo et al., 2017). The second highest total SSQ means were found in studies using 360 videos. This was followed by minimalist content, with scenic content producing the lowest total SSQ mean. The total SSQ means did not always correspond with dropout rates, for example higher dropout rates were found in scenic content than 360 videos. This discrepancy highlights the variability in how users tolerate HMD use that could be due to other factors. Exposure time, user characteristics or the amount of visual stimulation are all other factors that may have contributed to the high heterogeneity found in this review. Thus, a limitation of this

current meta-analysis and meta-analyses in general is that methodological differences between studies are collapsed when pooling results.

## Influence of Visual Stimulation on Sickness

Content varies not only by type but also by the amount of visual stimulation offered. For example, all four types of content examined in this review may provide varying degrees of visual movement to the user. Oculomotor subscale SSQ mean scores were significantly higher for high visual stimulation compared with low visual stimulation. Some of the symptoms in the oculomotor subscale relate to eyestrain, difficulty focusing, difficulty concentrating and blurred vision. Despite recent improvements in display technology, stereoscopic HMDs may produce more side effects due to the vergence-accommodation conflict. Vergence refers to the way the eyes move laterally to adjust to items moving toward and away from the eyes combined with the process of focusing (accommodation). These visual processes do not occur naturally in a stereoscopic display as accommodation occurs at a fixed screen depth (Terzic and ´ Hansard, 2016). This conflict may be a reason for the higher SSQ means for high visual stimulation in the oculomotor SSQ subscale. When there is a high level of visual stimulation there are more changes in the stimulus distance compared to content with low visual stimulation. The level of visual stimulation is meaningful, as research examining rapid vs. slow changes in stimulus distance found rapid changes to increase visual discomfort (Kim et al., 2014). In a virtual environment, a conflict may be created due to the differences in what a person sees and what their body experiences. With the emergence of new VR technologies, high-quality stereoscopic HMDs are now capable of simulating the visual and spatial properties of the real-world. Despite improvements, current technology still falls short of replicating how humans see and perceive depth under natural viewing conditions (Howarth and Costello, 1997). There are software solutions that can help to reduce discomfort by introducing blurring during motion (Budhiraja et al., 2017), however, this technique may not be effective for everyone. The shortcomings of current HMDs can produce unnatural visual conflicts, which have been shown to play a role in VR sickness (Carnegie and Rhee, 2015), especially when they are combined with visually stimulating VR environments (Kim et al., 2014).

## Locomotion Type in Virtual Environment

SSQ scores were significantly influenced by locomotion type with controller-based movement displaying the highest total SSQ mean (M = 32.55). Both nausea and oculomotor subscale SSQ scores means were also significantly influenced by locomotion type with higher scores when stationary as opposed to both controller-based moving and walking (see **Table 4**), high heterogeneity between studies has contributed to these differences. There are several other factors that can account for differences between total and subscale SSQ scores for locomotion between controller-based and stationary content. This includes differences in the number of studies, with seven stationary and five walking studies that reported subscale SSQ data, compared with 12 studies that reported total SSQ data for these locomotion categories. Additionally, relatively high total SSQ scores were reported for controller-based studies (Merhi et al., 2007; Budhiraja et al., 2017; Ragan et al., 2017) that did not report any subscale scores. Finally, these differences between SSQ totals and subscales may result from certain methods of locomotion having a greater impact on specific symptoms in the subscale SSQ scores depending on locomotion type that would not be reflected in the total SSQ scores. For example, being stationary in the real world may induce a greater conflict in a virtual environment where there is movement and hence may increase nausea symptoms. This is consistent with research that indicates a reduction in symptoms when user-initiated movement is matched to the environment (Lee et al., 2017; Misha et al., 2018), these findings also support the sensory conflict theory relating to a visual-vestibular conflict (Reason and Brand, 1975). Thus, the visual-vestibular conflict may be exacerbated by the type of content (moving vs. static) being viewed combined with the locomotion method. A reduction in visual-vestibular conflict may be the reason that the lowest total and subscale SSQ scores for locomotion were consistently reported in studies that included physically walking content. More research is needed to increase the understanding of how the type of locomotion can influence specific symptoms of VR sickness.

## VR Exposure Time on VR Sickness

Both nausea and disorientation SSQ subscale scores in studies for exposure times of <10 min were lower than those that were equal to or >10 min. Interestingly scores were lower for studies that were equal to or >20 min than those equal to or >10 min (see **Figure 3**). This contradicts a recent summary in a review suggesting that longer exposure times are more likely to increase VR sickness (Duzmanska et al., 2018). Content may have been a factor contributing to this pattern of results within each of the time categories. In examining the distribution of content among the time breakdowns ≥10 min studies did have the highest percentage of gaming content (62%), compared to studies with the shortest (<10) and longest exposure times (≥20). In addition to this 50% of studies with the longest exposure times (≥20) consisted of minimalist or scenic content. More research is needed to determine the relationship between content and exposure time. Within-subject designs with different exposure times and controlled content may assist with answering questions around safe exposure times as this information is important when planning clinical trials to avoid VR sickness and dropouts and establish safe use procedures.

## Age and VR Sickness

Four studies included older samples (studies with a mean age range ≥35 years; n = 64) that reported total SSQ scores. Not only did these studies report lower total SSQ scores for older samples (M = 14.30) compared to younger samples (M = 28.44), these studies reported the lowest SSQ scores when compared with all other examined factors (see **Table 3**). Two of the four studies with older samples also included subscale SSQ scores with 37 participants in total. The disorientation subscale recorded significantly lower SSQ scores for the older samples compared with the younger samples. While scores for nausea and oculomotor subscales were higher for the older adult samples compared with younger samples, they were not statistically significant. Previous research has found inconsistent findings when looking at older samples (Kennedy et al., 2010; Benoit et al., 2015). Even though age has been reported as a user characteristic likely to predict motion sickness (Golding, 2006), the results from this review support previous research that there may be a decline in susceptibility to VR sickness as a person ages (Paillard et al., 2013). However, as there are a limited number of studies including older samples, these results should be interpreted with caution. Additionally, three of the studies used scenic content and one study used gaming content. What also needs to be considered is that the VR content for the studies including older adults may

be assessing specific symptoms, and the virtual environments may be designed to reduce the likelihood of side effects. For example, two of the studies (Parijat and Lockhart, 2011; Kim et al., 2017) involved walking on a treadmill to assess gait or balance and consisted of content with the lowest total SSQ mean scores in this review (scenic content). It is also possible that older adults may experience symptoms that differ to younger adults as indicated with lower disorientation subscale SSQ scores found in the older samples (symptoms related to dizziness, vertigo, blurred vision, nausea and difficulty focusing). With many companies offering VR services to aged care facilities (Aged Care Virtual Reality, 2018; Reminiscience, 2018; Rendever, 2018), the use by older adults will continue to increase. Moreover, VR delivered in HMDs is being widely used for rehabilitation, assessment and even prediction of cognitive impairments in older adults (Optale et al., 2010; Corriveau Lecavalier et al., 2018; Howett et al., 2019). Therefore, more research is needed to evaluate safety aspects of using HMD-delivered VR with older adults having cognitive decline or other age-related health conditions.

## Sex and VR Sickness

An analysis of sex differences was performed with a correlation between the percentage of females in studies and total SSQ scores. Sex breakdown was not supplied in studies when reporting total SSQ scores, therefore, this was the only way that sex could be analyzed and therefore a limitation of this analysis. The results indicated no difference. This is not consistent with research indicating that females are at higher risk of VR sickness (Lawson et al., 2004). Finding evidence in studies that females are more susceptible than males to VR sickness depends on what study is examined with many confounding variables not taken into account (Lawson, 2015). The importance of this topic suggests that more research is needed to better understand the incidence of VR sickness based on sex differences. Age and sex have been stated as being the most common user characteristics likely to predict motion sickness (Golding, 2006) highlighting a need for further research. Other user characteristics including ethnicity; motion sickness susceptibility; fitness; and prior experience of VR may provide a deeper insight into symptomatology of user characteristics and assist to develop a more targeted approach to dealing with VR sickness.

## Strengths and Limitations

This is the first study to pool estimates of VR sickness symptoms measured with the SSQ using HMDs with a pooled sample size of 3,016, however, the study is not without limitations. Although the most commonly used measure of VR sickness was used (SSQ), there were also many studies excluded (112) that did not use the SSQ. As the SSQ is self-report participants may under or over-report symptoms. Physiological measures can assist with overcoming this limitation however, a consensus is yet to be reached on the best physiological response for assessing VR sickness (Duzmanska et al., 2018). The scoring system for the SSQ can create some confusion and this was seen in this review with some authors incorrectly calculating total scores. Another limitation of the SSQ is the relevance of symptoms for HMD use. For example, the Virtual Reality Symptom Questionnaire (Ames et al., 2005), increased the focus on oculomotor symptoms, while Kim et al. (2018) removed the symptom of nausea in the Virtual Reality Sickness Questionnaire, due to not contributing to motion sickness compared with other symptoms, both of these studies were HMD specific. For a more detailed discussion of alternative measures see (Hale and Stanney, 2014).

Additionally, all analyses had high heterogeneity demonstrating large variation across the included studies. As well as individual differences of age and sex, susceptibility to VR sickness can also vary between individuals and therefore influence results. Gaming or VR experience is another individual difference that can influence the likelihood of side effects and needs to be both reported and taken into account during analysis of results. The small number of studies including older adults and lack of reporting of sex differences and dropouts are also limitations and areas requiring further research or improved reporting in future VR studies including HMDs. As 22 studies did not report dropout rates, the rate of 15.6% may be inflated if many of these studies did not have dropouts, however, we cannot assume there were no dropouts if they were not reported. This highlights the need to make reporting of dropout rates a standard in VR research.

Finally, another limitation involves the varied nature of the HMDs used across these studies. HMDs can differ in terms of field of view, use of stereo, resolution, framerate, availability of inter-pupillary distance controls/adjustment, and other technical display factors. Modern HMDs from the last 5 years differ fundamentally from the more limited display technology that was available before these recent advances (Kourtesis et al., 2019), and since 35% of papers included in this analysis used these older HMDs, it is difficult to predict how those findings would predict the occurrence of symptoms with use of currently available HMDs. Moving forward, there is an obvious need for more controlled laboratory research with standard reference VR environments that are adjustable in terms of content, movement, user interaction, etc. With such specifically created environments, one would be able to test out the incidence of side effects across different display types with varied hardware capabilities. This will be essential for promoting parametric research that creates a database of known properties for different types of virtual environments delivered across varied hardware types and would serve to produce the baseline normative data needed to enable better research in how to mitigate or eliminate the incidence of these use-limiting side effects.

## Conclusion

Previous research has focused on the influence of technological aspects on VR sickness. This review advances this knowledge by examining content as a major contributing factor to VR sickness, which will remain a problem despite future technological advances. Our findings show that content significantly influences VR sickness symptoms. Recent HMD technology can provide a better experience (Kourtesis et al., 2019) and if this is combined with careful selection of content the risk of VR sickness can be reduced and those symptoms that do occur can be easily managed. In this review, we compared our total SSQ scores with the cut-off scores suggested by Kennedy et al. (2003), what

these scores mean in relation to HMDs and how these scores relate to the general population remains unclear. Nevertheless, comparing total scores between studies shows that content is a major contributing factor. This review also highlights the need for a further understanding of the influence of user characteristics such as age and sex as there is a lack of studies including older samples, and sex differences that are often not reported. Increasing our understanding of VR sickness could be particularly valuable to researchers and practitioners, as there may be ethical and liability implications in research, training and clinical applications.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## REFERENCES


## AUTHOR CONTRIBUTIONS

DS, AS, and TL conception of the work and analyzed and interpreted the results. DS and AS article selection and screening. DS wrote the manuscript. All authors revised the work critically for important intellectual content and have read and approved the manuscript. AS and DS created **Figure 2**, certain 3D models for this figure sourced from cadnav.com and modified.

## FUNDING

DS was supported by the Australian Government Research Training Program Scholarship. TL and HK were funded by National Health and Medical Research Council (NHMRC) Dementia Research Leadership Fellowships (GNT1136269 & GNT1135676 respectively).


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Saredakis, Szpak, Birckhead, Keage, Rizzo and Loetscher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Application of Supervised Machine Learning for Behavioral Biomarkers of Autism Spectrum Disorder Based on Electrodermal Activity and Virtual Reality

Mariano Alcañiz Raya<sup>1</sup> \*, Irene Alice Chicchi Giglioli<sup>1</sup> , Javier Marín-Morales<sup>1</sup> , Juan L. Higuera-Trujillo<sup>1</sup> , Elena Olmos<sup>1</sup> , Maria E. Minissi<sup>1</sup> , Gonzalo Teruel Garcia<sup>1</sup> , Marian Sirera<sup>2</sup> and Luis Abad<sup>2</sup>

1 Instituto de Investigación e Innovación en Bioingeniería, Universitat Politécnica de Valencia, Valencia, Spain, <sup>2</sup> Red Cenit, Centros de Desarrollo Cognitivo, Valencia, Spain

## Edited by:

Valerio Rizzo, University of Palermo, Italy

#### Reviewed by: Dayi Bian,

Vanderbilt University, United States Máté Aller, University of Cambridge, United Kingdom

> \*Correspondence: Mariano Alcañiz Raya malcaniz@i3b.upv.es

#### Specialty section:

This article was submitted to Sensory Neuroscience, a section of the journal Frontiers in Human Neuroscience

Received: 15 November 2019 Accepted: 27 February 2020 Published: 03 April 2020

#### Citation:

Alcañiz Raya M, Chicchi Giglioli IA, Marín-Morales J, Higuera-Trujillo JL, Olmos E, Minissi ME, Teruel Garcia G, Sirera M and Abad L (2020) Application of Supervised Machine Learning for Behavioral Biomarkers of Autism Spectrum Disorder Based on Electrodermal Activity and Virtual Reality. Front. Hum. Neurosci. 14:90. doi: 10.3389/fnhum.2020.00090 Objective: Sensory processing is the ability to capture, elaborate, and integrate information through the five senses and is impaired in over 90% of children with autism spectrum disorder (ASD). The ASD population shows hyper–hypo sensitiveness to sensory stimuli that can generate alteration in information processing, affecting cognitive and social responses to daily life situations. Structured and semi-structured interviews are generally used for ASD assessment, and the evaluation relies on the examiner's subjectivity and expertise, which can lead to misleading outcomes. Recently, there has been a growing need for more objective, reliable, and valid diagnostic measures, such as biomarkers, to distinguish typical from atypical functioning and to reliably track the progression of the illness, helping to diagnose ASD. Implicit measures and ecological valid settings have been showing high accuracy on predicting outcomes and correctly classifying populations in categories.

Methods: Two experiments investigated whether sensory processing can discriminate between ASD and typical development (TD) populations using electrodermal activity (EDA) in two multimodal virtual environments (VE): forest VE and city VE. In the first experiment, 24 children with ASD diagnosis and 30 TDs participated in both virtual experiences, and changes in EDA have been recorded before and during the presentation of visual, auditive, and olfactive stimuli. In the second experiment, 40 children have been added to test the model of experiment 1.

Results: The first exploratory results on EDA comparison models showed that the integration of visual, auditive, and olfactive stimuli in the forest environment provided higher accuracy (90.3%) on sensory dysfunction discrimination than specific stimuli. In the second experiment, 92 subjects experienced the forest VE, and results on 72 subjects showed that stimuli integration achieved an accuracy of 83.33%. The final confirmatory test set (n = 20) achieved 85% accuracy, simulating a real application

of the models. Further relevant result concerns the visual stimuli condition in the first experiment, which achieved 84.6% of accuracy in recognizing ASD sensory dysfunction.

Conclusion: According to our studies' results, implicit measures, such as EDA, and ecological valid settings can represent valid quantitative methods, along with traditional assessment measures, to classify ASD population, enhancing knowledge on the development of relevant specific treatments.

Keywords: autism spectrum disorder, sensory dysfunction, virtual reality, electrodermal activity, assessment

## INTRODUCTION

Autism spectrum disorder (ASD) is a neurodevelopment disorder characterized by a wide range of impairments, ranging from social to physical and cognitive functions (Baron-Cohen, 1990), affecting one in 160 children (World Health Organization [WHO], 2019). ASD symptoms arise as early as 2 to 4 years in age, and in some cases, the signs of ASD might start as early as 6 months old (Lord et al., 2006; Anagnostou et al., 2014). Specifically, ASD is associated with social and interaction symptoms as well as stereotyped and repetitive behavior patterns (American Psychiatric Association, 2013) that have a significant impact on educational (Levy and Perry, 2011) and social life (Schmidt et al., 2015). Furthermore, sensory processing dysfunctions have been observed as a relevant aspect of ASD symptomatology; indeed it is experienced by over 90% of ASD children (Leekam et al., 2007; Tomchek and Dunn, 2007; Baron-Cohen et al., 2009). Sensory processing is the ability to capture, elaborate, and integrate information through the five senses (touch, movement, smell, taste, vision, and hearing), allowing adapting behavioral responses to the environment (Miller et al., 2007). In the ASD population, such sensory processing and integration of stimuli are experienced differently from that of the typical development (TD) population, affecting response to stimuli. In more details, they show hyper-sensitivities (overresponsiveness) and hypo-sensitivities (under-responsiveness) to a wide range of sensory stimuli. Previous studies on sensory dysfunctions showed a hypersensitivity to visual and auditive stimuli, such as bright lights or noisy sounds (Tomchek and Dunn, 2007; Baron-Cohen et al., 2009; Tomchek et al., 2014); conversely, with olfactive stimuli, they present hypo-sensitiveness in detecting odor threshold (Dudova et al., 2011; Ashwin et al., 2014). Sensory dysfunction consequently affects the information processing in ASD, and it has been suggested that it may be the cause of impairments in several psychological domains, such as in cognitive and social responses (Tomchek and Dunn, 2007; Baron-Cohen et al., 2009).

## Current Issues in ASD Diagnosis and the Need for Biomarkers in ASD

Traditionally, ASD diagnosis and assessment include a series of explicit qualitative and quantitative measures characterized by semi-structured behavioral tasks' observations in which the examiner rates and scores an individual's responses to prompted situations (e.g., the Autism Diagnostic Observation Schedule, ADOS; Lord et al., 1999) and family structured interview (e.g., the Autism Diagnostic Interview-Revised, ADI-R; Lord et al., 1994). For example, the ADOS measure consists of various standardized activities introduced by the examiner, such as a simulation of having a snack together, that permits to observe the occurrence or non-occurrence of behaviors related to ASD. ADOS principally focuses on social behavior and communication analysis, and it is characterized by five different modules that allow tailoring of assessment to the age and communication development of the participants. Regarding sensory processing, the utmost test for its evaluation is Sensory Profile-2 (Dunn, 2014), a qualitative questionnaire in which family caregivers answer to several questions about activities at home, in school, and in the community (see the section "Materials and Methods" for test description).

Despite these instruments having been widely adopted in ASD research and clinical practice, several limitations remain (Volkmar et al., 2009), mainly regarding the absence of explicit sensory functioning assessment, the subjective evaluation and the examiner's expertise, and the ecological validity of the assessment setting.

Concerning the first limitation, traditional assessments have been designed following both ASD ICD-10 and DSM IV guidelines that do not consider sensory dysfunction as a necessary and distinct diagnostic criterion. Thus, ADI-R and ADOS do not tap sensory processing and responsiveness (Leekam et al., 2007). Second, training in administration and scoring is crucial and highly recommended (Lord et al., 2001) since test results and diagnosis rely on the examiner's subjective ability to detect ASD-related features. Examiners who not have a high level of ASD-specific previous training and expertise might lead to inappropriate task presentation and administration. This could influence the rating and the scoring, contributing to over- or under-interpretation of the outcomes and prompting a misleading assessment (Reaven et al., 2008). Another limitation that can cause ADOS' unreliable outcomes and affect the truthfulness of responses is the social desirability bias (Paulhus, 1991). Social desirability is a response bias in which individuals attempt to answer to tasks or questions in a manner that will be viewed as favorable by others (Edwards, 1957). First, part of the ASD assessment consists of reporting child information by family caregivers, who can interpret differently specific behaviors according to their personal perspective and experience (Möricke et al., 2016). Second, in ASD assessment, children may have been taught to act according to specific settings (e.g., laboratory settings) (Francis, 2005), and it might be that whether the same situation happened in the real world, examiners would obtain

different responses (Gillberg and Rasmussen, 1994). Finally, although diagnostic structured interviews are considered as the gold standard in ASD assessment (Goldstein et al., 2009), they usually take place in the laboratory rather than in ecologically valid settings. Ecologically valid settings are environments and situations similar to real ones, able to elicit everyday experiences and behaviors related to daily functioning (Franzen and Wilhelm, 1996; Chaytor et al., 2006). The more the assessment measure is valid from an ecological point of view, the more that the results can be generalized to the real world (Brunswik, 1955; Chaytor et al., 2006). Indeed recent studies showed that traditional assessment results did not reflect performance in real-life situations and vice versa (Parsons S., 2016).

According to these limitations, the existing ASD diagnosis criteria (DSM, ICD, ADOS, and ADIR) do not consider quantitative variations in symptom severity in each person's measurements and do not take into account the biological bases of the disorder. Recently, there has been a growing need for more reliable and valid diagnostic measures, such as biomarkers, to distinguish typical and atypical functioning and to reliably track the progression of the illness, thus helping to diagnose ASD (**Figure 1**). In order to generate valid quantitative models between explicit symptoms and implicit biomarkers, the emerging field of Computational Psychiatry (CP) is seeking, first, to mathematically model brain responses to the problems it faces and, second, to study how the "abnormal" experiences, emotions, and behaviors that are commonly used to describe disorders contribute to normal function and neural processes (Montague et al., 2012; Friston et al., 2014; Wang and Krystal, 2014; Redish and Gordon, 2016).

## Implicit Processes as Pillars for ASD Biomarkers

Currently, the EU AIMS Longitudinal European Autism Project is one of the largest multicenter, multidisciplinary studies to identify the stratification biomarkers for ASD and the biomarkers that may serve as surrogate ends (Murphy and Spooren, 2012). However, all participants are comprehensively characterized in terms of their brain structure and function [assessed using structural magnetic resonance imaging (sMRI), functional MRI (fMRI), and electroencephalogram (EEG)], biochemical biomarkers, prenatal environmental risk factors, and genomics. Nonetheless, when experiencing social situations, it is equally important to study the related behavioral outputs. Up to now, most of the information contained in the behavioral inputs do not seem to have been noticed. Studying social situations on how people process, store, and apply data about other people and

social circumstances can provide us with objective information about the ASD evaluation.

Recent progress in social cognitive neuroscience (SCN), a field of study including biological processes and cognitionbased aspects (Lieberman, 2010), is confuting the majority of social cognition models that suggest that humans can analyze and correctly verbalize their beliefs, feelings, and behaviors (Nosek et al., 2011), showing that our social interactions are mostly governed by unconscious processes that happen without conscious awareness or control (Forscher et al., 2019). To study the unconscious processes, several implicit measures, including brain images, behavior, and psychophysiological tracking, have been developed as alternative research methods to explicit measures since they are able to capture implied brain processes (Ledoux et al., 2016).

In the ASD population, implicit measures can contribute, along with traditional techniques, to obtain a more objective assessment from a quantitative point of view (Alcañiz et al., 2019). Various techniques used are based on measurements linked to some implied system in effect. The adoption of implicit SCN metrics as biomarker input variables for ASD evaluation suggests a move toward a quantitative ASD diagnosis. Some previous studies proposed the use of brain activity (fMRI and EEG), physiological measures (heart variability— HR), and behavioral responses (eye tracking measures—ET body movement recognition), with the goal of capturing the ASD patient's behavioral structure while being subjected to a stimulus (Di Martino et al., 2014; Van Hecke et al., 2015; Chita-Tegmark, 2016; Wang et al., 2016; Großekathöfer et al., 2017). For example, brain activity studies showed that ASD patients using fMRI present general brain hyperactivity and alterations in the middle and the posterior insula and in the cingulate posterior cortex (Di Martino et al., 2014). EEG studies in ASD showed greater activity in the left hemisphere in social situations (Van Hecke et al., 2015). In ASD, the study of gaze activity measured by eye tracking tools was analyzed as behavioral tests, linking the gaze patterns to the existence of nuclear deficits. Many studies have succeeded in linking this implicit measure with the affectation core deficits, with the degree of social, emotional, and cognitive skill development. Even in circumstances of social participation, predictors of ASD were found based on ocular actions and facial processing (Chita-Tegmark, 2016).

## Electrodermal Activity in ASD

To date, electrodermal activity (EDA, Nikula, 1991), a marker of sympathetic nervous system arousal, is one of the main implicit measures examined in ASD (White et al., 2014; Fenning et al., 2017). Specifically, it is an implicit neurophysiological process related to electrical proprieties of the skin, based on variations in sweating, skin conductance, heart rate, and blood flow to muscles when individuals are facing either internal or external stimuli (Fagius and Wallin, 1980; Benedek and Kaernbach, 2010; Boucsein, 2012). Its analysis allows to discern, among others, the phasic component of the signal, with rapidly changing activity, referred to the subject's responses to discrete stimuli (being an indicator of sympathetic activity), and the tonic component, with slowly changing activity, referred to the subject's basal conductance level (Dawson et al., 2007).

Regarding sensory dysfunction in ASD, multiple studies have investigated its relationship with EDA, comparing baseline arousal and EDA reactions to sensory stimuli among ASD individuals, neurotypical development population, and other diagnostic groups (for reviews, see Rogers and Ozonoff, 2005; White et al., 2014; Lydon et al., 2016). The evidence from these studies are controversial: some research found no differences in EDA levels in response to sensory stimuli (e.g., Zahn et al., 1987; Rogers and Ozonoff, 2005; McCormick et al., 2014), whereas other studies were successful (van Engeland et al., 1991; Miller et al., 2001; Rogers and Ozonoff, 2005; Schoen et al., 2009).

Overall regarding auditive stimulation, the enhanced EDA levels in ASD individuals have been associated to both baseline arousal and reaction to stimulus presentation (Palkovitz and Wiesenfeld, 1980; Barry and James, 1988; Chang et al., 2012); nevertheless, there are also instances about no differences between ASD and typical populations (e.g., Stevens and Gruzelier, 1984; Allen et al., 2013). Moreover, same pattern of mixed results has been found for immediate EDA of ASD individuals in visual stimulations: regarding reactions to facial expressions, autistic people exhibited weakened EDA responses compared to typical adults and children (Hirstein et al., 2001; Hubert et al., 2009; Riby et al., 2012), whereas Ben Shalom et al. (2006) found no differences; likewise, several studies related increased EDA reactivity to direct eye gaze in children with ASD (Kylliainen and Hietanen, 2006; Joseph et al., 2008; Kylliainen et al., 2012), but, conversely, other investigations did not (Louwerse et al., 2013). Furthermore, regarding smell processing, the ASD children seemed to be more sensitive than the TD children (Schoen et al., 2009); thus, they can detect odors at shorter distances (Ashwin et al., 2014); on the other hand, they have difficulties in detecting odor threshold (Dudova et al., 2011).

Finally, the correlation between ASD traditional assessments and EDA measures has been studied, observing that higher levels of ASD symptoms, measured by ADOS, are related to greater variability in EDA (Fenning et al., 2017).

## Use of Virtual Reality in ASD

To date, the above-described implicit measuring methodologies can be divided into two groups: studying the actions of the subject in a real scenario or conducting experiments in laboratory settings. The main problem with actual real-life scenarios is that it is not easy to study human responses in real situations because the experimenter struggles to fully monitor the stimuli involved in the encounter. Conversely, participants face controlled conditions in laboratory settings that do not include certain variables present in real-life situations, resulting in the experiment's low ecological validity.

Virtual reality (VR) emerges as a promising technology capable of overcoming the problems mentioned above. VR offers the opportunity to create different real situations, including social situations that produce body interactions in which the body, environment, and brain are closely related. VR can be described as a virtual 3D environment that can replicate real experiences where participants can interact as if they were in

the real world. Different technical tools can create a sense of presence, enabling the subjects to view their behaviors as real (Slater, 2009). Experiencing a high sense of presence enables the participants in the virtual environment (VE) to communicate and behave as if they were thinking, acting, and communicating in their real life (Alcañiz et al., 2019). Therefore, actions, attitudes, and beliefs can be transferred from nature to virtuality and vice versa and can occur spontaneously and unconsciously, generating circumstances of high ecological validity and maintaining high experimental control in stimuli presentation and in gathering behavioral performance. Neuroscientists are increasingly using VR to replicate natural phenomena and social interactions, developing immersive and multimodal sensory stimuli that provide advantages over real-life and traditional testing methodologies on the controlled stimuli and accuracy in data gathering (Bohil et al., 2011) and allowing also the integration of behavioral measures.

The use of VR in ASD research has been postulated as one of the methods with great potential in the treatment of the main symptomatological nucleus (Wing et al., 2011; Parsons T. D., 2016; Golestan et al., 2018). Such advantages have the theoretical basis established by Blascovich et al. (2002), who argued that interactive VEs would be able to change interaction and evaluation by offering the opportunity to study human behavior in normal, controlled, and replicable environments to produce an individual response close to that obtained in a real context.

One of the aims that are replicated for ASD and VR users throughout the research is to improve their ability to work in everyday life. Research that has built VE to learn different skills in children with ASD are not difficult to find: cognitive learning (Kandalaft et al., 2013), interaction (Bernardini et al., 2014), and emotional training (Bekele et al., 2014).

Nevertheless, there is a lack of research applied to the diagnosis in the field of VR in which an objective assessment of ASD is conducted through individualized clinical tests (behavioral biomarkers), customizing the treatment to each patient's profile.

To our knowledge, no one has investigated whether multimodal VR settings and EDA reactions might contribute to predicting ASD population versus TD children. Starting from these premises, we performed two studies (the first exploratory and the second confirmatory) to discriminate and predict sensory processing in the ASD population versus in a TD population through the combined use of implicit measure (EDA) and different sensory stimuli, involving two different VE and tasks.

To this extent, the first experiment aimed to analyze the influence of three factors in predicting ASD: (1) the VE contents, one VE including a relaxing environment and another one including an arousal environment; (2) the task, one related to the subject's greeting responses in the relaxing environment and others related to the subject's imitation in the arousal environment; and (3) the stimuli conditions (SC), including visual (V), visual and auditive (VA), and visual, auditive, and olfactive stimuli (VAO). Specifically, in the first environment, the participants have been projected into a forest wherein the visual stimulus was a girl avatar appearing, the auditive stimulus was the sound of the rain, and the olfactive stimulus was the odor of fresh-cut grass. In this relaxing environment, the subjects were asked to complete tasks related to responding to the greetings of the avatars. In the second environment, the participants were introduced in a city street intersection in which the visual stimulus was the presence of two avatars (a girl and a boy), the auditive stimulus was a song that avatars danced to, and the olfactive stimulus was the smell of butter related to avatars that bit a muffin. In this arousal environment, the subjects were asked to complete a task related to the imitation of the actions of the avatars. In both environments and experiments, the EDA responses were recorded and introduced in a supervised machine learning classifier in order to recognize ASD.

Starting from these premises and aims, the first hypothesis in experiment 1 was that the ASD recognition is higher in the forest since the response to a greeting is one of the confirmatory symptoms in the ASD. The second hypothesis was that, by including more sensory modalities, the ASD recognition using EDA would present a better performance. After that, we performed a second experiment in order to develop a supervised learning model using the outputs of the first experiment. We increased the number of subjects used to calibrate the model and we tested it in a set of subjects not used before, simulating a real-world application.

## MATERIALS AND METHODS

## Experiment 1

### Participants

The study included 52 children between the ages of 4 and 7 years. In detail, 23 TD children (age = 4.87 ± 0.92; male = 13, female = 10) and 29 children with a previous diagnosis of ASD (age = 5.20 ± 1.34; male = 26, female = 3) participated in experiment 1. The ASD group sample was recruited from the Development Neurocognitive Centre, Red Cenit, Valencia, Spain. The ASD and the TD participants presented individual assessment reports that included the results of their ADOS-2 test. A sample management company recruited the TD group through targeted mailings to families. Before participating in the study, the family caregivers received written information about the study and they were required to give written consent for inclusion in the investigation. The study obtained ethical approval from the Ethical Committee of the Polytechnic University of Valencia. Furthermore, all procedures performed in the study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

## Psychological Assessment

The following scales and tasks have been administered to the participants and their family caregivers:

• Autism Diagnostic Interview-Revised (ADI-R): The ADI-R (Lord et al., 1994) is a clinical semi-structured interview used to detect ASD and answered by family caregivers. The questions are linked to ICD-10 and DSM-IV criteria for autism and yield separate scores

in three domains—communication, social interaction, and restricted, repetitive, and stereotyped behaviors. The answers are scored on a 0–3-point scale, in which 0 indicates the absence of the behavior and 3 indicates the clear manifestation of the determined behavior. ADI-R presents high psychometric properties and the test–retest reliability ranged from 0.93 to 0.97.

• Autism Diagnostic Observation Schedule (ADOS-2): The ADOS-2 (Lord et al., 1999) includes structured and semi-structured tasks to assess children's development in several areas, such as communication, use of imagination, social interaction and play, and restrictive and repetitive behaviors. The measure uses five modules, tailored to the age and communication development of the participants. Concretely, module T is for young children who are between 12 and 30 months old and do not use phrase language consistently, module 1, for children who are 31 months or older and who do not use phrase language consistently, module 2 for children of any age who use phrase language but who do not have verbal fluency, module 3 for children with fluent language and young adolescents (under 16), and finally module 4 for adults and adolescents (16 years and older) with fluent language. From the observation of these behaviors, the items are scored between 0 (no evidence of abnormality related to autism) and 3 (definitive evidence), and from the sum of scores, two specific indices (social affectation and restricted and repetitive behavior) and the ASD global total index are obtained. The ADOS-2 presents excellent psychometric properties: the test–retest reliability is 0.87 for the social affectation index, 0.64 for the repetitive behavior index, and 0.88 for the total global index. In the study, the assessment was performed using module 1, corresponding to children from 31 months of age who do not use phrase language consistently.

### The Virtual Environments

The 3D models were developed in the Institute for Research and Innovation in Bioengineering (i3B) at the Polytechnic University of Valencia. The environment was developed and projected inside a three-surface Cave Assisted Virtual Environment (CAVETM) with dimensions of 4 m × 4 m × 3 m. It was equipped with three ceiling ultra-short lens projectors, which can project a 100◦ image from just 55 cm. The sound system used was the Logitech Speaker System Z906 500W 5.1 THX Digital (**Figure 2**).

Two VEs were developed:

1. A virtual forest, including three controlled stimuli conditions: visual, visual–auditive, and visual–auditive– olfactive (**Figure 3**). The visual stimuli consisted of a girl's avatar appearing from the left side of the forest and walking to the central virtual scene, where she stopped and waved her hand three times to say hello to the child, and then leaving the virtual scene, walking to the right side of the forest (**Figure 4**). The auditive stimuli consisted of adding to the virtual forest a storm and rain sound. Finally, the olfactive stimuli consisted of an odor of fresh-cut grass.

2. The other VE involved a simulated city street intersection (**Figure 5**) and was divided into three experimental stimuli conditions: visual, visual–auditive, and visual–auditive– olfactive. First, in the V stimuli condition, a boy's avatar appeared from the left side of the surface CAVETM, walking to the center of the virtual scene, where he stopped and waved his hand three times to say hello to the child, and then leaving the virtual scene, walking out of the street intersection (**Figure 6**). Successively, a girl's avatar appeared in the central of the surface CAVETM, walking to the right of the virtual scene, where she stopped and repeated the three waves with her hand to say hello to the child, and then leaving the virtual scene, walking to the right side of the street intersection. This sequence was repeated three times. In the second VA stimuli condition, the same avatars appeared in the same order from the same directions, but instead of waving the hand to say hello, they danced over a piece of music for 10 s for three times. In the VAO stimuli condition, the same avatars appeared in the same order and from the same directions, but they bit a buttered muffin, accompanied by the same song of the previous condition and an artificial butter smell that was released during the VR experience.

To avoid transfer effects over VR experiences, the VE presentation (forest and city street intersection) was counterbalanced across participants and a 1-week rest was left between the two VR experiences. Despite counterbalancing practice is also recommended for stimuli conditions of VEs, to reduce the possibility to provoke sensory sensitiveness overload in ASD children, the same stimuli presentation order was maintained (V, VA, and VAO) for the entire sample in both VR experiences. Indeed sensory sensitiveness in ASD can suddenly emerge in different situations that require the processing capacity of sensory integration from several channels (Bogdashina, 2016); such concurrent sensory decoding of stimuli might yield ASD children distress and uncomfortable states that could affect the quality of performance and assessment in VEs.

#### Physiological Assessment and Data Processing

Electrodermal activity signal was recorded using an Empatica E4 wristband.<sup>1</sup> Its reliability has been found to be comparable to clinical devices in appropriate circumstances (McCarthy et al., 2016). Raw signal (recorded at 4 Hz and 0.001–100 µS) was pre-processed and analyzed using Ledalab<sup>2</sup> (v.3.4.8) via Matlab<sup>3</sup> (v.2016a). Pre-processing consisted of two successive phases: (1) Butterworth low-pass signal filtering at 2.5 Hz (Valenza and Scilingo, 2013) and (2) visual diagnosis of artifacts and their corrections. Due to the records characteristics and the analysis chosen, it was not considered necessary to apply signalsmoothing techniques. The analysis was tackled through the continuous decomposition analysis (CDA) method. It is based on the deconvolution of the skin conductance signal by the general response shape, prior to the data decomposition in the tonic and

<sup>1</sup>www.empatica.com

<sup>2</sup>www.ledalab.de

<sup>3</sup>www.mathworks.com

phasic components. As mentioned above, the tonic component generates slow changes in the conductance signal (magnitude of minutes), being considered the basal activity, and the phasic component generates rapid changes in the conductance signal (magnitude of seconds), being considered the response of the subjects to discrete stimuli. CDA has been proven to be an appropriate method for the analysis of short intervals between stimuli, especially in situations that can generate a high phasic activity (Benedek and Kaernbach, 2010). In order to reduce inter-subject differences, all values were standardized according to Venables and Christie (1980). This process was applied to the subject's whole experience record. Finally, the set of metric extracted to characterize each stimuli condition includes the mean of tonic (BL tonic) and phasic (BL phasic) component of the baseline performed previously to the stimuli condition, the mean of tonic and phasic component of the responses to the stimuli condition, and the ratio between the tonic and the phasic component of the responses to the stimuli condition.

### The Olfactive System

For the olfactive stimulus, we used the Olorama<sup>4</sup> TechnologyTM wireless freshener. It features 12 scents arranged in 12 precharged channels, which can be selected and triggered by means of a UDP packet. The device encompasses a programmable fan time system that dissipates the scent. Both the intensity of the chosen scent (amount of time the scent valve was open) and

<sup>4</sup>www.olorama.com

the amount of fan time were programmed. The scent valve was opened all the time during the last stimuli condition (VAO).

### Experimental Procedure

First, the family caregivers of the participants were informed about the general objectives of the research, the physiological measure and its device localization, and the VR system. Second, the Empatica E4 device was shown and placed on the participant's non-dominant hand before the virtual session. Subsequently, the child was accompanied in the CAVE by the researcher and by his or her family caregiver, according to the child's needs, and was placed in the middle of the virtual room, standing in front of the central surface at a distance of 1.5 m. Before

FIGURE 4 | Girl's avatar saying hello.

FIGURE 5 | Virtual city street intersection.

each stimuli condition, 2 min of EDA baseline was recorded in rest and relaxing state, and then the VE experiences started (**Figure 7**). The total duration of the forest VE experience was 8 min and 15 s, and each stimuli condition lasted 45 s. The total duration of the city VE was 14 min, and each stimuli condition lasted for 2 min and 40 s. The participants were balanced between the two VEs, leaving a 1-week rest between the two experimental sessions.

During the three VR stimuli conditions in both virtual experiences, the EDA signals were recorded. The researcher monitored the child state during the entire experiment, and care was taken to address any indisposition derived from the use of the devices.

## Experiment 2 Participants

The study added 40 children, between the ages of 4 and 7 years, to experiment 1. In detail, 23 TD children (age 4.86 ± 0.91; male = 13, female = 10) and 17 ASD children (age 5.13 ± 1.35; male = 14; female = 3) participated in experiment 2. The ASD group sample was recruited from the Development Neurocognitive Centre, Red Cenit, Valencia, Spain. The ASD and the TD participants presented an individual assessment report that included the results of their ADOS-2 test. A sample management company recruited the TD group through targeted mailings to families. Before participating in the study, the family caregivers received written information about the study, and they were required to give written consent for the inclusion in the investigation.

#### Psychological Assessment

Experiment 2 utilized the same scales and tests of experiment 1.

### Physiological Assessment and Data Processing

The EDA signals were recorded using Empatica E4 wristband, as in experiment 1,<sup>5</sup> and the physiological data processing and analyses were performed using the method described in the "Physiological Assessment and Data Processing" section of experiment 1.

#### The Olfactive System

The system used was the same as that implemented in experiment 1: the Olorama TechnologyTM<sup>6</sup> wireless freshener.

<sup>5</sup>www.empatica.com <sup>6</sup>www.olorama.com

## Experimental Procedure

fnhum-14-00090 April 1, 2020 Time: 15:45 # 9

In experiment 2, the participants only experienced the forest VE as follows: first, as in experiment 1, the family caregivers of the participants were informed about the general objectives of the research, the physiological measure and its device localization, and about the VR system. Second, the Empatica E4 device was shown and placed on the participants' arm of the non-dominant hand before the virtual session. Consequently, the child was accompanied in the CAVE by the researcher and by his or her family caregiver according to the child's needs. The participant was placed in the middle of the virtual room, standing in front of the central surface at 1.5 m. Firstly, 2 min of EDA baseline was recorded in resting and relaxing state. Next, the three stimuli VR experience conditions were presented, recording a 2-min EDA baseline before each one.

## Statistical Analysis

In experiment 1 (n = 54), three participants were excluded from the analysis for lack of EDA data due to bad recording: two from the forest VE and one from the city VE. Consequently, the sample size included 52 children for forest VE analysis and 51 in the city VE. In this preliminary stage, we developed four models for each environment (forest VE and city VE) in order to explore the importance of the scenario and each stimuli condition (SC). The first model included all the SC, the second only the visual stimuli, the third only the VA stimuli, and the fourth only the visual, auditive, and olfactive SC. Moreover, we developed two extra models to analyze if they can achieve a performance better than chance. To this extent, we computed a permutation-based test, i.e., we developed two models with city/forest VE input data (all stimuli) with a random output class assignment. The development of the models (parameter tuning and feature selection) used cross-validation with all the samples of the experiment. To compare model performance, we used the output of the classification algorithm without bipolarization, i.e., the probability between 0 and 1 that the model as their true class classified a subject. Due to the Gaussianity of the data (p > 0.05 from the Shapiro–Wilk test with null hypothesis of having a Gaussian sample), we performed a statistical model comparison using the probabilities of the models by applying a one-way ANOVA with Tukey–Kramer correction.

In experiment 2 (n = 92): in order to calibrate and test the final model, we used the case of the forest VE and all the SC, increasing the participants to boost the final model and test it. We split the dataset into a training set (n = 72) and a test set (n = 20). The test set was sliced randomly using the new subjects, but keeping a balanced 50% of each class. The development of the model (parameter tuning and feature selection) used crossvalidation with the training set and afterward was applied in the test set that has not been previously used.

To develop the models, we used support vector machine (SVM)-based pattern recognition (Schölkopf et al., 2000) with a leave-one-subject-out (LOSO) cross-validation procedure. For the LOSO scheme, the training set was normalized by subtracting the median value and dividing by the median absolute deviation over each dimension. In each iteration, the validation set


consisted of one specific subject and he/she was normalized using the median and deviation of the training set. In particular, we used an optimized C-SVM using a sigmoid kernel function, changing the parameters of cost and gamma using a vector with 15 parameters logarithmically spaced between 0.1 and 1,000. Moreover, we performed a feature selection strategy to explore the relative importance of each feature. A support vector machine recursive feature elimination (SVM-RFE) procedure, in a wrapper approach, was included (RFE was performed on the training set of each fold and we computed the median rank for each feature over all folds). We specifically chose a recently developed, non-linear SVM-RFE, which includes a correlation bias reduction strategy in the feature elimination procedure (Yan and Zhang, 2015). The model was optimized to achieve best Cohen's kappa. The algorithms were implemented using Matlab© R2016a and LIBSVM toolbox (Chang and Lin, 2011).

## RESULTS

## Experiment 1: Model Comparisons

**Table 1** shows the performance of the eight models performed, considering both VEs and SC. It includes the accuracy of each model, the confusion matrix, and the features included derived to the automatic feature selection procedure. In addition, **Figure 8** shows a comparison of the performance of each model, considering the probability that the model as their true class classified a subject and the significant differences between models were derived from one-way ANOVA using a Tukey–Kramer correction. We included in the ANOVA 2 permutated models to test if the accuracy is significantly better than chance, where the accuracy is 67.30% for the forest VE and 68.62% for the city VE. The result of the one-way ANOVA shows that there are differences between models (p < 0.0001).

The highest accuracy (90.3%, kappa = 0.80) was achieved by experiment 1 including all SC and it presented higher performance than the rest of the models (forest—VA p = 0.000, forest—VAO p = 0.000, city—all p = 0.000, city—V p = 0.000, city—VA p = 0.000, city—VAO p = 0.002), except with the forest—V SC where no statistical significance was found. The model included four features of the V SC (baseline tonic, baseline phasic, phasic, and ratio), one feature of the VA SC (ratio), and two features of VAO SC (baseline tonic and tonic). The second highest accuracy (84.6%, kappa = 0.69) was achieved by the forest including V SC, and it presented a higher performance than the rest of the models with lower accuracy (forest—VA p = 0.000, forest—VAO p = 0.008, city—all p = 0.001, city—V p = 0.000, city—VA p = 0.000, city—VAO p = 0.029). The model only included two features (baseline tonic and baseline phasic). Both models have a strongly balanced confusion matrix. These two models presented a performance statistically different than chance (forest—all p = 0.000 and forest—V p < 0.0001).

The rest of the models presented accuracy between 68 and 76% and did not present statistically significant differences in terms of performance between them and the permutated models. The model, including the VA stimuli condition of the forest, showed an accuracy of 71.15% (kappa = 0.41) and included the three features (phasic baseline, phasic, and ratio). The model, including VAO SC of the forest, achieved a balanced accuracy of 75.00% (kappa = 0.49), including only the tonic responses in the baseline.

true class; vertical lines represent the standard deviation of the means; asterisk indicates significant differences with p < 0.05.

Regarding the city VE, the model included all stimuli conditions, achieved 70.5% (kappa = 0.39) using one feature of VA SC (phasic) and four features of VAO SC (baseline tonic, tonic, and phasic and ratio). The models, including the V and the VA SC, achieved 68.63% (kappa = 0.32) and 72.55% (kappa = 0.41) of accuracy respectively, but with a very bad balance in terms of false positives. The model, including the VAO SC, achieved a balanced 76.47% (kappa = 0.52) of accuracy including four features (baseline phasic, tonic, phasic, and ratio).

## Experiment 2: Development of the Final Model

**Table 2** shows the performance of the final model derived from the forest VE after the increment of the subjects. The validation set ( n = 72) shows a balanced accuracy of 83.33% (kappa = 0.668). The test set ( n = 20) achieved 85% of accuracy (kappa = 0.700), recognizing 80% of subjects with sensory dysfunction. The model included one feature of the V SC (phasic), three features of the VA SC (baseline tonic, baseline phasic, and phasic), and one feature of the VAO SC (baseline tonic). In addition, **Figure 9** shows the ROC curve of the performance of the final model, achieving an area under a curve (AUC) of 0.897 in the validation and 0.870 in the test.

## DISCUSSION

The main aim of this study was to discriminate and predict sensory processing, recognizing ASD population versus TD population through the combined use of implicit measure (EDA) and different sensory stimuli in VR. Specifically, two experiments have been run, testing two different VEs, presenting three sensory stimuli conditions each—visual, visual and auditive, and visual, auditive, and olfactive stimuli—and examining EDA changes before and during the presentation of the virtual and the sensory stimuli. The focus has been on sensory processing because there are evidences that it is relatively impaired in the ASD population (Leekam et al., 2007; Tomchek and Dunn, 2007; Baron-Cohen et al., 2009).

The results can be discussed on four levels: (1) the influence of scenarios and stimuli conditions, (2) the role of EDA and the features used, (3) the performance of ASD recognition, and (4) the limitations and further studies.

## The Influence of Scenarios and Stimuli Conditions

Regarding scenarios, the model developed using the forest VE presented a higher accuracy (forest VE—all, 90.3%) than the model developed using the city VE (city VE—all, 70.59%). Since we used the same set of subjects, the results are not influenced by the individuals' bias. Therefore, a model comparison validated the hypothesis that ASD recognition was higher in the forest VE (forest VE—all vs. city VE—all, p = 0.000). Moreover, the permutated test shows that forest—all and forest—V are the models that statistically offer a performance better than chance. This outcome could be due to task characteristics since the response to a greeting is one of the confirmatory symptoms in


ASD. In addition, several previous studies showed the influence of nature scenes in reducing arousal (Liszio et al., 2018; White et al., 2018). Therefore, the forest VE can be assumed as a more relaxed environment than the city VE. Since EDA is highly affected by the arousal (Picard et al., 2016), the results suggested that a natural and relaxed environment as a forest VE could be a better scenario to detect changes in the ASD population due to sensory processing dysfunctions. In addition to the increase of arousal derived from city VE, the avatars imitation task provoked a physical activity in the subject that could affect arousal, decreasing the recognition performance of models, due to an arousal saturating effect; hence, the results supported the use of a low-arousal natural environment and non-physical activities to increase the performance recognition of models using EDA.

Regarding stimuli conditions, the model developed in the forest VE with all the stimuli conditions achieved 5.78% of accuracy more than the forest with only visual stimulation, but it did not show statistical differences. Both models presented higher accuracy and performance than the rest of the models, including the permutated one. However, the model developed in the city VE used only one feature of V SC, three features of VA SC, and four features of VAO SC. Therefore, even though the exploratory analysis performed in forest VE suggested that VA and VAO did not play an important role in the ASD recognition in comparison with V, the feature selection for the final model showed high reliance on the multimodal sensory condition since four out of five of the features selected were from VA and VAO stimuli. The hypothesis that increasing sensory modalities would have contributed to better ASD recognition through EDA is partially confirmed by the final model.

## The Role of EDA and the Features Used

To our knowledge, we proposed the first supervised ML model using EDA for ASD population [see Hyde et al. (2019) for a review of ASD models recognition]. Our results were in accordance with previous research that showed that ASD is associated with the autonomic nervous system and can be measured using EDA (Miller et al., 2001; Rogers and Ozonoff, 2005; Schoen et al., 2009; Bujnakova et al., 2016). However, other researches did not find differences in EDA levels in response to sensory stimuli in the ASD population (e.g., Zahn et al., 1987; Rogers and Ozonoff, 2005; McCormick et al., 2014). The level of recognition of the presented models represents a new step in the use of the autonomic nervous system as a biomarker for ASD recognition. In addition, the CDA analysis showed a valid signal processing method to extract valuable features from EDA. The phasic responses of the subjects in the two SCs (V and VA) are included in the feature selection of the final model and in many of the exploratory model comparison. A baseline is also a very important part of the stimuli since the baseline responses of VA (tonic and phasic) and VAO (phasic) SCs were included in the final model. Moreover, the baseline responses were also included in forest VE—all and forest VE—V models of forest VE. In this regard, it should be noted that variations in the phasic and tonic components, related to changes in emotional

arousal, have been reported by studies carried out in different experimental paradigms (Kreibig, 2010). The relevant role of the baseline was in accordance with previous research that suggested that the participants are likely to be hyper- or hypo-responders independent of any effects of stimuli (Braithwaite et al., 2013). Moreover, the role of baselines could be especially important in research on sensory processing disorders as in ASD.

## The Performance of ASD Recognition

Regarding the final model on ASD recognition, the validation set using 72 subjects achieved 83.33% of accuracy (kappa: 0.668, AUC: 0.897), including 86.11% of true positives. Moreover, we tested the model in a set of 20 subjects (10 ASD and 10 controls) recruited in a second phase, and the model achieved 85% accuracy (kappa: 0.700, AUC: 0.870). The results presented perform a new step in ASD recognition since, to our knowledge, we presented the first ASD-supervised ML recognition model using EDA and multimodal VR. Moreover, the methodology presented some advantages in contrast to previous research. Li et al. (2017) presented an analysis using kinematics recognizing ASD in adults achieving 86.7% accuracy (n = 30). Liu et al. (2016) developed a model using eye tracking to recognize ASD children based on face processing, achieving 88.51% of accuracy (n = 58). Nakai et al. (2017) presented an ASD model recognition in children using voice analysis, achieving 76% of accuracy (n = 30 ASD, n = 51 TD). All of them validated their models using cross-validation procedures and used ecological biomarkers to recognize ASD. Contrarily, the presented model achieved the same (or more) level of accuracy, but using a broader sample size and, moreover, applying the model to a new test set that was not used before, simulating a real application. It supposes a new step forward in order to develop scalable clinical applications of ASD recognition models. On the other hand, previous research by Chen et al. (2015) showed a very large study (n = 252) with a very high accuracy (91%) using fMRI. In contrast to this approach, we proposed an ecological environment and instrumentation using VR and EDA wristband sensor. This ecological approach is particularly important in the field of ASD and can offer cheaper and quicker clinical diagnostic models in the future.

## Limitations and Future Studies

Although this study did a step forward in the field of ASD sensory processing assessment, it presented some limitations regarding sample characteristics, specific ASD symptoms and their related measures, and VEs.

First, the participants were from 4 to 7 years old and selected ASD children received, according to their symptomatology and age, a previous ASD diagnosis through the module 1 of the ADOS-2 questionnaire that is addressed to infants older than 31 months of age but who do not use phrase language consistently. Nonetheless, it has been decided to test only participants who pertained to this class and characteristics to control and ensure results, but these narrow criteria limit the generalization of findings.

Second, the present study mostly focused on ASD sensory processing although it is not a core ASD symptom for diagnostic manuals, such as DSM-5 and ICD-10. Furthermore, regarding VEs, at the first time, children might experience them as astonishing and impressive (Novelty effect, Clark, 1983; Gravetter and Forzano, 2018); for this reason, in the first part of each study, there might be a common effect on EDA metrics, especially in forest-V and city-V conditions. However, this artificial activation arousal, that is due to the sense of being physically present in a VE despite the certainty of not being physically there, decreases as the familiarity with the virtual world and device increases. Third, the sample size was restricted and not matched on socio-demographics, limiting the generalization of the model outcomes. In accordance with limitations, future works are needed in order to develop an objective method for the assessment of sensory processing in the ASD population. Future studies must, first, include a broader sample size with further control and matching on sociodemographics. Socioeconomic status is recommended in order to avoid misleading model outcomes based on other metrics far from ASD presence or absence (Delobel-Ayoub et al., 2015). Second, ASD individuals should be diagnosed by the five modules of ADOS-2 questionnaire to test whether the results presented here may be generally replicated in all age range and linguistic ability clusters. Moreover, in conjunction with sensory measures, the inclusion of core symptom analyses in VR is suggested, for example, repetitive and stereotypical behaviors, and communication and social abilities. Biomarkers that could be relevant for this purpose are eye tracking, body movement analysis, and EEG (Loth et al., 2016); indeed eye tracking glasses and RGBD cameras for body movement analysis might be included in future studies on current VR experiences in order to enhance model strength and accuracy. Furthermore, to discern impaired sensory processing, the present study involved three VR conditions (visual, auditive, and olfactive) and it could be interesting to add a fourth condition about haptic processing since it enhances immersion in the VE, providing a more ecological and realistic experience (Slater and Wilbur, 1997). Finally, some adjustments of the virtual content might bring more sense of presence to the participants, such as the introduction, in both VA and VAO stimuli conditions, of auditive stimulation consistent with the avatar that is waving the hand to say hello.

## CONCLUSION

Sensory processing is a relevant ability in information processing, allowing adapting behavioral responses to the environment (Miller et al., 2007). ASD show hyper-sensitiveness (overresponsiveness) to VA stimuli and hypo-sensitiveness (underresponsiveness) to olfactive stimuli (Tomchek and Dunn, 2007; Baron-Cohen et al., 2009; Dudova et al., 2011; Ashwin et al., 2014; Tomchek et al., 2014). The hyper–hypo sensitiveness to sensory stimuli can generate an alteration in information processing, affecting cognitive and social responses in daily life situations. Traditional ASD assessment, based on semistructured behavioral task observations on laboratory settings

and structured interviews, does not take into account the dysfunctional sensory processing in real life. According to the results, current studies have shown that it is possible to obtain biomarkers for ASD classification using a CP paradigm based on implicit brain processes, measured through psychophysiological signals and the subjects' behavior, while exposed to complex social conditions using VR interfaces. The ASD classification using biomarkers, along with traditional assessment, could enhance knowledge on the development of relevant specific treatments.

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by Polytechnic University of Valencia. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

## REFERENCES


## AUTHOR CONTRIBUTIONS

All authors have contributed to the manuscript as follows: MA and LA designed the study and supervised the whole study. EO and MS recruited and assessed the eligible subjects. JM-M conducted the odor and statistical analyses. IC, GT, MM, MS, JM-M, and JH-T wrote the original manuscript and MA and LA revised the manuscript. All authors assisted in the revision process, read, and approved the final manuscript.

## FUNDING

This work was supported by the Spanish Ministry of Economy, Industry, and Competitiveness-funded project "Immersive Virtual Environment for the Evaluation and Training of Children with Autism Spectrum Disorder: T Room" (IDI-20170912) and by the Generalitat Valenciana-funded project REBRAND (PROMETEU/2019/105).

## ACKNOWLEDGMENTS

We thank Zayda Ferrer Lluch for the development of virtual reality environments.




**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Alcañiz Raya, Chicchi Giglioli, Marín-Morales, Higuera-Trujillo, Olmos, Minissi, Teruel Garcia, Sirera and Abad. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Usability Issues of Clinical and Research Applications of Virtual Reality in Older People: A Systematic Review

Cosimo Tuena1,2 \*, Elisa Pedroli 1,3, Pietro Davide Trimarchi <sup>4</sup> , Alessia Gallucci <sup>4</sup> , Mattia Chiappini <sup>1</sup> , Karine Goulene<sup>5</sup> , Andrea Gaggioli 1,2, Giuseppe Riva1,2 , Fabrizia Lattanzio<sup>6</sup> , Fabrizio Giunco<sup>4</sup> and Marco Stramba-Badiale<sup>5</sup>

*<sup>1</sup> Applied Technology for Neuro-Psychology, IRCCS Istituto Auxologico Italiano, Milan, Italy, <sup>2</sup> Department of Psychology, Catholic University of the Sacred Hearth, Milan, Italy, <sup>3</sup> Faculty of Psychology, University of eCampus, Novedrate, Italy, 4 IRCCS Fondazione Don Carlo Gnocchi, Milan, Italy, <sup>5</sup> Department of Geriatrics and Cardiovascular Medicine, IRCCS Istituto Auxologico Italiano, Milan, Italy, <sup>6</sup> Scientific Direction, IRCCS INRCA, Ancona, Italy*

Edited by:

*Valerio Rizzo, University of Palermo, Italy*

#### Reviewed by:

*Hasan Ayaz, Drexel University, United States Diogo Morais, Lusophone University of Humanities and Technologies, Portugal*

> \*Correspondence: *Cosimo Tuena cosimo.tuena@unicatt.it*

#### Specialty section:

*This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience*

> Received: *28 October 2019* Accepted: *02 March 2020* Published: *08 April 2020*

#### Citation:

*Tuena C, Pedroli E, Trimarchi PD, Gallucci A, Chiappini M, Goulene K, Gaggioli A, Riva G, Lattanzio F, Giunco F and Stramba-Badiale M (2020) Usability Issues of Clinical and Research Applications of Virtual Reality in Older People: A Systematic Review. Front. Hum. Neurosci. 14:93. doi: 10.3389/fnhum.2020.00093* Aging is a condition that may be characterized by a decline in physical, sensory, and mental capacities, while increased morbidity and multimorbidity may be associated with disability. A wide range of clinical conditions (e.g., frailty, mild cognitive impairment, metabolic syndrome) and age-related diseases (e.g., Alzheimer's and Parkinson's disease, cancer, sarcopenia, cardiovascular and respiratory diseases) affect older people. Virtual reality (VR) is a novel and promising tool for assessment and rehabilitation in older people. Usability is a crucial factor that must be considered when designing virtual systems for medicine. We conducted a systematic review with Preferred Reporting Items for Systematic reviews and Meta-analysis (PRISMA) guidelines concerning the usability of VR clinical systems in aging and provided suggestions to structure usability piloting. Findings show that different populations of older people have been recruited to mainly assess usability of non-immersive VR, with particular attention paid to motor/physical rehabilitation. Mixed approach (qualitative and quantitative tools together) is the preferred methodology; technology acceptance models are the most applied theoretical frameworks, however senior adapted models are the best within this context. Despite minor interaction issues and bugs, virtual systems are rated as usable and feasible. We encourage usability and user experience pilot studies to ameliorate interaction and improve acceptance and use of VR clinical applications in older people with the aid of suggestions (VR-USOP) provided by our analysis.

Keywords: aging, assessment, rehabilitation, usability, user-experience, virtual reality

## INTRODUCTION

Life expectancy is rapidly increasing and is expected to rise in the years to come, thereby creating an aging population. However, a significant proportion of older people may develop frailty, multi-morbidity, and disability causing a significant impact both on their quality of life and also on health care and social costs (Lutz et al., 2008; World Health Organization, 2015). Aging is associated with physiological changes (e.g., apoptosis, senescence, inflammation) that may lead to systemic alterations (Flatt, 2012). This potential decline may involve sensory, mental, and physical functioning thus leading to-increased morbidity, multi-morbidity, disability, and mortality (World Health Organization, 2015). On the other hand, motor skills, visual, hearing, proprioception, and cognitive abilities (e.g., memory) may be reduced even in healthy older people (Kuehn et al., 2017). In addition, aging hampers psychosocial well-being by adding new developmental tasks or situations (e.g., isolation; Steptoe et al., 2015). In particular, the prevalence of Alzheimer's disease, cancer, chronic obstructive pulmonary disease, maculopathy, osteoarthritis, osteopenia, Parkinson's disease, periodontitis, rheumatoid arthritis, sarcopenia, cardiovascular diseases, and type 2 diabetes increases with age (Tolosa et al., 2006; Dubois et al., 2010; Marengoni et al., 2011; Edwards et al., 2015; Steenman and Lande, 2017; Yakaryilmaz and Öztürk, 2017; Franceschi et al., 2018). Additionally, several clinical conditions may jeopardize the well-being of older people, such as mild cognitive impairment, frailty, or metabolic syndrome (Fried et al., 2001; Petersen, 2004; Portet et al., 2006; Huang, 2009; Xue, 2011; Fedarko, 2012). The main priority of successful management of aging is enabling older people to be healthy, active, and autonomous for as long as possible (World Health Organization, 2002). Accordingly, functional decline is one of the key issues to be managed (World Health Organization, 2015). Among other practices, the use of assistive health technology (AHT; i.e., technologies devoted to maintain or improve functionality, autonomy and well-being) or medical devices (MD; i.e., technologies used for prevention, diagnosis and treatment) may also produce a beneficial effect in older people (Garçon et al., 2016); however, a critical aspect is to ensure accessibility and use of these technologies in the older population (World Health Organization, 2015; Beard et al., 2016).

Virtual reality (VR) is one of the emerging AHT and MD in the field of aging, frailty, and disability (Lange et al., 2010; Bohil et al., 2011). VR is defined as a system based on an interactive computer-simulated 3D environment (Gorini and Riva, 2008), which incorporates mainly auditory and visual feedback, and sometimes also haptic. VR can be divided in non-immersive, semi-immersive, and fully immersive systems (Mujber et al., 2004). The non-immersive system is a desktop-based VR with low interaction (e.g., keyboard, joypad) and immersion (e.g., PC, tablet). The semi-immersive system consists of a large monitor/projector with moderate immersion and interaction (e.g., Kinect, data gloves). The immersive system is characterized by the use of tools such as a head-mounted display (HMD) or the cave automatic virtual environment (CAVE) that enables a high degree of interaction (e.g., trackers) and immersion in the virtual environment (VE). Additionally, VR can be conceptualized as a continuum between reality and virtuality, where some aspects of VE are mixed with the real environment (augmented reality) or vice-versa (augmented virtuality) (Milgram et al., 1995). The sensorimotor channels connected to the VR define the degree of immersion; the psychological consequence of immersion on perception is the sense of presence that felt through being in the VE or, alternatively, the "perceptual illusion of non-mediation" with the VE (Riva, 2008; Bohil et al., 2011). Moreover, mobile applications (e.g., tablet) with tracking systems of the user and/or visors (e.g., Google Cardboard) can be considered mobile VR that allow for different degrees of immersion and interaction with the VE (Pallavicini et al., 2015; Fang et al., 2017).

VR has several requirements for motor and cognitive neurorehabilitation interventions: repetitive practice, feedback about performance, multimodal stimulation, and controlled, secure, and ecologically valid environments (Bohil et al., 2011). It is possible to control and manipulate tailored exercises within meaningful and motivating environments using virtual environments, i.e., transformation of flow (Riva et al., 2006). For these reasons, VR has been utilized for rehabilitation in different fields and, particularly, after stroke. Accordingly, guidelines have recently included the use of VR for both motor and cognitive rehabilitation in patients who suffered a stroke (ISO, 2016b; Winstein et al., 2016). However, access to this kind of technology may be limited by the lack of accessibility in the older population, as compared to other AHT and MD (World Health Organization, 2015). For instance, VR in the context of stroke rehabilitation is facing challenges concerning end-users' interaction, such as feasibility of VR training, lack of functional relevance, patient frustration to feedback, and lack of integration of environmental factors that link to motor performance (Teo et al., 2016).

On the macroscopic level, access to AHT and MD is limited by socio-demographic and economic terms, while on the microscopic level, access is the use itself of a device. Indeed, according to the MOLD-US framework (Wildenbos et al., 2018), the use of technology among older people is hampered by different barriers: (1) cognitive (e.g., reduced working memory, spatial cognition, attention, language, and reasoning) and motivational (e.g., self-efficacy, self-confidence, benefits identification, computer literacy, integration in daily life) that affect the use with errors, efficiency, learnability, memorability and satisfaction; (2) physical (e.g., motor speed, flexibility, hand-eye coordination, strength) and perception (e.g., vision, auditory, haptic) that influence errors and efficiency. According to Nielsen (Nielsen, 2012), usability is defined by learnability (is it easy to accomplish the task?), efficiency (once learned, is the user fast in performing the task?), memorability (is the user able to reestablish proficiency with the design after a period of stop?), errors (how many errors does the user make?) and satisfaction (how pleasant is the design?). Along with usability (i.e., easiness and pleasure), the technology should provide the attributes needed by the user (i.e., utility). Usability can be assessed by a means of a wide range of methods, such as the system usability scale (SUS), heuristic evaluation, cognitive and pluralistic walkthrough, formal usability, pluralistic, consistency, and standard inspections (Brooke, 1986; Nielsen, 1994).

Nevertheless, usability tends to focus more on the task rather than on the experience (Vermeeren et al., 2010). Indeed, researchers investigating user experience (UX) point out a role of factors that go beyond the technology and its usability/usefulness. UX facets embrace emotion and affective reactions toward the technology and experiential, hedonic, holistic, and aesthetic factors. The interaction with a technology is "a subjective, situated, complex, and dynamic encounter" (Hassenzahl and Tractinsky, 2006). If it is true that satisfaction plays a critical role in usability, UX takes into account emotions, motivation, and expectation of human-computer interaction (Vermeeren et al., 2010). For instance, the user experience questionnaire (UEQ) aims at evaluating six factors: attractiveness, perspicuity, efficiency, dependability, stimulation, and novelty (Laugwitz et al., 2008), or the usability metric for user experience (UMUX) taps UX facets of usability (Finstad, 2010). Additionally, 96 UX methods (http://www. allaboutux.org/all-methods) have been identified in the UX research field (Vermeeren et al., 2010). UX methods range from qualitative to quantitative techniques, target technology, period of assessment (e.g., developmental, conceptual), time, information source (e.g., experts, specific users, individual, group), and location (e.g., lab, online, field). Methods range from semantic differential, checklists, heuristics, think-aloud, psychophysiological measures, self-report, questionnaires, in situ observation, and video analysis (Vermeeren et al., 2010). A critical aspect of UX is the prototype development (Novak, 2008), which follows the concept (idea) and pre-production (demo) phases and precedes production & localization (development), Alpha, Beta, and Gold/post-production phases.

A wide range of theories have been proposed to understand and explain user acceptance and use of technology (for a literature review see Taherdoost, 2018). The most inclusive model is the unified theory of acceptance and use of technology model (UTAUTM) (Venkatesh et al., 2003), which includes the technology acceptance model (TAM), theory of reasoned action, theory of planned behavior (TPB), combined TAM and TPB, model of PC utilization, the diffusion of innovation model, motivational model, and social cognitive theory. In this model, the significant factors are: effort, expectancy, performance expectancy, social influence, and facilitating conditions. Interestingly, starting from the TPB (Fishbein and Ajzen, 1975), TAM (Davis et al., 1989), and UTAUTM (Chen and Shou, 2014), developed the senior technology acceptance model (STAM). Controlling age, gender, educational level, and economic status, their model included gerontotechnology self-efficacy and anxiety, facilitating health conditions, cognitive abilities, social relationships, attitude to life and satisfaction, and physical functioning as factors that influenced perceived usefulness, usage behavior, and perceived ease of use, which in turn affects general attitude toward the use. A similar model (senior citizens' acceptance of information systems; SCAIS) was developed by Phang et al. (2006). This model takes into account preference for human contact, self-actualization, resource saving, anxiety, computer support, physiological decline which influences perceived usefulness, ease of use, internet safe perception and in turn, intention. Another theoretical framework used to approach technology use and acceptance is the user-centered design (UCD). UCD enables technology systems to be made more usable and interactive to end-users, but it can also be applied to assess needs, wants, and limitations of general products (Sebe, 2010; ISO, 2016a; Brox et al., 2017). UCD can be investigated using a variety of qualitative and quantitative methods such as field studies, user requirements analyses, iterative design, usability evaluation, task analyses, focus groups, user interviews, participatory design, and prototypes (Vredenburg et al., 2002). UX can be explored with the playability model (i.e., immersion, socialization, emotion, satisfaction, effectiveness) that is crucial when building games for clinical purposes (Sánchez et al., 2012; Valladares-Rodriguez et al., 2019); emotive design for VR should be followed for designing human-computer interaction systems (see Vredenburg et al., 2002).

Lastly, human-computer interfaces are also conceptualized in terms of architecture and layers needed to provide a service (Tsai et al., 2012; Nikitina et al., 2018). For instance, the user remote console (URC) is a framework used for telemedicine systems to define abstract user interface layers, hubs, and devices. If a researcher wishes to consider a VR AHT or MD for healthcare purposes, in addition to the usability and UX aspects, they may want to assess the sense of presence in the VE. According to the Inner Presence theory (Lee, 2004; Riva and Waterworth, 2014), presence is not necessarily related to media characteristics (e.g., graphic realism) but rather to an everyday life flow that controls actions through a constant intentions-perceptions comparison. In this sense, a VR user may experience the system as usable, as they are able to enact actions thanks to an easy-to-learn interface that tracks user's movements, an understandable game/training structure, and engaging storytelling (Triberti and Riva, 2016). These elements are particularly relevant for videogames and serious games used also for therapeutic purposes (Sáenz-de-Urturi et al., 2015). This conceptualization of presence has relevant consequences when taking clinical practice and change into consideration. VR clinical applications should exploit the transformation of flow (transformative and optimal experience allowed by the sense of presence) to discover and use new and unexpected resources to foster clinical change (Riva et al., 2006, 2016) and consider sensorimotor and cognitive impairments in the old population to customize VR for cognitive (Tuena et al., 2019) or physical (Pedroli et al., 2018) rehabilitation.

This paper aims at systematically reviewing the studies that evaluated feasibility, usability, and UX of assessment and treatment VR systems in healthy aging and age-related clinical conditions. In order to provide an overview of the current research status we analyzed characteristics of participants involved, technological apparatus and use, usability/UX assessments, theoretical framework, and primary outcomes. VR use is classified as the task being accomplished and the training sessions and the aims, which include assessment and rehabilitation. Additionally, we outlined suggestions to assess usability of VR applications for older people in clinical and research contexts.

## METHODS

Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) guidelines were followed (Moher et al., 2009).

## Search Strategy

Three high-profile databases (PubMed, PsycINFO, and Web of Science) were used to perform the computer-based research on 3 September 2019. The string used to carry out the search

(Title/Abstract for PubMed, Topic for Web of Science, Abstract for PsycINFO) was as follows: ("aging" OR "frailty" OR "elder<sup>∗</sup> " OR "multimorbidity") AND ("usability" OR "user experience" OR "UX" OR "user centered design" OR "human centered design" OR "human computer interaction") AND ("virtual"). The search resulted in 507 articles for Web of Science, 22 for PubMed, and 20 for PsycINFO (total of 529). We made a first selection by reading titles and abstracts after removing duplicates. A total of 66 manuscripts were chosen for full-text screening. This procedure resulted in 25 experimental studies. See the flow diagram (**Figure 1**) for the paper selection procedure.

## Selection Criteria

Studies concerning the usability, UX, and feasibility of VR (see introduction for definition) systems for assessment/monitoring and rehabilitation/empowerment in healthy and pathological aging were included. In particular, we focused on the age-related clinical conditions in older people. We excluded articles that did not involve usability of VR clinical systems in non-agerelated conditions that do not fall into the context of frailty, multimorbidity, or chronicity in aging and with technologies that do not meet VR definition. Additionally, studies for which the full text was not available or for which the abstract lacked basic information for review were removed. Non-English papers, reviews, meeting abstracts, conference proceedings, notes, case reports, letters to the editor, research protocols, patents, editorials, and other editorial materials were also excluded.

## Quality Assessment and Data Abstraction

PRISMA guidelines were strictly followed; search results found by the first author (CT) were shared with the review author (MC) for individual selection of papers in order to reduce the risk of bias, and disagreements were resolved through consensus. The risk of bias for each single study was assessed following the Cochrane guidelines (Higgins et al., 2011) by CT and MC. The research question was formulated according to suggested PICO (Population: older people with age ≥ 65, Intervention: VR for assessment or rehabilitation in age-related conditions and diseases, Comparison: N/A as usability at this time adopt quasi-experimental or pilot study designs (see also risk of bias **Supplementary Figure 1**), Outcome: measures of usability and acceptance) research question guidelines (Abigail et al., 2014). The Comparison is mainly applied to randomized clinical trials and within our search only one study (Schwenk et al., 2014) satisfied this criterion. Consequently, data extracted from each included study were as follows: reference, year, sample (s), aims, technology, VR training, technology design framework, usability/UX/feasibility assessment tools, primary outcomes, and type (assessment/rehabilitative) of VR system.

## RESULTS

Our search identified several usability, user experience (UX), and feasibility studies in healthy aging and age-related clinical

#### TABLE 1 | Summary of the studies included.


*(Continued)* Virtual Reality

Usability in Aging

Tuena et al.

#### TABLE 1 | Continued


*(Continued)* Virtual Reality

Usability in Aging

Tuena et al.

#### TABLE 1 | Continued


*(Continued)* Virtual Reality

Usability in Aging


conditions. A critical aspect of virtual reality (VR) and new technologies is their interaction with humans and in particular, those whose physical, psychological, or social barriers hamper the use of technological devices. The aim of this systematic review was to analyze the current research in the field of usability of clinical VR systems in older people and to provide an overview on this topic. Findings are shown in **Table 1** according to reference, year, sample(s), aims of the study, VR technology, VR training, theoretical framework, usability assessment, primary outcomes, and clinical aims. **Figures 2**–**8** summarize the results as well.

## Which Are the Samples Involved in VR Usability Studies?

The majority of the studies (Kizony et al., 2006; Tsai et al., 2012; Castilla et al., 2013; Corno et al., 2014; Wüest et al., 2014; Im et al., 2015; Morán et al., 2015; Cook and Winkler, 2016; Trombetta et al., 2017; Vanbellingen et al., 2017; Plechatá et al., 2019; Rebsamen et al., 2019) recruited healthy older adults (OA) to assess the usability of clinical VR systems. Two studies collected data from the fifth decade to old age (Kiselev et al., 2015; Money et al., 2019). Nevertheless, in these studies, systems were created for clinical conditions such as stroke (Wüest et al., 2014; Morán et al., 2015; Trombetta et al., 2017; Vanbellingen et al., 2017) or movement disorders (e.g., balance, physical frailty; Pedroli et al., 2018; Money et al., 2019). Indeed, only two studies recruited stroke patients for stroke VR systems (Kizony et al., 2006; Fordell et al., 2011). Sáenz-de-Urturi et al. (2015) and Pedroli et al. (2018) recruited OA and, among these individuals, some had mild or moderate cognitive impairment (O'Bryant et al., 2017). Patients with mixed age-related conditions (e.g., Parkinson's disease, macular degeneration, muscular dystrophy, arthritis, diabetes, hypertension) were recruited in Sáenz-de-Urturi et al. (2015) and Shubert et al. (2015). Frail patients were collected in Nikitina et al. (2018), mixed frail and physical-motor patients in Brox et al. (2017), participants at risk of falling in Schwenk et al. (2014) and Kiselev et al. (2015), mild cognitive impairment (MCI) and Alzheimer's disease (AD) individuals in Valladares-Rodriguez et al. (2019), Parkinson's disease (PD) individuals in van Beek et al. (2019), OA with atrial fibrillation (AF) in Desteghe et al. (2017), and with orthopedics impairments in Epelde et al. (2014). Experts and medical professionals were included in some pilot studies for their opinion on the design or on the VR system (Castilla et al., 2013; Epelde et al., 2014; Morán et al., 2015; Sáenz-de-Urturi et al., 2015; Brox et al., 2017; Desteghe et al., 2017; Valladares-Rodriguez et al., 2019).

## Which are the Aims and the Clinical Fields of the Studies?

All of the studies—except one that was principally devoted to the clinical efficacy of the training (Schwenk et al., 2014) were mainly designed for the usability, UX and feasibility of VR systems in aging (Tsai et al., 2012; Castilla et al., 2013; Corno et al., 2014; Epelde et al., 2014; Wüest et al., 2014; Im et al., 2015; Kiselev et al., 2015; Morán et al., 2015; Sáenz-de-Urturi et al., 2015; Shubert et al., 2015; Cook and Winkler, 2016; Brox et al., 2017; Desteghe et al., 2017; Trombetta et al., 2017;

TABLE

1


Continued

**120**

*and use of technology model; UX, user experience;*

*VR, virtual reality; vSST, virtual supermarket*

 *UEQ, user experience questionnaire;*

 *shopping task; YA, young adults.*

 *UCD, user-centered*

 *design; URC, user remote console; VBT,* 

*videogame-based*

 *training; VE, virtual environment;*

 *V-MT, virtual multitasking*

 *test;*

Vanbellingen et al., 2017; Nikitina et al., 2018; Pedroli et al., 2018; Money et al., 2019; Plechatá et al., 2019; Rebsamen et al., 2019; Valladares-Rodriguez et al., 2019; van Beek et al., 2019). Most of the studies concerned the assessment of therapeutic (i.e., rehabilitative or psychological empowerment) VR systems Tsai et al., 2012; Castilla et al., 2013; Corno et al., 2014; Epelde et al., 2014; Wüest et al., 2014; Im et al., 2015; Kiselev et al., 2015; Morán et al., 2015; Sáenz-de-Urturi et al., 2015; Shubert et al., 2015; Cook and Winkler, 2016; Brox et al., 2017; Desteghe et al., 2017; Trombetta et al., 2017; Vanbellingen et al., 2017; Nikitina et al., 2018; Pedroli et al., 2018; Money et al., 2019; Plechatá et al., 2019; Rebsamen et al., 2019; Valladares-Rodriguez et al., 2019; van Beek et al., 2019, whereas only a few were on assessment or monitoring tools (Fordell et al., 2011; Corno et al., 2014; Desteghe et al., 2017; Plechatá et al., 2019; Valladares-Rodriguez et al., 2019). The intervention/assessment of the studies included were physical-motor (e.g., limb physiotherapy, physical activity, hand motricity; Kizony et al., 2006; Epelde et al., 2014; Schwenk et al., 2014; Wüest et al., 2014; Im et al., 2015; Kiselev et al., 2015; Morán et al., 2015; Sáenz-de-Urturi et al., 2015; Shubert et al., 2015; Brox et al., 2017; Trombetta et al., 2017; Vanbellingen et al., 2017; Nikitina et al., 2018;

Pedroli et al., 2018; Money et al., 2019; van Beek et al., 2019), neuro/psychological (Fordell et al., 2011; Tsai et al., 2012; Castilla et al., 2013; Corno et al., 2014; Desteghe et al., 2017; Plechatá et al., 2019; Valladares-Rodriguez et al., 2019), cardiovascular fitness (Rebsamen et al., 2019), or non-specific healthcare applications (Cook and Winkler, 2016).

## Which Are the VR Technologies Used and the Training?

Non-immersive VR (i.e., desktop-based VR, tablet, and mobile app) were used in most of the studies (Kizony et al., 2006; Castilla et al., 2013; Wüest et al., 2014; Morán et al., 2015; Shubert et al., 2015; Cook and Winkler, 2016; Desteghe et al.,

2017; Vanbellingen et al., 2017; Nikitina et al., 2018; Money et al., 2019; Valladares-Rodriguez et al., 2019; van Beek et al., 2019). Application exclusively for tablets were used in Valladares-Rodriguez et al. (2019) and multi-device (i.e., PC, tablet or mobile) apps in Castilla et al. (2013), Desteghe et al. (2017), and Nikitina et al. (2018). Semi-immersive VR (i.e., large TV or projector screens with sensors for interaction) systems were tested in some studies (Tsai et al., 2012; Epelde et al., 2014; Schwenk et al., 2014; Im et al., 2015; Kiselev et al., 2015; Sáenzde-Urturi et al., 2015; Brox et al., 2017; Rebsamen et al., 2019),

whereas full immersive VR (i.e., visors or CAVE with interaction devices) was tested only in Fordell et al. (2011), Corno et al. (2014), and Pedroli et al. (2018). Interestingly, Trombetta et al. (2017) compared a semi-immersive vs. an immersive version of the training to evaluate their usability, whereas Plechatá et al. (2019) tested a VR memory test with non-immersive vs. immersive VR.

Sessions lasted from nine to 90 min (mean time = 30 min ca.), ranging from one to 36 sessions spread over the course of 1 single day to 4 years; indeed, usability along with effectiveness of training was tested for three (Vanbellingen et al., 2017), four (Im et al., 2015; Rebsamen et al., 2019; van Beek et al., 2019), six (Kiselev et al., 2015), eight (Nikitina et al., 2018), and 12 weeks (Wüest et al., 2014; Desteghe et al., 2017), and three (Brox et al., 2017) and 4 years (Schwenk et al., 2014). Exergames (i.e., serious games used for balance and fall risk training) were used in most of the motor training (Kizony et al., 2006; Schwenk et al., 2014; Wüest et al., 2014; Kiselev et al., 2015; Sáenz-de-Urturi et al., 2015; Shubert et al., 2015; Brox et al., 2017; Trombetta et al., 2017; Money et al., 2019; Rebsamen et al., 2019; van Beek et al., 2019), physiotherapy exercise in others (Epelde et al., 2014; Im et al., 2015), cognitive-physical dual-task in Pedroli et al. (2018), neuropsychological testing in three studies (Fordell et al., 2011; Corno et al., 2014; Plechatá et al., 2019; Valladares-Rodriguez et al., 2019), gesture therapy in Morán et al. (2015) and Vanbellingen et al. (2017), and psychosocial support/educational in four studies (Tsai et al., 2012; Castilla et al., 2013; Cook and Winkler, 2016; Desteghe et al., 2017).

## Which Are the Theories and Tools Used to Assess VR?

A key component regarding the use and acceptance of technology is understanding elements that facilitate or reduce its use in terms of human factors, not only in terms of technical ones (Wildenbos et al., 2018). An architecture structure model, the user remote console (URC), was used as a theoretical background to design the physical training in Epelde et al. (2014), whereas the majority of the studies used psychological models to develop the VR systems. Technology acceptance model (TAM) and modified versions were used in most of the studies (Tsai et al., 2012; Wüest et al., 2014; Morán et al., 2015; Cook and Winkler, 2016; Money et al., 2019; Rebsamen et al., 2019; Valladares-Rodriguez et al., 2019), transformation of flow (ToF) in Pedroli et al. (2018), UX playability in two studies (Sáenz-de-Urturi et al., 2015; Valladares-Rodriguez et al., 2019), and user-centered design model (UCD) in Kiselev et al. (2015) and Brox et al. (2017). Importantly, some studies adopted technological theoretical frameworks adapted for older people, such as the senior UCD or the senior citizens' acceptance of information systems (SCAIS) (Brox et al., 2017; Nikitina et al., 2018). However, several studies did not report a theoretical model to design their systems (Kizony et al., 2006; Castilla et al., 2013; Corno et al., 2014; Schwenk et al., 2014; Im et al., 2015; Shubert et al., 2015; Desteghe et al., 2017; Trombetta et al., 2017; Vanbellingen et al., 2017; Plechatá et al., 2019; van Beek et al., 2019). Concerning the assessment of usability of VR systems, a wide range of quantitative and qualitative methods have been

used (see **Table 1** for specific information and **Figures 5**–**7** for models and methods overviews). Concerning quantitative data, system usability scale (SUS) (Corno et al., 2014; Sáenz-de-Urturi et al., 2015; Shubert et al., 2015; Vanbellingen et al., 2017; Nikitina et al., 2018; Pedroli et al., 2018; Money et al., 2019; Rebsamen et al., 2019; van Beek et al., 2019), TAM-based questionnaires (Tsai et al., 2012; Wüest et al., 2014; Morán et al., 2015; Cook and Winkler, 2016; Rebsamen et al., 2019; Valladares-Rodriguez et al., 2019), UX questionnaires (Schwenk et al., 2014; Sáenzde-Urturi et al., 2015; Brox et al., 2017; Desteghe et al., 2017; Rebsamen et al., 2019), UCD-based questionnaire (Brox et al., 2017), flow of experience scale (Pedroli et al., 2018), other usability questionnaires (Fordell et al., 2011; Epelde et al., 2014; Trombetta et al., 2017; Plechatá et al., 2019; Valladares-Rodriguez et al., 2019), adherence or motivation to training questionnaires (Im et al., 2015; Desteghe et al., 2017; Vanbellingen et al., 2017; Nikitina et al., 2018; Rebsamen et al., 2019; van Beek et al., 2019) or with VR data (Wüest et al., 2014; Kiselev et al., 2015; Desteghe et al., 2017; Vanbellingen et al., 2017; Nikitina et al., 2018; Rebsamen et al., 2019; van Beek et al., 2019), cybersickness assessment (Corno et al., 2014; Im et al., 2015; Plechatá et al., 2019), technology expertise (Corno et al., 2014; Rebsamen et al., 2019; Valladares-Rodriguez et al., 2019), and video analysis (Morán et al., 2015) were used. Regarding qualitative data, think aloud technique (Corno et al., 2014; Wüest et al., 2014; Shubert et al., 2015; Money et al., 2019; Rebsamen et al., 2019), heuristic evaluation or cognitive walkthrough (Castilla et al., 2013; Sáenzde-Urturi et al., 2015), focus group (Castilla et al., 2013; Epelde et al., 2014; Kiselev et al., 2015; Brox et al., 2017; Desteghe et al., 2017), and semi-structured or structured usability postexperience interviews (Corno et al., 2014; Kiselev et al., 2015; Shubert et al., 2015; Brox et al., 2017; Vanbellingen et al., 2017; Pedroli et al., 2018; Money et al., 2019; van Beek et al., 2019) were used. The sense of presence was assessed only in three studies (Kizony et al., 2006; Nikitina et al., 2018; Pedroli et al., 2018).

Concerning the tools used, a variety of quantitative and qualitative methods are reported. However, it is important to remember that each of these instruments assess different aspects of usability and acceptance; some are more concerned about the task to perform (e.g., SUS) while others tap the emotional/motivational elements of the interaction (e.g., UX questionnaires) or the factors that hamper/facilitate the use of a technology (e.g., TAM-based tools). Qualitative tools are able to grasp different perspectives (individual or group) of the experience or the design by asking experts in the sector or the end-user itself. A multidimensional approach emerged in our search and should be preferred when selecting assessment tools.

## Are VR Clinical Systems for the Older People Usable?

In this section we outlined the findings of the included studies, reporting their strengths and weaknesses. **Figure 8** shows mean and standard deviation for the available SUS scores, which display moderate to acceptable usability despite some cases of wide variation.

Cook and Winkler (2016) showed that OA find virtual environments (VE) from Second Life (SL) as feasible and applicable for healthcare purposes, especially for improving social interactions. Despite a high number of drop-outs, participants liked the realism and virtual experience (e.g., sports, changing avatar, teleporting, shopping) but bugs frustrated them and they found it hard to control the avatar and to learn SL. According to users, SL might be improved by clear training (i.e., individualized, small group), step-by-step teaching, by enlarging the screen, and facilitating the interaction. The exergame Falls Sensei was rated as engaging and usable for educating OA about risk fall (Money et al., 2019). Falls Sensei was rated as having a good usability (score SUS > 70, Bangor et al., 2009), especially by older users. Unified theory of acceptance and use of technology (UTAUT) thematic analysis on interviews (i.e., performance expectancy, effort, social influence) showed that users rated the training as a useful, positive experience, relevant for specific populations. Similarly, the Positive Bike (Pedroli et al., 2018) was rated as having good usability (mean SUS = 76.88, SD =17). Problems were found concerning the size of items on the screen and low realism or interaction users felt in the VE, but still had a positive experience and found the system useful. Stand Tall (ST) (Shubert et al., 2015) was rated by participants as having a nearly good usability (mean SUS = 65.5, SD = 21.2) and agreed in using ST to improve balance autonomously and accepted the Kinect sensor and the avatar. Senso system (Rebsamen et al., 2019) had high adherence, usability (mean SUS = 93.5, SD = 5.52), enjoyment, usefulness, and acceptability, also confirmed by think aloud technique. Similarly, van Beek et al. (2019) found optimal adherence and motivation toward their VR training. Despite some interaction issues with LMC and difficulty of the exercises, the system had marginal usability (mean SUS = 58.25, SD = 17.9) and was also rated positively at the interviews. Lineage was evaluated with high satisfaction by its users (Sáenzde-Urturi et al., 2015). Gaming experience was positive, exercise adequate, and participants stated that they would use the game again. SUS improved across the three sessions (first mean SUS = 73.84, SD = 4.72; third mean SUS = 86.25, SD = 3.06). Acceptable usability was reported by OA and stroke patients for the TheraGame (mean SUS = 73.8, SD = 14.5) that also found the VR training adequate and enjoyable (Kizony et al., 2006). Good usability (first session mean SUS = 75.4, SD =13.8) was found by Vanbellingen et al. (2017) in their upper limb video game with a leap motion controller; however, usability did not change across the nine sessions. The training had a compliance of 87.4% and the adherence was rated as very good and remained stable across time. Users expressed that a 30 min session is the best time to not overload arm fatigue. Optimal (100%) adherence and good acceptance (e.g., ease, usefulness, intention to use) were found by Wüest et al. (2014). Nikitina et al. (2018) found that usability of the virtual gym App did not differ between groups with social interaction (mean SUS = 63, SD = 9) or interaction with coach only (mean SUS = 66, SD = 14). Moreover, the participants positively accepted the app, with high co-presence for the interaction group (interactions occurred especially with private messages), but adherence was similar for individual vs. group exercises with social support

predicating adherence when social connections are low. Despite Corno et al. (2014) finding that virtual-multitasking test (V-MT) induces cybersickness symptoms, it was rated as usable (mean SUS = 69.17, SD = 8.2), the head-mounted display (HMD) was comfortable, interaction with the wand was difficult, instructions hard to remember, and realism sufficient. Similar results on HMD were found by Plechatá et al. (2019). HMD lead to the worst memory performances compared to non-immersive VR in OA, with users preferring neither desktop-based VR nor immersive VR, whereas young users liked immersive versions of the virtual supermarket shopping task (vSST). However, authors suggest non-immersive scenarios for OA. Fordell et al. assessed VR-DiSTRO, an immersive VR version of "paper and pencil" neglect neuropsychological battery, and showed that stroke patients tolerated and were engaged during the assessment, which was much faster than the classic evaluation (Fordell et al., 2011).

In order to design Game Up exergames and a senior-UCD model (Brox et al., 2017), it is crucial to involve older people and experts to create safe, fun, and usable games. Three-point Likert scale short questionnaires are suggested for end evaluations, whereas in the requirement, design, and implementation phases, interviews, observations, and group discussions are preferred for senior UCD. Similarly, in order to develop the Butler app (Castilla et al., 2013) it is important to gather information from end-users and experts from the first stage of the development and to create prototypes of the app. Graphics and navigations systems must be adequate and understandable for older people in order to reduce mental load. In the same way, the Health Buddies app (Desteghe et al., 2017) was initially designed with the end-users (AF patients and grandchildren). Participants, especially patients, were motivated to use the app but its usage decreased across 90 days. Despite adherence improving only in one patient, the UX with the app was easy to use and educational, and 60% of patients would use the app again. Experts and end-users of a joint rehabilitation virtual therapy were also involved in the evaluation phase in Epelde et al. (2014). Medical professionals and patients positively accepted the virtual therapist and training but patients stated that the avatar was too serious and lacked empathy. A team of experts developed an augmented reality exergame (Im et al., 2015), which did not have any side effects (e.g., cybersickness) and led to high adherence to the training.

The Interactive Trainer (Kiselev et al., 2015), despite some technical problems being reported, was evaluated according to interviews as easy to use, challenging, and motivating. Schwenk et al. (2014) assessed the gaming UX of a exergame with sensors, which was found to be effective, fun, easy to learn thanks to feedback, adequate, and well-designed. Interestingly, Valladares-Rodriguez et al. (2019) aimed at assessing UX and player eXperience (PX) of Panoramix neuropsychological touchscreen battery in OA, mild cognitive impairment (MCI), and Alzheimer's disease individuals. They found that Panoramix perception and acceptance were positive after the pilot study in the groups but was judged as more playable by OA, MCI, and AD in this order; nevertheless, PX improved after the second interaction in all groups. Additionally, administrators also evaluated the battery as playable, usable, useful, and with a good interface. Morán et al. (2015) used a TAM-based questionnaire and video analysis to assess usability. Users rated the VR gesture therapy (GT) as useful, easy, and with high UX and found that even technological expertise did not affect task performance. By analyzing verbal and non-verbal reactions, raters judged the system as more usable and fun for non-expert participants. Conversely, anxiety was low for expert users. Authors defined two approach strategies according to expertise, explore-andlearn and score-and-complete, respectively, for inexperienced and experienced participants that guided behaviors (e.g., anxiety, interaction strategies with the games) and reactions through the experience.

A comparison of semi vs. full immersive versions of Motion Rehab AVE 3D was done by Trombetta et al. (2017). Training was feasible for users and participants evaluated as important for usability feedback, third-person perspective, comfort (semiimmersion version), and immersion (full immersion). Authors suggested that, for post-stroke rehabilitation, semi-immersive systems are more comfortable than full-immersive VR. Tsai et al. (2012) showed that Sharetouch is a well-designed, easy, and usable system, independent of gender or age, and facilitates social interactions in OA. Importantly, significant effects of the rehabilitative training on different motor/physical measures were found in all the studies that tested efficacy and usability (Schwenk et al., 2014; Wüest et al., 2014; Im et al., 2015; Vanbellingen et al., 2017; Rebsamen et al., 2019; van Beek et al., 2019). However, risk of bias (see **Supplementary Figure 1**) is high for most of the categories (randomization, allocation, blinding, missing data, and reporting bias), as the majority of the research is quasi or non-experimental. Of note, the risk of incomplete data outcome was low.

In general, despite some technical weaknesses (e.g., realism, bugs), interaction constraints and physical/psychological barriers to technology use, the included VR studies showed that with adequate usability design methods, it is possible to develop effective and usable systems for clinical purposes in aging.

## DISCUSSION

In the present paper we reviewed the current research on usability, user experience (UX), and feasibility of virtual reality (VR) clinical systems in older people.

Our work can be summarized in the following points: (1) most of the usability pilots involved healthy or heterogeneous diseased older people; (2) usability mainly concerned VR physiotherapy training; (3) most of the studies involved nonimmersive scenarios; (4) quantitative (e.g., SUS) and qualitative (e.g., interviews) methods are the most used and suggested approach in usability piloting and technology acceptance model (TAM) is the main theoretical framework; (5) despite some interaction issues, VR systems are rated as having good usability by end-users.

Usability is a critical and complex task when specific endusers with particular needs are involved. Conditions that hamper the interaction with the device (Wildenbos et al., 2018), and also cultural and technology background, should be taken into account (Corno et al., 2014; Nikitina et al., 2018). For instance, Tuena et al. found that executive functions are overloaded by input device use in older people and this leads to worse memory performances (Tuena et al., 2019). Design guidelines should be used to avoid basic sensorimotor and interaction issues (see Phiriyapokanon, 2011; Loureiro and Rodrigues, 2014).

If, on the one hand, the studies included collected data from the target population (e.g., Parkinson's disease patients tested usability for Parkinson's disease rehabilitation), several others assessed usability with healthy older people or mixed-pathologies patients (e.g., Wüest et al., 2014; Sáenz-de-Urturi et al., 2015; Shubert et al., 2015; Trombetta et al., 2017; Vanbellingen et al., 2017); in this sense, diagnostic criteria were not clear or endusers characteristic do not match potential technology barriers of end-users. Future research should use strict inclusion/exclusion criteria according to diagnostic criteria of the diseases or syndromes. Moreover, in the context of healthcare, the end-users are also the medical professionals that use the technology with the patients. Usability should be assessed via questionnaires or interviews in the design and test phases (e.g., Castilla et al., 2013; Valladares-Rodriguez et al., 2019). Finally, despite some studies reporting the number of participants as a limitation (Corno et al., 2014; Desteghe et al., 2017; Vanbellingen et al., 2017; van Beek et al., 2019), a number of 5–10 individuals is sensible enough to identify a minimum of 80% circa of usability issues (Wüest et al., 2014; Brox et al., 2017).

The uses of VR systems in our review were mainly focused on motor rehabilitation. In healthcare, VR is mainly applied for the assessment and rehabilitation of sensorimotor, physical, and psychological deficits via non-immersive to immersive technologies (Lange et al., 2010; Bohil et al., 2011; García-Betances et al., 2015; Muratore et al., 2019; Tuena et al., 2019). We also encourage the use of pilot studies in other domains where VR is used for clinical purposes. For instance, it is important to evaluate usability of assessment tools (e.g., Pedroli et al., 2015; Desteghe et al., 2017). Mean usability session testing lasted 30 min; nevertheless, depending on the aims of the studies (e.g., memorability), longitudinal usability studies can be done as usability might improve after some sessions (Valladares-Rodriguez et al., 2019). Lastly, future research should focus more on immersive technology as technical development will lead to new forms of immersive VR and costs will be reduced. It is important to also assess these systems because they might lead to reduced cybersickness compared to desktop-based VR (Lange et al., 2010; Bohil et al., 2011; Plechatá et al., 2019).

Several studies (see **Table 1**) did not report a model on which usability and acceptance of a technology can be assumed. TAM-based and UX-based are useful for investigating and understanding psychological factors, whereas architecture design and user remote control (URC) are more useful for technical development. Indeed, usability, and in particular UX, are devoted not only to the ease of use and the technical bugs but also to the psychological domains (e.g., emotions, motivations; Vermeeren et al., 2010). However, as researchers in the context of aging face specific needs and barriers, adapted models with relevant variables should be used as the senior user-centered design (UCD) by Brox et al. (2017) or the senior citizens' acceptance of information systems(SCAIS) by Phang et al. (2006). Surprisingly, none of the authors used the senior technology acceptance model (STAM) by Chen and Shou (2014), which could be more suitable than TAM models not adapted to older people. Interestingly clinical researchers interested in technology usability, sense of presence, and clinical change may want to use the transformation of flow (ToF) theory, as presence and flow experiences might facilitate clinical change by means of VR (Riva et al., 2006).

Usability assessment (see **Table 1** and **Figure 7**) tools should include a mix of quantitative methods (e.g., SUS, TAMbased questionnaires, UX-based questionnaires) and qualitative techniques (e.g., experience interviews, think aloud, heuristic evaluation). The systematic review on telemedicine systems by Klaassen et al. (2016) recommend SUS, TAM2, and PSSUQ and state that questionnaires along with interviews, which are both low-cost and flexible methods, can be used from early to final phases of usability. Indeed, questionnaires give useful quantitative data that, however, still need qualitative information to tap individual sources of variation. Therefore, a mixed approach composed of quantitative and qualitative tools is the preferred way to carry out complete, interpretable, and useful usability studies in older people. Additionally, we encourage a critical adoption of assessment tools according to the aims of the study, thus considering the aspects (e.g., individual, group, task, emotions/motivation, acceptance, adherence) to be engaged during the VR interaction.

Additionally, innovative quantitative techniques could be useful to track unexpected information about psychophysiological (e.g., eye-tracking, heart-rate, galvanic skin response, non-verbal communication) responses of the users to assess their affective and cognitive reactions to the VR system (Morán et al., 2015; Sáenz-de-Urturi et al., 2015). VR can also be used for evaluating usability and adherence (good >80%) by using time spent, number of log-ins, or interaction modality, giving additional quantitative data (Cipresso, 2015; Rebsamen et al., 2019; van Beek et al., 2019). Importantly, when testing immersive VR, cybersickness should always be assessed because it may negatively influence clinical practice and its reduction is a key objective of pilot studies (Kober et al., 2013; Corno et al., 2014; Tuena et al., 2017; Plechatá et al., 2019) and virtual embodiment with questionnaires if avatars are used (Kilteni et al., 2012; Gonzalez-Franco and Tabitha, 2018). Finally, in the early design phases, information from end-users (e.g., patients, medical professionals) could be gathered from group interviews or focus groups, where ideas from experts' opinions and needs can be used to guide VR development (Castilla et al., 2013; Brox et al., 2017). For instance, Brox et al. (2017) developed a senior-UCD with a mixed use of quantitative and qualitative methods to design a semi-immersive exergame for older people, through iteration from the early phases to the prototype. Researchers should be aware that step-by-step UCD (e.g., prototype development) and pretesting are critical for clinical VR settings (Novak, 2008; Im et al., 2015). However, we know that time is a limitation to some research projects and, in some occasions, there is no time for longitudinal and proper VR design. When this is not possible, we strongly encourage the use of qualitative and quantitative evaluation of the VR experience. In the same manner, it would be better to assess usability and acceptance separately from efficacy of a VR system, as quality of patients' healthcare services is intertwined with usability, acceptance, and adherence (Middleton et al., 2013).

Despite some technical and interaction issues (e.g., bugs, interaction difficulties, realism, sensors application), the included studies showed that usability of a wide range of VR clinical systems is good, well-accepted, adequate, effective, and useful. Skepticism of older people and digital divide are walls that could be successfully broken after the use of VR devices (Desteghe et al., 2017) and comfort of immersive VR can be improved by replacing visors with CAVE, although non-HMD systems are considered better for older people (Corno et al., 2014; Pedroli et al., 2018; Plechatá et al., 2019). Nevertheless, a recent study shows that OA positively accept and tolerate HMD VR (Huygelier et al., 2019). Indeed, Fordell et al. showed that stroke patients enjoyed the immersive VR assessment (Fordell et al., 2011). However, a future objective in the field is to make sensors application and use easier for this population, as home-based training, where no professional is present to provide assistance, is rising in popularity in VR clinical practice (Schwenk et al., 2014). Moreover, online assistance could be useful to help patients with set-up and exercises (Im et al., 2015; Nikitina et al., 2018). Morán et al. (2015) provided some guidelines concerning the feedback the VR training should give to older users:


Additionally, Teo et al. (2016) provide specific suggestions in their review for VR training in individuals with strokerelated impairments, such as flexible activity according to patients' objectives, possibility to adapt online the task by the therapist according to patient's needs, multiplayer services, and automated recording of patient tracking. Moreover, Teo et al. (2016) show that VR can be enriched with neurophysiological tools (e.g., EEG, fNIRS) that the researcher or the clinician can use to adapt the task according to individual effort or needs.

Finally, it is worth mentioning some solutions provided by the Cochrane guidelines (Higgins et al., 2011) that avoid risk of bias in usability experiments. Despite blinding procedures in cognitive/motor rehabilitation trials (VR vs. treatment as usual) being a hard task to fulfill, still randomization, attrition bias, and reporting bias can be improved, respectively, with random number generators, shuffling cards, or throwing dice, with adequate missing data manipulation (e.g., balanced observation, imputation) and via adequate hypotheses and primary/secondary outcomes specification in the introduction and then in the discussion and adequate analyses in the result section.

The present review outlined current VR usability piloting issues and strengths in healthy aging and age-related clinical



*PSSUQ, post-study system usability questionnaire; PX, player eXperience; SCAIS, senior citizens' acceptance of information systems; STAM, senior technology acceptance model; SUS, system usability scale; UCD, user-centered design; UTAUTM, unified theory of acceptance and use of technology model; UX, user eXperience; VR, virtual reality.*

conditions. In the following paragraph, we will provide suggestions for researchers who wish to run usability testing in the context of clinical application of VR systems for older patients.

## VR-USABILITY SUGGESTIONS FOR THE OLDER PEOPLE (VR-USOP)

In the present paragraph we presented some suggestions we derived from findings of the systematic review. VR-USOP will be mainly focused on human-interaction factors rather than on technical aspects of developing VR clinical systems. **Table 2** summarizes some suggestions in four steps to follow if researchers and clinicians wish to design and test their VR clinical apparatus to older end-users.

The assessment of potential barriers and facilitators of the end-users, which can also include the medical professionals and technology acceptance models, is the first step. In our opinion, this is crucial as it allows the identification and the development of adequate characteristics of VR interaction and task (step 2). The latter aspects will be provided also by adopting architecture design, senior-UCD, and guidelines and prototyping, thus allowing the definition of usability assessment. In addition, we encourage ameliorating the methodology (risk of bias, see **Supplementary Figure 1**; i.e., randomization, allocation, blinding, manipulation of missing data, and reporting bias) to overcome the limitations of the available studies analyzed in the present review. VR usability and acceptance assessment should be defined and developed in accordance to the aims of the study (step 3). We suggest a mixed-approach with quantitative and qualitative methods (mainly focused on psychological experience of usability) and additional aspects to consider (see **Table 2**). Lastly, we suggest ensuring usability before clinical testing (step 4).

## CONCLUSIONS

This systematic review aimed at describing an overview of state of the art VR clinical systems for older people in relation to usability and providing researchers with suggestions based on the results of the review. Despite some limitations concerning the criteria used to recruit the samples, the low number of immersive technologies so far tested, and the high risk of bias of the studies, VR systems show good usability and acceptance among older people. A wide variety of quantitative and qualitative methods can be used to evaluate usability. We suggest adopting mixedmethodology with appropriate tools in order to grasp different aspects of the usability, acceptability, and user experience and to plan sessions according to objectives of usability. Piloting is a critical aspect of clinical studies with VR technology and we encourage future research to test usability of their applications following VR-USOP.

## REFERENCES


## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## AUTHOR CONTRIBUTIONS

CT wrote the first draft of the manuscript. MS-B supervised and wrote the following drafts of the manuscript. MS-B, FG, CT, EP, AGal, and PT defined the methodology and objectives of the manuscript. MC assessed risk of bias and made the second search strategy. AGag gave framework for the VR-USOP. KG provided clinical expertise and support. GR and FL revised the manuscript. All authors contributed to the revision and final approval of the manuscript.

## FUNDING

This work was funded by the Italian Ministry of Health IRCCS Network on Aging Research roadmap on aging and age-related diseases RRC-2018-2365820.

## ACKNOWLEDGMENTS

CT wishes to thank all the authors for providing useful ideas and supervision during the conceptualization, writing, and revision of the manuscript.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum. 2020.00093/full#supplementary-material

Supplementary Figure 1 | Risk of bias assessment.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Tuena, Pedroli, Trimarchi, Gallucci, Chiappini, Goulene, Gaggioli, Riva, Lattanzio, Giunco and Stramba-Badiale. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Agency and Performance of Reach-to-Grasp With Modified Control of a Virtual Hand: Implications for Rehabilitation

#### Raviraj Nataraj1,2 \*, Sean Sanford1,2, Aniket Shah1,2 and Mingxiao Liu1,2

<sup>1</sup> Movement Control Rehabilitation (MOCORE) Laboratory, Stevens Institute of Technology, Hoboken, NJ, United States, <sup>2</sup> Department of Biomedical Engineering, Stevens Institute of Technology, Hoboken, NJ, United States

#### Edited by:

Pietro Cipresso, Italian Auxological Institute (IRCCS), Italy

#### Reviewed by:

Joon-Ho Shin, National Rehabilitation Center, South Korea Maria Vittoria Bulgheroni, Ab.Acus, Italy

> \*Correspondence: Raviraj Nataraj rnataraj@stevens.edu

#### Specialty section:

This article was submitted to Motor Neuroscience, a section of the journal Frontiers in Human Neuroscience

Received: 25 November 2019 Accepted: 19 March 2020 Published: 23 April 2020

#### Citation:

Nataraj R, Sanford S, Shah A and Liu M (2020) Agency and Performance of Reach-to-Grasp With Modified Control of a Virtual Hand: Implications for Rehabilitation. Front. Hum. Neurosci. 14:126. doi: 10.3389/fnhum.2020.00126 This study investigated how modified control of a virtual hand executing reach-to-grasp affects functional performance and agency (perception of control). The objective of this work was to demonstrate positive relationships between reaching performance and grasping agency and motivate greater consideration of agency in movement rehabilitation. We hypothesized that agency and performance have positive correlation across varying control modes of the virtual hand. In this study, each participant controlled motion of a virtual hand through motion of his or her own hand. Control of the virtual hand was modified according to a specific control mode. Each mode involved the virtual hand moving at a modified speed, having noise, or including a level of automation. These specific modes represent potential control features to adapt for a rehabilitation device such as a prosthetic arm and hand. In this study, significant changes in agency and performance were observed across the control modes. Overall, a significant positive relationship (p < 0.001) was observed between the primary performance metric of reach (tracking a minimum path length trajectory) and an implicit measurement of agency (intentional binding). Intentional binding was assessed through participant perceptions of time-intervals between grasp contact and a sound event. Other notable findings include improved movement efficiency (increased smoothness, reduced acceleration) during expression of higher agency and shift toward greater implicit versus explicit agency with higher control speed. Positively relating performance and agency incentivizes control adaptation of powered movement devices, such as prostheses or exoskeletons, to maximize both user engagement and functional performance. Agency-based approaches may foster user-device integration at a cognitive level and facilitate greater clinical retention of the device. Future work should identify robust and automated methods to adapt device control for increased agency. Objectives include how virtual reality (VR) may identify optimal control of real-world devices and assessing real-time agency from neurophysiological signals.

Keywords: cognitive agency, reach to grasp, movement rehabilitation, virtual reality, visual feedback

## INTRODUCTION

fnhum-14-00126 April 21, 2020 Time: 14:38 # 2

Sense of agency during movement intuitively leads to better physical function, but it is not a primary rehabilitation target compared to increased strength or practiced skill (Shepherd, 2001; Yang et al., 2006; Timmermans et al., 2009). Powered devices such as exoskeletons (Rosen et al., 2001; Heo et al., 2012) and prosthetics (Childress, 1973; Li et al., 2010), can inject the mechanical energy to physically assist the user. However, functional performance depends on how well the person can control the device toward intended actions. The ability to control these assistive devices primarily depends on a robust command interface from which the user can reliably trigger device actions. The command interface can infer user intention from mechanical triggers such as switches (Bhadra et al., 2002; Peckham and Knutson, 2005). More "natural" interfaces involve command detection from computational processing of recorded physiological signals such as muscle electromyography (EMG) (Boostani and Moradi, 2003) or brain electroencephalography (EEG) (Wolpaw and McFarland, 2004). Despite the interface, functional control is generated from the user's ability to cognitively integrate their intention with observed device actions toward desired performance outcomes. This study investigated how modifying control of a virtual hand executing reach-to-grasp contributed to performance of functional reach and sense of grasp agency. It was hypothesized that control modes inducing higher agency would also demonstrate greater performance. To verify this relationship as broadly applicable, we investigate control modes that are diverse (changes in speed, presence of noise, addition of automation). Such positive associations should motivate greater consideration of agency in movement rehabilitation.

Sense of agency is defined as the perception of control over actions and related sensory consequences (Moore and Obhi, 2012). Since sensorimotor control of functional movements involves sequences of motor actions continually modulated by sensory feedback (Todorov, 2004), measuring agency by actionconsequence events may be especially pertinent and effective in methods to rehabilitate movement. Significant previous work has demonstrated conditions under which sense of agency is generated and modulated (Moore, 2016; Haggard, 2017, 2019). These conditions include voluntary versus involuntary movements (Haggard et al., 2002), matching actual and expected consequences (Frith et al., 2000; Blakemore et al., 2002), and the effects of external cues (Moore et al., 2009). Thus, experimental conditions may be constructed to provide cues that boost agency, but it is unclear if greater agency is related to better movement performance and which conditions may precipitate both. If clear links between agency and movement performance were established, methods to adapt device control for better cognitive engagement and ability with a device may be better pursued. Greater perception of control would naturally engage the user, and user ability is inherently reflected through greater performance. Engagement and ability are vital factors for clinical retention of devicebased rehabilitation. Such approaches are especially beneficial for developing sensorimotor prostheses (Marasco et al., 2018) and powered exoskeletons (Farris et al., 2013) that restore function after neurological trauma. Individuals with brain injury, spinal cord injury (SCI), or amputation may undergo intensive therapy to improve both physical and cognitive skills in re-learning functional movements with devices.

A major advancement in rehabilitation device technology would be the creation of methods that not only optimize user-device mechanics but also cognitive engagement of the user. Systematically identifying user agency and adapting device control accordingly may produce better performing, cognitiondriven rehabilitation devices. Ultimately, clinical retention of rehabilitation devices is predicated on user perception of utility (Phillips and Zhao, 1993; Hughes et al., 2014). Methods that leverage perception metrics, such as agency, can also facilitate more usage of rehabilitation devices. Devices for rehabilitation are those that improve movement function for persons with neuromuscular dysfunction. We classify devices either providing powered movement assistance or training for independent function through robotic and computer interfaces as rehabilitation devices. In both cases, greater cognitive engagement and involvement due to user agency in controlling the device should facilitate better, and more natural, performance.

Intentional binding is an established implicit measure for agency. It indicates how coupled one perceives an intended action to an expected sensory consequence (Haggard et al., 2002; Moore and Obhi, 2012). Intentional binding refers to the perceived compression in time between a movement and its consequences during voluntary control (Haggard et al., 2002). The classical construct for intentional binding involved action of a key press to trigger the delayed onset of a sound tone. Participants would judge the time duration between key press and tone. A perceptual shift toward compression of time was shown when the key press was voluntary versus an involuntary twitch induced by transcranial magnetic stimulation. This binding effect is considered implicit since it is specific to voluntary action while passively induced actions can produce a reversal of this effect (Moore et al., 2012). Intentional binding has been used to show the influence of sensorimotor processes on agency through internal prediction and external action outcomes (Haggard et al., 2002; Moore and Haggard, 2008; Moore and Obhi, 2012; Frith and Haggard, 2018). Physical rehabilitation methods could be well served to monitor agency during the recovery and reformulation of sensorimotor pathways after neurotrauma. Intentional binding metrics for agency have already been used for human computer interaction to show the sensitivity of implicit agency to particular input modalities (Coyle et al., 2012; Limerick et al., 2014). Furthermore, it has been shown that brain machine interfaces (BMIs) can generate experiences of explicit agency in users similar to bodily movements (Evans et al., 2015). Explicit agency requires subjects to provide higher-order, conscious assessments of perception of control for given conditions (Moore et al., 2012). Given the sensitivity of both implicit and explicit agency to external cues, a variety of sensory feedback paradigms may be employed to train user-device integration centered on agency. As such, the effects of varying device control on both implicit and explicit agency should be examined.

Virtual reality (VR) is an attractive platform to develop customized methods for user-device integration and agencybased rehabilitation. For the user, VR is proven to enhance cognitive engagement in performing repetitive physical therapy movements (Sveistrup, 2004; Saleh et al., 2017). VR is readily programmable (Todorov et al., 2012) to customize visual projections of user actions and their consequences in functional task performance. Visual feedback from VR can modulate for both sense of agency (Moore and Fletcher, 2012) and control of functional movements like reaching (Desmurget and Grafton, 2000; Saunders and Knill, 2003; Nataraj et al., 2014b) and grasping (Winges et al., 2003; Nataraj et al., 2014a). Reach-tograsp is a fundamental human action and is commonly targeted for rehabilitation following neuromuscular dysfunction (Lin et al., 2007; Loureiro and Harwin, 2007) and can be assisted with powered devices triggered by user command actions (Popovic, 2003; Kotecha et al., 2014). With neurotrauma such as SCI, visual capabilities are still largely intact and can be leveraged further in VR to partially compensate loss of other senses (Ghez et al., 1995) such as touch and proprioception. For rehabilitation devices, such as prostheses and exoskeletons, VR platforms can be flexibly constructed to train complex interfaces involving direct physiological access (Kuiken et al., 2009; Marasco et al., 2018) or powered actuation of limbs (Hartigan et al., 2015). VR could be employed to match user intentions to optimal parameters for controlling a device using visual projections of device actions following user commands. Control parameters include feedback gains to maximize performance and minimize effort (Nataraj and van den Bogert, 2017) and to achieve desired movement features such as smoothness (Hogan and Sternad, 2009). Ultimately, VR platforms may be utilized to efficiently identify control parameters of rehabilitation devices that optimize not only functional mechanics but also user agency prior to eventual translation to real-world systems (Caldwell et al., 1995, 1998; Bar-Cohen, 2003; Perry et al., 2007).

In this study, a VR environment was utilized to couple reach and grasp "actions" to programmed sensory "consequences" (visual and sound events). Participants triggered movement control of the virtual hand through movement of their own hand. The visually observed movement of the virtual hand depended on the specific control mode. The control mode defined at what fixed speed the virtual hand would move proportional to the real hand and if virtual movement included noise or assisted automation. We investigated how changes in user control of a virtual hand prosthesis (Johannes et al., 2011) during reach-to-grasp may generate effects across both sense of agency and functional task performance. Visual cues informed the participant about initiating and pacing the reach, where to grasp, and when grasp action was successfully completed. The primary performance metric was reducing position error of the participant's hand to a minimal path-length trajectory at a fixed velocity. As with previous intentional binding studies (Moore and Obhi, 2012), a sound cue (beep) was used as the consequence to an intended action (grasp). Participants provided verbal estimates of lapsed time intervals between action and consequence to infer agency implicitly via intentional binding across the various control modes. The control modes of the virtual hand were consistent with parameters commonly adapted for a movement rehabilitation device, and included: setpoints for speed (Blaya and Herr, 2004; Wege et al., 2005), noise mitigation (Taylor et al., 2002; Agostini and Knaflitz, 2012), and a level of automated assistance (Ronsse et al., 2010, 2011). Speed, noise, and automation are fundamental control parameters that a device engineer can ad hoc tune based on stated user preferences or anecdotal observation of performance (Terenzi, 1998). Alternatively, these parameters can also be determined through optimization of mechanical performance (e.g., effort, tracking) for a model system (Davoodi et al., 2007; Nataraj and van den Bogert, 2017). Neither approach systematically adapts the device according to user agency. The major implication of this study is how a subjective metric of perception in control of a virtual device (hand) can be related to objective performance (reaching) with that device. In this study, the control modes were enacted as deviations from an optimal ("Baseline") mode, at which the virtual hand moved to match the actual hand movements and agency is expected to be highest.

Unlike previous studies that identified agency for movement initiation (Haggard et al., 2002), this study investigated how agency of grasp execution was modulated by the control mode of the preceding reach. In this way, it was inferred how control during reaching may facilitate or inhibit agency of the terminating action of grasp and performance of the reach itself. Previous studies have shown the direct link of agency between continuous movements and terminal events (Wen et al., 2015; Oishi et al., 2018). In this study, we prioritized and considered implicit agency by time-interval estimation as a less biased (more sub-conscious) perceptive measure. With time-interval estimation, a quantifiable measure was provided each trial that was not readily linked to a conscious preference to a control mode. The main hypotheses of this VR reach-to-grasp study were: (1) implicit grasp agency and reaching performance are positively related across a broad class of control modes typically considered for rehabilitation devices, (2) significant differences in both implicit agency and performance are observable between these control modes. While our primary hypotheses considered implicit agency, we additionally examined explicit perception of each control mode with Likert-scale survey responses. The purpose of the survey responses was to observe how implicit and explicit agency may be related through the presented control modes of this study. Another important implication of translating agency to more effective rehabilitation device control is greater performance efficiency. Thus, the secondary hypotheses of this study were: (1) there are significant shifts between implicit and explicit agency across control modes, (2) agency is positively related to performance efficiency, and (3) significant differences in efficiency exist between control modes.

## MATERIALS AND METHODS

In this experimental protocol, participants controlled a virtual hand to perform reach-to-grasp through movements of their own hand (**Figure 1**). The observed movement of the virtual hand were initially based on those of the real hand ("Baseline" case)

but modified depending on the other control modes tested. The modifications from Baseline involved fixed changes in speed, addition of noise, or inclusion of automation. Participants were asked to maximize performance (primarily moving own hand to minimize reaching path length at a target velocity) and provide verbal estimates of perceived time-intervals between grasp action and a sound consequence for implicit assessment of agency.

## Participants

A total of 16 able-bodied volunteers (12 male, 4 female, 20.9 ± 3.2 years) were recruited to participate in this study. A power analysis for one-way ANOVA at 95% suggested that eight-participant samples would show significant differences (α = 0.05) in implicit agency and reaching performance. In this power analysis, performance was for minimizing path length (see "Data and statistical analysis") across the tested control modes during a pilot study (Shah et al., 2018). Only right-handed participants were tested for right-hand reachto-grasp to avoid considering effects of hand dominance. All participants had normal or corrected-to-normal vision and did not previously report nor demonstrate a history of disease, injury or complications involving cognition or upper extremity function. All participants signed an informed consent form approved by the Stevens Institutional Review Board.

## Equipment (Hardware and Software)

A marker-based motion capture system was used to track 3-D hand motions and correspondingly control a virtual model of a prosthetic hand [MPL, Modular Prosthetic Limb (Johannes et al., 2011)]. The hand was viewed in a VR environment with advanced contact mechanics [Multi-Joint Dynamics with Contact, MuJoCo, Roboti LLC, Seattle, Washington, United States (Todorov et al., 2012)]. The motion capture system included nine infra-red cameras (Prime 17W by Optitrack, NaturalPoint Inc., Corvallis, OR, United States) to track 3-D position and orientation of three retroreflective marker clusters. The first cluster included three markers (9 mm diameter) that were Velcro-affixed in a noncolinear arrangement on a worn glove at the dorsal side of the hand (midpoint of third metacarpal). This "hand" cluster served as a reference coordinate system mapping real-time changes in position and orientation to the virtual hand. Similarly, two additional clusters with smaller markers (4 mm diameter) were placed on the nails of the index finger and thumb. These nail clusters were affixed to 3-D printed platforms that attached to the nails using double-side adhesive tape. Coordinate systems represented by these nail clusters drove position and orientation of the distal segments of the respective digits. Joint angle changes across the digits were based on real-time inverse kinematics solutions sufficiently satisfying the position and orientation constraints of all three clusters. Position constraints for the nail clusters were relative to the hand cluster and scaled for each participant hand size to match the virtual hand size. Only the thumb and index finger were tracked and animated on the virtual hand as the functional task was reach to precision grasp (Nataraj et al., 2014a), requiring focus onto smaller objects. Real-time streaming of marker data to manipulate the VR environment was done using the motion capture software (Motive by Optitrack) and API code written in MATLAB (Mathworks Inc., Natick, MA, United States) running on a Dell Workstation. All data was processed at 120 Hz.

## Protocol

#### Participant Preparation

Upon arriving to the laboratory, participants were re-informed about protocol and their right-hand size was measured. Hand size was measured as the maximum spread distance from tip

of thumb to tip of index finger. The average hand size was 15.2 ± 0.95 cm. For each participant, hand size was used to spatially calibrate motions of the index finger and thumb clusters relative to the hand cluster of the real hand to those of the virtual hand. Each participant was seated with chair height adjusted so that the reaching arm would be table-supported to initially have: the elbow at a right angle, shoulders comfortably level, and upperarm at the participant's side (**Figure 2A**). Each participant then wore a glove (**Figure 2B**) with hand marker cluster attached. A marker cluster was then added to each of the index finger and thumb nails. The participant then had placed over their head and eyes an Oculus <sup>R</sup> Rift headset (Facebook Technologies, LLC) displaying a custom virtual environment (MuJoCo) as seen in **Figure 2C**. The participant then had placed over their ears a noise canceling headset (Bose <sup>R</sup> QuietComfort 35) to minimize audible distractions and primarily only hear an occasional beep tone (sound consequence) as part of the experimental task.

#### Virtual Reality Calibration Procedures

The Oculus display filled the participant's entire field of view with the virtual environment. Participants were able to find an initial starting position for their real hand based on tactile sensation of a Velcro strip on the support table. The view within the virtual environment was initially calibrated such that the hand marker cluster position of the real hand was coincident with the same landmark position of the virtual hand. In front of the participant's virtual view was a sphere (7 cm diameter) that served as the target the participant reached toward and grasped each trial. The virtual sphere was located 20 cm above and 25 cm anterior to the initial hand cluster position. Two tracks for speed pacers were also within view. One pacer moved forward and the other vertically to inform the participant about the target hand velocity in each dimension. The tracks were semi-transparent to subtly cue the participant about speed without distracting visual focus from the virtual hand. The pacer speeds were set to traverse each dimension in 4 s.

#### Virtual Reality Task

Each trial, the participant was cued by countdown to begin performing reach-to-grasp (**Figure 3**). The countdown for a trial was represented by color transitions of the target sphere as follows: red at trial time (t) = −2 s, to yellow at t = −1 s, and to green at t = 0 sec, at which time the speed pacers, moving at constant velocity, began to move and the participant should initiate hand movement. The pacers ceased movement aftert = 4 s or earlier when the participant made premature grasp contact. Participants were told to maximize reach-to-grasp performance across three criteria: (1) minimize reaching path length, (2) match hand reaching velocity to speed pacers and complete reach-tograsp in precisely 4 s, and (3) grasp the target sphere with thumb and index finger at consistent locations. Participants were told that reaching performance was primarily evaluated in this study but to self-consider all three performance criteria to promote task consistency. Each trial lasted up to 10 s as the participant had 7 s to complete reach-to-grasp with the goal to complete in precisely 4 s. Although natural reach-to-grasp is executed nominally at 1 s (van Vliet and Sheridan, 2007), reaching time with a neural controlled robotic device can be notably slower (∼6 s) (Hochberg et al., 2012). In this study, ecological validity for reach performance and grasp agency was intended more for device control.

When the virtual hand grasped the target sphere with both the index and thumb digits, the sphere instantly changed color from green to black and the virtual environment froze in place. This color-change event cued the participant that grasp action was successfully completed. A short-duration (∼100 ms), moderatepitch beep was sounded to the participant's headset at a variable time-interval following grasp action. The participant was asked to verbally estimate the time-interval to the best of their abilities after each beep. The participant was previously instructed that the interval for each trial was anywhere from 100 to 1000 ms in denominations of 100 ms. The actual intervals were always 100, 300, 500, 700, or 900 ms. For each block of trials to test a specific control mode, the number of trials presented at a given time-interval was based on a Gaussian distribution centered at 500 ms. This approach in presenting time-intervals was modified from previous intentional binding experiments that assessed agency with a uniform distribution of intervals at 300, 500, and 700 ms (Caspar et al., 2015). Pilot data revealed that these modifications facilitated a distribution of estimates necessary to infer differences in agency across several control modes (6 in all, see next section). Greater underestimation of time-intervals indicated greater compression (shortening) of the perceived time-interval and implicitly demonstrated greater agency (Haggard et al., 2002). As with other intentional binding experiments, our implicit measure of agency served as a more sub-conscious perception of control.

#### Varying Control Modes

Each participant performed the reach-to-grasp task under six different control modes. As previously described, the control modes examined in this study considered modifications in speed, addition of mild noise, and automation. The participant was aware of each control mode being tested through visual feedback of the virtual hand in reference to their own moving hand. The test cases of control modes were as follows:


used for comparison to other cases. The second block was done to compare agency and performance to the first block and verify possible changes due to fatigue or learning across the session.


target sphere against transparent speed pacer tracks, Bottom) Sphere color changes with hand transitions across trial time t = –3 to +7 s (10 s total). Countdown occurs from t = –3 to 0 s. After countdown, hand should be in "motion" during time sphere is green.

average of the participant's real hand position (possubj) and a pre-defined optimal position (posopt) corresponding to the minimal path trajectory. The virtual hand position was given as: posVR−hand = 1 − treach 4 × posreal + treach 4 × posopt. At treach = 4 s, the virtual hand was guaranteed to be very near the sphere, but the participant must still volitionally perform grasp to complete the trial. This automated case was akin to user initiation of movement to trigger device assistance and auto-complete the movement (Lucas et al., 2004).

#### Experimental Testing Blocks

Participants would perform a block of 20 consecutive trials for each of the six control modes. The first three trials of every block were "practice" with the time-interval between grasp contact and the beep fixed at 1 s. The participant was aware these practice trials served to gain mild familiarity with the control mode and to re-calibrate their internal reference of a 1 s time-interval. The remaining 17 test trials were used for agency and performance assessment with time-intervals to be estimated ranging from 100 to 1000 ms as previously described. After each trial, the VR hand was reset to the initial position prior to the 3-s countdown to initiate movement for the second trial. Each participant was given up to 5 min between blocks to rest and complete a survey to rate their experience for that block.

#### Surveys

After each block, the participant was presented with a 1 statement survey to express their subjective perception of the control mode presented. Participants were asked to rate, on a 5-point Likert scale (−2 = strongly disagree, +2 = strongly agree), to what extent they agreed that the visualized hand motions reflected their intentions. The specific statement read "the visualized hand motions reflected your intentions." The survey responses served as an explicit, or conscious, measure of agency (Moore et al., 2012; Dewey and Knoblich, 2014) for each control mode. The single survey was presented at the end of each block to ensure subjects accommodated to a control mode prior to making a conscious subjective assessment.

## Data and Statistical Analysis

The primary performance metric evaluated across control modes was the inverse of path length error to a minimal path length trajectory occurring at constant velocity over 4 s. The total 3D minimal pathlength was 0.32 m, and for completion in 4 s, the target constant velocity is 0.08 m/s. The total error in three dimensions (3D) was computed for the position of the hand cluster from the target position trace over time. In each dimension, the target trajectory was a linear (constant velocity) position trace that directly (straight line) connects the initial hand position to a position near the sphere from which it can immediately be grasped. The time course of each target trajectory was coincident with the 4 s duration of the constantspeed pacers. Additional performance metrics evaluated in this study involved efficiency of movement. These metrics included greater smoothness (Hogan and Sternad, 2009) and lower 3D acceleration given a constant velocity target. These movement performance metrics were explicitly computed for each trial as follows:

**Pathlength**(over entire reach) →

fnhum-14-00126 April 21, 2020 Time: 14:38 # 8

$$\mathbf{P} = \sum\_{\mathbf{i}=1}^{N} \sqrt{(\mathbf{p}\mathbf{x}\_{\mathbf{i}+1} - \mathbf{p}\mathbf{x}\_{\mathbf{i}})^2 + (\mathbf{p}\mathbf{y}\_{\mathbf{i}+1} - \mathbf{p}\mathbf{y}\_{\mathbf{i}})^2 + (\mathbf{p}\mathbf{z}\_{\mathbf{i}+1} - \mathbf{p}\mathbf{z}\_{\mathbf{i}})^2}$$

where

i = time index

N = total number of time-points until grasp contact at sampling frequency (120 Hz)

px, py, pz = x, y, z position of hand marker-cluster

$$\text{Inverse Pathlength} \to \text{P}^{-1} = \frac{1}{\text{P}}$$

**Kinematics** (at each time index) →

$$\nu \mathbf{x}\_{i+1} = \frac{p \mathbf{x}\_{i+1} - p \mathbf{x}\_i}{\Delta t},\\ a \mathbf{x}\_{i+1} = \frac{\nu \mathbf{x}\_{i+1} - \nu \mathbf{x}\_i}{\Delta t},\\ j \mathbf{x}\_{i+1} = \frac{a \mathbf{x}\_{i+1} - a \mathbf{x}\_i}{\Delta t}.$$

where

vx, ax, jx = velocity, acceleration, and jerk of hand markercluster in x-dimension (repeated for y- and z- dimension) at given time index. 1t = 1/120 s. A moving mean window of 12 time points (0.1 s given sampling frequency of 120 Hz) was employed for smoothing kinematic trajectories.

**Total 3D Acceleration** (at each time index) →

$$Acc\_i = \sqrt{a x\_i^2 + a y\_i^2 + a z\_i^2}$$

**Total Smoothness** (over entire reach) → Stot = Sx + Sy + Sz

where Sx = P N i=1 jx2 i+1 (smoothness in each dimension, e.g.,

x-dimension)

Sx<sup>0</sup> = Sx <sup>D</sup> 3 vx<sup>2</sup> (unitless smoothness in each dimension)

D = total duration of reach

vx = mean velocity in x-dimension during reach

**Inverse Smoothness** →S −1 tot = 1 Stot

$$\text{Controler Efficiency} \to \text{CE} = \frac{\text{S}\_{tot}^{-1}}{Acc}$$

where

Acc = mean total 3D acceleration during reach

The following statistical analyses were performed:


A paired t-test (two-tailed) was used to assess difference in agency and performance between the Baseline test block at the start of the session versus the end of the session.

## RESULTS

This study demonstrates the effects of varying control modes of a virtual hand on agency and performance of reach-to-grasp. Results are organized as follows: preliminary considerations of agency and performance of the reach-to-grasp task, agency and performance across control modes, changes in movement efficiency (e.g., smoothness) across control modes, and path length kinematics during high agency versus low agency.

## Preliminary Considerations of Reach-to-Grasp Agency and Performance

The reach phase decreased agency of grasp compared to the grasp-only test case as shown in **Figure 4A**. No significant change in agency was observed for between the Baseline test blocks across the session (**Figure 4B**). There was a significant reduction in reaching performance (inverse of mean error to minimal path length trajectory) between the Baseline test block from start (14.7 m−<sup>1</sup> ) to end (13.1 m−<sup>1</sup> ) of the session (**Figure 4C**). Due

FIGURE 5 | Example tracking of target path length shown for one subject during "Baseline" test case. The target path length changes linearly in time ("ramp") in each of the three dimensions (3D). The primary performance metric in this study was the average total 3D tracking error during the time period of the target ramp (between t = 0 and 4 s).

to the observed reduction in Baseline performance, performance data across the session were adjusted by a linear correction factor. The correction factor was applied uniformly across sequential test blocks proportional to the reduction in Baseline performance from start to end of the session.

## Effect of Control Mode on Agency and Reaching Performance

The mean total 3D tracking error of the target minimal pathlength across time was the primary performance metric in this study. Example performance to track a minimal path length trajectory is shown in **Figure 5**. There was typically a delay in movement initiation despite a preparatory countdown cue. There was also tendency to move the virtual hand faster than the target constant velocity. This resulted in a quick overshoot of the target and completion of contact prior to completion of the target ramp trajectory.

One-way ANOVA indicated significant differences in both agency (p < 0.001) and performance (p < 0.0001) across the single factor of control modes (**Figure 6A** and **Table 1**). The highest mean value in agency and performance was observed for the Baseline control mode. The lowest mean value in agency and performance was observed for the Slow control mode. The F-stat for both agency and performance were notably greater than 1 and with notable effect size (η <sup>2</sup> > 0.30). A linear regression was applied to subject-averaged sample points for agency versus performance across all control modes tested (**Figure 6B**). The slope parameter was significantly greater than zero (p < 0.01) indicating a positive relationship between agency and performance.

Implicit measures of agency using intentional binding are shown against survey-based explicit measures of agency in **Figure 7** and **Table 2**. Significant differences (p < 0.05) in explicit agency were not observed across control modes (**Figure 7B**). Implicit and explicit agency results across subject-mode pairs were self-normalized [mean = 0, range over (−1, 1)] and plotted against each other in **Figure 7C** to suggest an inverse relationship (linear regression slope < 0, p < 0.05) in this study. The average difference in normalized explicit agency from implicit agency for each control mode is shown in **Figure 7D**. Across control modes, the normalized differences between explicit and implicit agency produced notable F-stat (9.88) and effect size (η <sup>2</sup> = 0.36). The largest differences were observed for the



TABLE 1B | Post hoc comparisons (p-values) between control modes for implicit agency.


TABLE 1C | Post hoc comparisons (p-values) between control modes for performance (minimizing reach pathlength).


All post hoc comparisons made with Bonferroni correction. Significant post hoc p-values (<0.05) bolded.

Slow and Fast mode with a shift toward explicit and implicit agency, respectively.

## Effect of Control Mode on Movement Efficiency

The mean kinematic trajectory for reach in each direction is shown for Baseline in **Figure 8**. Given the reach-to-grasp task is continuous with clear initiation and termination, the movement smoothness was computed based on minimization of integrated squared-jerk (Flash and Hogan, 1985) for each control mode. To remove dependencies on movement duration or amplitude, the squared-jerk term is made unitless (Hogan and Sternad, 2009) based on movement time and mean velocity in each direction.

Results for select metrics of movement efficiency across control modes are shown in **Figure 9** and **Table 3**. Smoothness (**Figure 9A**) is shown as the inverse of the integrated unitless squared-jerk metric summed in all three directions. The inverse operation presents higher smoothness by higher positive value. Highest smoothness was observed for the Slow control mode. However, the highest total 3-D acceleration (**Figure 9B**) was also observed for the Slow control mode. Higher acceleration indicates greater corrections were made online in tracking a constant-velocity movement target. When smoothness is normalized by total 3-D acceleration (**Figure 9C**), then the highest smoothness per unit acceleration was achieved during the Baseline and Fast control modes. Higher smoothness per unit acceleration suggests greater sensitivity of efficiency to a given correction, i.e., "correction sensitivity." Correction sensitivity is plotted against agency for data points across subjects and control modes in **Figure 9D**. A linear regression on that data indicates a positive relationship (slope > 0, p < 0.05) between correction sensitivity and agency.

## Effect of High Versus Low Agency on Path Length Kinematics

The general effects of high versus low implicit agency on path length position and velocity over the reach cycle are shown in **Figure 10**. The mean path kinematic trajectories are shown across the top (high) 50% of trials in agency versus the bottom (low) 50% of trials across all participants and control modes. High agency trials generally demonstrate shorter path length trajectories and slower path length velocities throughout the reach cycle.

**Figure 11** indicates that high agency trials produce significant (p < 0.001) reductions in the following movement features of path length: maximum path length, mean path length velocity, and maximum path length velocity. These high agency effects were desirable given the performance task was to minimize path length, ideally by following a minimum path length trajectory of 0.32 m at a constant velocity of 0.08 m/s. **Figure 11** also indicates a significant increase (p < 0.05) in movement smoothness in path length with high agency.

FIGURE 7 | Comparing mean agency from IMPLICIT time-interval estimates (in ms) versus EXPLICIT survey responses (average Likert score) for each control mode. (A) Positive implicit agency is indicated as underestimation of actual time-intervals. (B) Positive explicit agency is indicated by level of agreement that the displayed control of the virtual hand reflects participant intent. Survey Likert scores given as: -2 = Strongly Disagree,−1, Disagree, 0 = Neutral, 1 = Agree, 2 = Strongly Agree. (C) Implicit versus explicit agency across subjects and control modes after self-normalizing for mean to equal zero and range over [–1, 1]. F-stat for regression is 4.62 with p = 0.035. (D) Relative shift shift from explicit to implicit shown for each control mode.

TABLE 2A | Mean value comparisons for implicit and explicit agency across control modes.


TABLE 2B | Post hoc comparisons (p-value) between control modes for difference (shift) in normalized agency, 1agency = implicit – explicit.


All post hoc comparisons made with Bonferonni correction. Significant post hoc p-values (<0.05) bolded.

## DISCUSSION

This study demonstrated a positive relationship between agency of grasp and performance of reach-to-grasp across various control modes of the virtual hand. Implicit agency was measured through intentional binding of grasp action, and performance was primarily assessed as inverse of mean reaching error to a minimized path length trajectory. The results of this study may establish motivation for adapting user-device interfaces to co-maximize agency and performance. Of special interest are devices for movement assistance and rehabilitation, such as prostheses and exoskeletons. Clinical paradigms for motor rehabilitation that offer high value in both user engagement and functional utility have the best chances for retention and success

time of 3.49 s for all subjects prior to trajectory averaging.

FIGURE 9 | Various metrics of movement efficiency shown across control modes. (A) Smoothness values made dimensionless and inverted so higher values indicate greater smoothness. (B) Total acceleration in 3-D indicates magnitude of corrections made in tracking constant velocity target trajectory. (C) Smoothness per acceleration (correction sensitivity) computed to indicate smoothness achieved as a function of correction effort. (D) Correction sensitivity positively related to implicit agency across subjects and control modes (F-stat = 4.40, p = 0.039).

TABLE 3A | Mean value comparisons for performance efficiency metrics across control modes.


TABLE 3B | Post hoc comparisons (p-value) between control modes for smoothness.


TABLE 3C | Post hoc comparisons (p-value) between control modes for total acceleration.


TABLE 3D | Post hoc comparisons (p-value) between control modes for correction efficiency (smoothness over acceleration).


All post hoc comparisons made with Bonferonni correction. Significant post hoc p-values (<0.05) bolded.

versus low implicit agency (bottom 50%) trials. Kinematics presented as path length position (A) and velocity (B) across reach cycle%. Path length position and velocity plotted as mean +/–1 standard deviation varying across reach cycle for all participants tested.

(Wulf et al., 2010). To this end, the flexibility and accessibility of VR environments can be well leveraged to adapt rehabilitation platforms that co-maximize agency and movement performance. While implicating greater user agency over a device with higher functional performance is intuitive, the agency-performance link for movement has not been clearly established previously. This study relates agency, the autonomous sense of control, to functional performance across several control modes that can be standardly adapted for rehabilitation devices or rehabilitation training paradigms.

This study demonstrated a significant positive relationship (p < 0.001, **Figure 6**) between grasp agency and reaching performance across five distinct modes of control. This study also indicated how agency of grasp action is reduced in the presence of a preceding reach (**Figure 4**) compared to agency of movement initiation (Haggard et al., 2002). This is an important result since it suggests how agency of complex task action is modulated due to intermediate movement stages, which are further modified in this study with each control mode. The tested control modes were chosen to reflect control features (speed, noise mitigation, automaticity) commonly tuned for a movement device. The overall positive relationship between performance and agency is driven by the relatively high-agency, high-performing "Baseline" case and the relatively low-agency, low-performing "Slow" case. The "Fast" case yielded moderateagency and moderate-performance. In total, these observations suggest that control sensitivity of speed may be a key tuning parameter for a device to co-maximize both user agency and functional performance.

Given high agency and performance for "Baseline," it may be especially important to tune motion of a device to best match that of intact or restored proprioception and kinesthesia (Marasco et al., 2018). "Baseline" may also best facilitate the positive contributions of embodiment onto agency (Caspar et al., 2015). The "Slow" condition demonstrated the lowest agency, which indicates that experiencing slower device speeds relative to one's intent significantly reduced the sense of control. For "Slow," participants were required (unintentional) to move their own hands faster to compensate for the visual lags they observed for the virtual hand. While greater intentional effort can produce greater agency (Minohara et al., 2016), greater unintentional effort may reduce perceived efficacy of user control, especially if it promotes feelings of inability to initiate faster speeds (Kawabe, 2013). In this study, slower and faster speed control of the virtual hand required participants to actually reach longer and shorter, respectively. This limitation was required to employ changes in control speed while ensuring the task pathlength of the virtual hand was constant.

The remaining control mode cases were "Noisy" and "Auto," which were categorically different from the other three which can be related by speed. These cases generally produced intermediate agency and performance relative to "Baseline" and "Slow." "Noisy" may have been cognitively distracting in this study, but previous work has suggested that sensory noise (Collins et al., 2003) can improve motor function or indicate natural tremor (Allum, 1984; Riviere and Thakor, 1996) to better reflect human motion. For noise to be a cognitive or performance enhancer, there may be additional considerations

beyond the scope of this study such as identifying a custom resonant frequency for each person. "Auto" would expectantly reduce agency given its intended feature to remove control from the user. It has been shown that increased automation can reduce sense of agency during aircraft control (Berberian et al., 2012), and that intentional binding is sensitive to degrees of automaticity. Our study similarly uses intentional binding to indicate a reduction in agency with increased automation of a movement device. Bang-bang (abrupt switch between on-off states) control is an underlying principle in automating natural movement (Ben-Itzhak and Karniel, 2008) and powered movement assistance (Farris et al., 2013). To facilitate greater user agency over a rehabilitation device, the level of proportional control (Lenzi et al., 2012) must be optimized.

While agency is a measure of subjective perception, its implicit quantification through intentional binding and positive relationship to performance suggests its plausible incorporation in engineering better movement systems. For comparison to explicit agency (Moore et al., 2012), participants provided Likert-scale survey responses, but only after each trial block, as in Berberian et al. (2012). Since performance and implicit measures were taken after each trial, no conclusions between explicit agency and performance were made in this study. While implicit and explicit measures of agency may expectedly be related, they indicate agency at different levels. With implicit agency there is low-level and non-conceptual formation of being an agent, while explicit attribution of agency involves higher-order judgment (Moore et al., 2012). There has been compelling suggestion that there are separable implicit and explicit learning systems in dissociating their effects. Perruchet et al. (2006) demonstrated how, for a probabilistic learning task pairing two events, greater prediction strength was observed with implicit learning which relied more on recency effect. In our study, there appeared to be an inverse relationship between implicit and explicit measures of agency (**Figure 7C**), indicating separate levels of perceived learning. There also appears to be larger shifts toward implicit agency with the "Fast" case but toward explicit agency with "Slow." This result suggests that perceptions of probabilistic learning and conscious judgment are sensitive to speed in this study and should be considered accordingly for potential device adaptation.

We next investigated metrics for movement control efficiency (increased smoothness, decreased acceleration, change in smoothness per change in acceleration) across control modes and their dependence on agency. Since agency is affected by perception of outcome and effort, efficiency of a movement device is implicated with agency and should be considered in optimizing user-device integration. Against the hypothesis that agency produces better movement characteristics, the "Slow" case, which demonstrated the lowest agency, also exhibited the highest smoothness (**Figure 9A**). Further inspection showed how this smoothness came at a cost of higher corrective accelerations to modulate biomechanical control (Winter, 2009), otherwise minimized for a constant velocity task. When smoothness is normalized

by total accelerations, a metric for correction sensitivity was inferred. For "Slow," this sensitivity is significantly lower than "Baseline" or "Fast," just as with agency and tracking performance. Across all five control modes, there is an apparent positive relationship (p < 0.05 on linear regression slope) between correction sensitivity and agency (**Figure 9D**). Participants were not aware of considering any efficiency metrics, but control modes producing higher agency may produce performance benefits at multiple levels (execution and efficiency).

Finally, the effects of generally high (top 50%) agency were also observed on general path length kinematics (**Figure 10**) and specific pathlength characteristics (**Figure 11**). High agency generated reduced path length (primary performance task objective), reduced mean and peak path length velocity (closer to target constant velocity of 0.08 m/s<sup>2</sup> ), and smoother (more efficient) movement. While this study primarily aimed to demonstrate performance and agency modulation across control modes as motivation for device adaptation, a general positive relationship between agency and movement performance was also apparent.

## CONCLUSION

In conclusion, this study has demonstrated clear dependence between implicit agency, based on time-interval estimation, and reaching performance across varying control modes. This dependence is apparent across conditions of speed changes, inclusion of noise, and adding a measure of automation. This suggests the potential for adapting control of devices, such as those for movement assistance, to co-maximize cognitive agency and performance. While performance indicates greater functional abilities, higher agency facilitates cognitive integration between user and device for ease-of-use and more natural control. Agency may also be key in accelerating learning and clinical retention of rehabilitation devices. Implicit measures of agency based on intentional binding are potentially reliable foundations for observing positive agency-performance dependencies.

This study was conducted in VR to ensure the pathlength for the reach-to-grasp task was visually similar while systematically varying control modes. It remains unclear how VR is best employed to identify optimal control modes for real-world devices. The objective with this work was to demonstrate the positive relationship between implicit agency and performance and their dependencies across control modes. This finding should then inspire practical methods that robustly and automatically adapt device control for each user toward greater agency and performance. Virtual reality could be a highly efficient medium in which to identify initial user-fitted control parameters. Those parameters may then be further refined based on real-world observations. Similar approaches have been utilized whereby computational models indicate basic operating characteristics of a control system (Nataraj et al., 2010, 2012c) prior to implementation in a clinical setting (Nataraj et al., 2012a,b).

In the future, alternative measures of agency such as neurophysiological recordings may be more robust for control system adaptation. Characterizing agency according to patterns in muscle electromyography (EMG) or brain electroencephalography (EEG) would be practically beneficial. These recordings often serve as command inputs to control systems for movement devices. Furthermore, neurophysiological recordings would not necessitate conscious user responses during adaptation of device control. Reducing such user onus could mitigate cognitive fatigue although that was not readily apparent in this study. Meanwhile, implicit agency through time-interval estimates could be critical in identifying what neurophysiological patterns best represent cognitive states of high agency. Changes in EEG readiness potential have been shown with greater agency in relation to the intent to initially move (Jo et al., 2014). However, spectral coherence changes in EEG and EMG during high agency movement remains unclear. Ultimately, clear biomarkers for high agency and performance would be invaluable in optimizing userdevice interfaces for movement through better musculoskeletal control systems (Nataraj et al., 2010, 2012b).

## DATA AVAILABILITY STATEMENT

The datasets generated for this study are available on request to the corresponding author.

## ETHICS STATEMENT

The studies involving human participants were reviewed and approved by the Stevens Institute of Technology IRB.

## REFERENCES


The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

## AUTHOR CONTRIBUTIONS

RN: designing and developing the experiment, analyzing the data, writing and revising the manuscript, and directing the project. SS: recruiting participants, performing the data collections, and revising the manuscript. AS: recruiting participants and performing the data collections. ML: revising the manuscript.

## FUNDING

This work was made possible by support from the Schaefer School of Engineering and Science, at the Stevens Institute of Technology and research grant (PC 53-19) from the New Jersey Health Foundation.

## ACKNOWLEDGMENTS

The authors would like to acknowledge Felix Chen, an electrical and computer engineering graduate student at Stevens, for initiating real-time marker streaming in the virtual reality protocol.




**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Nataraj, Sanford, Shah and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Using More Ecological Paradigms to Investigate Working Memory: Strengths, Limitations and Recommendations

Lison Fanuel 1,2, Gaën Plancher <sup>1</sup> \* † and Pascale Piolino3,4,5†

<sup>1</sup> Cognitive Mechanisms Research Laboratory, Université Lyon 2, Bron, France, <sup>2</sup> Lyon Neuroscience Research Center (CRNL), INSERM U1028, CNRS UMR5292, Université Lyon 1, Université de Lyon, Lyon, France, <sup>3</sup> Laboratoire Mémoire, Cerveau et Cognition, MC2Lab 7536, Université de Paris, Paris, France, <sup>4</sup> Institut de Psychologie, Université de Paris, Boulogne Billancourt, France, <sup>5</sup> Institut Universitaire de France, Paris, France

Keywords: memory, working memory, virtual reality, naturalistic events, ecological environment

#### Edited by:

Valerio Rizzo, University of Palermo, Italy

#### Reviewed by:

José Manuel Reales, National University of Distance Education (UNED), Spain Fabio Solari, University of Genoa, Italy

\*Correspondence: Gaën Plancher gaen.plancher@univ-lyon2.fr

†These authors share last authorship

#### Specialty section:

This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience

> Received: 15 October 2019 Accepted: 06 April 2020 Published: 04 May 2020

#### Citation:

Fanuel L, Plancher G and Piolino P (2020) Using More Ecological Paradigms to Investigate Working Memory: Strengths, Limitations and Recommendations. Front. Hum. Neurosci. 14:148. doi: 10.3389/fnhum.2020.00148 Working memory (WM) is essential to daily-life activities as it allows maintaining information in the short-term while processing concurrent information (Baddeley and Hitch, 1974). For example, one must maintain which ingredient is already in the plate while following a recipe. WM is a complex cognitive function involving multiple processes (e.g., encoding, maintenance, retrieval processes). The present paper focuses on the utility of virtual reality (VR) in investigating maintenance in WM, but the relevance of VR studies also applies to other WM-related mechanisms.

Recent models proposed an attention-based mechanism supporting maintenance of domaingeneral information: attentional refreshing (or refreshing; Camos et al., 2009; Camos and Barrouillet, 2014; Camos, 2017). Refreshing is described as a brief thought to an information that is no longer perceptually present (Johnson, 1992) and received growing attention in both WM and episodic memory (EM). The WM field provides convincing evidence of an involvement of refreshing in maintenance of visual, spatial, verbal information, as well as in the binding between these information (Hudjetz and Oberauer, 2007; Camos et al., 2009; Vergauwe et al., 2009, 2010, 2012). Studies using delayed recall suggest that memory performance depends on the time available for refreshing (Camos and Portrat, 2015; Souza and Oberauer, 2017; Jarjat et al., 2018) and that refreshing plays a role in construction of episodic traces (Johnson et al., 2002; Loaiza and McCabe, 2013). So far, studies used very simple to-be-remembered material such as letters or spatial locations (e.g., Camos et al., 2009; Vergauwe et al., 2009, 2010; Camos and Portrat, 2015). As refreshing is involved in maintenance of domain-general information and construction of EM, it should play a significant part in maintenance and long-term retention of rich and complex information. Because WM is central in daily-life activities, future research should design more ecological experiments to better understand the role of WM and refreshing in naturalistic situations.

VR seems to be a useful tool to investigate memory functioning in daily-life-like environments. VR allows creating naturalistic situation and increasing their ecological validity as compared to classical experimental or neuropsychological tests (Plancher and Piolino, 2017). Ecological validity refers to the extent to which experimental conditions are similar to a real-world setting (Bohil et al., 2011). Accordingly, a VR experience can provide complex and rich information involving multiple senses (vision, audition, proprioception, etc.) and spatiotemporal features. VR also enables interaction with the environment, for example by controlling displacements, which increase the feeling of immersion in this environment (Mestre and Fuchs, 2006). Besides improving ecological validity, controlled environments can be created to assess multiple features of memory traces—the content of the memory trace (what) and its spatial and temporal location (where and when)—as well as the binding between these features (Plancher et al., 2010). VR is thus a good compromise between memory assessment of daily-life-like experience and experimental control.

While VR was extensively used to better understand EM (Plancher and Piolino, 2017; La Corte et al., 2019), and executive functions (Negut, 2014; Negu¸t et al., 2016), only few studies investigated WM mechanisms with this method. Meilinger et al. (2008) investigated the involvement of WM in a wayfinding task. In comparison to a control condition without concurrent processing, a concurrent task (e.g., indicating the spatial location of a sound) negatively affected wayfinding of the routes previously seen. Both verbal and spatial concurrent tasks (continuously repeating a syllable sequence or tapping a spatial sequence, respectively), impaired memory performance for the landmark location and only the spatial concurrent task impaired memory performance for the route (Gras et al., 2013). More recently, Plancher et al. (2018) investigated the role of WM in construction of EM traces using a VR paradigm. While driving into a virtual town, participants had to memorize the encountered scenes as detailed as possible including the elements constituting the scene (what), the spatial location (where) and the temporal context of the scene (when). The recall of the spatial or temporal context associated to each element provided a binding score. As compared to a condition without concurrent processing, a verbal concurrent task (memorizing the number of garbage containers) only impaired memory performance of what information and a visuospatial concurrent task (memorizing the spatial position of containers) impaired memory performance of what, when, and what-where-when binding information. These results suggest that construction of memory traces rely on verbal and visuospatial maintenance mechanisms and were interpreted as reflecting an involvement of both phonological loop (i.e., verbal-specific WM mechanism, Baddeley and Hitch, 1974) and refreshing in the construction of what and an involvement of refreshing in the construction of when and binding components of EM traces.

Typically, the involvement of refreshing in maintenance is investigated using complex span tasks where to-be-processed items are interleaved in-between each to-be-memorized information (Barrouillet et al., 2004, 2007). Following the assumption that maintenance and processing compete for one limited resource (i.e., attention), increasing the amount of attentional resources required by the processing task leave less attentional resources available for attentional maintenance (i.e., refreshing, Barrouillet et al., 2007). Attentional sharing between maintenance and processing is proposed to rely on time: when time is occupied by a processing task, attentional maintenance cannot take place, and vice versa (Barrouillet et al., 2007). Varying the amount of time required for processing a concurrent task (i.e., its cognitive load) results in manipulating the amount of time available for refreshing. Poorer WM performance under higher cognitive load is taken as evidence of an involvement of refreshing in WM maintenance (Barrouillet et al., 2004, 2007; Vergauwe et al., 2009, 2010).

To understand the involvement of refreshing in maintenance of rich and complex information, we suggest adapting complex span tasks to VR paradigms. The task concurrent to maintenance should be distinct from the memorization task and allow to measure response times. Thereby, it will be possible to manipulate the cognitive load of the concurrent task and investigate whether and how refreshing is involved in maintenance of the different features of a complex memory trace (what, when, where) and the binding of these features. A passive exploration of the environment would allow determining the time-course of the task and controlling temporal parameters and require no control nor planning for traveling. Motor and planning actions required by active traveling can constitute an attentional cost (Plancher et al., 2013) and have an uncontrolled detrimental effect on WM performance. However, a passive exploration result in a simple video experience. Active navigation seems more useful to enrich the EM trace (Plancher et al., 2012, 2013; Sauzéon et al., 2012; Jebara et al., 2014). Immersion and real time interaction with the environment are necessary for selfexperience and bodily representation and modulate the sense of presence in the present that is central in daily-life experience (Nash et al., 2000; Makowski et al., 2017). Self-experience and bodily representation reinforce EM performance (Bergouignan et al., 2014; Repetto et al., 2016; Tuena et al., 2017, 2019; Blanke et al., 2018) and might also influence maintenance in WM. For a more immersive experience and a better understanding of the involvement of refreshing in daily-life situations, future studies should systematically use an active condition. Contrasting different levels of immersion (from computer screens to headmounted displays or cave automatic virtual environments) and interaction (from joystick to motion capture) with a passive condition would enable determining the minimum conditions for studying WM in an ecological context and explore how embodiment impacts refreshing.

To study WM in conditions as close as possible from reallife using VR, we suggest designing a virtual environment where the participant is freely exploring and encounter events of different nature (e.g., visual, auditory, proprioceptive, spatial, or any combination). To enhance ecological validity, events should occur at a non-isochronous pace. Temporal parameters related to to-be-memorized and to-be-processed events (number of events, presentation duration, inter-stimuli intervals) should be fixed and participants' behavior (e.g., response times) should be measured. Studies of interdependence between WM processes and other cognitive functions will also benefit from VR paradigms. Long-term memory and semantic representation seem to contribute to refreshing and WM (Loaiza et al., 2015; Loaiza and Camos, 2018). Refreshing is involved in construction of EM traces (Johnson et al., 2002; Loaiza and McCabe, 2013) and might play a part in prospective memory (Marsh and Hicks, 1998). Carefully timed VR experiments manipulating WM parameters will contribute to a detailed insight of these cognitive processes and their relationship with WM.

Developing virtual-reality-based paradigms to investigate WM would be useful to identify the neural basis of WM mechanisms in ecological situations. To date, as for behavioral studies, neurophysiological studies of maintenance in WM (Vogel and Machizawa, 2004; Guimond et al., 2011; Lefebvre et al., 2013; Grimault et al., 2014) and refreshing have used very simple stimuli (Johnson et al., 2005, 2015). To our knowledge, no study combined WM daily-life-like paradigms and neurophysiological measures. Yet, VR paradigms can be combined easily with neurophysiological recording like eye-tracking (e.g., Whitmire et al., 2016) or electrodermal and cardiac responses (e.g., Parsons et al., 2011; Armougum et al., 2019). Electroencephalography (e.g., Jaiswal et al., 2010; Bohil et al., 2011) and fMRI (e.g., Kalpouzos et al., 2010) recordings are possible during exposure in experiments with limited movements. Neurophysiological measures during retrieval depending on WM manipulation during exposure (e.g., the amount of time available for refreshing) could provide a further insight on the implication of WM processes in episodic construction.

VR is a powerful experimental tool that allow creating multimodal and naturalistic environments to assess memory for rich and complex information while keeping a strong experimental control. VR will allow testing theoretical assumptions with enhanced ecological validity and interactive fidelity and explore new hypothesis as whether and how traveling affects WM. In addition, VR allows us to examine some of the basic properties of the situated and embodied approach through exact control of methodological factors and go further into the understanding of the role of presence and consciousness in memory. Yet, using VR paradigms in WM studies require to compromise between a strong control of temporal parameters and immersion through active exploring. We recommend fixing temporal parameters of the experiment and to measure behavior, especially response times, as precisely as possible. Future studies will need to determine the minimum immersion and interaction conditions to explore WM with VR to limit the attentional cost of active navigation and facilitate the combined use of VR and neurophysiological measurements. Moreover, some other technical aspects of VR such as the risk of cybersickness, photorealistic level of environment, type of interaction and way of navigation, level and mode of the embodiment need further investigations to test their impact on WM as suggested by some authors in the domain of memory studies (Smith, 2019).

Future studies should also assess metric properties of VRbased measurements of WM to ensure a good construct validity and reliability. Besides the good equivalence between cognitive performance and physiological responses in the virtual and the real world (Sorita et al., 2013; Armougum et al., 2019),

## REFERENCES


construct validity of VR-based neuropsychological assessments seem suitable (Negu¸t et al., 2015), comforting the feasibility of developing VR-based assessment of cognitive functions. Given that VR is becoming more accessible at low cost, future studies will be able to multiply on a larger number of subjects to obtain reliable and reproducible data.

WM is impaired in various populations such as healthy aging, age-related dementia (e.g., Baddeley et al., 1986; Huntley and Howard, 2010) or schizophrenia (e.g., Lee and Park, 2005). In some populations, WM deficit is proposed to be due to an impairment of refreshing (e.g., Hoareau et al., 2016; Fanuel et al., 2018 in healthy aging; Grillon et al., 2013 in schizophrenia). Developing naturalistic tools to investigate WM functioning seem very useful to characterize the WM deficits in these populations. Previous studies suggest that WM training involving multi-modal stimuli, demanding high cognitive engagement and targeting WM domain-general mechanisms are more likely to yield WM and general cognitive enhancement (Morrison and Chein, 2011). VRbased trainings of WM thus seem a promising approach for enhancing both WM and broader cognitive functions. It is a crucial point nowadays to examine the VR acceptability for fragile populations.

## AUTHOR CONTRIBUTIONS

All authors conceptualized the manuscript. LF wrote the original draft. GP and PP reviewed and edited its intellectual content.

## FUNDING

This work was supported by a grant from Région Rhône-Alpes. It was conducted within the framework of the LabEx Cortex (Construction, Function and Cognitive Function and Rehabilitation of the Cortex, ANR-11-LABX-0042) of Université de Lyon, within the program Investissements d'avenir (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR).


auditory and visual short-term memory. Neuropsychologia 51, 2939–2952. doi: 10.1016/j.neuropsychologia.2013.08.003


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Fanuel, Plancher and Piolino. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Digital Biomarkers for the Early Detection of Mild Cognitive Impairment: Artificial Intelligence Meets Virtual Reality

Silvia Cavedoni<sup>1</sup> \*, Alice Chirico<sup>2</sup> , Elisa Pedroli1,3, Pietro Cipresso1,2 and Giuseppe Riva1,2

<sup>1</sup> Applied Technology for Neuro-Psychology Lab, Istituto Auxologico Italiano, Milan, Italy, <sup>2</sup> Department of Psychology, Catholic University of the Sacred Heart, Milan, Italy, <sup>3</sup> Faculty of Psychology, eCampus University, Novedrate, Italy

#### Edited by:

Soledad Ballesteros, National University of Distance Education (UNED), Spain

#### Reviewed by:

Vitoantonio Bevilacqua, Politecnico di Bari, Italy Junichi Chikazoe, National Institute for Physiological Sciences, Japan

> \*Correspondence: Silvia Cavedoni silvia.cavedoni@outlook.com

#### Specialty section:

This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience

> Received: 14 April 2020 Accepted: 02 June 2020 Published: 24 July 2020

#### Citation:

Cavedoni S, Chirico A, Pedroli E, Cipresso P and Riva G (2020) Digital Biomarkers for the Early Detection of Mild Cognitive Impairment: Artificial Intelligence Meets Virtual Reality. Front. Hum. Neurosci. 14:245. doi: 10.3389/fnhum.2020.00245 Elderly people affected by Mild Cognitive Impairment (MCI) usually report a perceived decline in cognitive functions that deeply impacts their quality of life. This subtle waning, although it cannot be diagnosable as dementia, is noted by caregivers on the basis of their relative's behaviors. Crucially, if this condition is also not detected in time by clinicians, it can easily turn into dementia. Thus, early detection of MCI is strongly needed. Classical neuropsychological measures – underlying a categorical model of diagnosis - could be integrated with a dimensional assessment approach involving Virtual Reality (VR) and Artificial Intelligence (AI). VR can be used to create highly ecologically controlled simulations resembling the daily life contexts in which patients' daily instrumental activities (IADL) usually take place. Clinicians can record patients' kinematics, particularly gait, while performing IADL (Digital Biomarkers). Then, Artificial Intelligence employs Machine Learning (ML) to analyze them in combination with clinical and neuropsychological data. This integrated computational approach would enable the creation of a predictive model to identify specific patterns of cognitive and motor impairment in MCI. Therefore, this new dimensional cognitive-behavioral assessment would reveal elderly people's neural alterations and impaired cognitive functions, typical of MCI and dementia, even in early stages for more time-sensitive interventions.

Keywords: gait analysis, kinematic, Mild Cognitive Impairment, Virtual Reality, Machine Learning, elderly, digital biomarkers, Artificial Intelligence

## INTRODUCTION

A categorical approach to diagnosing dementia struggles to capture subclinical conditions, such as Mild Cognitive Impairment (MCI). Crucially, MCI can either revert to normal cognition, stabilize, or slowly evolve toward other forms of dementia (Chiu, 2005; Walters, 2011; Morris, 2012; Díaz-Mardomingo et al., 2017; Vanacore et al., 2017). This construct indicates people affected by an in-between condition between normal aging and early dementia (Petersen, 2004; Albert et al., 2011; Mckhann et al., 2011; Seo et al., 2017) and is usually segmented into single- or multiple-domain amnestic (aMCI) and non-amnestic (naMCI) subtypes, depending on whether impairments concern only memory or other cognitive functions, e.g., executive and visuo-spatial

abilities (Petersen, 2004; Apostolova and Cummings, 2008; Albert et al., 2011; Hughes et al., 2011; Michaud et al., 2017; Facal et al., 2019). Both patients and their caregivers can observe and report clear signals of this subtle waning, undiagnosable as dementia. Frequently, elderly people express concern over their perceived worsening in one or more cognitive domains, such as memory or language (Petersen et al., 1999, 2018). This waning has a great impact on their quality of life, reducing their ability to autonomously carry out activities. A key aspect concerns the possibility of detecting an initial cognitive decline at the behavioral level with a slowdown in execution of the instrumental activities of daily life (IADL), such as grocery shopping and medication and financial management (Kim and Kim, 2009; Gold and Gold, 2012).

Changes associated with subclinical forms of dementia manifest themselves through behavioral alterations. Usually, caregivers are the first ones to notice these altered behaviors, as shown by Van Vliet et al. (2011). The authors explored the barriers hindering a timely diagnosis of dementia, focusing on interviews conducted with caregivers of relatives that were later diagnosed with early-onset dementia (EOD). Caregivers frequently reported behavioral changes in relatives with EOD, either alone or associated with neuropsychiatric symptoms (NPS), such as apathy or depression, and personality changes. Behavioral impairment then evolved toward a decline in IADL and involved cognitive impairment, particularly memory loss (Van Vliet et al., 2011). The broader detrimental impact of behavioral changes generated familial/marital conflicts and reduced job productivity, leading to a decreased income or even dismissal (Van Vliet et al., 2011). Though valuable, this anecdotal information rarely becomes part of a (categorical) diagnosis based on medical and neuropsychological assessment. Over time, caregivers have been considered a source of information that is not always reliable, given their tendency to over- or underestimate elderly people's deficits, possibly due to knowledge gaps (Akl et al., 2015; Jekel et al., 2015). Caregivers might be absent or suffer from physical or psychological conditions exacerbated by their relative's worsening (Okonkwo et al., 2008; Van Vliet et al., 2011; Pfeifer et al., 2013; Akl et al., 2015; Jekel et al., 2015). They might explain the elderly person's decline and behavioral, cognitive, and personality changes rather as a result of aging. Sometimes, caregivers are not aware of the symptoms because of their relative's ability to cover them up, denying their impairments or developing subsequent compensatory strategies to disguise the difficulties. This, in turn, delays the consultation of a practitioner and the diagnostic process as well (Okonkwo et al., 2008; Van Vliet et al., 2011; Roehr et al., 2019). Early detection of MCI, resulting in time-sensitive interventions, is still an open issue.

In this regard, two components appear relevant. Firstly, there is a need to rely more on rigorous and systematic behavioral analysis for early detection of MCI. Secondly, there is a need to integrate this new practice into current ones, i.e., neuropsychological evaluation. Including these data jointly in MCI assessment can allow a more sensitive measurement of the deficit by placing it on a continuum, reflecting a dimensional approach accounting for several other subclinical conditions, including Subjective Cognitive Decline (SCD; Roehr et al., 2019) or Pre-Mild Cognitive Impairment (Pre-MCI; Crocco et al., 2018; Grassi et al., 2018).

This is far more crucial when considering that MCI can turn into dementia if the elderly person does not receive a timely diagnosis (Chiu, 2005), which should be built upon finer discrimination among the early stages of MCI and the collection of behavioral data, moving beyond a categorical, dichotomous approach rooted in previous diagnostic models, such as DSM-IV-TR or ICD-10 (American Psychiatric Association, 2000; World Health Organization, 2007; Negu et al., 2016), and the distinction between aMCI and naMCI.

The exclusive implementation of neuropsychological assessment tools cannot provide information on the finer behavioral aspects of the early stages of MCI and, despite their widespread use and efficacy, they fail to predict an individual's behavior in real life, and there is a need to improve their ecological validity, sensitivity, and specificity (Rizzo et al., 2004; Negu et al., 2016; Plancher and Piolino, 2017; Kim et al., 2019). The available objective methods for assessing MCI are frequently based on informant-reports or conducted in isolated and artificial situations, thus opening the possibility for evaluation biases. A resounding change might be fostered by a novel approach assembling in new ways existing technologies and data analysis methodologies that allow a refined assessment and the creation of a continuum for MCI following a dimensional approach. These technologies aim to integrate rather than replace existing neuropsychological evaluation or caregiver/informant reports in order to obtain a more complete and dynamic picture of the strengths and critical aspects of the elderly person as they evolve over time.

This perspective aims to propose the development of a new integrated, multimethod, dimensional approach for early detection of MCI on the basis of behavioral data that incorporates existing, consolidated technologies, such as gait kinematic analysis, Virtual Reality (VR), and Machine Learning (ML), in the conventional assessment of MCI. The outcome would be a finer, continuous, time-sensitive assessment of MCI, in line with a dimensional approach compliant with new DSM-5 guidelines (American Psychiatric Association, 2013). Moreover, it would draw on recent empirical evidence and scientific groundwork, helping the clinician to tailor the rehabilitation to the needs of the individual. This positive contribution would improve their quality of life, decreasing both health care assistance costs and hospitalization rates, thus opening up new possibilities for primary and secondary prevention. Moreover, it would facilitate the communication between practitioners and researchers, providing a solid foundation and fostering mutual exchange.

## A NEW INTEGRATED APPROACH TO MCI ASSESSMENT

We suggest that Virtual Reality would provide the most suitable context (i.e., answering the question Where?) for the assessment

of key behavioral variables indicating MCI onset (i.e., What?), which can be analyzed in a systematic and accurate way in relation to neuropsychological and clinical data by means of Machine Learning (ML) (i.e., How?). We expand on all of these aspects in the following sections.

## "Where" Does the Assessment Take Place? Virtual Reality

Usually, the assessment of cognitive functions does not take place in daily-life contexts, potentially hindering an ecological evaluation of the individual's impairment (Rizzo et al., 2004; de los Reyes-Guzmán et al., 2014). A promising integration of conventional practices could rely on novel dimensional assessment techniques, based on realistic immersive simulations of daily situations, e.g., Virtual Reality (VR) – a 3D computergenerated environment with some degree of immersion and interactivity, along with a sense of being really present in it (Riva and Mantovani, 2014; Riva et al., 2018, 2019a; Moreno et al., 2019). VR has developed into a key technology that is able to resemble even complex daily situations and interactions in a safe and controlled setting, due to the feeling of immersion (i.e., the number of senses stimulated within the environment, together with the closeness of the stimuli employed in simulations to reality) (Slater, 2009; Plancher and Piolino, 2017; Cipresso, 2018), the sense of presence within the environment (i.e., the feeling of being really "there" in the simulated environment, along with the ability to realize our intentions within it), and the possibility to interact with objects (Biocca, 1997; Heeter, 2000; Bailenson et al., 2006; Sundar et al., 2010; Negu et al., 2016; Plancher and Piolino, 2017; Cipresso, 2018; Kim et al., 2019). Depending on the degree of immersion of the system employed, VR allows a realistic experience through the use of multi-sensorial displays (i.e., visual, auditory) along with tracking devices that detect any movement of the individual and deliver the recorded data to the visualization system for a real-time update of the virtual environment (Chirico et al., 2016; Plancher and Piolino, 2017; Cipresso, 2018). The most immersive 3D VR environments can provide a high sense of presence also by isolating individuals, facilitating natural interactions and exchanges that resemble equivalent ones in daily life (Gold and Gold, 2012; Allain et al., 2014; Riva and Mantovani, 2014; Chirico et al., 2016; Riva et al., 2018).

The main features of VR allow the creation of ecological, safe, standardized settings and exert a strict experimental control over stimulus delivery and measurement (Rizzo et al., 2004; Gold and Gold, 2012; Allain et al., 2014; Negu et al., 2016; Plancher and Piolino, 2017). This, in turn, has supported its deployment for both clinical and non-clinical samples of elderly people and young adults (García-Betances et al., 2015; De Tommaso et al., 2016; Plancher and Piolino, 2017). Within medical and neuropsychological settings, VR has been extensively applied as an assessment and a rehabilitation tool for elderly people suffering from consequences of a traumatic brain injury (Aida et al., 2018; Alashram et al., 2019; Maggio et al., 2019), for poststroke patients (Henderson et al., 2007; Saposnik and Levin, 2011; Laver et al., 2017), and for spatial memory and balance (Allain et al., 2014; Serino et al., 2017; Gerber et al., 2018; Soares et al., 2018), among other applications (see Plancher and Piolino, 2017; Moreno et al., 2019). Crucially, VR allows the therapy to be tailored in a controlled way, according to each disease starting from a continuous assessment of the individual's behaviors. Only recently, VR has been employed to assess IADL in MCI patients while including kinematic measures that integrate a neuropsychological evaluation (Seo et al., 2017). As previously mentioned, an initial cognitive decline can be behaviorally manifested by a slowdown in the execution of IADL (Kim and Kim, 2009; Millán-Calenti et al., 2010; Gold and Gold, 2012), which implies a neurological and cognitive alteration that is partially reflected in indexes such as bodily movements or gait. Previous studies have examined these behavioral alterations of IADL in order to refine MCI assessment and have already delivered promising results (Schröter et al., 2003; Montero-Odasso et al., 2009; de los Reyes-Guzmán et al., 2014). Motion detectors, applied to the elderly person's leg joints allow gait kinematics and their impairments to be tracked during the performance of IADL within a VR environment. This could consolidate preliminary findings of specific motor alterations that integrate neuropsychological and cognitive evaluation to identify MCI. In this perspective, the preliminary work of Seo et al. (2017) is the closest application of the technologies proposed to refine MCI assessment, although gait analysis was not included. The recording of kinematic measures from the performance of IADL within an immersive VR environment potentially adds more discriminative value in distinguishing MCI individuals from the healthy control group (Seo et al., 2017). Including an evaluation where the elderly person him/herself performs IADL might be essential for establishing more precise criteria (Díaz-Mardomingo et al., 2017; Seo et al., 2017). Several authors have tried to refine early MCI detection by combining two out of the three variables considered in this paper: either behavioral alteration (IADL, gait) within a VR environment (Lee et al., 2003; Seo et al., 2017; Kim et al., 2019; Eraslan Boz et al., 2019), gait kinematics extracted and analyzed by means of ML, which will be discussed further (Begg and Kamruzzaman, 2005; Pogorelc et al., 2012; Zhang and Wang, 2012; Eskofier et al., 2013; Akl et al., 2015; Costa et al., 2016; Mannini et al., 2016; Caldas et al., 2017; Farah et al., 2017; Ur Rehman et al., 2019), or ML techniques for predicting MCI evolution (Filipovych and Davatzikos, 2011; Williams and Weakley, 2013; Moradi et al., 2014, 2015; Bratic et al., ´ 2018; Grassi et al., 2018, 2019; Graham et al., 2020). Thus, to our best knowledge, this is the first paper proposing an integration of VR, gait kinematics, and ML in order to refine early detection of MCI following a dimensional approach in line with the most recent diagnostic systems and possibly providing information on disease progression. However relevant, traditional neuropsychological assessment does not provide this extent of information and could serve as a starting point that should be integrated with further information in order to detect a subclinical condition otherwise undiagnosable following a categorical approach. Crucially, some anecdotal evidence and more systematic but scattered evidence from kinematic analysis

of specific movements (i.e., What?) suggest the feasibility and the relevance of an approach based on assessment of behavioral variables for early detection of MCI. We present preliminary evidence in this regard in the following.

## "What" Variables Are Included in the Assessment? Gait Kinematics

Is it possible to give relevance to behavioral data reported by the caregivers, relying on the anecdotal description of the elderly person's daily functioning and their IADL performance, in a scientific and rigorous manner?

A potential solution is to analyze the elderly individual's movements (kinematics) while performing IADL. Kinematic analysis automatically records movements in a controlled setting and assesses the underlying cognitive impairment. Preliminary studies proved the feasibility of tracking the elderly person's head, dominant hand, or gait during the performance of IADL to refine the assessment of MCI and other cognitive conditions (Schröter et al., 2003; de los Reyes-Guzmán et al., 2014; Akl et al., 2015; Seo et al., 2017). Among these indexes, gait kinematic analysis has progressively received more attention, despite the paucity of MCI-focused studies. The work of Martín-Gonzalo et al. (2019) thoroughly explains the contribution of considering gait alterations, beginning in early cognitive decline, to an improved understanding of neurocognitive disorders. In fact, gait kinematics are strongly related to neurophysiological alterations (Persad et al., 2008; Maquet et al., 2010; Martín-Gonzalo et al., 2019), brain volume changes in specific areas (Tian et al., 2017; Allali et al., 2019; Martín-Gonzalo et al., 2019), and subsequent cognitive decline, predicting future risks of impairment (Martín-Gonzalo et al., 2019). Kinematics assesses the sequential configuration of the leg joints required to maintain the body's center of gravity above the stance base while a person is moving forward. Compared to healthy subjects, the gait of a person suffering from MCI shows decreased velocity, longer stride time, increased stride-to-stride variability (Hausdorff, 2007; Bahureksa et al., 2017; Byun et al., 2018; Martín-Gonzalo et al., 2019), and spatiotemporal complexity (Ihlen et al., 2016; Martín-Gonzalo et al., 2019).

A gait cycle is defined by ongoing changes in the sequential configurations of the joints allowed by muscle activation, which is controlled by neural mechanisms depending on the integrity of somatosensory, motor, and cognitive integration cerebral networks (Perry and Burnfield, 2010; Caldas et al., 2017; Costilla-Reyes et al., 2020). Successful locomotion is indeed a dual task requiring the ability to simultaneously perform a cognitive task that could interfere with gait performance, particularly in elderly people (Pedroli et al., 2018; Costilla-Reyes et al., 2020). A decrease in attentional and executive functioning is physiological in aging and could impact this simultaneous execution (Hsu et al., 2012; Montero-Odasso et al., 2012; Wang et al., 2015; Gwak et al., 2018; Pedroli et al., 2018). In order to maintain walking capacity, damage to cerebral networks involved in gait leads to an adaptation of the nervous system, generating new signals reflecting the damage. Brain signals to the muscles controlling joint movement may become discontinuous and uncoordinated: this generates noise that could be consequent to the failure of some neuronal networks and produces configurations that respond to intentional cognitive directives, such as changing gait pace, little or not at all (Martín-Gonzalo et al., 2019). Indeed, kinematic data provide additional, crucial information that increases the sensitivity and specificity of MCI assessment. Paper-and-pencil neuropsychological tests are not suitable for the detection of gait features and its alterations, which appear relevant for more precise identification of MCI individuals.

To date, gait analysis has been studied within a context with little ecological validity: the walking task is generally an end in itself and is not recorded while the subject is completing a complex activity. Even the extraction of gait kinematics from videos or home-based motion sensors could provide only partial, bi-dimensional information or could be less sensitive in detecting real-time movement adjustment (Akl et al., 2015; Prakash et al., 2015; Neverova, 2016). The use of VR enables continuous, tridimensional tracking of the ongoing events within a highly immersive, safe, and standardized environment, enhancing the methodological strength of the procedure as well.

This introduces the need for a highly ecological and immersive context that allows the elderly person's kinematics while performing IADL to be observed and detected. A plausible solution comes from the implementation of Virtual Reality (VR), as shown in **Figure 1** (Pedroli et al., 2018). The technological equipment illustrated in **Figure 1** is a fourwalled Cave Automated Virtual Environment (CAVE), available at Istituto Auxologico Italiano, which is routinely used for cognitive and motor rehabilitation of elderly people. This highly immersive technology is equipped with eight (4 × 2) Vicon Bonita 10 cameras (Opti-Tracking system, 1MP) and different Hi-res Hi-FOV head-tracked 3D HMDs and also with a wide range of physiological and motion measures for quantifying embodiment in VR and movements within the environment. A virtual representation of, e.g., a city or a supermarket can be projected on the four walls, and subjects can actively navigate and interact with the environment. This setting was used by Seo et al. (2017), which, as previously mentioned, is the most similar procedure to the one that we propose.

For the aims of this perspective, the most important feature of VR is its ability to detect both real-time behaviors (e.g., specific bodily movements such as those of the head and upper limbs and gait) and physiological indexes (e.g., skin conductance, heart rate). The great amount of data collected with VR and kinematics implies the need for a computational method of analysis that is able to extract meaning from a large amount of data. Thus, Machine Learning appears to be a viable solution, as shown by previous research employing this technique to discriminate between normal and pathological gait alteration and for diagnostic purposes as well (Pogorelc et al., 2012; Zhang and Wang, 2012; Eskofier et al., 2013; Costa et al., 2016; Mannini et al., 2016; Caldas et al., 2017; Farah et al., 2017; Ur Rehman et al., 2019).

## "How" to Analyze Them? Machine Learning

The massive amount of kinematic information extrapolated from motion detectors, complemented by neuropsychological and neuropsychiatric symptoms and signs, needs a similarly powerful technology in order to process it and convert it into an output intelligible for both clinicians and patients. The employment of kinematic measures and VR within a healthcare setting, i.e., a hospital, inevitably involves the use of a large amount of electronic health records (EHR) of patients' evolution over time. Despite the challenges related to the use of EHRs, several prediction algorithms and models have been developed from their use (Häyrinen et al., 2008; Miotto et al., 2016, 2017; Goldstein et al., 2017; Graham et al., 2020). Among other advantages, EHR-based predictors consider various metrics of multiple individuals, observed at different time points: this makes use of a higher frequency of data recording, facilitating the prediction of possible near-term evolution; they also reflect real life more closely than cohort studies (Goldstein et al., 2017).

The most suitable technique capable of administering a volume of complex and extensive information may be Machine Learning (ML). This scientific discipline stems from Artificial Intelligence (AI), i.e., a computer science field performing tasks capable of emulating human performance, generally learning to understand complex data, an endeavor that requires human intelligence (Bawack, 2019; Wang, 2019; Graham et al., 2020). ML algorithms have progressively gained popularity for several reasons, including their ability to automatically learn the inherent structure of a dataset (Kononenko, 2001; Abu-Mostafa et al., 2012; Facal et al., 2019) without requiring a priori hypotheses about relationships between variables (Miotto et al., 2017; Vieira et al., 2017; Graham et al., 2019, 2020). Conversely, ML algorithms can discover and predict data trends and patterns by building on existing information and highlight unexpected relationships between variables (Vieira et al., 2017; Graham et al., 2019, 2020). This "learning by processing" approach generates increasingly accurate predictive models and, so far, has demonstrated enormous potential for supporting individual prognosis, risk estimation, and classification learning

for diagnosis (Lehmann et al., 2007; Patel et al., 2015; Vieira et al., 2017; Dwyer et al., 2018; Facal et al., 2019). Inevitably, ML techniques work with high-dimensional data, which require a pre-processing step to remove redundant information, reduce data dimensionality, and improve learning accuracy and data comprehensibility (Khalid et al., 2014). This can be achieved by means of (i) feature selection (i.e., the selection of the best and most optimal features from a larger set of those useful for discriminating between classes to increase accuracy and generalizability); and (ii) feature extraction or dimensionality reduction (i.e., the transformation of original features to generate other, more significant features and reduce complexity) by means of Principal Component Analysis (PCA) or Independent Component Analysis (ICA), among other approaches (Khalid et al., 2014; Dwyer et al., 2018). The application of ML for healthcare purposes has been further developed into two main sub-classes, supervised (SL) and unsupervised (UL) techniques. SL jointly employs pre-labeled data, e.g., MCI versus healthy subjects, and additional features derived from clinical or neuroimaging sources to determine which feature predicts the pre-labeled data the most (Dwyer et al., 2018; Graham et al., 2020). SL operates with probabilistic and non-probabilistic classifiers (Naïve Bayes and Support Vector Machine, respectively), as well as with decision tree, linear, and logistic regression (Dhall and Kaur, 2020). UL techniques, instead, sets unlabeled and unstructured data, e.g., clinical notes, as a starting point to seek relationships or patterns and to learn general representations that enable the automatic extraction of information when building predictors (Miotto et al., 2017; Dwyer et al., 2018; Graham et al., 2020). The algorithms employed by UL include K-means clustering, PCA, and Artificial Neural Networks (ANN) (Dhall and Kaur, 2020).

However, at the time of data collection, it is unclear whether MCI subjects will progress toward other forms of dementia (e.g., AD) or convert back to normal cognition, and this evolution could become more evident over the years. This challenges data labeling: thus, researchers tackling MCI detection have employed semi-supervised learning (SSL) techniques capable of combining labeled and unlabeled data to improve the classification procedure (Zhu, 2008; Filipovych and Davatzikos, 2011; Moradi et al., 2015; Dwyer et al., 2018; Van Engelen and Hoos, 2020). A semi-supervised approach, therefore, allows cases to be managed by providing only partial data labels (Filipovych and Davatzikos, 2011). Several studies have employed MCI data as unlabeled data and have shown an improvement in the predictive performance of the model (Batmanghelich et al., 2011; Filipovych and Davatzikos, 2011; Ye et al., 2011; Moradi et al., 2015): this approach could be particularly feasible for the purpose of the integrated dimensional approach offered in the present perspective.

The ability to process raw data, the need for manual engineering of features, and the extensive expertise needed to perform the analyses represent the main limitations of conventional (shallow) ML techniques (Lecun et al., 2015; Vieira et al., 2017; Zhang et al., 2020). This has led to the dissemination of deep learning (DL) algorithms, including Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) (Dhall and Kaur, 2020). DL outperforms ML in many ways, showing best-inclass performance and increased complexity in the computed function and addressing problems in multiple domains such as language and speech (Zhang et al., 2020). Moreover, it eliminates the need for manual feature engineering, reducing possible human biases and removing the need for advanced expertise (Zhang et al., 2020). DL is capable of learning data representation in an unprocessed or raw form, and its high performance and expressive power in one specific domain can be transferred to other contexts, providing a flexible adaptation to problems (Bengio, 2009; Lecun et al., 2015; Miotto et al., 2017; Vieira et al., 2017; Chauhan et al., 2019; Esteva et al., 2019; Costilla-Reyes et al., 2020; Zhang et al., 2020). Despite all the advantages, it is crucial to consider that DL techniques require very large datasets to perform, which may be too hard to achieve, expensive, or time-consuming to obtain; thus, ML may be more feasible and efficient (Zhang et al., 2020).

To date, advanced statistical ML and pattern recognition techniques have proved their usefulness in outlining neurodegenerative patterns of mild symptoms manifesting during the early stages of diseases, and MCI is no exception (Davatzikos et al., 2008, 2010; Vemuri et al., 2009; Wee et al., 2014). ML has been repeatedly applied to diagnostic transitions from MCI to other forms of dementia, e.g., AD, employing different types of information: mostly neuroimaging data (e.g., MRI, PET scan, Diffusion Tensor Imaging) (Batmanghelich et al., 2011; Filipovych and Davatzikos, 2011; Ye et al., 2011; Zhang and Shen, 2011, 2012; O'Dwyer et al., 2012; Shaffer et al., 2013; Moradi et al., 2015; Bratic et al., 2018 ´ ), cerebrospinal fluid biomarkers (Davatzikos et al., 2010; Fjell et al., 2010; Zhang and Shen, 2011; Shaffer et al., 2013; Bratic et al., 2018 ´ ), demographic and cognitive data (Moradi et al., 2015; Bratic´ et al., 2018), and gait kinematics (Mannini et al., 2016; Farah et al., 2017; Gwak et al., 2018). Broad variations in studies' results have been reported, as has the lack of a gold-standard ML algorithm to predict disease progression (Grassi et al., 2018, 2019; Chiu et al., 2019; Facal et al., 2019; Mallo et al., 2019). Specifically, Grassi and colleagues (Grassi et al., 2018, 2019) have recently developed clinically translatable ML algorithms to identify which subjects with pre-MCI and MCI will convert to AD (Grassi et al., 2018, 2019). ML likewise appears promising for precision medicine: given the patients' extreme heterogeneity of symptoms, medication response, and prognosis, the implementation of ML to create computational models of disease development tackles patients' diverseness (Fisher et al., 2019). Over the years, researchers have devised a number of disease progression models for both MCI and AD, relying on clinical and imaging data (Mueller et al., 2005; Ito et al., 2011; Rogers et al., 2012; Moradi et al., 2015; Miotto et al., 2017; Samper-Gonzalez et al., 2017; Fisher et al., 2019). Previous applications of ML to clinical data have proven useful in predicting a single outcome (e.g., the likelihood of conversion from MCI to AD) (Fisher et al., 2019). From a clinical point

of view, however, it would be important to predict the disease progression and trajectory for everyone, which is difficult with current data-driven modeling approaches.

In their latest work, Graham et al. (2020) support the employment of AI and ML for ranking those variables crucial for MCI assessment and cognitive impairment. The authors show that clinical and psychometric assessments appear promising for identifying individuals at high risk for cognitive impairment (Lins et al., 2017; Senanayake et al., 2017; Moreira and Namen, 2018), which could be better identified by means of brain imaging and neuropsychological data as well (Fan et al., 2018; Iizuka et al., 2019). Even more importantly, Graham et al. (2020) report on several studies employing novel techniques to detect cognitive impairment in MCI subjects as well, such as home-installed motion sensors (Akl et al., 2015) and multimodal wearable activity devices (Gwak et al., 2018), therefore including behavioral data in ML analysis (Graham et al., 2020). However useful, for providing real-world behavioral data in an ecological context, the employment of motion sensors alone has shown substantial heterogeneity (Graham et al., 2020); therefore, VR appears a promising integrative solution achieved by simulating a supervised and controlled real-lifelike environment.

## DISCUSSION

Although both elderly people and caregivers notice and report their concerns regarding behavioral, personality, and cognitive changes, MCI is a subclinical condition that remains undiagnosed by an official categorical system while progressively compromising the independent functioning of the elderly person. Although a possible regression to normal cognition is desirable, more often, MCI evolves toward other forms of dementia. A delayed diagnosis entails the worsening of the individual's conditions, greatly reducing the extent of possible interventions and making primary and secondary prevention essential (Van Vliet et al., 2011; Jekel et al., 2015; Roehr et al., 2019). However, MCI assessment should necessarily move beyond a stringent categorical approach in favor of a dimensional one able to include finer discrimination among early stages of MCI, thus reflecting the complexity of this construct. So far, its assessment has followed a dichotomous view, relying on neuropsychological instruments to test MCI's presence or absence. Despite their proven efficacy, a dimensional approach would integrate them by implementing existing technologies and data analysis methodologies, placing MCI on a continuum. With this in mind, this perspective aimed primarily to move forward, proposing a novel assessment that could enable a more accurate prevision of the trajectory of MCI decline, employing Virtual Reality (VR) for a continuous dimensional assessment of MCI behaviors in ecological and realistic tailored, safe, and controlled simulated contexts.

Since the individual's altered behavior reflects impaired cognition (Martín-Gonzalo et al., 2019), this proposal would allow early detection of MCI, enabling timely rehabilitative interventions. Specifically, gait kinematics is a behavioral index whose analysis has proved sensitive to cerebral and cognitive alterations capable of discerning patients with cognitive decline from healthy individuals (Martín-Gonzalo et al., 2019). Nevertheless, few studies have specifically employed gait measurement as a possible marker to refine MCI assessment, and even then mainly in unfamiliar contexts, thus hindering ecological validity (Jekel et al., 2015; Seo et al., 2017). So far, MCI assessment has relied on neuropsychological measures rather than behavioral ones, despite the importance of the latter in revealing initial cognitive decline. When available, these behavioral data are generally based on informant-report questionnaires or reported as anecdotal information lacking scientific rigor (Van Vliet et al., 2011; de los Reyes-Guzmán et al., 2014; Kim et al., 2019). Behavioral data appear to provide a relevant contribution to MCI assessment: further research should deepen and consolidate the preliminary, promising evidence reported (Seo et al., 2017; Martín-Gonzalo et al., 2019).

It appears evident that a mere conventional neuropsychological assessment, however relevant, cannot provide such a high degree of information, giving rise to the necessity of integrating paper-and-pencil instruments and anecdotal evidence with behavioral alterations evaluated within a highly ecological and standardized setting, such as VR. A plausible, practical implementation of the approach could be structured as follows. During a first brief clinical interview, the practitioner could collect anamnestic and quantitative information from (i) the elderly person, relying on neuropsychological/neuropsychiatric and cognitive measures as well; and (ii) the caregiver, which could fill in informant-report IADL measures. A second appointment would be dedicated to VR-based assessment: the elderly person could perform IADL (e.g., money withdrawal, grocery shopping) within the CAVE virtual environment, while kinematic information of their performance would be simultaneously collected. These data could be provided by kinematics motion detectors placed on the individual's joints, as illustrated in **Figure 1**. The entire VR-kinematic assessment would last a maximum of 20 min to possibly avoid cybersickness, i.e., a form of motion sickness that

includes nausea, headaches, and disorientation, among other symptoms (Laviola, 2000; Davis et al., 2014). Cybersickness is a common side effect of VR and could interfere with the completion of quantitative measures: thus, whenever it is necessary to complete paper-and-pencil assessment in the second appointment, this should be done before the VR procedure starts. VR would allow the clinician to closely observe the real life-like behavior of the individual and employ motion detectors, which extrapolate a large amount of data computed by means of Artificial Intelligence (AI) and, specifically, Machine Learning (ML). As mentioned in the ML section, recently developed, clinically translatable ML algorithms could help to identify MCI subjects who will convert to AD (Grassi et al., 2018, 2019). Thus, these algorithms could be tested and implemented in the assessment procedure illustrated in the previous section after collecting data from both the quantitative evaluation and the VR procedure within the CAVE. This could generate an accurate, predictive model proposing a gradient of behavioral and cognitive decline: a subclinical condition such as MCI could not be detected promptly by a categorical approach. A schematic illustration of this model is depicted in **Figure 2**.

The first and foremost added value of this approach lies in moving one step forward toward refined MCI early detection by integrating (i) behavioral (gait kinematics, IADL), neuropsychological/neuropsychiatric and cognitive information; (ii) a highly ecological and standardized setting, such as VR; and (iii) a powerful method capable of analyzing an extensive amount of data and predicting MCI progression over time. This is the first dimensional approach jointly considering all of the mentioned sources of information, whether previous studies considered only two out of three variables at the same time. The main focus is the relevance of building an innovative assessment procedure that is data-fusion-based and capable of identifying a subclinical condition that is otherwise undetected. Many ML and DL algorithms exist to analyze the extensive amount of data collected and, except for some that were recently tested (Grassi et al., 2018, 2019), there is no consensus regarding a gold-standard algorithm to predicting MCI diagnostic transition. Moreover, several opensource libraries for ML can provide information regarding the most feasible programming language (e.g., Python) and algorithms to use (Rathi, 2019).

We are aware that the integration of kinematic analysis, VR, and ML, could be very expensive and may not be available in a clinical setting, such as in a hospital. In addition, there may be risk of initial acceptance resistance by elderly individuals and healthcare providers due to the novelty of the equipment. However, the implementation of this approach would offer a crucial benefit by enabling the dimensional assessment of a subclinical condition otherwise undiagnosable, and the trained models, enriched by data of numerous patients, would easily overcome the initial expense. Moreover, a hospital would be the only setting where biological and neuroimaging data (e.g., MRI, PET scan, cerebrospinal fluid biomarkers) can be collected. Although the method proposed, so far, does not include them, these types of information could eventually be added to ML analyses, since they have been previously indicated as plausible contributors to MCI assessment (Fjell et al., 2010; Batmanghelich et al., 2011; Filipovych and Davatzikos, 2011; Ye et al., 2011; Zhang and Shen, 2011, 2012; O'Dwyer et al., 2012; Shaffer et al., 2013; Moradi et al., 2015; Bratic et al., 2018 ´ ; Chiu et al., 2019). The employment of ML and DL methods usually requires a large sample size, which may not always be feasible in the healthcare setting. However, this limitation could be settled by developing multi-centric studies, providing an adequate sample size of patients and sharing data (Vieira et al., 2017). The dimensional approach also needs to be applied carefully in order to avoid hypervigilance for the slightest cognitive and behavioral age-related alteration, which might lead to excessive diagnosis and false-positives (Vanacore et al., 2017). Diagnosis communication must be carefully handled, given the potential harm of anxiety about a condition that may not progress (Chiu, 2005; Díaz-Mardomingo et al., 2017; Vanacore et al., 2017) prognostic possibilities can be discussed and planned accordingly. Strengthening or rehabilitative interventions could foster regression to normal cognition or decelerate the progression toward other clinical conditions.

In summary, while the majority of the literature has studied the application of several combinations of VR, gait kinematic analysis, and ML, this is the first paper to integrate all of these three methods and techniques in order to refine early detection of MCI and possibly predict its evolution over time. VR allows the collection of "Digital Biomarkers" – physiological/behavioral data - by means of digital technologies, used as an indicator of biologic processes or responses to therapeutic interventions (Coravos et al., 2019) directly connected to brain functioning. On the other side, AI, by applying ML techniques to the individual's digital biomarkers, allows the creation of a predictive model – following a dimensional approach to MCI – able to identify specific behavioral cognitive patterns within an ecological and safe environment, for accurate early detection of MCI and its potential evolutionary trajectory (Riva et al., 2019b).

## AUTHOR CONTRIBUTIONS

SC, AC, and PC developed the new model integrating gait kinematics, Virtual Reality, and Machine Learning. PC and GR supervised the sections of Virtual Reality and Machine Learning. EP supervised the section regarding MCI. SC wrote the manuscript under the final supervision of AC, EP, GR, and PC. All authors have approved the final version of the manuscript.

## FUNDING

This research was funded by "Future Home for Future Communities" ("Accordo Quadro di Collaborazione tra Regione Lombardia e Consiglio Nazionale delle Ricerche" – Convenzione Operativa n 19365/RCC) and by the Italian Ministry of Health research project "High-End and Low-End Virtual Reality Systems for the Rehabilitation of Frailty in the Elderly" (PE-2013- 0235594).

## REFERENCES

fnhum-14-00245 July 23, 2020 Time: 17:24 # 9


and adaptive algorithms. Gait Posture 57, 204–210. doi: 10.1016/j.gaitpost.2017. 06.019


Turkish older adults: virtual supermarket (VSM). Neuropsychol. Dev. Cogn. B Aging Neuropsychol. Cogn. 1–15. doi: 10.1080/13825585.2019.1663146




**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Cavedoni, Chirico, Pedroli, Cipresso and Riva. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Being in the Past and Perform the Future in a Virtual World: VR Applications to Assess and Enhance Episodic and Prospective Memory in Normal and Pathological Aging

Azzurra Rizzo† , Giuditta Gambino† , Pierangelo Sardo\* and Valerio Rizzo\*

Department of Biomedicine, Neuroscience and Advanced Diagnostic, Università Degli Studi di Palermo, Palermo, Italy

#### Edited by:

Soledad Ballesteros, National University of Distance Education (UNED), Spain

#### Reviewed by:

Nicola Cellini, University of Padua, Italy José Manuel Reales, National University of Distance Education (UNED), Spain

#### \*Correspondence:

Pierangelo Sardo pierangelo.sardo@unipa.it Valerio Rizzo valeriorizzophd@gmail.com

†These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Cognitive Neuroscience, a section of the journal Frontiers in Human Neuroscience

> Received: 14 November 2019 Accepted: 03 July 2020 Published: 04 August 2020

#### Citation:

Rizzo A, Gambino G, Sardo P and Rizzo V (2020) Being in the Past and Perform the Future in a Virtual World: VR Applications to Assess and Enhance Episodic and Prospective Memory in Normal and Pathological Aging. Front. Hum. Neurosci. 14:297. doi: 10.3389/fnhum.2020.00297 The process of aging commonly features a gradual deterioration in cognitive performance and, in particular, the decline of memory. Despite the increased longevity of the world's population, the prevalence of neurodegenerative conditions, such as dementia, continues to be a major burden on public health, and consequently, the latest research has been focused on memory and aging. Currently, the failure of episodic and Prospective memory (PM) is one of the main complaints in the elderly, considered among the early symptoms of dementia. It is therefore increasingly important to define more clearly the boundaries between normal and pathological aging. Recently, researchers have begun to build and apply Virtual Environments (VE) to the explicit purpose of better understanding the performance of episodic and PM in complex and realistic contexts, with the perspective of further developing effective training procedures that depend on reliable cognitive assessment methods. Virtual technology offers higher levels of realism than "pen and paper" testing and at the same time more experimental control than naturalistic settings. In this mini-review article, we examine the outcomes of recently available studies on virtual reality technology applications developed for the assessment and improvement of episodic and/or PM. To consider the latest technology, we selected 29 articles that have been published in the last 10 years. These documents show that VR-based technologies can provide a valid basis for screening and treatment and, through increased sensory stimulation and enriched environments reproducing the scenarios of everyday life, could represent effective stimulating experiences even in pathological aging.

Keywords: aging, pathological aging, virtual reality, episodic memory (EM), prospective memory (PM), assessment, cognitive training, cognitive impairment

## INTRODUCTION

Memory has been for centuries an intriguing field of brain research since it is a biologically essential function to the survival of almost all species (Bisaz et al., 2014). Memory is defined as the ability to acquire, process, store, and retrieve information (Fietta and Fietta, 2011). The remembering process is not a monolithic entity, but memory can be categorized and sub-categorized following many domains (Squire and Zola, 1996; Purves et al., 2001; Plescia et al., 2014). Conceivably due to the increasing human life expectancy and the growing incidence of severe diseases that can induce neuronal degeneration and alter neuronal excitability (Carletti et al., 2016, 2017; Jaul and Barron, 2017; Park and Festini, 2017), memory has been considered a core feature to study upon normal and pathological aging processes. Most people report some early age-related memory impairments since the age of 60, especially in longitudinal studies (Nilsson, 2003; Rönnlund et al., 2005). Even earlier, by the age of 30, the decline of some cognitive functions was evidenced in cross-cutting studies (Park et al., 2002). Nevertheless, specific memory abilities that do not entail the conscious recollection of previously experienced material seem not to be altered with normal and pathological aging (Mitchell and Bruss, 2003; Ballesteros and Reales, 2004). Classically, aging has been linked to neuronal loss independently of the brain region involved (Coleman and Flood, 1987). However, research involving healthy adults indicates that normal aging is always associated with morphological alterations in neurons belonging to structures involved in cognition (Tisserand and Jolles, 2003). Not all cognitive abilities are affected by aging, but impaired memory skills are generally reported by the elderly and give rise to bitter complaints (Craik, 2008).

Memory has been classified by time direction (Maylor, 1993): retrospective memory refers to the ability to retrieve past information. Focusing on episodic memory (EM) refers to long-term memories including specific information such as time, location, or perceptual details as well as the connection of multidimensional information (Tulving, 2002). Currently, the alteration of EM is considered the main early symptom of dementia (Gold and Budson, 2008), though, it is also common in healthy aging, therefore its failure is not specific of pathological aging (Nilsson, 2003; Rönnlund et al., 2005; Craik, 2008). Prospective memory (PM), on the other hand, is the ability to remember to execute previously planned actions and can be defined as ''remembering to remember,'' thus referring to the future, for example when you have to remember to take a drug at a certain time. Some studies suggest that the onset of pathological aging determines more difficulty in PM (Huppert et al., 2000). However, potential errors in PM may be associated with considerable risks (Smith et al., 2000; Maylor et al., 2002), for example forgetting to turn off the gas.

Considering this context, it is essential to differentiate normal and pathological aging, that is when aging brings about complications due to the presence of diseases such as Alzheimer's disease (AD), Mild Cognitive Impairment (MCI), Parkinson's Disease (PD), as well as other dementias (Hedden and Gabrieli, 2004; Craik, 2008) or diabetes type 2 (Redondo et al., 2016). Indeed, the progressive impairment in executive functions and memory processes in healthy adults could be exacerbated by the concomitant presence of common chronic diseases. Questions arise about if it is possible and how to inhibit pathological and non-pathological memory loss, especially considering the recent discovery of the ability of the nervous system to reconstruct cellular synapses upon interaction with enriched environments (Barak et al., 2013). A huge number of studies on the field have led to a better molecular understanding of different types of memory. Although, the most reliable way to assess memory processes in normal and pathological aging is still intensely debated.

Classic memory tests using paper and pencil or computer systems for the evaluation of EM usually require older adults to remember static stimuli, therefore, they may not provide sufficient detail for predicting patients' daily difficulties in different dynamic environments. Some studies have argued that neuropsychological assessments should provide a good degree of similarity to daily life tasks since the lack of ecological validity can negatively affect predictions about patient's memory failures (Schultheis et al., 2002; Parsons and Rizzo, 2008). Episodic retrieval, for example, requires information about central and perceptual details, space-time contextual elements, and the binding of this multidimensional information (Abichou et al., 2019). Similarly, in everyday life, people present motivational aspects and adopt strategies for coding daily intentions that are difficult to probe through classic tests. Moreover, these tests measure memory components in isolation and failing to offer a comprehensive understanding of their operation (Tulving, 2002). Even naturalistic observation is not always an effective solution due to a series of difficulties including problems of standardization, control of the stimuli and distractors, economic costs to physically build the observation environments, as well as security problems.

In this regard, an outstanding advantage could be posed by the usage of virtual reality (VR) that evaluates memory consolidation by interacting with an enriched everyday environment, which guarantees both laboratory analytical control and precise assessments of how memory and other cognitive processes operate. The main goal of VR is to allow the patient to undertake specific tasks through artificial sensory stimulation and the illusion of being in an interactive environment perceived as a real place (Mantovani and Riva, 1999; Riva et al., 2007; LaValle, 2019). This experimental application could be reconducted to the heterogeneous family of Embodied Cognition theoretical approaches claiming that the physical properties of the human body, especially perceptual and motor systems, must be considered essential factors for the development and functioning of a cognitive system and could modulate learning and memory formation (Madan and Singhal, 2012).

The concepts of immersion and presence, related to VR, can better describe the experience from the user's physical and psychological point of view. The immersion refers to the physical configuration of the interface of a VR application: the number and range of sensory and motors channels connected to the system determine the ''immersiveness,'' ranging from Non-Immersive (NI) systems on desktop computers to fully immersive systems. This distinction is based on how much the user can perceive the outside world during the virtual simulation (LaValle, 2019). The fully immersive types are indeed characterized by the use of a head-mounted display (HMD) in which a high-fidelity graphic screen is mounted in front of one's eyes with separate lenses for each eye. The interaction in this type of virtual reality is controlled by tracking the movement of the head in combination with a computer system, therefore when users move their head to look around, they consequently move their visual field within the virtual environment 360 degrees. The

presence, on the other hand, is defined as ''being in there'' and occurs when the subject experiences an illusion of non-mediation in his space of action and acts as he would if the medium were not present (Mantovani and Riva, 1999; Riva et al., 2007). Presence is a subjective ''response'' to a system that has a certain level of immersion (Sanchez-Vives and Slater, 2005) people react and act on it as if they were real. This response is on many levels, ranging from unconscious physiological processes (cerebral, cardiac, skin, etc.) through to deliberate volitional behavior (Slater et al., 2009). In particular, the ability to induce the sense of presence seems to have positive effects on attention and involvement and consequently is very relevant in the evaluation of memory (Sutcliffe et al., 2005; Makowski et al., 2017).

In this review article, we aimed to explore the most recent evidence about the ability to evaluate EM and PM, as well as to stimulate improvements in both normal and pathological aging, through representative and significant examples of applications in VR.

## METHODOLOGY

Initially, a systematic online bibliography search was carried out through the following profile databases: Web of Knowledge, ScienceDirect, PubMed, and Google Scholar, on the date of June 2019. We used the following core search terms and their combinations: VR or virtual environment, prospective memory or PM, EM or EM; and the following as additional search terms with ''Xor'' combinations: assessment, cognitive training, aging, pathological aging, cognitive impairment. Also, to get a broader and more complete view of the topic we have included studies on young adult subjects. Second, a selection of relevant articles was limited to the period 2009–2019 to obtain information mainly about outcomes referring to the latest technology.

The references selected were included in the review in case the following criteria were met as shown in **Figure 1**: research on the impairment of EM or PM in the aging; description of VR methodologies for assessment or training; a clear description of VR tools that determine the related level of immersion. Overall, 29 studies have been identified and summarized in this mini-review article and classified as in **Table 1**.

## VR FOR EPISODIC MEMORY IN AGING

Within Virtual Environments (VE), participants can be immersed in scenarios that represent different everyday situations such as virtual apartments (Sauzéon et al., 2012), grocery stores (Parsons and Barnett, 2017; Plechatá et al., 2019; Corriveau-Lecavalier et al., 2020) or city (Plancher et al., 2012, 2013, 2018; Abichou et al., 2019). This gives the chance to implement simple tasks to assess the versatile nature of EM in ecological situations in a rich and specific space-time context.

Plancher et al. (2018) analyzed the role of working memory (WM) while building an episodic trace, through an on-screen projected urban virtual environment. They reported that the memory of central information was altered by simultaneous tasks and that the memory of the temporal context and binding was compromised only upon the performance of a competing visuospatial activity. To the purpose of testing WM's key role in consolidating the EM, participants were asked to explore the environment using a steering wheel, a gas pedal, and a brake pedal. At the same time, a secondary numerical task interfering with the phonological cycle (e.g., storing the number TABLE 1 | Virtual reality (VR) studies in aging classified according to contribution in Episodic Memory (EM) or Prospective Memory (PM), kind of intended purpose divided into Assessment (AS) or Training (TR), level ofimmersion divided into Full-Immersive (FI) or Semi-Immersive (SI) or Non-Immersive (NI); Experimental subjects (ES); the presence of Information about Cybersickness (IC); type of Environment; the number of Training Session (TS); navigation Time and Main results.


(Continued) VR Applications for Memory


Human Neuroscience | www.frontiersin.org

Frontiers in



(Continued)


of garbage containers in the path) and a secondary visuospatial task (e.g., memorizing the spatial model of containers) were applied to predict that secondary activities performed during learning would interfere with coding and resulted in altered memory performances.

One of the main advantages of VEs is, indeed, the ability to precisely model and control the environment itself according to the requirements decided by the experimenter, avoiding possible problems of building real scenarios (Sauzéon et al., 2012). Due to the extreme adaptability of this technique, EM in VE has already been tested in clinical contexts (Plancher et al., 2012; García-Betances et al., 2015; Serino et al., 2015, 2017). To compare VR memory tasks with traditional neuropsychological tools for its evaluation, Plancher et al. (2012) conducted a study on healthy participants, patients with amnestic MCI (aMCI) and with mild Alzheimer's. The experimental groups were asked to store as much information as possible during active and passive browsing conditions. The virtual task allowed characterizing the different cognitive profiles of the three populations and the authors found that spatial allocentric memory assessments discriminated against patients with aMCI from controls. Nevertheless, after active exploration of the VE, all participants, including patients with aMCI and AD, retrieved significantly better both central and allocentric spatial information and the process of binding. As pointed out by the authors, these results about active exploration are particularly promising because they provide support for the feasibility of VR as an effective non-pharmacological tool to promote neuroplasticity and neural reorganization in patients with AD.

Preclinical studies on aging have shown that immersion in enriched environments may drive long-term enhancement of the activity of the hippocampus and changes in memory-associated brain regions inducing structural changes in animal models (Harvey et al., 2009). Clemenson and Stark (2015) discovered that young adults trained with Super Mario 3D showed better spatial and EM performance dependent on hippocampus activity, compared to people trained in a 2D-controlled game. More recently, West et al. (2017) proposed the same 3D platform training applied to an elderly population reporting increases in gray matter thickness in brain regions known to be implicated in cognitive-related decline. Also, it was suggested that a greater feeling of presence improves the effectiveness of VR applications (Optale et al., 2010). It was found that higher levels of presence were associated with better factual memory and the impact of the emotional stimulus was mediated by a sense of presence (Makowski et al., 2017).

As pointed out by Repetto et al. (2016), besides environmental enrichment in VR research, two main aspects may be advantageous in the context of EM's study. The first is that VR allows exploration from an egocentric point of view (Bergouignan et al., 2014; Serino et al., 2015, 2017); i.e., Bergouignan et al. (2014), using an out-of-body-induced illusion, have reported that an accurate EM encoding is favored by the perception of the world from the perspective of one's own body. Second, VR enables active exploration of the environment. However, comparisons of active and passive navigation showed contradictory results with both negative (Taillade et al., 2013) and positive effects of active navigation (Sauzéon et al., 2012; Plancher et al., 2013).

Employing a similar paradigm to Plancher et al. (2013), Jebara et al. (2014) assessed the performance of a sample population of young adults and seniors in a virtual city projected on a screen. Four interaction conditions were included: ''passive'' (passengers in a virtual car cannot choose directions and route), ''itinerary control'' (passengers can choose), ''low control'' (driver move the car on rails) and ''high control'' (driver choose also direction). Better scores in EM (what—where—when and binding) in both young and old groups were obtained only in the conditions of choosing the route and low navigation control. This suggests that EM performance benefits from multimodal coding, through the enrichment of motor interaction and that it has been improved by active navigation when it is not too expensive in terms of attention efforts. According to some authors (Bakdash et al., 2008; Sauzéon et al., 2012; Jebara et al., 2014), active navigation may require additional cognitive resources that are not fully available for the coding process. Consequently, inconsistent results on memory performance related to the active-passive navigation may be due to differences in the manipulation of sensorimotor stimulation and its confusing effects on cognitive activity, as shown by several studies reporting worse memory performance caused by split attention (e.g., Craik et al., 1996). Plancher et al., 2013, have shown that driving in a VR while encoding information can be considered as a double task in which motor control can impact factual memory.

However, although these results suggest that active navigation VR training may have a beneficial effect on EM, it should be noted that older adults perform worse than young people, particularly in binding scores. This age-related effect noted in the low control condition encourages greater attention from research on the elderly regarding the complexity of the motor task which would risk having diametrically opposite effects on memory.

Lastly, as the use of HMD spreads, the effects of active and passive navigation must also be investigated in fully immersive environments especially for elder persons who are not familiar with technology and not used to handle it.

## VR FOR PROSPECTIVE MEMORY IN AGING

A growing number of studies on memory have focused on its prospective side, but uncertainties remain regarding the characteristics of PM impairments. It is still to be fully unveiled the influence of the executive functions, the life-span development of prospective remembering and the age effects, the underlying mechanism involved in event-based or in time-based PM task and the role of motivational aspect (Kliegel and Martin, 2003). Compared to traditional laboratory paradigms, virtual reality creates realistic tasks for the evaluation of the PM, increasing the variety of possible actions to perform. This allows measurements of multiple cognitive processes involved and to systematically control interactive stimuli with immediate feedback on performance through sensory modalities. Nolin et al. (2013) exploited an eMagin Z800 immersion target on a population of older adults with MCI vs. healthy controls in an urban environment. This VR-based evaluative approach in PM tasks could be more sensitive to the effects of MCI than traditional neuropsychological ones such as the Rivermead Behavioral Memory Test (RBMT; Wilson et al., 1985). Indeed, although RBMT was widely used in clinical settings, it does not include enough PM tasks to generate many types of performances and does not assess time-based PM performances (Mioni et al., 2014). VR, reaching a higher level of complexity, requires more cognitive resources to perform tasks, therefore could better represent the person in real life.

Based on this assumption, VEs are used to explore central theoretical questions about how the cognitive system successfully codifies and recovers intentional behavior (Gonneaud et al., 2014; Trawley et al., 2014). Gonneaud et al. (2014) assessed the impact of connections between the potential component (PC; remembering that something needs to be done) and the retention component (RC; the content of intention) of PM, in a Semi-Immersive (SI) urban environment where subjects could navigate using a virtual car. More specifically, the link between PC and RC affects the distinction between PM based on the appearance of an external cue (EB) and based on the automatic start of intention after a time interval (TB). Nine tasks were presented to the subjects: with a clear link between PC and RC (Link-EB; e.g., buying stamp book at a post office) or without (noLink-EB; e.g., buying eyeglasses at the fountain). Link-EB produced better performance than noLink-EB and TB, highlighting the importance of the association processes between PC and RC for effective PM. Similarly, Lecouvey et al. (2019) explored in VR the effects of mild AD on PM, showing that both the PM components are significantly compromised in AD patients, but RCs of intentions are altered before PCs. These data supported the hypothesis that early impairments of EM have a great impact on the execution of PM tasks in AD.

This VR approach is a more realistic tool that could help to better highlight planning processes, motivational aspects, time estimation, or eventual difficulties in dual-task processes in PM impairments in everyday life (Gonneaud et al., 2014; Lecouvey et al., 2019). Also, this approach could be useful to provide more efficient therapeutic interventions (Meijer et al., 2009) and a better measure of training effectiveness concerning less naturalistic performance. Indeed, VR technology has also been applied to cognitive training, such as in the paradigm of ''Virtual Week'' (Yip and Man, 2013; Rose et al., 2015). Mioni et al. (2015) outlined that VR improved PM performance in PD patients for the first time by using emotionally enriched tasks. Participants were asked to remember to carry out actions with positive value (e.g., ''tell Roberta that Maria had a baby when you talk to Roberta''); with a negative value (e.g., ''Pay a fine for speeding when you go shopping'') or with neutral value (e.g., ''Buy your bus ticket after breakfast''). The tasks of PM with positive emotional value showed better results than tasks with negative or neutral value in both normal and PD patients, although the latter showed worse performance than the control group, independently of the emotional valence of the cue. Experimental data also provided improved outcomes in remembering to perform tasks with pleasant content compared to neutrals, since positive stimuli may attract more attention resources, hence facilitating the recovery and execution of PM actions. Furthermore, results seem to indicate that the use of a fully immersive task is feasible in the elderly: it arouses presence, it is addictive and causes limited symptoms of illness (Ouellet et al., 2018; Corriveau-Lecavalier et al., 2020).

However, it is important to note that due to technological limitations, state of art immersive environments do not correspond exactly to the real world. This contributes to the manifestation, in some users, of symptoms similar to those of the classic motion sickness called cybersickness, resulting from the conflict between the visual, vestibular, and proprioceptive sensory systems. Factors such as the previous familiarity with technology, age, or the presence of diseases can play an important role but, in particular, among older people, factors such as rotational speed and duration of exposure seem to increase cybersickness (Liu, 2014).

In the literature reviewed here, visual stimuli are presented mainly through SI systems, nevertheless, several studies applied fully immersive HMD systems (Nolin et al., 2013; Parsons and Barnett, 2017; Ouellet et al., 2018). A recent study conducted by Dong et al. (2016) compared desktop monitor activity and immersive VR activity with Oculus Rift to investigate whether traditional lab PM activity produced a similar result to PM activity in the VR environment. It was reported that while the performance of standard computer monitor tasks does not significantly correlate to the VR scores, a negative correlation between desktop reaction times and VR scores was observed, as well as a positive correlation between VR and desktop reaction times. VR seems to be more sensitive in accurately assessing PM in life. This is because the slidebased activity requires fewer cognitive resources than VR, where participants' cognitive load has been increased and more correctly identifies worse scores when reaction times are slower. Notwithstanding, as pointed out by Plechatá et al. (2019), research should provide further clarification on the comparison between desktop platform performance and those in HMD, since different results in older people may be because participants have already had previous experiences with desktops but not with HMD platform that would lead to increased fatigue. Consequently, under certain circumstances, the level of immersion and a more complex context could be a problem for memory ability and participants may have limited cognitive resources for the memory task. Structural features such as movement or sensory feedback of the virtual environment can involve participants' attention by draining cognitive energy because older adults unfamiliar with technology may find their management frustrating and this may distract from their virtual reality experience, making the task more demanding than for young adults.

Nonetheless, VR immersiveness is essential for exploiting the procedural involvement that allows us to predict the cognitive behavior of subjects as realistically as possible. In many of the described experimental environments, exploration took place while sitting, but people must navigate in the same way as they do in the real world (Tieri et al., 2018). The need arises from the fact that immersive VR navigation offers the user a much wider range of movement to approach or physically realistically interact with the virtual world, while the sitting position requires a set visual height, longer movements, and controller-based environment navigation. This freedom of movement is particularly problematic for VR-based PM studies that combine neuroimaging techniques (Dong et al., 2016) to ascertain changes in brain hemodynamic responses, and neuromodulation to improve PM performance in senior subjects (Debarnot et al., 2015).

Neurophysiological changes associated with VR neurorehabilitation can be measured using non-invasive and portable neuroimaging techniques, including fNIRS and/or EEG, equipped with a neuroergonomic and wireless approach, to measure cerebral blood flow in real-time during VR activity. Recently a study by Dong et al. (2017) investigated the function of the prefrontal cortex during a PM activity in an immersive VR environment via Oculus Rift and an OEG-16 multi-channel fNIRS system that allowed solving the problems often present for EEG such as the difficulty of movement for the application of electrodes. By using a virtual shopping experience, this study provided early confirmation of Brodmann area activation in a PM activity in VR but further studies are still needed to evaluate if and which other areas could potentially be involved during memory tasks in VE.

## CONCLUSIONS

The present review article provides a snapshot of virtual reality technology applications developed for assessment and improvement of episodic and/or PM, with the idea to suggest the integration of the most recent technological advancements to cognitive and aging neuroscience. It could be considered a flaw in our choice to also include studies employing young adults, though not strictly representative of the aging process. This inclusion, however, is based on a conceptual approach that recognizes that the study of aging must cover both older and younger populations trying to bridge what has been called a gap in geriatric research (Moffitt et al., 2017). The goal is to select additional evidence that currently cannot be obtained with adult subjects.

More importantly, it should be noted that the literature here reviewed included studies that used any form of VR technology, including non, semi, or totally immersive. It is important to note that the keywords of these reviews included the term ''virtual reality'' although this differs in terms of technical qualities and level of ecological validity. We believe that recognizing the centrality of the nomenclature in this field and the need for greater uniformity of language will create a more coherent and connected research field.

Also, the implementation of clinical VR research outside the laboratory still presents significant challenges that need to be addressed. The type of VR technology and the experimental designs implemented vary greatly between studies with many approaches using completely different hardware, software, or paradigms. However, the positive results provide reasons for more rigorously controlled research, necessary to progress from feasibility studies and pilot tests to standardized protocols that can be shared by the research community.

The literature here reviewed suggests that VR protocols offer an additional tool and excellent opportunity for innovative assessment and training options, particularly important in early identification of the subtle amnestic deficits that usually elude traditional methods. Some of the related advantages are the easy adaptability and the ability to replicate ecologically valid environments present in everyday life, allowing precise measurements of the cognitive processes involved. Furthermore, the possibility of providing a more stimulating context than in traditional laboratories can generate positive motivation in the elderly. However, the possible limitations associated with the perception of VR technology must be taken into account. The results of higher-immersive studies or greater interactivity are inconclusive in terms of the benefits of evaluating or training in the elderly population, particularly in pathological aging. Also, the introduction of the clinical population into VEs raises particular ethical and safety problems: some users experience health problems associated with the use of immersive HMD though these effects are mild and quickly fade. Susceptibility

## REFERENCES


to cybersickness appears to be limited, but it could be related to short exposure times. Consequently, an in-depth research is needed to investigate how aging can affect motion sickness caused by immersive environments, to avoid the risk of reducing rather than increasing the ecological validity.

Finally, it is interesting to note that VR technology can be easily combined with other forms of technologies such as neuromodulation (tDCS) and neuroimaging (fNIRS/EEG) that can be considered valuable and indispensable tools to increase the benefits of VR. This perspective could provide a more targeted approach to neuro-training and will be the core of future research on the field.

## AUTHOR CONTRIBUTIONS

AR and GG conducted the search and selection of bibliography. VR designed and directed the project. AR, GG, PS, and VR wrote the manuscript. All authors contributed to the article and approved the submitted version.


performance assessment test. Cyberpsychol. Behav. 11, 17–25. doi: 10.1089/cpb. 2007.9934


**Conflict of Interest**: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Rizzo, Gambino, Sardo and Rizzo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.