Usability Issues of Clinical and Research Applications of Virtual Reality in Older People: A Systematic Review.

Aging is a condition that may be characterized by a decline in physical, sensory, and mental capacities, while increased morbidity and multimorbidity may be associated with disability. A wide range of clinical conditions (e.g., frailty, mild cognitive impairment, metabolic syndrome) and age-related diseases (e.g., Alzheimer's and Parkinson's disease, cancer, sarcopenia, cardiovascular and respiratory diseases) affect older people. Virtual reality (VR) is a novel and promising tool for assessment and rehabilitation in older people. Usability is a crucial factor that must be considered when designing virtual systems for medicine. We conducted a systematic review with Preferred Reporting Items for Systematic reviews and Meta-analysis (PRISMA) guidelines concerning the usability of VR clinical systems in aging and provided suggestions to structure usability piloting. Findings show that different populations of older people have been recruited to mainly assess usability of non-immersive VR, with particular attention paid to motor/physical rehabilitation. Mixed approach (qualitative and quantitative tools together) is the preferred methodology; technology acceptance models are the most applied theoretical frameworks, however senior adapted models are the best within this context. Despite minor interaction issues and bugs, virtual systems are rated as usable and feasible. We encourage usability and user experience pilot studies to ameliorate interaction and improve acceptance and use of VR clinical applications in older people with the aid of suggestions (VR-USOP) provided by our analysis.


INTRODUCTION
Life expectancy is rapidly increasing and is expected to rise in the years to come, thereby creating an aging population. However, a significant proportion of older people may develop frailty, multi-morbidity, and disability causing a significant impact both on their quality of life and also on health care and social costs (Lutz et al., 2008;World Health Organization, 2015). Aging is associated with physiological changes (e.g., apoptosis, senescence, inflammation) that may lead to systemic alterations (Flatt, 2012). This potential decline may involve sensory, mental, and physical functioning thus leading to-increased morbidity, multi-morbidity, disability, and mortality (World Health Organization, 2015). On the other hand, motor skills, visual, hearing, proprioception, and cognitive abilities (e.g., memory) may be reduced even in healthy older people (Kuehn et al., 2017). In addition, aging hampers psychosocial well-being by adding new developmental tasks or situations (e.g., isolation; Steptoe et al., 2015). In particular, the prevalence of Alzheimer's disease, cancer, chronic obstructive pulmonary disease, maculopathy, osteoarthritis, osteopenia, Parkinson's disease, periodontitis, rheumatoid arthritis, sarcopenia, cardiovascular diseases, and type 2 diabetes increases with age (Tolosa et al., 2006;Dubois et al., 2010;Marengoni et al., 2011;Edwards et al., 2015;Steenman and Lande, 2017;Yakaryilmaz and Öztürk, 2017;Franceschi et al., 2018). Additionally, several clinical conditions may jeopardize the well-being of older people, such as mild cognitive impairment, frailty, or metabolic syndrome (Fried et al., 2001;Petersen, 2004;Portet et al., 2006;Huang, 2009;Xue, 2011;Fedarko, 2012). The main priority of successful management of aging is enabling older people to be healthy, active, and autonomous for as long as possible (World Health Organization, 2002). Accordingly, functional decline is one of the key issues to be managed (World Health Organization, 2015). Among other practices, the use of assistive health technology (AHT; i.e., technologies devoted to maintain or improve functionality, autonomy and well-being) or medical devices (MD; i.e., technologies used for prevention, diagnosis and treatment) may also produce a beneficial effect in older people (Garçon et al., 2016); however, a critical aspect is to ensure accessibility and use of these technologies in the older population (World Health Organization, 2015;Beard et al., 2016).
Virtual reality (VR) is one of the emerging AHT and MD in the field of aging, frailty, and disability (Lange et al., 2010;Bohil et al., 2011). VR is defined as a system based on an interactive computer-simulated 3D environment (Gorini and Riva, 2008), which incorporates mainly auditory and visual feedback, and sometimes also haptic. VR can be divided in non-immersive, semi-immersive, and fully immersive systems (Mujber et al., 2004). The non-immersive system is a desktop-based VR with low interaction (e.g., keyboard, joypad) and immersion (e.g., PC, tablet). The semi-immersive system consists of a large monitor/projector with moderate immersion and interaction (e.g., Kinect, data gloves). The immersive system is characterized by the use of tools such as a head-mounted display (HMD) or the cave automatic virtual environment (CAVE) that enables a high degree of interaction (e.g., trackers) and immersion in the virtual environment (VE). Additionally, VR can be conceptualized as a continuum between reality and virtuality, where some aspects of VE are mixed with the real environment (augmented reality) or vice-versa (augmented virtuality) (Milgram et al., 1995). The sensorimotor channels connected to the VR define the degree of immersion; the psychological consequence of immersion on perception is the sense of presence that felt through being in the VE or, alternatively, the "perceptual illusion of non-mediation" with the VE (Riva, 2008;Bohil et al., 2011). Moreover, mobile applications (e.g., tablet) with tracking systems of the user and/or visors (e.g., Google Cardboard) can be considered mobile VR that allow for different degrees of immersion and interaction with the VE Fang et al., 2017).
VR has several requirements for motor and cognitive neurorehabilitation interventions: repetitive practice, feedback about performance, multimodal stimulation, and controlled, secure, and ecologically valid environments (Bohil et al., 2011). It is possible to control and manipulate tailored exercises within meaningful and motivating environments using virtual environments, i.e., transformation of flow (Riva et al., 2006). For these reasons, VR has been utilized for rehabilitation in different fields and, particularly, after stroke. Accordingly, guidelines have recently included the use of VR for both motor and cognitive rehabilitation in patients who suffered a stroke (ISO, 2016b;Winstein et al., 2016). However, access to this kind of technology may be limited by the lack of accessibility in the older population, as compared to other AHT and MD (World Health Organization, 2015). For instance, VR in the context of stroke rehabilitation is facing challenges concerning end-users' interaction, such as feasibility of VR training, lack of functional relevance, patient frustration to feedback, and lack of integration of environmental factors that link to motor performance (Teo et al., 2016).
On the macroscopic level, access to AHT and MD is limited by socio-demographic and economic terms, while on the microscopic level, access is the use itself of a device. Indeed, according to the MOLD-US framework (Wildenbos et al., 2018), the use of technology among older people is hampered by different barriers: (1) cognitive (e.g., reduced working memory, spatial cognition, attention, language, and reasoning) and motivational (e.g., self-efficacy, self-confidence, benefits identification, computer literacy, integration in daily life) that affect the use with errors, efficiency, learnability, memorability and satisfaction; (2) physical (e.g., motor speed, flexibility, hand-eye coordination, strength) and perception (e.g., vision, auditory, haptic) that influence errors and efficiency. According to Nielsen (Nielsen, 2012), usability is defined by learnability (is it easy to accomplish the task?), efficiency (once learned, is the user fast in performing the task?), memorability (is the user able to reestablish proficiency with the design after a period of stop?), errors (how many errors does the user make?) and satisfaction (how pleasant is the design?). Along with usability (i.e., easiness and pleasure), the technology should provide the attributes needed by the user (i.e., utility). Usability can be assessed by a means of a wide range of methods, such as the system usability scale (SUS), heuristic evaluation, cognitive and pluralistic walkthrough, formal usability, pluralistic, consistency, and standard inspections (Brooke, 1986;Nielsen, 1994).
Nevertheless, usability tends to focus more on the task rather than on the experience (Vermeeren et al., 2010). Indeed, researchers investigating user experience (UX) point out a role of factors that go beyond the technology and its usability/usefulness. UX facets embrace emotion and affective reactions toward the technology and experiential, hedonic, holistic, and aesthetic factors. The interaction with a technology is "a subjective, situated, complex, and dynamic encounter" (Hassenzahl and Tractinsky, 2006). If it is true that satisfaction plays a critical role in usability, UX takes into account emotions, motivation, and expectation of human-computer interaction (Vermeeren et al., 2010). For instance, the user experience questionnaire (UEQ) aims at evaluating six factors: attractiveness, perspicuity, efficiency, dependability, stimulation, and novelty (Laugwitz et al., 2008), or the usability metric for user experience (UMUX) taps UX facets of usability (Finstad, 2010). Additionally, 96 UX methods (http://www. allaboutux.org/all-methods) have been identified in the UX research field (Vermeeren et al., 2010). UX methods range from qualitative to quantitative techniques, target technology, period of assessment (e.g., developmental, conceptual), time, information source (e.g., experts, specific users, individual, group), and location (e.g., lab, online, field). Methods range from semantic differential, checklists, heuristics, think-aloud, psychophysiological measures, self-report, questionnaires, in situ observation, and video analysis (Vermeeren et al., 2010). A critical aspect of UX is the prototype development (Novak, 2008), which follows the concept (idea) and pre-production (demo) phases and precedes production & localization (development), Alpha, Beta, and Gold/post-production phases.
A wide range of theories have been proposed to understand and explain user acceptance and use of technology (for a literature review see Taherdoost, 2018). The most inclusive model is the unified theory of acceptance and use of technology model (UTAUTM) (Venkatesh et al., 2003), which includes the technology acceptance model (TAM), theory of reasoned action, theory of planned behavior (TPB), combined TAM and TPB, model of PC utilization, the diffusion of innovation model, motivational model, and social cognitive theory. In this model, the significant factors are: effort, expectancy, performance expectancy, social influence, and facilitating conditions. Interestingly, starting from the TPB (Fishbein and Ajzen, 1975), TAM (Davis et al., 1989), and UTAUTM (Chen and Shou, 2014), developed the senior technology acceptance model (STAM). Controlling age, gender, educational level, and economic status, their model included gerontotechnology self-efficacy and anxiety, facilitating health conditions, cognitive abilities, social relationships, attitude to life and satisfaction, and physical functioning as factors that influenced perceived usefulness, usage behavior, and perceived ease of use, which in turn affects general attitude toward the use. A similar model (senior citizens' acceptance of information systems; SCAIS) was developed by Phang et al. (2006). This model takes into account preference for human contact, self-actualization, resource saving, anxiety, computer support, physiological decline which influences perceived usefulness, ease of use, internet safe perception and in turn, intention. Another theoretical framework used to approach technology use and acceptance is the user-centered design (UCD). UCD enables technology systems to be made more usable and interactive to end-users, but it can also be applied to assess needs, wants, and limitations of general products (Sebe, 2010;ISO, 2016a;Brox et al., 2017). UCD can be investigated using a variety of qualitative and quantitative methods such as field studies, user requirements analyses, iterative design, usability evaluation, task analyses, focus groups, user interviews, participatory design, and prototypes (Vredenburg et al., 2002). UX can be explored with the playability model (i.e., immersion, socialization, emotion, satisfaction, effectiveness) that is crucial when building games for clinical purposes (Sánchez et al., 2012;Valladares-Rodriguez et al., 2019); emotive design for VR should be followed for designing human-computer interaction systems (see Vredenburg et al., 2002).
Lastly, human-computer interfaces are also conceptualized in terms of architecture and layers needed to provide a service (Tsai et al., 2012;Nikitina et al., 2018). For instance, the user remote console (URC) is a framework used for telemedicine systems to define abstract user interface layers, hubs, and devices. If a researcher wishes to consider a VR AHT or MD for healthcare purposes, in addition to the usability and UX aspects, they may want to assess the sense of presence in the VE. According to the Inner Presence theory (Lee, 2004;Riva and Waterworth, 2014), presence is not necessarily related to media characteristics (e.g., graphic realism) but rather to an everyday life flow that controls actions through a constant intentions-perceptions comparison. In this sense, a VR user may experience the system as usable, as they are able to enact actions thanks to an easy-to-learn interface that tracks user's movements, an understandable game/training structure, and engaging storytelling (Triberti and Riva, 2016). These elements are particularly relevant for videogames and serious games used also for therapeutic purposes (Sáenz-de-Urturi et al., 2015). This conceptualization of presence has relevant consequences when taking clinical practice and change into consideration. VR clinical applications should exploit the transformation of flow (transformative and optimal experience allowed by the sense of presence) to discover and use new and unexpected resources to foster clinical change (Riva et al., 2006 and consider sensorimotor and cognitive impairments in the old population to customize VR for cognitive  or physical (Pedroli et al., 2018) rehabilitation.
This paper aims at systematically reviewing the studies that evaluated feasibility, usability, and UX of assessment and treatment VR systems in healthy aging and age-related clinical conditions. In order to provide an overview of the current research status we analyzed characteristics of participants involved, technological apparatus and use, usability/UX assessments, theoretical framework, and primary outcomes. VR use is classified as the task being accomplished and the training sessions and the aims, which include assessment and rehabilitation. Additionally, we outlined suggestions to assess usability of VR applications for older people in clinical and research contexts.

METHODS
Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) guidelines were followed (Moher et al., 2009).

Search Strategy
Three high-profile databases (PubMed, PsycINFO, and Web of Science) were used to perform the computer-based research on 3 September 2019. The string used to carry out the search (Title/Abstract for PubMed, Topic for Web of Science, Abstract for PsycINFO) was as follows: ("aging" OR "frailty" OR "elder * " OR "multimorbidity") AND ("usability" OR "user experience" OR "UX" OR "user centered design" OR "human centered design" OR "human computer interaction") AND ("virtual"). The search resulted in 507 articles for Web of Science, 22 for PubMed, and 20 for PsycINFO (total of 529). We made a first selection by reading titles and abstracts after removing duplicates. A total of 66 manuscripts were chosen for full-text screening. This procedure resulted in 25 experimental studies. See the flow diagram (Figure 1) for the paper selection procedure.

Selection Criteria
Studies concerning the usability, UX, and feasibility of VR (see introduction for definition) systems for assessment/monitoring and rehabilitation/empowerment in healthy and pathological aging were included. In particular, we focused on the age-related clinical conditions in older people. We excluded articles that did not involve usability of VR clinical systems in non-agerelated conditions that do not fall into the context of frailty, multimorbidity, or chronicity in aging and with technologies that do not meet VR definition. Additionally, studies for which the full text was not available or for which the abstract lacked basic information for review were removed. Non-English papers, reviews, meeting abstracts, conference proceedings, notes, case reports, letters to the editor, research protocols, patents, editorials, and other editorial materials were also excluded.

Quality Assessment and Data Abstraction
PRISMA guidelines were strictly followed; search results found by the first author (CT) were shared with the review author (MC) for individual selection of papers in order to reduce the risk of bias, and disagreements were resolved through consensus. The risk of bias for each single study was assessed following the Cochrane guidelines (Higgins et al., 2011) by CT and MC. The research question was formulated according to suggested PICO (Population: older people with age ≥ 65, Intervention: VR for assessment or rehabilitation in age-related conditions and diseases, Comparison: N/A as usability at this time adopt quasi-experimental or pilot study designs (see also risk of bias Supplementary Figure 1), Outcome: measures of usability and acceptance) research question guidelines (Abigail et al., 2014). The Comparison is mainly applied to randomized clinical trials and within our search only one study (Schwenk et al., 2014) satisfied this criterion. Consequently, data extracted from each included study were as follows: reference, year, sample (s), aims, technology, VR training, technology design framework, usability/UX/feasibility assessment tools, primary outcomes, and type (assessment/rehabilitative) of VR system.

RESULTS
Our search identified several usability, user experience (UX), and feasibility studies in healthy aging and age-related clinical   conditions. A critical aspect of virtual reality (VR) and new technologies is their interaction with humans and in particular, those whose physical, psychological, or social barriers hamper the use of technological devices. The aim of this systematic review was to analyze the current research in the field of usability of clinical VR systems in older people and to provide an overview on this topic. Findings are shown in Table 1 according to reference, year, sample(s), aims of the study, VR technology, VR training, theoretical framework, usability assessment, primary outcomes, and clinical aims. Figures 2-8 summarize the results as well.
Which Are the Samples Involved in VR Usability Studies?
Concerning the tools used, a variety of quantitative and qualitative methods are reported. However, it is important to remember that each of these instruments assess different aspects of usability and acceptance; some are more concerned about the task to perform (e.g., SUS) while others tap the emotional/motivational elements of the interaction (e.g., UX questionnaires) or the factors that hamper/facilitate the use of a technology (e.g., TAM-based tools). Qualitative tools are able to grasp different perspectives (individual or group) of the experience or the design by asking experts in the sector or the end-user itself. A multidimensional approach emerged in our search and should be preferred when selecting assessment tools.

Are VR Clinical Systems for the Older People Usable?
In this section we outlined the findings of the included studies, reporting their strengths and weaknesses. Figure 8 shows mean and standard deviation for the available SUS scores, which display moderate to acceptable usability despite some cases of wide variation. Cook and Winkler (2016) showed that OA find virtual environments (VE) from Second Life (SL) as feasible and applicable for healthcare purposes, especially for improving social interactions. Despite a high number of drop-outs, participants liked the realism and virtual experience (e.g., sports, changing avatar, teleporting, shopping) but bugs frustrated them and they found it hard to control the avatar and to learn SL. According to users, SL might be improved by clear training (i.e., individualized, small group), step-by-step teaching, by enlarging the screen, and facilitating the interaction. The exergame Falls Sensei was rated as engaging and usable for educating OA about risk fall (Money et al., 2019). Falls Sensei was rated as having a good usability (score SUS > 70, Bangor et al., 2009), especially by older users. Unified theory of acceptance and use of technology (UTAUT) thematic analysis on interviews (i.e., performance expectancy, effort, social influence) showed that users rated the training as a useful, positive experience, relevant for specific populations. Similarly, the Positive Bike (Pedroli et al., 2018) was rated as having good usability (mean SUS = 76.88, SD =17). Problems were found concerning the size of items on the screen and low realism or interaction users felt in the VE, but still had a positive experience and found the system useful. Stand Tall (ST) (Shubert et al., 2015) was rated by participants as having a nearly good usability (mean SUS = 65.5, SD = 21.2) and agreed in using ST to improve balance autonomously and accepted the Kinect sensor and the avatar. Senso system (Rebsamen et al., 2019) had high adherence, usability (mean SUS = 93.5, SD = 5.52), enjoyment, usefulness, and acceptability, also confirmed by think aloud technique. Similarly, van Beek et al. (2019) found optimal adherence and motivation toward their VR training. Despite some interaction issues with LMC and difficulty of the exercises, the system had marginal usability (mean SUS = 58.25, SD = 17.9) and was also rated positively at the interviews. Lineage was evaluated with high satisfaction by its users (Sáenzde-Urturi et al., 2015). Gaming experience was positive, exercise adequate, and participants stated that they would use the game again. SUS improved across the three sessions (first mean SUS = 73.84, SD = 4.72; third mean SUS = 86.25, SD = 3.06). Acceptable usability was reported by OA and stroke patients for the TheraGame (mean SUS = 73.8, SD = 14.5) that also found the VR training adequate and enjoyable (Kizony et al., 2006). Good usability (first session mean SUS = 75.4, SD =13.8) was found by Vanbellingen et al. (2017) in their upper limb video game with a leap motion controller; however, usability did not change across the nine sessions. The training had a compliance of 87.4% and the adherence was rated as very good and remained stable across time. Users expressed that a 30 min session is the best time to not overload arm fatigue. Optimal (100%) adherence and good acceptance (e.g., ease, usefulness, intention to use) were found by Wüest et al. (2014). Nikitina et al. (2018) found that usability of the virtual gym App did not differ between groups with social interaction (mean SUS = 63, SD = 9) or interaction with coach only (mean SUS = 66, SD = 14). Moreover, the participants positively accepted the app, with high co-presence for the interaction group (interactions occurred especially with private messages), but adherence was similar for individual vs. group exercises with social support predicating adherence when social connections are low. Despite Corno et al. (2014) finding that virtual-multitasking test (V-MT) induces cybersickness symptoms, it was rated as usable (mean SUS = 69.17, SD = 8.2), the head-mounted display (HMD) was comfortable, interaction with the wand was difficult, instructions hard to remember, and realism sufficient. Similar results on HMD were found by Plechatá et al. (2019). HMD lead to the worst memory performances compared to non-immersive VR in OA, with users preferring neither desktop-based VR nor immersive VR, whereas young users liked immersive versions of the virtual supermarket shopping task (vSST). However, authors suggest non-immersive scenarios for OA. Fordell et al. assessed VR-DiSTRO, an immersive VR version of "paper and pencil" neglect neuropsychological battery, and showed that stroke patients tolerated and were engaged during the assessment, which was much faster than the classic evaluation (Fordell et al., 2011).
In order to design Game Up exergames and a senior-UCD model (Brox et al., 2017), it is crucial to involve older people and experts to create safe, fun, and usable games. Three-point Likert scale short questionnaires are suggested for end evaluations, whereas in the requirement, design, and implementation phases, interviews, observations, and group discussions are preferred for senior UCD. Similarly, in order to develop the Butler app (Castilla et al., 2013) it is important to gather information from end-users and experts from the first stage of the development and to create prototypes of the app. Graphics and navigations systems must be adequate and understandable for older people in order to reduce mental load. In the same way, the Health Buddies app (Desteghe et al., 2017) was initially designed with the end-users (AF patients and grandchildren). Participants, especially patients, were motivated to use the app but its usage decreased across 90 days. Despite adherence improving only in one patient, the UX with the app was easy to use and educational, and 60% of patients would use the app again. Experts and end-users of a joint rehabilitation virtual therapy were also involved in the evaluation phase in Epelde et al. (2014). Medical professionals and patients positively accepted the virtual therapist and training but patients stated that the avatar was too serious and lacked empathy. A team of experts developed an augmented reality exergame (Im et al., 2015), which did not have any side effects (e.g., cybersickness) and led to high adherence to the training.
The Interactive Trainer (Kiselev et al., 2015), despite some technical problems being reported, was evaluated according to interviews as easy to use, challenging, and motivating. Schwenk et al. (2014) assessed the gaming UX of a exergame with sensors, which was found to be effective, fun, easy to learn thanks to feedback, adequate, and well-designed. Interestingly, Valladares-Rodriguez et al. (2019) aimed at assessing UX and player eXperience (PX) of Panoramix neuropsychological touchscreen battery in OA, mild cognitive impairment (MCI), and Alzheimer's disease individuals. They found that Panoramix perception and acceptance were positive after the pilot study in the groups but was judged as more playable by OA, MCI, and AD in this order; nevertheless, PX improved after the second interaction in all groups. Additionally, administrators also evaluated the battery as playable, usable, useful, and with a good interface. Morán et al. (2015) used a TAM-based questionnaire and video analysis to assess usability. Users rated the VR gesture therapy (GT) as useful, easy, and with high UX and found that even technological expertise did not affect task performance. By analyzing verbal and non-verbal reactions, raters judged the system as more usable and fun for non-expert participants. Conversely, anxiety was low for expert users. Authors defined two approach strategies according to expertise, explore-andlearn and score-and-complete, respectively, for inexperienced and experienced participants that guided behaviors (e.g., anxiety, interaction strategies with the games) and reactions through the experience.
A comparison of semi vs. full immersive versions of Motion Rehab AVE 3D was done by Trombetta et al. (2017). Training was feasible for users and participants evaluated as important for usability feedback, third-person perspective, comfort (semiimmersion version), and immersion (full immersion). Authors suggested that, for post-stroke rehabilitation, semi-immersive systems are more comfortable than full-immersive VR. Tsai et al. (2012) showed that Sharetouch is a well-designed, easy, and usable system, independent of gender or age, and facilitates social interactions in OA. Importantly, significant effects of the rehabilitative training on different motor/physical measures were found in all the studies that tested efficacy and usability (Schwenk et al., 2014;Wüest et al., 2014;Im et al., 2015;Vanbellingen et al., 2017;Rebsamen et al., 2019;van Beek et al., 2019). However, risk of bias (see Supplementary Figure 1) is high for most of the categories (randomization, allocation, blinding, missing data, and reporting bias), as the majority of the research is quasi or non-experimental. Of note, the risk of incomplete data outcome was low.
In general, despite some technical weaknesses (e.g., realism, bugs), interaction constraints and physical/psychological barriers to technology use, the included VR studies showed that with adequate usability design methods, it is possible to develop effective and usable systems for clinical purposes in aging.

DISCUSSION
In the present paper we reviewed the current research on usability, user experience (UX), and feasibility of virtual reality (VR) clinical systems in older people.
Our work can be summarized in the following points: (1) most of the usability pilots involved healthy or heterogeneous diseased older people; (2) usability mainly concerned VR physiotherapy training; (3) most of the studies involved nonimmersive scenarios; (4) quantitative (e.g., SUS) and qualitative (e.g., interviews) methods are the most used and suggested approach in usability piloting and technology acceptance model (TAM) is the main theoretical framework; (5) despite some interaction issues, VR systems are rated as having good usability by end-users.
Usability is a critical and complex task when specific endusers with particular needs are involved. Conditions that hamper the interaction with the device (Wildenbos et al., 2018), and also cultural and technology background, should be taken into account (Corno et al., 2014;Nikitina et al., 2018). For instance, Tuena et al. found that executive functions are overloaded by input device use in older people and this leads to worse memory performances . Design guidelines should be used to avoid basic sensorimotor and interaction issues (see Phiriyapokanon, 2011;Loureiro and Rodrigues, 2014).
If, on the one hand, the studies included collected data from the target population (e.g., Parkinson's disease patients tested usability for Parkinson's disease rehabilitation), several others assessed usability with healthy older people or mixed-pathologies patients (e.g., Wüest et al., 2014;Sáenz-de-Urturi et al., 2015;Shubert et al., 2015;Trombetta et al., 2017;Vanbellingen et al., 2017); in this sense, diagnostic criteria were not clear or endusers characteristic do not match potential technology barriers of end-users. Future research should use strict inclusion/exclusion criteria according to diagnostic criteria of the diseases or syndromes. Moreover, in the context of healthcare, the end-users are also the medical professionals that use the technology with the patients. Usability should be assessed via questionnaires or interviews in the design and test phases (e.g., Castilla et al., 2013;Valladares-Rodriguez et al., 2019). Finally, despite some studies reporting the number of participants as a limitation (Corno et al., 2014;Desteghe et al., 2017;Vanbellingen et al., 2017;van Beek et al., 2019), a number of 5-10 individuals is sensible enough to identify a minimum of 80% circa of usability issues (Wüest et al., 2014;Brox et al., 2017).
The uses of VR systems in our review were mainly focused on motor rehabilitation. In healthcare, VR is mainly applied for the assessment and rehabilitation of sensorimotor, physical, and psychological deficits via non-immersive to immersive technologies (Lange et al., 2010;Bohil et al., 2011;García-Betances et al., 2015;Muratore et al., 2019;Tuena et al., 2019). We also encourage the use of pilot studies in other domains where VR is used for clinical purposes. For instance, it is important to evaluate usability of assessment tools (e.g., Pedroli et al., 2015;Desteghe et al., 2017). Mean usability session testing lasted 30 min; nevertheless, depending on the aims of the studies (e.g., memorability), longitudinal usability studies can be done as usability might improve after some sessions (Valladares-Rodriguez et al., 2019). Lastly, future research should focus more on immersive technology as technical development will lead to new forms of immersive VR and costs will be reduced. It is important to also assess these systems because they might lead to reduced cybersickness compared to desktop-based VR (Lange et al., 2010;Bohil et al., 2011;Plechatá et al., 2019).
Several studies (see Table 1) did not report a model on which usability and acceptance of a technology can be assumed. TAM-based and UX-based are useful for investigating and understanding psychological factors, whereas architecture design and user remote control (URC) are more useful for technical development. Indeed, usability, and in particular UX, are devoted not only to the ease of use and the technical bugs but also to the psychological domains (e.g., emotions, motivations; Vermeeren et al., 2010). However, as researchers in the context of aging face specific needs and barriers, adapted models with relevant variables should be used as the senior user-centered design (UCD) by Brox et al. (2017) or the senior citizens' acceptance of information systems (SCAIS) by Phang et al. (2006). Surprisingly, none of the authors used the senior technology acceptance model (STAM) by Chen and Shou (2014), which could be more suitable than TAM models not adapted to older people. Interestingly clinical researchers interested in technology usability, sense of presence, and clinical change may want to use the transformation of flow (ToF) theory, as presence and flow experiences might facilitate clinical change by means of VR (Riva et al., 2006). Usability assessment (see Table 1 and Figure 7) tools should include a mix of quantitative methods (e.g., SUS, TAMbased questionnaires, UX-based questionnaires) and qualitative techniques (e.g., experience interviews, think aloud, heuristic evaluation). The systematic review on telemedicine systems by Klaassen et al. (2016) recommend SUS, TAM2, and PSSUQ and state that questionnaires along with interviews, which are both low-cost and flexible methods, can be used from early to final phases of usability. Indeed, questionnaires give useful quantitative data that, however, still need qualitative information to tap individual sources of variation. Therefore, a mixed approach composed of quantitative and qualitative tools is the preferred way to carry out complete, interpretable, and useful usability studies in older people. Additionally, we encourage a critical adoption of assessment tools according to the aims of the study, thus considering the aspects (e.g., individual, group, task, emotions/motivation, acceptance, adherence) to be engaged during the VR interaction.
Additionally, innovative quantitative techniques could be useful to track unexpected information about psychophysiological (e.g., eye-tracking, heart-rate, galvanic skin response, non-verbal communication) responses of the users to assess their affective and cognitive reactions to the VR system (Morán et al., 2015;Sáenz-de-Urturi et al., 2015). VR can also be used for evaluating usability and adherence (good >80%) by using time spent, number of log-ins, or interaction modality, giving additional quantitative data (Cipresso, 2015;Rebsamen et al., 2019;van Beek et al., 2019). Importantly, when testing immersive VR, cybersickness should always be assessed because it may negatively influence clinical practice and its reduction is a key objective of pilot studies (Kober et al., 2013;Corno et al., 2014;Tuena et al., 2017;Plechatá et al., 2019) and virtual embodiment with questionnaires if avatars are used (Kilteni et al., 2012;Gonzalez-Franco and Tabitha, 2018). Finally, in the early design phases, information from end-users (e.g., patients, medical professionals) could be gathered from group interviews or focus groups, where ideas from experts' opinions and needs can be used to guide VR development (Castilla et al., 2013;Brox et al., 2017). For instance, Brox et al. (2017) developed a senior-UCD with a mixed use of quantitative and qualitative methods to design a semi-immersive exergame for older people, through iteration from the early phases to the prototype. Researchers should be aware that step-by-step UCD (e.g., prototype development) and pretesting are critical for clinical VR settings (Novak, 2008;Im et al., 2015). However, we know that time is a limitation to some research projects and, in some occasions, there is no time for longitudinal and proper VR design. When this is not possible, we strongly encourage the use of qualitative and quantitative evaluation of the VR experience. In the same manner, it would be better to assess usability and acceptance separately from efficacy of a VR system, as quality of patients' healthcare services is intertwined with usability, acceptance, and adherence (Middleton et al., 2013).
Despite some technical and interaction issues (e.g., bugs, interaction difficulties, realism, sensors application), the included studies showed that usability of a wide range of VR clinical systems is good, well-accepted, adequate, effective, and useful. Skepticism of older people and digital divide are walls that could be successfully broken after the use of VR devices (Desteghe et al., 2017) and comfort of immersive VR can be improved by replacing visors with CAVE, although non-HMD systems are considered better for older people (Corno et al., 2014;Pedroli et al., 2018;Plechatá et al., 2019). Nevertheless, a recent study shows that OA positively accept and tolerate HMD VR (Huygelier et al., 2019). Indeed, Fordell et al. showed that stroke patients enjoyed the immersive VR assessment (Fordell et al., 2011). However, a future objective in the field is to make sensors application and use easier for this population, as home-based training, where no professional is present to provide assistance, is rising in popularity in VR clinical practice (Schwenk et al., 2014). Moreover, online assistance could be useful to help patients with set-up and exercises (Im et al., 2015;Nikitina et al., 2018). Morán et al. (2015) provided some guidelines concerning the feedback the VR training should give to older users: • "Provide timely feedback on successful actions in a simple and salient manner"; • "Provide feedback on erroneous actions in a simple and salient manner"; • "Provide simple and salient instructions on how to recover or solve an error"; • "Provide feedback that fosters or inhibits specific behaviors from the user in a salient and concise manner." Additionally, Teo et al. (2016) provide specific suggestions in their review for VR training in individuals with strokerelated impairments, such as flexible activity according to patients' objectives, possibility to adapt online the task by the therapist according to patient's needs, multiplayer services, and automated recording of patient tracking. Moreover, Teo et al. (2016) show that VR can be enriched with neurophysiological tools (e.g., EEG, fNIRS) that the researcher or the clinician can use to adapt the task according to individual effort or needs. Finally, it is worth mentioning some solutions provided by the Cochrane guidelines (Higgins et al., 2011) that avoid risk of bias in usability experiments. Despite blinding procedures in cognitive/motor rehabilitation trials (VR vs. treatment as usual) being a hard task to fulfill, still randomization, attrition bias, and reporting bias can be improved, respectively, with random number generators, shuffling cards, or throwing dice, with adequate missing data manipulation (e.g., balanced observation, imputation) and via adequate hypotheses and primary/secondary outcomes specification in the introduction and then in the discussion and adequate analyses in the result section.
The present review outlined current VR usability piloting issues and strengths in healthy aging and age-related clinical conditions. In the following paragraph, we will provide suggestions for researchers who wish to run usability testing in the context of clinical application of VR systems for older patients.

VR-USABILITY SUGGESTIONS FOR THE OLDER PEOPLE (VR-USOP)
In the present paragraph we presented some suggestions we derived from findings of the systematic review. VR-USOP will be mainly focused on human-interaction factors rather than on technical aspects of developing VR clinical systems. Table 2 summarizes some suggestions in four steps to follow if researchers and clinicians wish to design and test their VR clinical apparatus to older end-users. The assessment of potential barriers and facilitators of the end-users, which can also include the medical professionals and technology acceptance models, is the first step. In our opinion, this is crucial as it allows the identification and the development of adequate characteristics of VR interaction and task (step 2). The latter aspects will be provided also by adopting architecture design, senior-UCD, and guidelines and prototyping, thus allowing the definition of usability assessment. In addition, we encourage ameliorating the methodology (risk of bias, see Supplementary Figure 1; i.e., randomization, allocation, blinding, manipulation of missing data, and reporting bias) to overcome the limitations of the available studies analyzed in the present review. VR usability and acceptance assessment should be defined and developed in accordance to the aims of the study (step 3). We suggest a mixed-approach with quantitative and qualitative methods (mainly focused on psychological experience of usability) and additional aspects to consider (see Table 2). Lastly, we suggest ensuring usability before clinical testing (step 4).

CONCLUSIONS
This systematic review aimed at describing an overview of state of the art VR clinical systems for older people in relation to usability and providing researchers with suggestions based on the results of the review. Despite some limitations concerning the criteria used to recruit the samples, the low number of immersive technologies so far tested, and the high risk of bias of the studies, VR systems show good usability and acceptance among older people. A wide variety of quantitative and qualitative methods can be used to evaluate usability. We suggest adopting mixedmethodology with appropriate tools in order to grasp different aspects of the usability, acceptability, and user experience and to plan sessions according to objectives of usability. Piloting is a critical aspect of clinical studies with VR technology and we encourage future research to test usability of their applications following VR-USOP.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

AUTHOR CONTRIBUTIONS
CT wrote the first draft of the manuscript. MS-B supervised and wrote the following drafts of the manuscript. MS-B, FG, CT, EP, AGal, and PT defined the methodology and objectives of the manuscript. MC assessed risk of bias and made the second search strategy. AGag gave framework for the VR-USOP. KG provided clinical expertise and support. GR and FL revised the manuscript. All authors contributed to the revision and final approval of the manuscript.

FUNDING
This work was funded by the Italian Ministry of Health IRCCS Network on Aging Research roadmap on aging and age-related diseases RRC-2018-2365820.

ACKNOWLEDGMENTS
CT wishes to thank all the authors for providing useful ideas and supervision during the conceptualization, writing, and revision of the manuscript.