Factors Associated With Virtual Reality Sickness in Head-Mounted Displays: A Systematic Review and Meta-Analysis

The use of head-mounted displays (HMD) for virtual reality (VR) application-based purposes including therapy, rehabilitation, and training is increasing. Despite advancements in VR technologies, many users still experience sickness symptoms. VR sickness may be influenced by technological differences within HMDs such as resolution and refresh rate, however, VR content also plays a significant role. The primary objective of this systematic review and meta-analysis was to examine the literature on HMDs that report Simulator Sickness Questionnaire (SSQ) scores to determine the impact of content. User factors associated with VR sickness were also examined. A systematic search was conducted according to PRISMA guidelines. Fifty-five articles met inclusion criteria, representing 3,016 participants (mean age range 19.5–80; 41% female). Findings show gaming content recorded the highest total SSQ mean 34.26 (95%CI 29.57–38.95). VR sickness profiles were also influenced by visual stimulation, locomotion and exposure times. Older samples (mean age ≥35 years) scored significantly lower total SSQ means than younger samples, however, these findings are based on a small evidence base as a limited number of studies included older users. No sex differences were found. Across all types of content, the pooled total SSQ mean was relatively high 28.00 (95%CI 24.66–31.35) compared with recommended SSQ cut-off scores. These findings are of relevance for informing future research and the application of VR in different contexts.


INTRODUCTION
Despite advancements in virtual reality (VR) technology, many people still report experiencing simulator sickness symptoms from its use (Rebenitsch and Owen, 2016;Gavgani et al., 2017;Duzmanska et al., 2018;Guna et al., 2019). Characterizing and quantifying these symptoms is challenging, as several factors are at play including a diverse range of technologies; the use of inconsistent terminology for sickness from using virtual environments; little consensus on the biological mechanisms of symptoms; the diverse range of VR content; along with user characteristics such as age and sex (Hale and Stanney, 2014). Identifying factors that increase the occurrence of simulator sickness becomes necessary with the increased use of VR for rehabilitation, industry training and gaming/entertainment consumers (Gallagher and Ferrè, 2018;Powell et al., 2018;Wang et al., 2018).
Side effects from virtual environment usage has been referred to by many terms including simulator sickness (Kennedy et al., 1993), cybersickness (LaViola, 2000) and VR sickness . The term simulator sickness originated from the early use of flight simulators in the military (Kennedy et al., 1993), and is still currently used in research using modern HMD technology (Tyrrell et al., 2018;Ziegler et al., 2018). Cybersickness, originally used to describe side effects from use of virtual environments (McCauley and Sharkey, 1992), has often been mentioned in studies using a variety of technologies including flat screen displays and head-mounted displays (HMD) (Rebenitsch and Owen, 2016). The term VR sickness has typically been used in studies using HMDs (Cobb et al., 1999;Kim et al., 2018). Thus, diverse terminology is often used interchangeably across the virtual environments literature.
This current review focuses on adverse symptoms from HMD use, hence the term "VR Sickness" will be referred to as the symptoms (and their severity) typically reported in the literature from HMD use. The term "motion sickness" will be used to refer to more general reporting of symptoms from motion environments (e.g., air, land, or sea travel), not specific to HMDs, where symptoms can differ. For example, nausea can be more severe in seasickness, compared with simulator use (Kennedy et al., 2010). Symptomatology of sickness also differs between technologies. Compared with simulators, HMDs have been reported to produce higher symptoms related to nausea, dizziness and blurred vision .
Measures of VR sickness are a fundamental part of establishing prevalence and symptomatology in virtual environments. The Simulator Sickness Questionnaire (SSQ) (Kennedy et al., 1993), originally developed for measuring motion sickness in simulators, is the most commonly used measure of sickness in virtual environments (Rebenitsch and Owen, 2016). Alternate measures, such as the Virtual Reality Symptom Questionnaire, which was specifically developed for HMDs (Ames et al., 2005) or the Virtual Reality Sickness Questionnaire ) have yet to be widely adopted. Single item assessments that are easy to administer and monitor symptoms during VR exposure (Bos et al., 2005;Keshavarz and Hecht, 2011) are commonly used, but do not provide comprehensive measurements of the symptoms of VR sickness. Very few studies report on the use of objective physiological measures (e.g., heart rate, skin conductance, electroencephalograms, eye blink rate, and electrogastrogram) that do not rely on individual self-report data (Kim et al., 2005;Dennison et al., 2016).
Recent advances in HMD technology (field of view, resolution, framerate, and ergonomic factors) have increased the levels of immersion and realism that may have an influence on the occurrence of VR sickness (Nichols, 1999;Lee et al., 2017;Kourtesis et al., 2019). For example, if an image is clear and tracking of movement is accurate, there may be fewer sensory conflicts, and that could lead to a reduction in VR sickness symptoms (White et al., 2015;Shin et al., 2016;Ray et al., 2018). However, an increase in the field of view may also increase risk of VR sickness (Fernandes and Feiner, 2016). Despite the improvements in HMD technology, a recent review suggests that the prevalence of VR sickness is still problematic (Rebenitsch and Owen, 2016). In addition to this, Kourtesis et al. (2019), in their review found that although recent hardware features have been an important factor in reducing VR sickness, software features also need to be taken into consideration.
The VR content delivered to users can induce or even reduce VR sickness. A rollercoaster ride may be more likely to induce VR sickness to the level of severity where users will request to discontinue the experience. For example, almost 67% of participants in a study using a rollercoaster virtual environment were unable to complete an exposure time of 14 min . In contrast, content consisting of low amounts of motion may be less likely to induce VR sickness (Guna et al., 2019), as well as in cases where head movement in a fixed position is concordant with what the user would experience in the real world (Rizzo and Koenig, 2017).
Length of time exposed to a virtual environment may also influence likelihood and severity of VR sickness (Duzmanska et al., 2018). Significant correlations have been found between exposure time and VR sickness, with longer exposure times increasing risk of VR sickness . For example, research measuring VR sickness at multiple time points found symptoms increased at 2-min increments, with the highest VR sickness scores measured in the final trial at 10 min . In contrast, a recent review has found that some people may build up a resistance or adapt over time to VR sickness, particularly over multiple sessions (Duzmanska et al., 2018). Although content and duration are significant contributing factors that may increase the likelihood of sickness symptoms, the user also needs to be taken into consideration.
User characteristics adds another layer of complexity in understanding the relationship between hardware, content and VR sickness. Research on sex and age, have generated mixed findings when it comes to the likelihood of sickness from VR (Cheung and Hofer, 2002;Benoit et al., 2015;Munafo et al., 2017;Arcioni et al., 2018). In reference to age, physiological differences over the lifespan (i.e., visual, vestibular senses) (Bermúdez Rey et al., 2016) may influence the occurrence of VR sickness and symptom profiles. For example, hormonal differences in females have been reported to influence and likely to be a factor in increased rates of VR sickness (Clemes and Howarth, 2005). Moreover, females can have a smaller interpupillary distance (Fulvio et al., 2019) and some HMDs may not be able to be adjusted accordingly therefore creating eye strain and general discomfort. Thus, it is important to increase the understanding of the relationship between these user characteristics and VR sickness.
Previous reviews (Rebenitsch and Owen, 2016;Duzmanska et al., 2018;Kourtesis et al., 2019) have focused on temporal or technological aspects of VR sickness. To date, none of the reviews on VR sickness have systematically evaluated VR content and user characteristics in a meta-analysis. The primary aim of this systematic review is to examine if VR sickness symptoms measured with the SSQ using HMDs are influenced by different factors. More specifically, factors that will be examined in this review are content, the amount of visual stimulation (motion of virtual environment), whether a person is stationary or moving in the virtual environment and time. As the SSQ consists of three grouped factors (nausea, oculomotor, and disorientation), a summary of the most common symptoms using HMDs will be provided. Studies with the intention of inducing or not inducing VR sickness will also be compared. A secondary aim is to examine the influence of user characteristics (i.e., age and sex) on SSQ scores and dropout rates.

Search Strategy
In accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement (Liberati et al., 2009), a systematic literature search was conducted to reveal journal and conference papers related to VR sickness from using HMDs. This review included the following search terms: virtual reality OR virtual environment * OR VR OR VR headset OR virtual reality headset OR head-mounted display OR HMD OR helmet mounted display AND cybersickness OR motion sickness OR simulator sickness OR visually induced motion sickness OR virtual reality induced motion sickness OR virtual reality induced symptoms and effects OR virtual reality sickness OR visualvestibular OR nausea OR aftereffect * OR after effect * OR VIMS. No limiters were inserted in the database searches.
This search was carried out on the 10th October 2018 in the six databases: Cochrane Library, IEEExplore, Medline, PsycINFO, Scopus, and Web of Science. Terms were mapped to subject headings. Both journal and conference articles were included in this review if: participants used a head-mounted display (HMD); VR sickness was measured using the SSQ; articles were peerreviewed and complete (i.e., includes a full paper, not just an abstract or poster presentation); the text was in English or had been translated for publication. Papers were excluded if they: used augmented reality (AR) or see-through displays; were reviews, dissertations, abstracts or poster presentations; used prototype HMD devices; and were case studies. Papers that included clinical samples were also excluded, however, if the study included a healthy control group, this data was included. Eligibility of studies was assessed by two independent reviewers (DS and AS).
Papers were included if they supplied mean data for the SSQ (either subscales or total scores), if no mean data was supplied they were still included in the dropout analysis if they indicated drop out rates. If papers supplied mean scores without standard deviations, authors were contacted to supply the standard deviations. Current contact details were searched for online in each case. A follow-up email was sent to authors that did not respond to the initial email. If the authors did not respond to the second email the paper was excluded. The calculation of subscale and total SSQ scores required weighting. Subscales are weighted as follows; nausea 9.54; oculomotor 7.58; and disorientation 13.92, while total scores can be calculated by multiplying unweighted subscale scores by 3.74 (Kennedy et al., 1993). This can create some confusion at times, and there were instances where researchers calculated the scores differently. For example, multiplying the weighted subscale scores by 3.74 thereby producing inflated total scores. There were also instances where the total SSQ scores did not match the subscale scores, the same contact procedure was followed for these papers as per the missing standard deviations. Figure 1 shows the results of the electronic search and article selection as per PRISMA guidelines (Liberati et al., 2009).

Statistical Approach
Comprehensive Meta-Analysis (CMA) Version 3 (Borenstein et al., 2013) was used to conduct meta-analyses. A random effects model was used to calculate pooled effect estimates with 95% confidence intervals. In studies reporting multiple experiments within groups, these means were merged in CMA to produce one mean per study. In studies reporting multiple experiments between groups, these means were calculated separately for each experiment. Pooled means were calculated for all factors separately on each subscale of the SSQ. Pooled means were also calculated for all factors separately for the total SSQ score. Differences between sub-factors within each factor were assessed using the Q-test based on analysis of variance (Borenstein et al., 2011). The Q-value for the between group analyses corresponded to the weighted sum of squared deviations of the subgroup means about the grand mean. P-values were obtained by comparing the Q-values with a chi-squared distribution with degrees of freedom equal to the number of subgroups minus one (Borenstein et al., 2013). A p-value lower than 0.05 was assumed to indicate a significant statistical difference of SSQ scores between the subfactors. A correlation was performed between the percentage of females in studies and total SSQ scores as breakdowns for sex of means for the SSQ scores were not supplied in most studies.

Operationalisation of Factors Being Examined
All factors were operationalised and independently reviewed by DS and AS. Any disagreements were resolved by discussion.

Content
Four types of content were categorized in studies included for analysis; 360 videos; gaming content; minimalist content; and scenic content. User interaction and environmental features differed for each category. The 360 videos included content captured with a 360 camera or video taken that allowed a 360 view of the virtual environment. Gaming included high detailed content where the user could actively interact and perform tasks in the virtual environment including off-theshelf games and content developed by researchers. Minimalist content consisted of basic shapes or minimal textures, with typically simple interactions. Scenic content included detailed environments, for example, a landscape or cityscape with no or simple interaction by the user. See Figure 2 for a summary of content characteristics.

Visual Stimulation
All studies were categorized based on the amount of visual movement within the content regardless of user-directed FIGURE 1 | The article selection and screening process using the PRISMA flow diagram (Liberati et al., 2009). movement, such as locomotion and head movement. Low visual stimulation included content with slow visual changes, while high visual stimulation included content with fast visual changes.

Locomotion
Locomotion refers to how a user navigates in the virtual environment. For the analysis in this review, locomotion was classified as either stationary, controller-based movement, or physically walking. With stationary content, the user does not move in the virtual environment. Two moving categories were included; controller and walking. Controller-based movement included the following navigation methods; flying; controller-based walking; teleporting and driving, therefore any movement for navigation by the user. Walking included the following physical movements; walking; walking in place and walking on a treadmill. The two categories of moving were used as physically walking has been found to reduce the incidence of VR sickness compared to controller-based navigation (Chance et al., 1998).

Time
Sickness in virtual environments has been found to increase after 10 min in HMDs and simulator studies (Min et al., 2004;. Thus, time was categorized into three intervals of 10 min: <10 min, ≥10 min, or ≥20 min.

VR Sickness Condition
Studies that explicitly set out to increase/decrease the occurrence of VR sickness or measured VR sickness as a secondary aim, were categorized into two conditions: induce, and not induce.

User Characteristics
The user characteristic of age was categorized into a mean age of <35 years old and ≥35 years old. This cut-off was used to correspond with theories of both sensory conflict and postural instability. For example, vestibular function involved in the sensory conflict theory starts to decline around the age of 40 (Bermúdez Rey et al., 2016). With relevance to the postural instability theory, changes in altered postural balance have been reported to commence at the ages of 30-39 (Era et al., 2006).
Mean breakdowns by sex were not supplied in most SSQ studies. Therefore, a correlational analysis was performed looking at the proportion of sex (females) in studies with total SSQ mean scores. This approach aimed to give an approximation due to the lack of available data, a positive correlation in this analysis will indicate higher susceptibility of VR sickness in females.

Dropouts
Dropouts in this review refer to participants that exited an experiment due to VR sickness.

RESULTS
A total of 2,654 publications were identified through the search. A snowballing strategy was used to identify an additional 15 articles for inclusion. These publications were imported into EndNote where 1,045 duplicates were removed. The remaining 1,609 articles were sent to Covidence systematic review management software (Covidence, 2019) for title and abstract screening, which identified 292 articles for full-text screening. A further 237 articles were excluded as outlined in Figure 1. Authors were contacted for 15 papers as per the procedure described in the methods section if mean scores were supplied without standard deviations (10), or if scores did not appear to be weighted correctly (5). A total of 54% of authors replied with 20% supplying raw data to enable calculation of SSQ scores. Hence, 55 publications were identified through the systematic review process and listed in Table 1.

Dropouts
The mean dropout rate reported across 46 experiments due to VR sickness was 15.6%. If studies did not report dropouts, they were not included in this analysis as it was unknown whether there were no instances of dropouts or whether they were just not reported. 18 n/a n/a n/a n/a n/a n/a n/a Profile n/a n/a n/a n/a n/a n/a n/a n/a n/a S1-Head Tilt S1-Head-Turn S1-Controller

Description of Studies
Out of the 55 papers included in this review, 20 papers reported both subscale scores and total SSQ scores, 7 papers reported subscale SSQ scores only, and 16 papers reported total SSQ scores only. Twenty papers that reported SSQ scores also reported dropout rates. A further 12 papers that used the SSQ but only reported dropout rates were also included. The total number of experiments from these papers included 54 that reported the total SSQ scores and 38 that reported the subscale SSQ scores (these numbers include between group studies from the same paper). The number of participants included in all experiments represented 3,016 participants. Heterogeneity was consistently high for all analyses (I 2 > 90). Studies came from: Australia (n = 3), Canada (n = 1), Columbia (n = 1), Cyprus (n = 1), Finland (n = 1), Germany (n = 11), Greece (n = 1), Japan (n = 1), Korea (n = 4), Netherlands (n = 3), New Zealand (n = 1), Portugal (n = 1), Slovenia (n = 1), Spain (n = 1), United Kingdom (n = 2), United States of America (n = 22).
The pooled mean age of participants was 24 years (of 45 studies that included mean age), with the youngest sample having a mean age of 19.5 years and the oldest having a mean age of 80 years. Fifty-one studies included both female and male participants, 4 studies did not report sex distributions, and 41% of participants were female. Bivariate correlations between the SSQ and percentage of females in studies were not significant (r = −0.172, p =0.170).
See Table 2 for a summary of results showing factors associated with both total and subscale SSQ scores.

DISCUSSION
The aim of the review was to synthesize the literature on VR sickness symptoms using HMDs measured using the SSQ. The primary aim was to examine if VR sickness symptoms are influenced by content (four categories), the amount of visual stimulation, how a person moves in the virtual environment and exposure times. With a secondary aim of examining the influence of user characteristics (i.e., age and sex).

SSQ Scores Interpretation
In this review, total SSQ mean scores ranged from 14.30 to 35.27. Pooled total SSQ scores were relatively high across all studies and content type (M = 28.00) with high levels of heterogeneity. Historically the SSQ was intended for military personnel using simulators, however, the different applications and interpretation of the scores have changed with increased use of VR and advancements in technology. When interpreting total SSQ scores, according to Kennedy et al. (2003); scores between 10 and 15 indicate significant symptoms; between 15 and 20 are a concern; and scores over 20 indicate a problem simulator. These cut-off scores were established from military personnel using flight simulators, these scores may differ in the general population, additionally, SSQ scores do tend to be higher in other virtual environments compared to flight simulators (Stanney and Kennedy, 1997;Kennedy et al., 2003). According to the Kennedy et al. (2003) categories, even the lowest total SSQ mean score of 14.30 found in studies including older adults in this current review would be regarded as significant symptoms. All remaining classifications displayed higher means with the highest total SSQ score displayed in studies that set out to induce motion sickness.

VR Sickness Symptom Profiles
Across all studies, this review found the highest pooled SSQ subscale scores for disorientation (M = 23.50), followed by oculomotor (M = 17.09) and nausea (M = 16.72). This subscale distribution demonstrates the difference with the symptom profile of motion sickness where nausea typically has the highest rating, followed by oculomotor and disorientation (Rebenitsch and Owen, 2016). These findings increase awareness of symptoms that may be more likely to develop when using HMDs (e.g., dizziness, blurred vision and difficulty focusing).
However, the weighting of these subscales makes it unclear as to what degree these symptoms differ.

VR Content
The content characteristics in Figure 2 highlight the distinguishing features of the four content types that may account for the distribution of SSQ scores in this review. SSQ scores were significantly influenced by content type with gaming content displaying the highest total SSQ mean (M = 34.26). This effect was also seen for subscale SSQ scores with all measured subscale symptoms of nausea, oculomotor and disorientation highest for gaming content compared to other types of content (see Table 4). Consistent with these results, previous studies using gaming content reported the highest dropout rates, ranging from 44 to 100% (Merhi et al., 2007;Dennison et al., 2016;Munafo et al., 2017). The second highest total SSQ means were found in studies using 360 videos. This was followed by minimalist content, with scenic content producing the lowest total SSQ mean. The total SSQ means did not always correspond with dropout rates, for example higher dropout rates were found in scenic content than 360 videos. This discrepancy highlights the variability in how users tolerate HMD use that could be due to other factors. Exposure time, user characteristics or the amount of visual stimulation are all other factors that may have contributed to the high heterogeneity found in this review. Thus, a limitation of this current meta-analysis and meta-analyses in general is that methodological differences between studies are collapsed when pooling results.

Influence of Visual Stimulation on Sickness
Content varies not only by type but also by the amount of visual stimulation offered. For example, all four types of content examined in this review may provide varying degrees of visual movement to the user. Oculomotor subscale SSQ mean scores were significantly higher for high visual stimulation compared with low visual stimulation. Some of the symptoms in the oculomotor subscale relate to eyestrain, difficulty focusing, difficulty concentrating and blurred vision. Despite recent improvements in display technology, stereoscopic HMDs may produce more side effects due to the vergence-accommodation conflict. Vergence refers to the way the eyes move laterally to adjust to items moving toward and away from the eyes combined with the process of focusing (accommodation). These visual processes do not occur naturally in a stereoscopic display as accommodation occurs at a fixed screen depth (Terzić and Hansard, 2016). This conflict may be a reason for the higher SSQ means for high visual stimulation in the oculomotor SSQ subscale. When there is a high level of visual stimulation there are more changes in the stimulus distance compared to content with low visual stimulation. The level of visual stimulation is meaningful, as research examining rapid vs. slow changes in stimulus distance found rapid changes to increase visual discomfort (Kim et al., 2014). In a virtual environment, a conflict may be created due to the differences in what a person sees and what their body experiences. With the emergence of new VR technologies, high-quality stereoscopic HMDs are now capable of simulating the visual and spatial properties of the real-world. Despite improvements, current technology still falls short of replicating how humans see and perceive depth under natural viewing conditions (Howarth and Costello, 1997). There are software solutions that can help to reduce discomfort by introducing blurring during motion (Budhiraja et al., 2017), however, this technique may not be effective for everyone. The shortcomings of current HMDs can produce unnatural visual conflicts, which have been shown to play a role in VR sickness (Carnegie and Rhee, 2015), especially when they are combined with visually stimulating VR environments (Kim et al., 2014).

Locomotion Type in Virtual Environment
SSQ scores were significantly influenced by locomotion type with controller-based movement displaying the highest total SSQ mean (M = 32.55). Both nausea and oculomotor subscale SSQ scores means were also significantly influenced by locomotion type with higher scores when stationary as opposed to both controller-based moving and walking (see Table 4), high heterogeneity between studies has contributed to these differences. There are several other factors that can account for differences between total and subscale SSQ scores for locomotion between controller-based and stationary content. This includes differences in the number of studies, with seven stationary and five walking studies that reported subscale SSQ data, compared with 12 studies that reported total SSQ data for these locomotion categories. Additionally, relatively high total SSQ scores were reported for controller-based studies (Merhi et al., 2007;Budhiraja et al., 2017;Ragan et al., 2017) that did not report any subscale scores. Finally, these differences between SSQ totals and subscales may result from certain methods of locomotion having a greater impact on specific symptoms in the subscale SSQ scores depending on locomotion type that would not be reflected in the total SSQ scores. For example, being stationary in the real world may induce a greater conflict in a virtual environment where there is movement and hence may increase nausea symptoms. This is consistent with research that indicates a reduction in symptoms when user-initiated movement is matched to the environment (Lee et al., 2017;Misha et al., 2018), these findings also support the sensory conflict theory relating to a visual-vestibular conflict (Reason and Brand, 1975). Thus, the visual-vestibular conflict may be exacerbated by the type of content (moving vs. static) being viewed combined with the locomotion method. A reduction in visual-vestibular conflict may be the reason that the lowest total and subscale SSQ scores for locomotion were consistently reported in studies that included physically walking content. More research is needed to increase the understanding of how the type of locomotion can influence specific symptoms of VR sickness.

VR Exposure Time on VR Sickness
Both nausea and disorientation SSQ subscale scores in studies for exposure times of <10 min were lower than those that were equal to or >10 min. Interestingly scores were lower for studies that were equal to or >20 min than those equal to or >10 min (see Figure 3). This contradicts a recent summary in a review suggesting that longer exposure times are more likely to increase VR sickness (Duzmanska et al., 2018). Content may have been a factor contributing to this pattern of results within each of the time categories. In examining the distribution of content among the time breakdowns ≥10 min studies did have the highest percentage of gaming content (62%), compared to studies with the shortest (<10) and longest exposure times (≥20). In addition to this 50% of studies with the longest exposure times (≥20) consisted of minimalist or scenic content. More research is needed to determine the relationship between content and exposure time. Within-subject designs with different exposure times and controlled content may assist with answering questions around safe exposure times as this information is important when planning clinical trials to avoid VR sickness and dropouts and establish safe use procedures.

Age and VR Sickness
Four studies included older samples (studies with a mean age range ≥35 years; n = 64) that reported total SSQ scores. Not only did these studies report lower total SSQ scores for older samples (M = 14.30) compared to younger samples (M = 28.44), these studies reported the lowest SSQ scores when compared with all other examined factors (see Table 3). Two of the four studies with older samples also included subscale SSQ scores with 37 participants in total. The disorientation subscale recorded significantly lower SSQ scores for the older samples compared with the younger samples. While scores for nausea and oculomotor subscales were higher for the older adult samples compared with younger samples, they were not statistically significant. Previous research has found inconsistent findings when looking at older samples (Kennedy et al., 2010;Benoit et al., 2015). Even though age has been reported as a user characteristic likely to predict motion sickness (Golding, 2006), the results from this review support previous research that there may be a decline in susceptibility to VR sickness as a person ages (Paillard et al., 2013). However, as there are a limited number of studies including older samples, these results should be interpreted with caution. Additionally, three of the studies used scenic content and one study used gaming content. What also needs to be considered is that the VR content for the studies including older adults may be assessing specific symptoms, and the virtual environments may be designed to reduce the likelihood of side effects. For example, two of the studies (Parijat and Lockhart, 2011;Kim et al., 2017) involved walking on a treadmill to assess gait or balance and consisted of content with the lowest total SSQ mean scores in this review (scenic content). It is also possible that older adults may experience symptoms that differ to younger adults as indicated with lower disorientation subscale SSQ scores found in the older samples (symptoms related to dizziness, vertigo, blurred vision, nausea and difficulty focusing). With many companies offering VR services to aged care facilities (Aged Care Virtual Reality, 2018; Reminiscience, 2018;Rendever, 2018), the use by older adults will continue to increase. Moreover, VR delivered in HMDs is being widely used for rehabilitation, assessment and even prediction of cognitive impairments in older adults (Optale et al., 2010;Corriveau Lecavalier et al., 2018;Howett et al., 2019). Therefore, more research is needed to evaluate safety aspects of using HMD-delivered VR with older adults having cognitive decline or other age-related health conditions.

Sex and VR Sickness
An analysis of sex differences was performed with a correlation between the percentage of females in studies and total SSQ scores. Sex breakdown was not supplied in studies when reporting total SSQ scores, therefore, this was the only way that sex could be analyzed and therefore a limitation of this analysis. The results indicated no difference. This is not consistent with research indicating that females are at higher risk of VR sickness (Lawson et al., 2004). Finding evidence in studies that females are more susceptible than males to VR sickness depends on what study is examined with many confounding variables not taken into account (Lawson, 2015). The importance of this topic suggests that more research is needed to better understand the incidence of VR sickness based on sex differences. Age and sex have been stated as being the most common user characteristics likely to predict motion sickness (Golding, 2006) highlighting a need for further research. Other user characteristics including ethnicity; motion sickness susceptibility; fitness; and prior experience of VR may provide a deeper insight into symptomatology of user characteristics and assist to develop a more targeted approach to dealing with VR sickness.

Strengths and Limitations
This is the first study to pool estimates of VR sickness symptoms measured with the SSQ using HMDs with a pooled sample size of 3,016, however, the study is not without limitations. Although the most commonly used measure of VR sickness was used (SSQ), there were also many studies excluded (112) that did not use the SSQ. As the SSQ is self-report participants may under or over-report symptoms. Physiological measures can assist with overcoming this limitation however, a consensus is yet to be reached on the best physiological response for assessing VR sickness (Duzmanska et al., 2018). The scoring system for the SSQ can create some confusion and this was seen in this review with some authors incorrectly calculating total scores. Another limitation of the SSQ is the relevance of symptoms for HMD use. For example, the Virtual Reality Symptom Questionnaire (Ames et al., 2005), increased the focus on oculomotor symptoms, while Kim et al. (2018) removed the symptom of nausea in the Virtual Reality Sickness Questionnaire, due to not contributing to motion sickness compared with other symptoms, both of these studies were HMD specific. For a more detailed discussion of alternative measures see (Hale and Stanney, 2014). Additionally, all analyses had high heterogeneity demonstrating large variation across the included studies. As well as individual differences of age and sex, susceptibility to VR sickness can also vary between individuals and therefore influence results. Gaming or VR experience is another individual difference that can influence the likelihood of side effects and needs to be both reported and taken into account during analysis of results. The small number of studies including older adults and lack of reporting of sex differences and dropouts are also limitations and areas requiring further research or improved reporting in future VR studies including HMDs. As 22 studies did not report dropout rates, the rate of 15.6% may be inflated if many of these studies did not have dropouts, however, we cannot assume there were no dropouts if they were not reported. This highlights the need to make reporting of dropout rates a standard in VR research.
Finally, another limitation involves the varied nature of the HMDs used across these studies. HMDs can differ in terms of field of view, use of stereo, resolution, framerate, availability of inter-pupillary distance controls/adjustment, and other technical display factors. Modern HMDs from the last 5 years differ fundamentally from the more limited display technology that was available before these recent advances (Kourtesis et al., 2019), and since 35% of papers included in this analysis used these older HMDs, it is difficult to predict how those findings would predict the occurrence of symptoms with use of currently available HMDs. Moving forward, there is an obvious need for more controlled laboratory research with standard reference VR environments that are adjustable in terms of content, movement, user interaction, etc. With such specifically created environments, one would be able to test out the incidence of side effects across different display types with varied hardware capabilities. This will be essential for promoting parametric research that creates a database of known properties for different types of virtual environments delivered across varied hardware types and would serve to produce the baseline normative data needed to enable better research in how to mitigate or eliminate the incidence of these use-limiting side effects.

Conclusion
Previous research has focused on the influence of technological aspects on VR sickness. This review advances this knowledge by examining content as a major contributing factor to VR sickness, which will remain a problem despite future technological advances. Our findings show that content significantly influences VR sickness symptoms. Recent HMD technology can provide a better experience (Kourtesis et al., 2019) and if this is combined with careful selection of content the risk of VR sickness can be reduced and those symptoms that do occur can be easily managed. In this review, we compared our total SSQ scores with the cut-off scores suggested by Kennedy et al. (2003), what these scores mean in relation to HMDs and how these scores relate to the general population remains unclear. Nevertheless, comparing total scores between studies shows that content is a major contributing factor. This review also highlights the need for a further understanding of the influence of user characteristics such as age and sex as there is a lack of studies including older samples, and sex differences that are often not reported. Increasing our understanding of VR sickness could be particularly valuable to researchers and practitioners, as there may be ethical and liability implications in research, training and clinical applications.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

AUTHOR CONTRIBUTIONS
DS, AS, and TL conception of the work and analyzed and interpreted the results. DS and AS article selection and screening. DS wrote the manuscript. All authors revised the work critically for important intellectual content and have read and approved the manuscript. AS and DS created Figure 2, certain 3D models for this figure sourced from cadnav.com and modified. Conflict of Interest: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Copyright © 2020 Saredakis, Szpak, Birckhead, Keage, Rizzo and Loetscher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.