Ecological validity of virtual reality simulations in workstation health and safety assessment

The last decade saw a rapid rise of interest in Virtual Reality (VR) technologies, driven by more mature hardware and software tools. Within the ongoing digitalization of industry, VR technologies see uses in workstation design, operator training and tele-operation. This article focuses on how VR can contribute to workstation design including health and safety assessment. VR allows the inclusion of the operator in the workstation design process, permitting evaluation of the design in a safe, interactive and immersive virtual environment. This systematic literature review aims to qualify the ecological validity of VR tools and identify the current obstacles to safe and successful workstation design transfer. A standard systematic literature review procedure is used, on a wide selection of experimental research articles studying the validity of VR, within or outside of industrial contexts. We aggregate results from fundamental research on VR ecological validity regarding user perceptions, movement, cognition and stress. These results are discussed with respect to their influence on workstation OSH assessment in VR. Furthermore, we identify current technological factors and upcoming developments that mediate the validity of VR assessments.


Virtual reality
Virtual Reality (VR) is a collection of digital technologies designed for immersing a user into a virtual environment, creating the feeling of presence: the subjective psychological perception of existing inside the virtual environment (Heater (1992)). Immersion in a virtual environment can be facilitated by various stimuli: visual (VR headsets, stereoscopic screens, immersive rooms), auditive, and even tactile or haptic (vibratile controllers, force feedback devices), but also natural modes of interactions, for instance using the user's hand or body movements.
The last decade saw the rapid emergence of new VR technologies and renewed interest in VR (Muñoz-Saavedra et al. (2020)). The combination of cheap, lightweight and readily available VR displays, with the rise in simpler VR content-authoring tools, and the general increase in available graphical processing power, opened up many opportunities for novel research and applications, in various fields such as education, healthcare and industrial R&D. VR devices also incorporate many sensors, with the ability to collect a wide range of data on the user, with position tracking, accelerometers and even eye-trackers. They can also be used in conjunction with many motion-capture systems, allowing further immersion and full-body posture and motion recording. As such, VR technologies are highly instrumented and allow for customizable virtual environments. This makes VR a useful tool for reproducible and objective research.
VR finds its industrial applications in the Fourth Industrial Revolution or Industry 4.0: a paradigm shift in manufacturing techniques and work organization, made possible by advances in digital technologies such as Artificial Intelligence, Big Data, additive manufacturing, digital simulations, collaborative robotics and Mixed Reality technologies, comprised of Augmented Reality and VR. This industrial revolution would allow for greater autonomy and flexibility of production systems, with fast iteration of product or workstation designs at a reduced cost.
In this regard, VR is used for operator training (Patle et al. (2019); Pérez et al. (2019)), teleoperation (Lipton et al. (2017); Whitney et al. (2020)) or to assist and validate workstation design. Early in the workstation design process, VR allows operators to take part and provide feedback based on their interactions with a digital mock-up of their future work environment. Virtual interactions can be advanced enough to simulate the functions of tools and equipment, allowing the operators to perform every step of a task in the digital mock-up.
These emerging use cases for VR provides opportunities to address Occupational Health and Safety (OSH) concerns early in the workstation design process: indeed, VR can provide a virtual, and thus safe, environment to assess operators' behavior in potentially dangerous environments or situations. This article presents a systematic literature review which aims to evaluate the validity and limitations of using VR technologies in that context.

Ecological validity
When using VR tools for training or validation of a workstation design, it becomes necessary to consider and evaluate the validity of such tools. The literature distinguish between several types of validity: • Internal validity (Campbell (1986)) refers to the ability of an experimental setup or simulation to be locally consistent. In the case of workstation design, it means to present the situations, tools, machines, procedures, etc., of a simulated work environment in a believable manner, both visually and functionally. • External validity (Campbell (1986)) refers to the ability of the experimental setup or simulation to provides observations that are consistent with external observations. For instance, an ergonomic assessment made on a real operator in a virtual environment could be found to be in accordance with an assessment made using a digital human model in a design tool. • Ecological validity (Schmuckler (2001)) is a particular case of external validity, in which experimental or simulation-based observations are consistent with those made in an ecological setting, that is, in the previous example, the real operator in their real-world work environment.
In practice, studies look to, either establish a correlation between observations made in VR and real-world observations, or find a difference between such observations, in order to identify the limits of validity. While even a strong correlation between real-world and virtual observations is insufficient to make exact predictions based solely on VR observations, it gives credence to the fact that the underlying phenomenon responsible for the variance is adequately reproduced in the virtual environment. For instance, the absolute time for an operator to perform a task in VR may differ from the real-world observed time, but the time variability between different designs of the workstation may be the same in VR and the real-world. In such a case, evidence of a correlational validity could be strong evidence for the validity of comparison of workstations designs in VR in an iterative conception process.

Related works
The roles of VR in industrial contexts have been extensively studied in the scientific literature. Damiani et al. (2018) propose a qualitative review of 39 articles presenting applications of virtual and Augmented Reality (AR) in the context of Industry 4.0. They report on the key technologies behind highlight of VR and AR, and their main applications in the industry: operator training, maintenance and decision making. They note that these technologies are currently welcomed and implemented in real industrial settings, with potential to improve operators' safety through virtual training. In a review of 78 studies, Radhakrishnan et al. (2021) show that this VR-based training is used and is effective in many diverse fields, such as healthcare, manufacturing or construction. However, they observe that effectiveness varies with technical aspects of the training tool, such as the visual fidelity of the simulation or modalities and interfaces of interaction with the environment. Professional training is an effective way to promote safety in the workplace: Grassini and Laumann (2020) identified 16 studies reporting VR OSH applications in industrial contexts. They present successful implementations of VR-based safety training of operators, showing that VR can be used as an alternative to existing training processes. The review however highlighted that few of those applications were evaluated with respect to their ecological validity.
In a wider context, Kinateder et al. (2014) show that VR is a promising tool to examine human behavior in fire evacuation research. It offers high reproducibility and allows to simulate dangerous situations without putting subjects at risk. They propose that VR experiments generally have similar ecological validity than laboratory experiments. This validity however depends on the experimental setup and the technical limitations of the VR apparatus. Furthermore there are too few validation studies to properly assess current limitations to validity.
These existing reviews indicate a clear adoption of VR technologies in various industries. A widely observed fact in the literature is that the diversity of hardware and software implementations of VR influences its effectiveness in its different applications. Furthermore, while training applications can be evaluated with respect to their effectiveness, we found few evaluations of the ecological validity of VR in the context of workstation design. However, evaluation of the validity of VR for prevention and safety exists outside of industrial contexts, and may provide data relevant to workstation design.

Objectives
This article presents a systematic literature review investigating the ecological validity of current VR technology in the context of safe and successful workstation design. That is, how can an interactive digital mock-up of a workstation be used to assess safety, health risks and ergonomic considerations of the future physical workstation. This review aggregate results from experimental validations of VR simulations, compared to some ground truth or an equivalent real-world experiment. The considered experiments may take place in a real or simulated industrial context, In practice, VR can be implemented through a wide variety of audiovisual and interaction devices, such as head-mounted displays, immersive rooms or force-feedback devices, each with different characteristics and technical limitations. Therefore, this review also present those devices and considers experimental comparisons of different VR technologies with respect to the influence they may have on simulation validity.

Materials and methods
In this literature review, we follow the engineering-oriented systematic review procedure presented by Kitchenham (2004). The review procedure consists of three main phases: • In the first phase of the process, the research question is identified and a review protocol is proposed. In particular the articles sources and inclusion and exclusion criteria of the review are defined. • The second phase consists of the identification of relevant research articles, leveraging searchable databases to find articles using relevant keywords, determined in the first phase. The inclusion and exclusion criteria are applied to filter out articles that do not answer the research question. At this stage, data relevant to the research question are extracted out of the articles, allowing for a synthetic quantitative analysis of the literature. • The final phase consists of reporting the review in a technical report, thesis or scientific article.

Research question
In this literature review, we aim to answer the following research questions: • VR offers a safe environment to evaluate operator behavior in dangerous environments or situations. This represents an important opportunity for health and safety assessments and training. What is the ecological validity of such simulation, with regard to operators' behavior? • VR technologies are still in a phase of constant evolution, and various combinations of software, displays and haptic devices can be employed. What are the influences of those technological factors on the validity of the simulation?
In conducting this review, we identified four main thematics relevant to organizational safety and health (OSH) considerations through which to analyze operators' behavioral validity in VR: spatial perception, stress or risk perception, cognition and movement. The results of this review will be organized around those four thematics. The ecological validity for each of these components of behavior will be discussed, identifying for each of them current potential and limitations of VR simulation and how they are mediated by technical aspects of the VR apparatus.

Selection criteria
The articles were retrieved from the Web of Science database using keyword search, for articles matching the query: ("virtual reality" OR "virtual environment") AND ("validity" OR ("real" AND "compar*")) AND "experiment*" Using the Web of Science "topic" field tag, the query can match text in the title, abstract and keywords of articles.
Compared to previously cited reviews, we chose to focus our initial query on the methodological aspects of the articles to retrieve. We thus restricted our search to articles describing VRbased experimental procedures designed to evaluate the validity of VR, or, in other words, compare it to a real situation. However, to retrieve a sufficient selection of articles, we relaxed criteria regarding the field of application in this initial query. Relevance to OSH was assessed by later criteria.
Several methodologies or experimental designs appear in the literature to assess the ecological validity of VR. We propose here a classification of three types of methods that produce conclusive evidence relevant to the scope of this literature review: • Indirect Comparison: The study describes an experiment carried out in VR with no directly comparable real condition.
-Reproduction of a real and well-known phenomenon in VR. For instance, one may want to assess whether a dangerous situation induces a stress response in VR. -Establishment of a correlation between a variable measured in a VR experimental condition and a related psychometric test. For instance, one may assess whether users with good performance on a standard memory test also display good performance on a memory task in a virtual environment. • Direct Comparison: The study describes an experiment carried out in both real and virtual conditions, in which measured variables can be compared directly.
-Establishment of a correlation between a variable measured in VR experimental conditions, and the same variable measured in a real-world condition. The comparison may be interindividual (e.g. high-performing subjects on a real-world task tend to perform better on the virtual task) or intra-individual (e.g. individual variance in performance on different real-world tasks is replicated on equivalent virtual tasks). -Quantification of a significant effect of VR on a variable measured in both VR and real-world experimental conditions. -On the contrary, evidence of the significant absence of an effect between the VR and real-world conditions. This would represent the highest level of ecological validity of a simulation, however, as it requires high statistical power (Quertemont (2011)), most experimental studies are unable to present that level of evidence. We must therefore be careful not to prematurely conclude the absence of an effect on VR from the lack of evidence of that effect. • Technical Comparison: The study describes an experiment carried out using differents VR apparatus, comparing how different hardware or software properties can affect activity in the virtual environment.
We then define the following inclusion criteria: 1. The article must be published between the years 2000 and 2021, inclusive. 2. The article must be published in English. 3. The article must be published in a academic journal or conference proceedings. 4. The article must present original, primary experimental research.
These criteria therefore exclude articles solely reporting on a literature review. 5. The presented experiments must involve one or more audio-visual VR devices, as defined in Section 1.1.1, used by human subjects. 6. The population observed in the study must be healthy adults. That is, we excluded articles who considered only subjects presenting a particular pathology or children. 7. The experimental design of the study must fit in at least one of the three categories described above. 8. The article reports quantitative results and their statistical significance. 9. The presented experiments must have similarities to professional situations or practical applications in OSH.
Each article must comply with each of the inclusion criteria in order to be included in the review.
The extraction query matched 1311 articles in the Web of Science database published between 2000 and 2021. According to our inclusion criteria, 22 articles were excluded from the review as they were not published in English. Four articles were found to be duplicated and the duplicates were excluded. Another document was excluded as it was a poster. Therefore, 1284 articles remained which fit the first three inclusion criteria.
Among these 1284 retained articles, 46 presented literature reviews and were excluded from the present review. Furthermore, 115 articles not presenting a VR-based experiment were excluded, including 16 articles about Augmented Reality. Five articles were excluded from the review as they presented identical results to another included article. Finally, we found 92 articles fitting the inclusion criteria.

Data extraction
In this literature review, we collect data on: • The type of VR hardware used in each study, organized into three categories: HMD, immersive room, one or more computer screen. One study may use several types of VR hardware. • The content authoring tools used to create the virtual environment, when mentioned in the article. • The experimental design categories that the study belong to, as defined in Section 2.1. • The research field of the study, e.g. ergonomics, behavioral sciences, engineering, etc, using Web of Science categories. • Any standard psychometric tests and questionnaires administered for testing the study hypotheses. • Any standard VR oriented questionnaire designed to provide a subjective evaluation of the user's presence in the virtual environment or of their motion sickness or cybersickness.

Devices
In the selected literature, we identified a wide variety of devices used to display the virtual environment and allow interaction within it. In particular, we can distinguish between three families of audio-visual VR devices commonly used in experiments: • Head-Mounted Displays (HMD), also known as VR headsets, are wearable devices offering a stereoscopic display of a virtual environment. Stereoscopy is achieved through two separate displays for each eye point of view, and are equipped with position and orientation tracking devices, allowing to synchronize the point of view in the virtual environment with the user's position in the physical space. • Immersive rooms, such as the Cave Automatic Virtual Environment (CAVE) system (Cruz-Neira et al. (1993)), exist in the form of an cube, with two open faces at the top and the side. Several projectors display a stereoscopic image on each internal face of the cube, resulting in a surrounding view of the virtual environment. A single user wearing stereoscopic glasses equipped with motion capture trackers allow for the rendered point of view to be synchronized with the user's head position. • Stereoscopic and non-stereoscopic screens are also used in VR research, in single or multiple screen setups. They might also be used in conjunction with head-tracking devices allowing the rendered point of view to be synchronized with the user's head movement.
A common mean of interaction is achieved through tracking the position and motion of the user's hands, allowing them to use natural gestures to "touch" objects in the environment. Fine finger movement is often not tracked and VR controllers often use simple buttons to allow for grabbing objects or other programmed interactions within the environment.
These interactions can be enhanced with tactile and force feedback, or "haptic" feedback. A simple implementation of haptic feedback is commonly achieved through vibration of a VR controller, for instance to notify the user of a contact with a virtual object. Interaction devices can however take many more forms, with varying degrees of haptic feedback: • Handheld controllers: handheld tracked devices capable of haptic feedback through vibrations. • Haptic suits (Konishi et al. (2016)): full or upper-body suits with vibrating motors allowing for localized tactile feedback on the user's body. • Force-feedback arms: a robotic arm with an end effector design to be held by the user. The force-feedback system can then apply forces and torques on the end effector, usually in three or 6 degrees of freedom, to simulate weight of objects and collision forces within the environment. Table 1 presents the frequency of usage of different VR systems within the selected studies. We observe that a large majority of the literature concerns HMD-based VR. In particular, recent consumer-oriented solutions such as the Oculus and HTC HMDs are well represented in the selected literature, illustrating a shift towards affordable consumer VR for research: these two solutions, released to the public in 2016, were featured in 29 out of 44 of the selected articles in the 2016-2021 period. Furthermore, in that period, a similar shift is observed in VR content authoring software usage, with the 3D engine Unity being cited in 19 out of 44 articles. This illustrates the recent evolution of VR technology, justifying an interest in assessing ecological validity in the different technological contexts presented by VR.

Topics
This literature review aims to assess the validity of VR for OSH research. The review however sources experimental studies from a wide spectrum of fields to validate different dimensions of the ecological validity of VR. Table 2 presents the Web of Science categories in which the selected studies are classified.
Within this article, we organized selected studies into four main topics that are discussed in the following sections. The first topic concerns spatial perception, which is how the perception of distances and speed is influenced in a virtual environment. The second topic is the validity of stress and risk perception within a virtual environment, in general and industrial contexts. The third topic concerns how cognition is affected by VR. Finally, the fourth topic relates to biomechanical considerations in the virtual environment, more specifically, how manual tasks are performed in VR and its effects on movement and posture. Table 3 presents the repartition of selected studies in these categories.
Among the selected literature, we found seven studies who used the Simulator Sickness Questionnaire (SSQ, Kennedy et al. (1993)) to assess the incidence of cybersickness in subjects. In these studies, the SSQ was used to compare incidence of cybersickness across different experimental conditions or VR apparatuses.

Distance perception
Distance perception in VR is affected by the display used, and egocentric distances, i.e. distances from the observer to an object tend to be underestimated while in a virtual environment.
Technical characteristics of the display can mediate this effect. For instance, Naceri et al. (2009) show that distance judgments up to 80 cm are less accurate using HMDs compared to stereoscopic screens. Ghinea et al. (2018) compare HMDs to CAVEs, finding that HMDs gives less accurate distance judgments for distances up to 8 m. Bodenheimer et al. (2007) and Peillard et al. (2019) further demonstrates that perceived spatial distortion in HMDs is nonlinear. Hatzipanayioti and Avraamides (2021) show that this distortion can affect angle perception and pointing accuracy.
The effect is however present on non-stereoscopic screens (Makaremi and N'Kaoua (2021); Popp et al. (2004)), stereoscopic screens ; Woldegiorgis and Lin (2017)) CAVEs (Piryankova et al. (2013);Hofmann et al. (2001)) and HMDs (Bodenheimer et al. (2007); Gamberini et al. (2008); Geuss et al. (2012); Hiramoto and Hamamoto (2018); Napieralski et al. (2011)). A comparison of older and more recent HMDs indicates a reduction of that effect on newer devices (Kelly et al. (2017)), possibly as a consequence of increased display resolution and visual information (Ryu et al. (2005)). Jones et al. (2008) and Jones et al. (2013) investigate the effect of field of view of HMDs on distance perception. They show that a narrower field of view biases distance judgments in both real and virtual environments, but that effect is not sufficient to fully explain

Spatial perception 33
Stress and risk perception 19 Cognition 14

Biomechanical considerations 27
Frontiers in Virtual Reality frontiersin.org distance underestimation in HMD-based VR. Furthermore, while depth perception seems to be an important feature of modern VR displays, Paille et al. (2005) show that stereoscopic displays do not increase accuracy of distance perception, contrary to what may be intuitively believed. Stereoscopy is however shown to reduce error in angle perception Karaman et al. (2010). Vienne et al. (2020) suggest that the lack of visual cues in a virtual environment may be a cause of distance underestimation. Steinicke et al. (2010) show that having the subject travels to the experimental location within the virtual environment allows for better distance judgments compared to directly immersing them at the experimental location. Similarly, Kelly et al. (2017) shows that walking within the virtual environment improves distance judgments. Using virtual environments that are similar to the real environment produces a similar improvement, as demonstrated by Ahmed Wick et al. (2010).
Accurate distance perception is necessary to assess affordance in the virtual environment: for instance whether the subject can pass through a gap or over an obstacle. Bhargava et al. (2020) note that subjects in HMD-based VR make similar judgments of passability through gaps of different sizes, but require a closer examination of the gap to produce those judgments. Furthermore, Lin et al. (2015) show an effect of using a virtual avatar on affordance judgments when stepping over an obstacle or down a ledge: subjects produce more conservative judgments when the avatar is visible. A similar effect is also observed when subjects are instructed to cross the obstacle after they make their judgement.

Speed perception
VR technologies can also be used to research road safety: it allows exposing drivers or pedestrians to each other in simulated dangerous situations, in a safe and controlled environment, allowing for easier and repeatable experimentation. Collisions between vehicles and pedestrians are of particular relevance in industries such as civil engineering and logistics.
Feldstein and Dyszak (2020) observe a difference between the behavior of pedestrians in HMD-based VR and in the real world when faced with the decision whether to cross a road in front of an incoming vehicle at varying speeds. In the real world, pedestrians' decision to cross is influenced by both the distance and speed of the incoming vehicle. However, in the virtual environment, pedestrians make the same crossing decisions irrespective of variation in speed, considering only the distance to the car. Consequently, while in VR, the time separation between the car and pedestrian decreases as the vehicle speed increase. A second experiment described by Feldstein and Peli (2020) corroborates these results, while also demonstrating a similar effect of vehicle color on crossing decisions in both real and virtual environments. This shows that VR may cause pedestrians to underestimate how speed increases the risk of collision. Schneider et al. (2021) compare crossing decisions in CAVE-based VR, HMD-based VR and real world conditions. Subjects report lower self-reported risk during crossings in HMD-based VR and are more reluctant to cross in front of incoming vehicles. Conversely, subjects display riskier crossing behavior in CAVE-based VR with no increase in self-reported risk compared to real world conditions. Branzi et al. (2017) look to validate how speed-related behavior of drivers is replicated between a real road condition and its virtual clone. They observe that drivers have a higher speed while in the simulator, however, drivers adapt their speed in the virtual environment in similar way than on the real road. In other words, the speed in virtual and real correlation is strongly correlated, despite a general speed difference. This could suggest that the difference in driving behavior could be the result of a difference in distance or speed perception in the simulator. For instance, acceleration forces of the vehicles cannot be felt while in the simulator. Furthermore, other technical parameters such as the field of view of the VR display are known to influence the perception of ego-speed (Hussain et al. (2020); Lidestam et al. (2019)). Display type also affects motion perception: Riecke et al. (2005) show that angular motion is underestimated when using an HMD compared to a non-stereoscopic screen, even when the field of view is matched. Moreover, cybersickness is a factor that can influence driver behavior in a simulator, as shown by Malone and Brünken (2021): users reduce their speed to avoid or limit effects of motion sickness.

Stress and risk perception 3.3.1 Stress induction in virtual reality
In order to properly simulate dangerous situations or to assess acceptability of a workstation design, VR needs to be able to induce measurable stress. It has been shown that a physiological stress response can be induced in VR, using artificial stressors such as a time-sensitive dual task (Legkov et al. (2017)). A point of interest of this review is how VR can induce an ecologically valid stress response in simulated dangerous situations, that could be of interest to OSH.
We identified a set of studies investigating how VR can be used to assess evacuation behavior in fire evacuation scenarios. Fire evacuations in VR induces a greater self-reported negative affect compared to a control condition in CAVE-based (Maïano et al. It is of interest to assess how self-reported and physiological adverse reactions to virtual danger influences behavior and decision-making. Kinateder and Warren (2016) show that selfreported risk of subjects exposed to a fire alarm is lower when the stimuli is presented in HMD-based VR compared to a real evacuation exercise. Consistent with self-reported risk, they also observe that subjects are less likely to evacuate when they are within the virtual environment. Kobes et al. (2010) and Feng et al. (2021) examine the validity of subjects behavior and exit choice in an evacuation exercise. Feng et al. (2021) report high validity in exit choice for subjects standing in HMD-based VR. However, Kobes et al. (2010) show that when required to explore their environment looking for an exit, subjects are less likely to correctly chose the nearest one in VR. confronted with a virtual risk of falling from height, compared to a safer environment. Martens et al. (2019), Meehan et al. (2002) and Phillips et al. (2012) observe increases in physiological markers of stress, such as heartrate and Galvanic Skin Response (GSR). Yuan and Steed (2010) and Zhang and Hommel (2016) present experiments in which the virtual hand controlled by the subject is threatened by a falling object, comparing realistic human hands with non-human or abstract hands. They observe greater anxiety when the realistic human hand is threatened, through, respectively, an increase in GSR, and an increase in self-reported anxiety. This indicates that using a realistic avatar increases subjective body ownership and physiological response to threats. Inoue et al. (2005) present a comparison of operators subjective anxiety towards a mobile robot in a real and a CAVE virtual environment. They report a high correlation of self-reported anxiety (r = 0.99) of operators toward robot motion between the real and virtual environment. Ng et al. (2009) however report a slight effect of the virtual environment increasing hazard perception towards a mobile robot, appearing as speed and proximity with the operator increases. Furthermore, Kamide et al. (2011) reports that operators express less concern about a robot going out of control during its operation. Ng et al. (2011) compare the perception of risk of operators interacting with robots in real and virtual conditions. Subjects are asked to estimate the work envelope of various models of robots, i.e., the volume in which the robot can operate, and where a collision between the robot and the operator is possible. The perceived size of envelope increase with the size of the robot and its operating speed, both in real-world and virtual conditions. However, in a second experiment, subjects are shown a simulated accident between the robot and an operator. In the simulated-accident real-world condition, subjects perceived a larger work envelope compared to the no-accident condition. However, subjects who witness the accident in virtual reality do not seem to adjust their perception of the work envelope relative to the no-accident condition. Earlier research from Or et al. (2009) shows that operators self-reported perception of risk increase after witnessing a similar simulated accident in VR. This suggests that such self-reports may not be associated with an effect on decision making while in VR. Furthermore, El-Shawa et al. (2017) reports an effect of HMD-based VR on perceived work envelope, as operators tend to stand farther away from operating robots in a virtual environment compared to the real one.

Cognition
Tasks performed at a workstation always rely on the cognitive resources of the operator. In order to assess mental workload and task performance in VR, it is necessary that the VR simulations do not add to the mental solicitations of the operator and solicit the same cognitive processes as the real situation.
Luong et al. (2019) find no effect of HMD-based VR on performance and mental workload in n-back tasks (Kirchner (1958)) of varying difficulties, assessed using the Rating Scale Mental Effort (RMSE, Zijlstra and Van Doorn (1985)). Broek et al. (2008) show no increase of mental workload (RMSE) in a memory and manual task in screen-based VR. Furthermore, dual-tasking in VR properly increase cognitive load, as shown in Han et al. (2021) (measured using NASA-TLX; Hart and Staveland (1988)).
However, Shen et al. (2019) observe greater mental fatigue in a virtual office workspace compared to a real one, for an exposition period of 8 h. This shows that an effect of VR on mental workload may exist but may not be observable in short sessions. Time perception is also distorted in virtual reality, as demonstrated by Mullen and Davidenko (2021): subjects were tasked to estimate a 5-min interval produces significantly longer estimations while in VR. Kisker et al. (2021a) show that HMD-based VR involve different processes of memorization and retrieval. In an experiment where subjects are tasked with exploring a virtual environment and learning the locations of differents object, they rely on recollection-based memory (remembering a particular event with high accuracy) when exploring the environment using an HMD, while screenbased VR elicits familiarity-based memory. As natural autobiographical memory is recollection-based, this shows validity of using HMD-based VR with respect to the nature of the memory processes involved. Mania et al. (2003) further report no difference in performance between a real task and HMD-based VR. Mellet et al. (2009) also observe a difference in the nature of brain activity between real and screen-based virtual environment learning tasks, despite no observable difference in subjects performance between the two conditions.
This difference in memory process may result in performance variations depending on the tasks and VR apparatus used. For instance, Sturz et al. (2009) and Tlauka et al. (2008) report no difference in spatial learning and memory between a real environment and screen-based VR, while performance in HMDbased VR compared to screen-based may (Figueroa et al. (2017)) or may not (Kim et al. (2018)) be improved depending on the nature of the spatial learning task. Srivastava et al. (2019) investigate how HMD-based VR influence spatial learning and memory when exploring a virtual environment, compared to monitor-based VR. They observe an effect of HMDbased VR, increasing mental workload measured by NASA-TLX, and a degradation of memory. Subjects were however found to spend less time exploring the environment in HMD-based VR. This could be caused by higher reported motion sickness when using the HMD, and explain the difference of performance between subjects in the two conditions.

Biomechanical considerations
3.5.1 Locomotion and posture in VR VR offers a variety of techniques for locomotion in a virtual environment. Arguably, natural locomotion, i.e. walking in the real environment to move within the virtual one would present the highest validity. However, several studies indicate different walking behavior in real and virtual environments. In particular, users lower their walking speed in HMD-based VR. Janeh et al.
(2017) reports a reduction of step length and walking speed of 6%, while Agethen et al. (2018) reports a speed reduction of 13%. Other gait characteristics are also influenced in VR, with an increase in step count, base support, and double support time (the amount of time in the walking cycle where both feet are on the ground).

Frontiers in Virtual Reality frontiersin.org
When colocalized with other users in the virtual environment, Berton et al. (2019) report a higher reduction of walking speed of 19.5%, with higher clearance when passing another user compared to a real environment. Podkosova and Kaufmann (2018) also show higher clearance and a higher self-reported risk of collision when two users cross paths in VR compared to real conditions. A limitation of real walking locomotion in VR is the need for a real space at least as large as the virtual environment. To address this limitation, other locomotion devices can be used, from simple joysticks to complex equipment simulating semi-natural movement, such as omnidirectional treadmills allowing users to control their walking speed and direction. Souman et al. (2011) propose and seek to validate an omnidirectional treadmill. They show that step length and walking speed are reduced and turn radius is increased when using the treadmill compared to real walking in the virtual environment, however, familiarity with the locomotion interface reduces the effect. Nabiyouni et al. (2015) compare natural walking in a virtual locomotion task with joystick-based motion and a VirtuSphere device, a hollow spherical apparatus offering similar functionality to an omnidirectional treadmill. They observe subjects adopting a similar trajectory in natural walking and joystick conditions, but deviate from that trajectory when using the VirtuSphere and require more time to complete the locomotion task.
Immersion in the virtual environment also affects standing balance. Kawamura and Kijima (2016) show that body sway increases in HMD-based VR. This effect further increases with the latency of the display device. An effect of the nature of the virtual environment is also observed: subjects have increased body sway in a realistic virtual environment that is unfamiliar, compared to a replica of the laboratory where the experiment takes place. This environmental effect is also investigated by Assländer and Streuber (2020), comparing a virtual replica of the real experimental environment to an abstract virtual environment. These two studies show an effect of HMD-based VR on standing balance, that can however be moderated by using an environment that subjects are familiar with and ensuring low latency of the VR system.

Performing manual tasks in VR
Studies investigating how operators perform manual tasks in real and virtual environments show that they require more time and make more mistakes in completing the virtual task. This effect holds true in both CAVE-based (Dessing et al. (2004); Sutcliffe et al. (2005)) and HMD-based VR (Arnold et al. (2002); Liu et al. (2009)). Mottelson and Hornbaek (2017) demonstrates an impact of the VR apparatus used, comparing an identical pointing task using a desktop HMD in a laboratory or smartphone-based HMD: subject perform a pointing task faster and more accurately with the greater visual information displayed on desktop HMD. Sutcliffe et al. (2005) shows that displaying a virtual hand improves execution time and accuracy in a CAVE-based VR task. Chessa et al. (2019) shows the validity of using hand-tracking devices to examine grasping behavior in VR, as similar grasping behavior with real or virtual objects. Hameed et al. (2021) however reports that using hand-tracking as a way to interact with the virtual environment also increases the time required to perform a pick-and-place task, while increasing the mental effort of the operators. Tian and Duffy (2011) and Wu et al. (2012) propose an experiment comparing ergonomic assessments based on a lifting task performed by an operator in a real-world setting and in HMD-based VR, as well as the same task simulated using a digital human model. The ergonomic assessment is performed using the NIOSH Recommended Weight Limit equation (RWL, Waters et al. (1993)). The RWL allows to compute the maximum mass an operator should manipulate depending on postural and task-dependent constraints. A more ergonomic workstation would allow the operator to adopt postures that raise their RWL. Hu et al. (2011) compare a drilling task in a real environment and in HMD-based VR. Operators report higher Rated Perceived Exertion (RPE, Borg (1970)) and higher Body Part Discomfort while working in the virtual environment, and required more time to complete the task.
These results show that, when performing the task wearing a VR HMD, operators tend to adopt less ergonomics postures, leading to greater exhaustion. A proposed explanation for the postural changes is the reduced peripheral field of view when wearing an HMD, forcing the operators to adapt their posture to better view manipulated objects and their environment.

Haptic feedback
Haptic devices allow for interactions with the virtual environment with tactile and force feedback, thus bringing complimentary sensory information. Force feedback can also be used to prevent interpenetration of virtual objects, thus preventing operators to perform gestures that would be physically impossible in a real environment. Force feedback allows to simulate weight and collisions of objects in the virtual environment, promoting realistic muscle activation ), gripping behavior (Bergamasco et al. (2006)) and ability to feel the shape of objects (O'Malley and Goldfarb (2002)) similar to what is observed in real conditions. However, Bell and Cao (2007) show that force feedback can increase applied force and reaction times compared to real conditions. Lassagne et al. (2017) demonstrates the need for haptic feedback in a task where an operator must interact with a virtual touchscreen. Using a simple inert panel or actual screen colocalized with the virtual one, they provide passive haptic feedback. This passive feedback allows for faster interactions between the operator and machine while reducing the number of errors made. Pontonnier et al. (2014) report operators' subjective experiences when using an active force-feedback mechanical arm in a virtual workstation. In the described experiment, subjects insert different shapes in a hole-box in real and virtual environments, with or without force feedback. Operators report more difficulty and less realism in completing the task in the virtual setting when using the haptic device. Pontonnier et al. (2014) suggest that this difficulty arise from and absence of colocalization between the haptic device and the visuals of the virtual environment. Louison et al. (2017) propose an upper-body haptic suit designed to give tactile feedback to operators when they touch or collide with objects in the virtual environment. The device aims to reduce situations where the body of the operator would pass through virtual objects, thus making their postures and gestures not ecologically valid. They observe that, despite the suit providing only tactile but no force feedback, operators reduced their number of collisions and time in interpenetration with the virtual environment, thus reducing impossible and invalid movements.

Frontiers in Virtual Reality
frontiersin.org 4 Discussion VR technologies still have technical limitations on how the virtual environment can be rendered, displayed and interacted with. We find that these limitations influence visual perception in VR, leading to distortions in how operators may see the virtual space around them. This affects how distances and velocities are perceived in VR: egocentric distances and ego-speeds are generally underestimated and less accurate. This represents a significant bias in how operators perceive, and thus interact with, the virtual workspace. These effects are however mediated by technological factors, such as display resolution and field of view, and they may become less relevant as VR display technologies continue to evolve.
Current technical limitations also strongly influence how operators are allowed to interact with the virtual environment. This results in operators performing manual tasks more slowly in VR, and with less accuracy. In order to make interactions more realistic, haptic feedback can be used, in particular to prevent operators from performing gestures that would be physically impossible in the real workspace. Haptics can also improve performance in tasks which rely on tactile feedback.
A wide range of locomotion modalities exists in VR. Real space natural walking present good validity in use cases where it is applicable. VR however has an effect on gait characteristics of operators, who decrease their walking speed and adopt a generally more prudent gait, especially around virtual obstacles. Other locomotion modalities exist, such as using a joystick, present good validity with respect to trajectories, but may not be used in contexts where gait and postures are to be analyzed. Furthermore, when designing a virtual environment that the operator must travel through, one must consider the possible incidence and effects of cybersickness, and the potential adaptative behaviors the subject may develop in response to it.
Just like in a real workstation, performing tasks in VR solicits the cognitive resources of the operator. We find that the largest effect of VR on cognition affects the nature of the memory processes involved in a cognitive task. In particular, unlike real space and HMD-based VR which solicit recollection-based memory, screen-based VR solicits familiarity-based memory, a less accurate and less ecological memory process. However, outside of this effect on memory, we found little evidence that confirms that VR has an effect on performance and mental workload in cognitive tasks.
VR offers a safe environment to evaluate behavior in risky environments or situations. While VR can elicit proper physiological stress responses in simulated dangerous situations, the extent to which those reactions are quantitatively valid is unknown. This will remain a limitation of VR validity literature as experimental validation is often not possible for such dangerous scenarios.
We also argue that subjective evaluations of stress or risk in VR need to be validated with respect to measurable operator behavior. This is illustrated by the robotic use cases described previously: while operator may produce ecologically valid risk assesments in the virtual environment, their decision-making and interactions with the virtual robots do not reflect real behavior. This disconnection between risk assessment and behavior in VR may simply be the consequence of the virtual environment not presenting an actual risk, while risk assessments rely on a theoretical understanding of the dangerous situation. Among factors that can increase the ecological validity of stress and risk perception VR, having a realistic representation of the operator helps to create the illusion of body ownership and promotes presence in the virtual environment.
Within the selected literature, we found few investigations on the impact of cybersickness on ecological validity. Cybersickness presents itself as a collection of symptoms (Gallagher and Ferrè (2018)), similar to motion sickness (Mazloumi Gavgani et al. (2018)), induced by immersion in a virtual environment. While extensive research on the incidence of cybersickness exists, its relation to ecological validity remains difficult to investigate: experimental studies will often exclude affected subjects. This may introduce a bias in the studied population, as, for instance, frequent users of VR are less likely to suffer from cybersickness (Del Cid et al. (2021)).
Besides, while the more severe cases of cybersickness will force users to exit the virtual environment, milder symptoms still remain an obvious obstacle to the feeling of presence Weech et al. (2019) potentially leading to poor ecological validity of the simulation Deniaud et al. (2015). This can occur as either a direct consequence of symptoms or from users adjusting their behavior Virtual environment designers can minimize the risks of cybersickness by following best practice guides proposed by VR devices constructors. These guidelines are however bound to evolve as both VR systems and cybersickness knowledge evolve. The high variability of cybersickness, between individuals (Del Cid et al. (2021)) or simulated settings (Davis et al. (2015)), also requires informing users of the possible symptoms before usage, and frequently checking for symptoms during usage.
Workstation designers must also remain aware of their terminal goal of optimizing ergonomics, safety and productivity within the real work environment. While VR tools allow for quick and costeffective prototyping and assessment in an iterative workstation design process, one must not overfit the design to the idiosyncrasies of VR.

Conclusion
This systematic literature review investigated the ecological validity of VR, as a tool for safe workstation design. Using a standard literature review methodology, we identified a set of 92 articles presenting experimental assessment of VR with regard to four main topics relevant to workstation design: spatial perception, stress and risk perception, cognition and biomechanical considerations.
Some limitations of VR ecological validity as a tool for safe workstation design were discussed. These limitations are however, in part, mediated by technical factors, such as haptic feedback, display resolution and field of view. As newer VR hardware gets released, the limitations discussed in these reviewed may disappear. However, we found seldom studies comparing a real environment and a virtual one experienced through different devices, as ecological validity is often assessed using the most advanced hardware available. Further research on technological factors impact on validity could prove necessary.
Furthermore, we identify two limitations in the set of selected studies. First, the chosen literature for this review does not consider potential inter-individual differences among subjects. For instance, Frontiers in Virtual Reality frontiersin.org familiarity with the VR interface or susceptibility to cybersickness could influence the validity of activity observed in a virtual environment. Secondly, validity is often evaluated in short-term studies outside of an industrial context. For the purpose of workstation design, studies of ecological validity on industrial use cases with long-term follow-ups would be recommended. Besides, this review assessed the ecological validity of VR in generic contexts. Different limitations of VR as an ecologically valid tool for workstation design could manifest in particular professional contexts, depending, for instance, on the precision, speed or strength required to accomplish a given task. Further works should consider those current limitations in the literature in order to properly assess ecological validity for safe workstation design.

Data availability statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions
GP and AS designed the review procedure (research question, queries, inclusion criteria and data extraction). GP performed retrieval, selection and data extraction of reviewed articles.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.