Behavior and self-efficacy modulate learning in virtual reality simulations for training: a structural equation modeling approach

Mousavi, S. M. Ali; Powell, Wendy; Louwerse, Max M.; Hendrickson, Andrew T.

doi:10.3389/frvir.2023.1250823

ORIGINAL RESEARCH article

Front. Virtual Real., 23 October 2023

Sec. Virtual Reality and Human Behaviour

Volume 4 - 2023 | https://doi.org/10.3389/frvir.2023.1250823

Behavior and self-efficacy modulate learning in virtual reality simulations for training: a structural equation modeling approach

S. M. Ali Mousavi*

Wendy Powell

Max M. Louwerse

Andrew T. Hendrickson

Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, Netherlands

Introduction: There is a rising interest in using virtual reality (VR) applications in learning, yet different studies have reported different findings for their impact and effectiveness. The current paper addresses this heterogeneity in the results. Moreover, contrary to most studies, we use a VR application actually used in industry thereby addressing ecological validity of the findings.

Methods and Results of Study1: In two studies, we explored the effects of an industrial VR safety training application on learning. In our first study, we examined both interactive VR and passive monitor viewing. Using univariate, comparative, and correlational analytical approaches, the study demonstrated a significant increase in self-efficacy and knowledge scores in interactive VR but showed no significant differences when compared to passive monitor viewing. Unlike passive monitor viewing, however, the VR condition showed a positive relation between learning gains and self-efficacy.

Methods and Results of Study2: In our subsequent study, a Structural Equation Model (SEM) demonstrated that self-efficacy and users’ simulation performance predicted the learning gains in VR. We furthermore found that the VR hardware experience indirectly predicted learning gains through self-efficacy and user simulation performance factors.

Conclusion/Discussion of both studies: Conclusively, the findings of these studies suggest the central role of self-efficacy to explain learning gains generalizes from academic VR tasks to those in use in industry training. In addition, these results point to VR behavioral markers that are indicative of learning.

1 Introduction

Virtual reality (VR) has increasingly been used as a tool for training in a variety of domains, including education (De Back et al., 2020; van Limpt-Broers et al., 2020; Schloss et al., 2021), medicine (Yang et al., 2018; Behmadi et al., 2022), and industrial maintenance (Pedram et al., 2020; Makransky and Klingenberg, 2022). In addition to an effort to understand in what contexts and what aspects of VR training are more beneficial than other training methods (Buttussi and Chittaro, 2017; Makransky, Borre-Gude, et al., 2019; Radianti et al., 2020), there is an increasing focus on understanding the cognitive and affective factors that explain the variability of learning in VR (Makransky and Petersen, 2019).

Both immersive VR and 2D screen training methods, have the potential to leverage multimedia learning principles to facilitate more effective training by optimizing the integration of various visual and auditory information (Mayer, 2009; Mayer, 2014). While several studies point to immersive VR promoting a higher degree of learning than 2D screen solutions (e.g., Krokos et al., 2019; Johnson-Glenberg et al., 2021), others report no difference in learning effectiveness between VR and non-VR conditions (Greenwald et al., 2018; Madden et al., 2020; Souchet et al., 2022). Some studies have even reported a lower degree of learning in VR conditions compared to a 2D screen solution (Molina-Carmona et al., 2018; Makransky et al., 2019).

One explanation for these mixed findings might be the complex nature of learning, with a myriad of elements in the learning process needing to be considered together and not in isolation (Salzman et al., 1999). For instance, factors such as self-efficacy or perceived user confidence (Gegenfurtner et al., 2014), the training context (Hamilton et al., 2021), learners’ behavioral traits (Bailenson et al., 2008; Gavish et al., 2015; Pathan et al., 2020), as well as the quality of the interaction experience with the system (Salzman et al., 1999; Wang et al., 2017; Rupp et al., 2019) all play an important role both in the learning process and its outcome. And the aim of the current study is to advance the knowledge in both VR learning outcomes and the process of learning in VR.

Given the primary objective of understanding the complexity in VR learning process and learning outcomes, a foundational aspect to explore is self-efficacy. Self-efficacy, was defined by Bandura (1997) as the perceived confidence in conducting the trained task. Self-efficacy beliefs influence an individual’s level of motivation, their resilience in the face of challenges, and the amount of effort they invest in a task, according to Bandura’s social cognitive theory (Bandura, 1993). Consequently, learners with a higher sense of self-efficacy are more likely to persist in difficult situations, resulting in improved learning outcomes (Pajares, 1996; Zimmerman, 2000). Self-efficacy has a central role in many explanations of learning gains found in VR training tasks (Wang and Wu, 2008; Richardson et al., 2012; Gegenfurtner et al., 2013; Tai et al., 2022). Most notably, Makransky and Lilleholt (2018) and Makransky and Petersen (2019) used a wide array of cognitive and affective measures within structural equation modeling (SEM) frameworks—a statistical technique that combines factor analysis and multiple regression analysis (Kline, 2015)—to explain variability in learning gains. The conclusion in both studies was that most measures indirectly explained a degree of learning, but the strongest direct connection to learning was from self-efficacy measures. In their CAMIL model, Makransky and Petersen (2021) also, reported a positive relationship between self-efficacy and learning outcomes. Finally, Tai et al. (2022) presented a model in which self-efficacy explained learning through a positive relationship with VR-learning-interest and a negative relationship with VR-using-anxiety. In light of these findings, self-efficacy serves as a central factor in our study as well.

In line with our objective to unravel the complexities influencing learning in VR, it is essential to address the behavioral traits of a learner known to be a factor that correlates with learning gains in VR (Cheng et al., 2015). These traits refer to in-game, real-time, objective behavioral measures of the user. Researchers often convert these measures into performance and use them as assessment methods embodied in the VR environment, which is also known as stealth assessment (Shute, 2009; Alcañiz et al., 2018). However, it is important to distinguish between behavioral data collected during training versus testing phases in VR. This can be seen as analogously to student classroom behavior during regular training versus during an exam. Paying attention to a text for an extended time during a training session, is more focused on the learning process and can reveal personal characteristics, intrinsic motivations and interests, which then may result in more effective learning. In contrast, extended attention to a part of text during a test, is directly related to the learning outcomes, which may indicate understanding issues, and propose potentially lower learning outcomes. In our study, we focused on training-phase data to isolate and better understand how inherent behavioral traits influence learning in a VR training procedure, rather than simply measuring the end result of learning outcomes.

Various objective measures have been employed in different studies to assess performance. For instance, Salzman et al. (1999) used administrator observations, time on task, error types, and error rates as indicators of actual performance to characterize their other variables. Gavish et al. (2015) used task time, the number of picture clues required, and the number of unsolved errors to calculate a combined performance score. Similarly, Shi et al. (2020) applied accuracy and operation time as indicators of task performance and used machine learning methods to predict learning outcomes. Inspired by these studies, we introduced the latent variable “user simulation performance,” inferred from four objective measures we selected, including time on task, error count, question count, and fixation on the checklist. We believe that the combined insights from these references suggest a correlation between user simulation performance and learning gains. And a novel aspect of our study is defining this variable and exploring its relationship with learning gains and self-efficacy.

To further strengthen our understanding of the VR learning process, we turn our focus to another known factor to influence learning, the quality of the interaction experience with the VR training environment (Salzman et al., 1999), which we term as “VR hardware experience.” This is a latent variable, inferred from usability and simulator sickness. However, the literature is not clear about how direct is the relationship between this factor and learning gains. Jia et al. (2014) reported that usability has a positive correlation with learning in a memory-test, but Makransky and Peterson (2019), in their SEM model, indicated that usability explained learning indirectly through self-efficacy. This is similar for simulator sickness, with some studies reporting its direct effects on learning (Rupp et al., 2019), but others reporting no effect (Selzer et al., 2019).

There have been few studies to date that have tried to explain learning in VR simulations by measuring self-efficacy, user simulation performance, and hardware experience. Several studies have identified direct or indirect associations between usability and self-efficacy or perceived learning (Makransky and Petersen, 2019; Pedram et al., 2020; Song et al., 2021). Jia et al. (2014) reported a correlation between usability and task performance, and Johnson (2007) reported a correlation between simulator sickness scores and their participants’ statements that “discomfort hampers training.” This study is the first to evaluate all three as potential factors to explain learning gains in VR training environments.

In conclusion, studies that investigate the role of interactive VR in learning often give insufficient consideration to the variety of factors that attribute to the complexity of the learning process as well as the interactions of these factors. This may explain why the literature has yielded mixed findings on the positive, neutral, and negative effects of VR on the outcome of the learning process. Moreover, the influence of training context and application domain on VR effectiveness has been demonstrated in the literature (Madden et al., 2020; Wu et al., 2020; Johnson-Glenberg et al., 2021), but most studies focus on research-designed simulations rather than industry-designed training actually being used, raising concerns about the generalizability and applicability of the findings. The current study addresses these gaps by examining the effectiveness of a pre-existing real-world industrial applications used for maintenance and safety training, and by investigating the interrelation of different factors affecting learning in this solution. The aim of our investigation is twofold: advancing the understanding of VR learning outcomes and exploring the complexities of the process of learning in VR. To meet these objectives we conducted two studies. The first study centers on learning outcomes, asking the question whether current VR training produces any learning gains and self-efficacy in interactive VR and passive monitor viewing? In the second study we map out the factors that may explain learning gains in interactive VR training scenarios by using structural equation modeling to address the question how the interrelation of different factors like self-efficacy, user simulation performance, and VR hardware experience can define the learning gain in VR. Makransky and Petersen (2019) demonstrated that two sets of measures—self-reported affective and self-reported cognitive measures—filtered through the measure of self-efficacy, affected learning gains. We added additional measures to the model to unravel the complexity of the above-mentioned factors that may affect learning in VR simulations.

2 Study 1. Training using monitor versus VR

A two-part study investigated the learning outcomes and self-efficacy of an industrial VR application for training electrical maintenance tasks. The simulation was presented either as an interactive 3D VR simulation or a passive viewing condition on a 2D screen. Learning and self-efficacy gains were evaluated for each condition, then compared across conditions. Finally, the relationship between self-efficacy and learning gains was evaluated in each condition separately.

2.1 Method

2.1.1 Participants

Sixty individuals (39 females, age M = 21.83, SD = 4.20) from Tilburg University participant pool participated in the study for course credits. The study received approval from the ethics committee at the university (REDC # 20201035). While we did not assess specific VR expertise of participants, none of the participants was familiar with the particular industrial VR solution. Inclusion criteria were that participants had to be 16 years of age or older, no uncorrected hearing or visual impairments, and had to have proficiency in English—the standard language of communication for Tilburg University students. The exclusion criterion consisted of the inability to complete the VR training task.

2.1.2 Materials

2.1.2.1 Interactive VR simulation

Before starting the VR simulation, participants completed a 5-min VR experience with a simple task to get familiarized with the VR controllers and head-mounted display. The interactive VR simulation utilized in this study represents a real-world industrial scenario in a factory, that aims to train participants in conducting electrical maintenance—specifically, disconnecting a main feed pump from the cooling tower in the control room and subsequently performing a megger test on the connections—while ensuring safety protocols. This VR solution, provided by an industry partner, was a pre-existing training tool for their field service engineers and maintenance staff and is part of their annual training. In the VR simulation, the task requires that participants progress through three rooms: the introduction room, the equipment room, and finally the maintenance operation room. These rooms must be visited in this order, and specific actions are required in each.

In the introduction room, participants listened to a detailed description of the task presented by an embodied agent (Figure 1A). After this description, participants were provided with a digital “work-permit” consisted of an in-application text panel which could be viewed by the user in VR as needed. This permit outlined all the tasks that were to be conducted, along with some necessary steps required to complete them. In the equipment room, participants collected the personal protective equipment (PPE) (Figure 1B) as outlined in the work-permit. Participants who did not collect all the necessary equipment were not permitted to proceed to the maintenance operation room. The final room is the maintenance operation room, in which all electrical maintenance steps had to be completed by using a set of tools present in the room (Figure 1C).

FIGURE 1

FIGURE 1. (A) Introduction Room; (B). PPE Room; (C). Operation Room; (D). Serious Error Stressor. Reproduced with permission.

The sequence of steps needed to complete the maintenance task could result in nine serious errors that could occur if the participant did not correctly follow the instructions. After each serious error, the participants experienced different stressors, such as an explosion or an evacuation alarm (Figure 1D) and were automatically teleported out of the maintenance operation room back into the equipment room. These stressors were inherent to the original VR solution provided by our industry partner, reflecting real-world industrial training scenarios. For each serious error made, the experimenter, who was the first author of the manuscript, verbally provided the participant with the corresponding guideline associated with the error and how to avoid it. To ensure uniform feedback for all participants, both the experimenter and the verbal feedback remained consistent across all sessions.

Following current training protocols, in which participants could ask questions to the experimenter, a set of 25 pre-written responses to frequently asked questions was created. These were based on queries from earlier trial-run sessions, intended to minimalize confusion across participants while ensuring standardization. Before starting the experiment, participants were informed that whenever they felt stuck, they could ask the experimenter a question. In this case, the question was recorded, and the most appropriate pre-written response was provided verbally by the experimenter. The experiment continued until either a 30-min time limitation was exceeded, or the user had completed the task without producing a serious error.

2.1.2.2 Hardware and technologies

In the VR condition, we used an HTC Vive Pro 2 head mounted display (HMD), up to 100° (horizontal) FOV, and 6-DOF trackers. We used this specific HMD to be consistent with the equipment used by our industry partner, who uses the same hardware and software for their annual training of their employees. Moreover, currently this HMD is one of the most prominent VR HMDs on the market and has been used in a variety of other studies (including Dey et al., 2019; van Limpt-Broers et al., 2020). The application was streamed through SteamVR to the HMD via the Windows 10 operating system.

2.1.2.3 Passive 2D monitor viewing

In the monitor condition, participants viewed a 7-min video of a user performing a walkthrough of the interactive VR simulation environment without committing any serious errors. Participants completed this experiment online at home using their personal setups, due to covid. The video was played in a full-screen mode automatically and all controls otherwise available to the participant (e.g., clicking, fast-forwarding, skipping, etc.) were deactivated by an embedded JavaScript code in the Qualtrics platform.

2.1.2.4 Knowledge questionnaires and learning gains

An interactive design process involving the maintenance training specialists produced 23 intended learning objectives for the VR simulation. Two knowledge questions were created for each intended learning objective and were then randomly assigned to one of two sets. This produced two sets of 23 questions, with each set featuring one question that covered each learning objective. Both before and after training, participants answered a set of written questions using Qualtrics on a desktop computer platform, with the order of question sets being counterbalanced across participants.

Learning gains were computed based on the average-normalized gain between pre- and post-knowledge assessments. Learning gains were calculated as the ratio of the actual average gain (%post—%pre) to the maximum possible average gain (100—%pre) (Hake, 1998).

2.1.2.5 Self-report measures

2.1.2.5.1 Self-efficacy

Self-efficacy is a measure of people’s perceived confidence in their ability to perform a specific task (Gegenfurtner et al., 2014). Following Bandura (2006) and Luszczynska et al. (2005), we adapted six questions from the General Self-Efficacy Scale (Schwarzer and Jerusalem, 1995). This scale has demonstrated good internal consistency, with Cronbach’s alpha values ranging from .76 to .90, the majority of which are in the high .80s (Croasmun et al., 2011). Participants rated statements such as “I can do an electrical maintenance operation” and “I feel confident that I can do an electrical maintenance operation in a limited time” on a 7-point Likert scale from 0 (lowest ability) to 6 (highest ability). These questions measured participants’ confidence in their ability to successfully complete the electrical maintenance task. Aggregate unweighted scores were computed and normalized to 0 to 100.

2.1.2.5.2 System usability

System usability measured how participants perceived the usability of the computer systems they were using (Brooke, 2013). To measure system usability, we used the 10-item System Usability Scale (SUS) (Brooke, 1996), which has a very good reliability with a global Cronbach’s alpha of .91 (Peres et al., 2013). Sample items from the SUS include “I thought the training system was easy to use” and “I felt very confident using the training system.” These questions measured users’ perceptions of the usability of the VR training system, and participants rated each statement on a 5-point Likert scale from 0 (lowest usability) to 4 (highest usability). Aggregate unweighted SUS scores were computed, including reverse-coding, when necessary, with a final range of scores from 0 to 100 (Bangor et al., 2008; Bangor et al., 2009; Sauro, 2011).

2.1.2.5.3 Simulator sickness questionnaire (SSQ)

To measure participants’ discomfort level in VR, we used the original SSQ designed by Kennedy et al. (2003), taking into account the scoring modification suggested by Bimberg et al. (2020). The questionnaire consists of 16 questions and a good reliability based on Cronbach’s alpha of .94 (Sevinc and Berkman, 2020). Participants rated sample items such as ‘Nausea’, ‘Headache’, and ‘Oculomotor discomfort’ on a 4-point Likert scale from “none” to “severe.” Higher scores indicate a higher degree of simulator sickness. We calculated the total SSQ score by aggregating the three unweighted subscales proposed by Kennedy et al. (2003) taking into consideration that five items repeat across the subscales. We multiplied the sum by Kennedy et al.‘s recommended scaling factor of 3.74, which translates to a possible score range from 0 to 235.62.

2.1.3 Design and procedure

After signing the informed consent form, participants in both the interactive VR condition and the passive viewing 2D monitor condition followed the same overall procedure. First, they completed the knowledge test and self-efficacy questionnaire. Participants in the interactive VR group received training in the electrical maintenance task VR simulation, while those in the passive viewing group underwent training by watching a gameplay video of the same simulation, but on a 2D monitor. After the training, participants in both groups completed the simulator sickness questionnaire, followed by the post-training knowledge test, self-efficacy questionnaire, and system usability questionnaire.

2.1.4 Statistical analyses

Several statistical methods were used to evaluate the effectiveness of our training conditions and various related measures. All measures recorded and reported here were analyzed in both studies. We used one-sample t-tests to determine whether the observed changes in learning gains and self-efficacy from pre-to post-test significantly deviated from zero. We utilized independent t-tests to compare the VR and monitor conditions in terms of learning gains, self-efficacy, system usability, and simulator sickness. In addition, Pearson correlation analyses were used to examine the relationships between learning gains and self-efficacy, system usability, and simulator sickness in both conditions.

2.2 Results

2.2.1 Learning and self-efficacy gains

2.2.1.1 VR condition

Twenty-nine out of 30 participants showed an increase in their knowledge scores from pre-to post-test (M = 42.10, SD = 18.9). These learning gains were above zero, according to a one-sample t-test, t(29) = 12.20, p < .001, d = 2.22. Moreover, the self-efficacy of 22 out of 30 participants increased (M = 12.2, SD = 18.7), and the self-efficacy gains were significantly greater than zero t(29) = 3.58, p < .01, d = 0.65. Detailed statistics are provided in Table 1.

TABLE 1

TABLE 1. One-sample t-test and descriptive stats for learning and self-efficacy across conditions.

2.2.1.2 Monitor condition

Twenty-five out of 30 participants showed an increase in knowledge (M = 32.62, SD = 29.62) and a one-sample t-test showed these knowledge gains were significantly above chance, t (29) = 6.03, p < .001, d = 1.1. Self-efficacy also showed an increase in 17 out of 30 participants (M = 9.63, SD = 15.6). A one-sample t-test showed a significant increase in self-efficacy after monitor training, t(29) = 3.38, p < .01, d = 0.62. For comprehensive statistics, refer to Table 1.

2.2.1.3 Comparison between conditions

Despite larger effect sizes for both learning and self-efficacy gains in the VR condition compared to the monitor condition there was not a significant difference between the VR and monitor conditions in learning gains, t(49.19) = 1.46, p =.14, d = .38 or self-efficacy, t(56.22) = 0.58, p = .56, d = .15. A detailed comparison between VR and monitor conditions, considering all measures, is presented in Table 2.

TABLE 2

TABLE 2. Independent t-test and descriptive stats for comparing all measures across conditions.

2.2.2 System usability checks

2.2.2.1 VR condition

System usability scored just below the average satisfaction rate of 68 (Brooke, 2013; Joshi et al., 2021), with the average SUS score being 66.42 (SD = 16.73). Also, simulator sickness was operationalized by SSQ scores (M = 29.67, SD = 26.02) in VR.

2.2.2.2 Monitor condition

For the monitor condition, the SUS scores (M = 56.75, SD = 15.45), were well below the acceptable standard (Brooke, 2013; Joshi et al., 2021), suggesting that the participants did not experience this system as acceptably usable. Additionally, the SSQ scores (M = 40.64, SD = 32.40), were derived from participants from watching the video on the monitor.

2.2.2.3 Comparison between conditions

A significant difference between VR and monitor was found for SUS, t(57.63) = −2.32, p = .02, d = .60. However, there was no significant difference between the VR and monitor conditions in SSQ, t(55.41) = -1.45, p = .15, d = −.37. Although, SSQ in monitor was slightly higher than induced sickness by using VR.

2.2.3 Relationship between learning gains and other measures

2.2.3.1 VR condition

A Pearson correlation test indicated a significant positive correlation between learning gains and self-efficacy, r(28) = .47, p < .01. In contrast, the correlations between learning gains and SUS, r(28) = .29, p = .12, and between learning gains and SSQ, r(28) = −.24, p = .2, were not statistically significant.

2.2.3.2 Monitor condition

Conversely, a correlation test showed no significant relation between self-efficacy and learning gains, r(28) = .23, p = .22. As with the VR condition, there was no significant correlation between learning gains and SUS, r(28) = .23, p = .21, nor between learning gains and SSQ (r(28) = −.05, p = .79) in the monitor condition.

2.3 Discussion

When either actively exploring an immersive VR simulation or passively viewing a simulation on a monitor, the industry-designed VR training resulted in significant improvements in both knowledge gains and self-efficacy. These results are consistent with research-designed 2D-VR training environments (Smith et al., 2018; Madden et al., 2020) and interactive VR training environments, e.g., (Buttussi and Chittaro, 2017; Smith et al., 2018; Rupp et al., 2019). Additionally, this study found no significant difference between the two conditions regarding SSQ, consistent with previous research by Joshi et al. (2021), but a difference between VR and monitor was found for SUS, as also reported by Simões et al. (2020) and Othman et al. (2022).

The high fidelity of VR compared to the monitor condition made it rather surprising that our comparisons did not show a reliable difference between the two conditions, except for system usability. But perhaps this is not unexpected given the complex nature of learning (Salzman et al., 1999), with literature suggesting that several factors can explain this finding, such as differences in participant perceptions of their abilities, behavioral differences, or the quality of experience participants had.

The difference between VR and the monitor condition was much clearer in the relationship between self-efficacy and learning. In the monitor condition, there was no evidence of a significant correlation between learning and other measures, but in the VR condition, there was. The positive correlation in the VR condition echoes the results of structural equation modeling analyses on learning in other VR tasks (Makransky and Petersen, 2021). Makransky and Petersen (2019) found that a large range of cognitive and affective factors that might directly influence learning gains are more appropriate to consider as indirect factors, and the measure with the most direct impact on learning gains is self-efficacy. Expanding on this framework, in Study 2 we recorded behavioral data during the immersive VR task (Salzman et al., 1999; Gavish et al., 2015; Read and Saleem, 2017; Shi et al., 2020) and evaluated the relationship between learning gains, self-efficacy, system usability, simulator sickness, and behavioral measures.

3 Study 2. Training in VR

We followed Makransky and Peterson (2019) and used a structural equation model (SEM) to map out the factors that may influence learning in VR. The theoretical framework for this study is grounded in the existing literature presented in the introduction section, which suggests that self-efficacy, user simulation performance, and VR hardware experience can all influence learning outcomes in a VR training environment either directly or indirectly through self-efficacy. The hypothesized model in Figure 2 encompasses all these factors and the postulated relationships between them, in accordance with the theoretical framework.

FIGURE 2

FIGURE 2. Priori SEM model of the learning process in VR. This initial model represents the hypothesized relationships based on theoretical underpinnings and previous research. Each path and node showcases our expected connections before data collection. This model serves as a foundation to compare against the final model (Figure 3) after the iterative fitting procedure.

As in the Makransky and Peterson (2019) analysis, there is hypothesized a connection from Self-Efficacy Gain to Learning Gains. This is in alignment with existing work showing self-efficacy has been found to have a significant influence on learning gains in VR (Wang and Wu, 2008; Richardson et al., 2012; Gegenfurtner et al., 2013; Makransky and Petersen, 2019; Tai et al., 2022). However, Figure 2 indicates the inclusion of connections from the User Simulation Performance and VR Hardware Experience latent factors to both the Self-Efficacy Gain and Learning Gains factors. Previous studies have shown that VR hardware experience, which encompasses the user’s interaction experience with the VR training environment, can explain learning outcomes (Salzman et al., 1999; Makransky and Petersen, 2019; Rupp et al., 2019; Selzer et al., 2019). However, usability and simulator sickness, components of VR hardware experience, have been found to have an ambiguous effect on learning (Jia et al., 2014; Makransky and Petersen, 2019). We expect that VR hardware experience may explain the learning gains and predict self-efficacy gains. Finally, we include a connection from VR Hardware Experience to User Simulation Performance as we expect VR hardware experience can explain user simulation performance (Johnson, 2007; Jia et al., 2014).

3.1 Method

3.1.1 Participants

To ensure sufficient statistical power for the SEM analysis, an additional 27 participants were recruited from the same participant pool. The original goal was to double the number of participants from the VR condition in Study 1 by recruiting 30 more. Three participants however had to be excluded due to their inability to complete the task. As a result, a total of 57 participants (30 from Study 1 and 27 new ones) were included in the SEM analysis (29 females, age M = 21.98, SD = 4.20).

Similar to Study 1, participants had no prior familiarity with our specific VR solution. The inclusion criteria was age 16 or older, no uncorrected hearing or visual impairments, and proficiency in English. The main exclusion criterion was the inability to complete the VR training task.

3.1.2 Materials

In Study 2, we focused exclusively on VR without any comparison to a monitor condition or anything else. This choice was informed by the results of Study 1, where the VR condition demonstrated a significant correlation between learning gains and other measures. Study 2 used the same materials and methods as Study 1, with the addition of the following behavioral measures.

3.1.2.1 Behavioral measures

3.1.2.1.1 Time on task

The duration of the task ranged from 0 to 30 min.

3.1.2.1.2 Question count

The number of questions participants asked the experimenter during the training. As before, all questions were answered with one of the 25 pre-determined FAQ responses.

3.1.2.1.3 Error count

The number of serious errors experienced during the training. This ranged from 0 to 9 errors which can occur if the user has not followed the instructions correctly.

3.1.2.1.4 Fixation on checklist

The percentage of time spent looking at the work-permit relative to the total time the user activated the work-permit (with a button press).

Descriptive statistics for these behavioral measures are provided in Table 3.

TABLE 3

TABLE 3. Descriptive stats of behavioral measures.

3.1.3 SEM statistical analyses

The list of all measures included in the SEM analysis and their correlation with each other is included in Table 4. The items were treated as scalar variables, and the proposed models are verified in terms of the suitability of the models using three indicators: Comparative Fit Index (CFI, Hatcher and O'Rourke, 2013), Discrepancy Divided by Degrees of Freedom (CMIN/DF, Hair et al., 2010), and Root Mean Square Error of Approximation (RMSEA, Hair et al., 2010). In this study, we performed SEM using IBM SPSS Amos version 28.0, and we followed the SEM method from Makransky and Petersen (2019) for pruning the non-significant paths according to the greatest misfit. After fitting, if a non-significant connection was present, the connection with the lowest significance was deleted and the model connections estimated again. This procedure was iteratively followed until all of the remaining paths were significant.

TABLE 4

TABLE 4. Correlations of all items in the hypothesized SEM.

3.2 Results

We conducted a confirmatory factor analysis (CFA) on defined constructs to test the fitness of the hypothesized relationships shown in Figure 2. Our initial hypothesized model almost reached an acceptable fit (RMSEA = .079, CFI = .94, CMIN/DF = 1.35) but the resulting fit indicated two insignificant paths were present. After the iterative procedure removed these two connections, a simplified and more robust model was obtained (Figure 3) with an acceptable fit (RMSEA = .078; CFI = .93; CMIN/DF = 1.34). All standardized path coefficients shown in Figure 3 are significant at an alpha level of .05. Table 4 indicates the descriptive statistics of all factor loadings.

FIGURE 3

FIGURE 3. Final model. The dashed line represents a connection that was not significant and was eventually pruned in the iterative fitting procedure (Makransky and Petersen, 2019). The continuous line represents significant paths that remain from the initial hypothesized model (Figure 2).

3.3 Discussion

In Study 2, the resulting model exhibited consistent constructs, with “user simulation performance” representing in-game behavioral performance measures and “VR hardware experience” encompassing self-reported usability and sickness. These latent factors had significant loadings on all observed measures hypothesized to be connected to them. A strong, direct connection was found between self-efficacy and learning gains. This is consistent with other studies evaluating VR training in the domains of education, academic assessments, and occupational skill development (Richardson et al., 2012; Makransky and Petersen, 2019; Tai et al., 2022).

The hypothesized direct connection from VR hardware experience to learning gains was not significant in the final SEM (beta = .137, p = .33). This finding is somewhat surprising, considering the results reporting the direct effect of usability or simulator sickness on learning in Jia et al. (2014) and Rupp et al. (2019), however Salzman et al. (1999), noted mixed results in the literature regarding usability and simulator sickness impacts on learning. Instead of a direct connection to the learning gains, our model suggests a positive indirect effect of the VR hardware experience on learning by affecting self-efficacy, in line with the findings in Makransky and Petersen (2019), where usability connects to the learning passing through cognitive variables and self-efficacy, and Pedram et al. (2020), where they showed both usability and self-efficacy can explain learning indirectly through different paths. The direct connection from usability to self-efficacy is consistent with Song et al. (2021).

Finally, our model sheds light on the effect of user simulation performance on learning gains. Unlike Makransky and Petersen (2019), who did not include behavioral measures, we incorporated them and identified a factor beyond self-efficacy that directly impacts learning gains. Behavioral measures recorded during training, including time on task, error count, question count, and fixation duration, constitute a latent variable that directly explains a significant proportion of the variance in learning gains. A portion of the variance in user simulation performance does seem to be directly explained by VR hardware experience, which is in line with Jia et al. (2014), who reported the correlation between usability and task performance, and Johnson (2007), who reported a correlation between SSQ score and agreement with the statement that “discomfort hampers training.”

4 General discussion

Our finding in Study 1 regarding the parity of the effect of a 2D screen and interactive VR on learning outcomes is in line with several studies (Buttussi and Chittaro, 2017; Greenwald et al., 2018; Joshi et al., 2021) but in contrast with others (Krokos et al., 2019; Kyrlitsias et al., 2020). To explain this parity and the existing discrepancy in the literature, factors such as training context and task-technology fit might be helpful. VR has been shown to perform better in assessments involving spatial memorization (Sowndararajan et al., 2008; Ragan et al., 2010), or spatial ability (Yang et al., 2018), as well as in studies focused on skill-based rather than knowledge-based training (Kozhevnikov et al., 2013). Thus, the immersion and interactivity which make VR effective for spatial and skill-based tasks, may make it less efficient for purely knowledge-based learning, where traditional methods can be more direct and less distracting.

The generalization of multimedia learning principles (Mayer, 2009; 2014; Mayer and Fiorella, 2014) to VR may explain cases where extraneous materials and features in VR environments can cause cognitive overload, depleting learners’ limited cognitive capacity (Parong and Mayer, 2018). A goal-oriented design with an appropriate task-technology fit can mitigate these distractions (Zhang et al., 2017). Considering the training context and task-technology fit, in our study we used a safety training application primarily designed for interactive VR. Thus, the observed parity between 2D screens and interactive VR is more likely due to the focus on knowledge-based rather than skill-based training contexts and assessments.

The findings of Study 1 suggest VR should not to be seen as a one-size-fits-all approach. This conclusion is in line with Johnson-Glenberg et al.’s (2021) argument that “platform is not destiny”, which suggests that only using new platforms like VR will not guarantee the effectiveness of the training. Instead, as the current study has shown, research into VR for training purposes should consider a variety of factors working together, such as context, self-efficacy, user simulation performance, and VR hardware experience.

The results of our SEM analysis in Study 2 indicate that self-efficacy is a central predictor of learning gains. The resulting beta value of .3 from the SEM in Study 2, along with the correlation of .47 in Study 1, are consistent with previous findings. For instance, Makransky and Petersen (2019) found a beta value of .579 for the same relationship in their SEM analysis. Likewise, Gegenfurtner et al. (2013) in their meta-analysis reported an uncorrected mean correlation of .34 between self-efficacy and transfer of learning. Richardson et al. (2012) also observed a medium correlation of .31 between GPA and academic self-efficacy, with a 95% confidence interval of [0.28, 0.34].

Though previous SEM analyses of learning gains on VR training found self-efficacy to be the only factor directly predicting learning gains (Makransky and Petersen, 2019), our analysis suggests a direct connection from the latent factor User Simulation Performance that captures the four behavioral markers of performance in the simulation. This suggests that behavior provides information about the degree of learning above and beyond what people are aware that they are capable of (self-efficacy). In addition, it demonstrates that people who display “correct” behavior in the VR experience (completing the task faster, with fewer errors, while asking fewer questions, but looking at the information sheet more) improve more based on the training. This result suggests future avenues of adaptive training that focus more on scaffolding the environment to facilitate correct behavior and thus more learning (Vygotsky, 1978) and less on finding the level of desirable difficulty to promote errors (Bjork, 1994).

The quality of the VR hardware experience, as indicated by factors such as simulator sickness and perceived usability, did now show a direct connection with learning gains. Instead, the VR hardware experience directly impacted both self-efficacy and user simulation performance, and thus indirectly had an impact on learning. Unsurprisingly, people who had a more negative experience had both lower self-efficacy and worse performance, leading to less learning. This highlights the importance of usability when designing virtual training environments (Pedram et al., 2020) or task-technology fit (Zhang et al., 2017).

These results focus on short-term knowledge gains from a VR simulation designed as part of an annual refresher training program for employees. However, insights in the duration of knowledge retention, its generalizability and transfer to the other situations, and the effect of participant expertise (Chi et al., 2014) require further research.

The current study used an industry VR application with university students as participants. One may argue that this is a limitation of the study, as ideally employees involved in the annual training would serve as better participants. There are two reasons for our decision, one theoretical and one practical. In order to compare the findings of our studies with those in other published studies, it would be desirable to not vary both VR application and participant group. With most of the published studies using university participants, we opted to keep this factor constant. These findings would then pave the way for more in-depth studies. This brings us to the practical reason: For obvious reasons it is harder to have employees participate in an experiment, and only ask them to participate once a foundation is put in place of the findings of a prior study.

Our future research will involve actual employees. In fact, currently we are conducting a follow-up study that focuses on employees, allowing us to compare the findings with those from university students. Other lines of further research will focus on gaining more insight in the dependent variables, by including physiological measures such as EEG and eye-tracking in addition to questionnaires, in order to compare the results from offline measures with online measures.

In conclusion, our study shows that an industrial safety training simulation produces significant gains in knowledge and self-efficacy, in both VR and monitor viewing conditions. In addition, further analysis of the VR data replicated the finding that self-efficacy is the best predictor of learning, something that had not yet been shown in a real-world application designed and used by the industry. Apart from generalizing the importance of self-efficacy for learning to industrial applications, our findings provide evidence that people whose behavior in the simulation is congruent with doing the task well (asking fewer questions, making fewer errors, completing faster) learn more, and that this effect is in addition to what can be explained by self-efficacy. This novel result highlights the potential for behavioral markers to indicate learning in VR settings. Our study adds to the growing body of literature on the use of VR for industrial training and has practical implications for the design of VR training programs.

Data availability statement

The data that support the findings of this study are openly available in DataversNL at https://doi.org/10.34894/T1VAKP.

Ethics statement

The studies involving humans were approved by the ethics committee at the Tilburg University (REDC # 20201035). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

SM, conceptualized and designed the experiments, performed the data collection and analysis, and wrote the manuscript. AH and MML contributed to the experimental design, suggested analysis plans, supervised the analysis and writing process. WP, contributed to the experimental design, and supervised the writing process. All authors contributed to the article and approved the submitted version.

Funding

This research is part of the MasterMinds project, funded by the RegionDeal Mid- and West-Brabant, and is co-funded by the Ministry of Economic Affairs and Municipality of Tilburg awarded to MML.

Acknowledgments

We would like to thank Actemium for their help in this research.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Alcañiz, M., Parra, E., and Chicchi Giglioli, I. A. (2018). Virtual reality as an emerging methodology for leadership assessment and training. Front. Psychol. 9, 1658. doi:10.3389/fpsyg.2018.01658

PubMed Abstract | CrossRef Full Text | Google Scholar

Bailenson, J., Patel, K., Nielsen, A., Bajscy, R., Jung, S.-H., and Kurillo, G. (2008). The effect of interactivity on learning physical actions in virtual reality. Media Psychol. 11 (3), 354–376. doi:10.1080/15213260802285214