Your new experience awaits. Try the new design now and help us make it even better

SYSTEMATIC REVIEW article

Front. Psychol., 06 January 2026

Sec. Educational Psychology

Volume 16 - 2025 | https://doi.org/10.3389/fpsyg.2025.1714481

The effectiveness of virtual reality for K-12 foreign language learning: a systematic review of recent randomized controlled trials


Lu Sun,Lu Sun1,2Xiacheng Song
Xiacheng Song3*
  • 1Xi'an Fanyi University, Xi'an, China
  • 2Faculty of Social Sciences and Humanities, Universiti Kebangsaan Malaysia, Selangor, Malaysia
  • 3Institute of Visual Informatics, Universiti Kebangsaan Malaysia, Selangor, Malaysia

Background: Despite the increasing adoption of immersive Virtual Reality (VR) in K−12 educational settings, there is a notable absence of systematic, high-quality experimental research evaluating its efficacy in facilitating foreign language acquisition.

Methods: Following a systematic search of five databases that yielded 1,054 records, six randomized controlled trials (RCTs) met the inclusion criteria. Because of considerable heterogeneity, a narrative synthesis was conducted following the Synthesis Without Meta-analysis (SWiM) guideline, with findings structured into a primary contrast (VR vs. non-VR) and a secondary analysis (VR vs. VR designs).

Results: The primary contrast analysis indicated that VR interventions generally had a positive effect compared to non-VR controls, particularly for vocabulary and listening. A notable finding was a consistent positive effect for VR in promoting long-term knowledge retention. The risk of bias evaluation indicated that each of the included studies was classified as presenting “some concerns”.

Conclusion: Across a small and heterogeneous set of recent RCTs, immersive VR shows promising effects, especially for long-term retention. However, the evidence for immediate learning gains is inconclusive. A more critical finding is the profound heterogeneity and methodological concerns within the evidence base, which preclude any single, overarching conclusion about VR's effectiveness.

Introduction

The emergence of virtual reality in language education

In recent years, immersive technologies, particularly Virtual Reality (VR), have transitioned from novel concepts to viable pedagogical tools within mainstream education (Lee and Wu, 2023; Tschanz and Baerlocher, 2022). Virtual reality (VR) affords interactive three-dimensional (3D) environments that foster presence and embodied interaction (Figueroa et al., 2024). These affordances are particularly salient for second/foreign-language (L2) learning, where aligning classroom practice with authentic use remains a central challenge (Jauregi-Ondarra et al., 2021; Kolesnichenko, 2023). Traditional approaches often provide limited opportunities for contextualized, interactive use of the target language (Kolesnichenko, 2023). Virtual reality (VR) addresses this constraint by situating learners in realistic, task-relevant contexts that demand meaningful communication, consistent with theories of situated and experiential learning (Lee and Wu, 2023). By providing culturally natural settings and interactive manipulation of virtual entities or objects, VR also has the capacity to strengthen the motivation of the learner, reduce anxiety of communication, and yield greater cognitive elaboration of linguistic input (Baidya et al., n.d.; Jauregi-Ondarra et al., 2021).

Why focus on K-12 L2 learning?

K-12 learners differ from adults in cognitive, socio-emotional, and motivational profiles, which may shape how immersive technologies influence attention, memory consolidation, and willingness to communicate (Jauregi-Ondarra et al., 2021). Schools also impose practical constraints limited session length, classroom management, and teacher mediation that can modulate VR's effectiveness relative to higher-education or informal learning settings (Cheng, 2012; Voordijk and Vahdatikhaki, 2020). A K-12–specific synthesis is therefore essential to inform curricular design, teacher training, and procurement decisions (Hite et al., 2019; Wilkinson et al., 2021).

Moreover, K−12 represents a critical developmental stage in which learners form foundational linguistic, cognitive, and socio-emotional skills (Arts et al., 2024; Pataquiva and Klimova, 2022; Weng et al., 2024). Early exposure to foreign-language learning has been shown to improve long-term proficiency, motivation, and intercultural competence. Therefore, understanding how immersive technologies support L2 learning at this stage is essential for educators and policymakers (Liu et al., 2023). This also underscores the timeliness and significance of conducting a systematic review specifically focused on K−12 L2 learning with immersive VR (Lee et al., 2023).

What counts as “immersive VR” in this review

To avoid conflating distinct media, we conceptualize immersive VR primarily as head-mounted display (HMD)–based environments that enable first-person perspective, head-tracked presence, and goal-directed interaction with virtual elements (Buetler et al., 2022; Dahl et al., 2021; Tan et al., 2022). While display configurations can vary, studies were considered within scope when learners experienced embodied interaction and spatial presence consistent with this definition. Non-interactive panoramic/360° video without head tracking is not treated as immersive VR in this review (cf. Lee and Wu, 2023). While our definition centers on HMD-based environments, we also included studies using hybrid systems where HMDs were essential for interaction and presence, such as the 360-degree projection system combined with HMDs in the study by Chang et al. (2024). The potential influence of this technological variation is addressed in the limitations section.

State of the evidence and the gap this review addresses

Empirical work on VR for L2 learning has proliferated, but the evidence remains fragmented: populations span grades and contexts; interventions differ in tasks, scaffolding, and exposure; outcomes vary across skills and timing (immediate vs. delayed); and reporting standards are uneven (Tai et al., 2022; Wu et al., 2021). Prior reviews often mixed higher-education with K-12 samples or combined immersive VR with non-immersive 3D/AR/360° media, limiting causal interpretability for school-age learners (Natale et al., 2020; Pellas et al., 2021). Moreover, delayed post-test outcomes critical for retention are inconsistently reported, and many studies have small samples or non-randomized designs that invite bias (Chen et al., 2023; Qiu et al., 2024). To provide policy- and practice-relevant guidance, there is a need for a focused, methodologically rigorous synthesis that (a) concentrates on K-12 learners, (b) uses a precise operationalization of immersive HMD-VR, (c) privileges randomized controlled trials (RCTs) to strengthen causal inference, and (d) distinguishes immediate from delayed outcomes across language domains.

Previous reviews of VR in education and language learning have generally taken a broad scope, often combining school-age learners with university students and mixing immersive HMD-based VR with less immersive 3D environments or 360° video, as well as non-randomized designs (Fransson et al., 2020; Wu et al., 2020). As a result, it is difficult to isolate what the highest-quality evidence suggests specifically for K−12 foreign language learning. By deliberately narrowing the focus to clearly defined immersive HMD-VR, randomized controlled trials, and school-aged learners, this review aims to offer a conservative but decision-relevant picture of what can currently be concluded from this emerging evidence base.

The present study and contributions

This review systematically synthesizes recent studies evaluating immersive HMD-VR for K-12 foreign language learning, with an emphasis on randomized controlled trials (e.g., Wu et al., 2020); see also (Shen et al., 2023). It makes three contributions. First, it offers a K-12–specific account that isolates school-age evidence from adult and higher-education studies. Second, it applies a tight operationalization of immersive VR to avoid media conflation and to clarify what educators can expect from HMD-based interventions. Third, it disaggregates language outcomes by timing and domain, highlighting whether VR advantages are concentrated in immediate performance or long-term retention, and whether effects cluster in vocabulary, listening, or writing (Jwai'ed et al., 2024; Lai and Chen, 2023; Sahinler, 2023).

Accordingly, we examine the following research questions (RQs):

RQ1: Among recent RCTs focusing on K-12 learners, what is the effect of immersive VR, relative to non-VR control conditions, on L2 learning outcomes?

RQ2: How does this effect vary across distinct language domains (e.g., vocabulary, listening)?

RQ3: Does immersive VR improve delayed post-test performance, indicating stronger retention?

RQ4: How do different VR design features (e.g., level of immersion, pedagogical approach) compare in their effects on K-12 L2 learning outcomes?

Scope note. To maximize causal interpretability and align with school decision-making needs, this review focuses exclusively on RCTs (Alfadil, 2020; Chen et al., 2023; Qiu et al., 2024).

Methods

This systematic review protocol was preregistered with the Open Science Framework (OSF) and is available at https://osf.io/wdx4f.

A systematic search was conducted across five electronic databases: Web of Science, Scopus, IEEE Xplore, ACM Digital Library, and ERIC. The search was conducted on September 18, 2025, and aimed to identify all relevant studies. In addition, we screened the reference lists of recent systematic reviews on VR and language education and searched for related terms such as “spherical video–based VR” and “social VR” to minimize the risk of missing eligible RCTs that used alternative terminology. No language or publication date restrictions were initially applied. The search query combined keywords related to three core concepts: Virtual Reality (e.g., “virtual reality,” “VR,” “immersive”), the K-12 population (e.g., “K-12,” “elementary,” “high school,” “children,” “adolescent”), and foreign language learning (e.g., “language learning,” “second language,” “foreign language,” “L2”). Search terms within each concept were combined using the ‘OR' operator, and the three concepts were then combined using the ‘AND' operator.

To improve transparency and reproducibility, we adapted this generic search template to the syntax of each database. In Web of Science, we searched topic fields (TS) using the three concept blocks combined with Boolean operators, whereas in Scopus, ERIC, IEEE Xplore, and ACM Digital Library we searched titles, abstracts, and keywords using equivalent terms. All search strings were constructed in English, but records with non-English full texts were retained during screening when an English title or abstract was available.

Inclusion and exclusion criteria

Studies were included in this review if they met the following criteria (Moher et al., n.d.):

• Population: Focused on K-12 students (approximately 5–18 years old).

• Intervention: Employed immersive Virtual Reality (VR) tools such as head-mounted displays as the central instructional approach for second or foreign language acquisition.

• Outcomes: Reported quantifiable language learning outcomes (e.g., vocabulary acquisition, listening comprehension, writing skills).

• Study design: To maximize causal interpretability and align with the review's objective of evaluating effectiveness, only randomized controlled trials (RCTs) were included. Quasi-experimental, non-randomized, and single-group pre-test/post-test designs were systematically excluded due to their higher inherent risk of bias.

Exclusion criteria included non-empirical studies (e.g., opinion papers, descriptive reports), studies targeting university students or adults, interventions using non-immersive technology (e.g., 360° videos without interactivity, mobile apps), and studies where K-12 student data could not be separated from other populations.

Study selection

The study selection process was conducted by two independent reviewers (Author X and Author Y). Titles and abstracts were screened first, followed by a full-text assessment against the inclusion criteria. Any disagreements were resolved through discussion or, if necessary, consultation with a third reviewer (Author Z). An overview of the study screening procedure is presented (Page et al., 2021) (Figure 1). The initial database query yielded 1,054 entries. Following the exclusion of 930 records comprising 176 duplicates and 754 items removed due to irrelevance or non-academic content a total of 124 records were assessed for eligibility based on their titles and abstracts. This screening of 124 records led to the removal of 108 articles, leaving 16 reports that were sought for full-text retrieval and assessed for eligibility against the inclusion criteria. Ten reports were excluded for the following reasons: the population was not K-12 (n = 2), the intervention did not meet eligibility criteria (n = 2), or the study design was insufficient (e.g., non-RCT, quasi-experimental, or lacked a control group) (n = 6). In total, six studies fulfilled all inclusion criteria and were incorporated into the final synthesis.

Figure 1
Flowchart depicting the identification and screening process of studies. Initial records identified from databases: one thousand fifty-four. After removal of nine hundred thirty records, mainly for unrelated topics, one hundred twenty-four records were screened. Sixteen reports were sought for retrieval, and six studies were included in the review. Exclusions included issues like unsuitable study design and population not K-12.

Figure 1. PRISMA flow diagram.

Data extraction

A structured data extraction form was used to collect relevant information from the 6 included studies (Page et al., 2021). The extracted data included: (1) study details (author, year), (2) study design, (3) participant demographics (sample size, age, grade level), (4) intervention details (VR technology, duration, activities), (5) primary outcome measures, and (6) key findings, including statistical results and effect sizes. One reviewer extracted the data, and a second reviewer cross-checked the extracted information for accuracy and completeness.

Risk of bias assessment

Methodological quality was independently appraised by two reviewers (Author A and Author B) employing validated tools tailored to the study design. The six randomized controlled trials underwent evaluation using the Cochrane Risk of Bias 2 (RoB 2) framework (Sterne et al., 2019). Any discrepancies between reviewers were addressed through deliberation until agreement was achieved. The detailed risk-of-bias outcomes are presented in the Results section.

Data synthesis

Due to substantial heterogeneity in intervention designs, participant characteristics, and outcome measures across the included studies, a meta-analysis was deemed inappropriate. Instead, a narrative synthesis was conducted following the Synthesis Without Meta-analysis (SWiM) guideline (Campbell et al., 2020). We grouped studies according to the type of comparison (primary contrast: VR vs. non-VR; secondary analysis: VR vs. VR) and learning domain. For each study, we narratively summarized the findings by reporting the direction of effect and the reported effect size (e.g., Cohen's d, partial eta squared) to describe the magnitude of the intervention's impact. We did not use vote-counting based on statistical significance to synthesize results.

Although several studies reported effect sizes, the small number of trials per outcome domain, inconsistent reporting of variance estimates, and differences in post-test timing and assessment instruments meant that any pooled quantitative estimate would have been statistically fragile and potentially misleading. Adopting the SWiM framework therefore allowed us to pre-specify grouping rules, summarize the direction and magnitude of effects in a transparent manner, and explicitly acknowledge heterogeneity without overstating the precision of the evidence.

Results

Characteristics of included studies

The key features of the six included studies are outlined in Table 1. All were published within the recent time frame of 2021–2025, reflecting a current emphasis in the field. Geographically, the studies were mainly concentrated in East Asia, with five originating from China (including four conducted in Taiwan region of China), one from South Korea, and another from the United States. Each of the selected studies adopted a rigorous experimental design, specifically a randomized controlled trial (RCT) methodology.

Table 1
www.frontiersin.org

Table 1. Characteristics of included studies.

Participants in these studies covered a wide K-12 age range, from lower elementary (Grades 2–3) to junior high (Grade 7) and high school (Grades 9–12). Sample sizes varied considerably, from 30 to 300 students.

The interventions featured a diverse array of VR technologies. Hardware included both mobile VR systems like the Oculus Go and Samsung Gear VR, as well as more powerful standalone or PC-tethered head-mounted displays (HMDs) such as the Meta Quest 2 and HTC Vive Pro2. The learning content was delivered through various software, including commercial language learning platforms like “Mondly” and “Immerse”, and specific VR games or experiences like “Angels and Demigods”.

Notably, while one included study (Chang et al., 2024) utilized a 360-degree screen projection system, its core mechanism involved participants wearing VIVE Pro HMDs for motion tracking and using handheld controllers for direct interaction, thus aligning with our operational definition of immersive VR.

The primary focus of the interventions was on vocabulary acquisition and retention, which was the main outcome in four of the six studies. Other targeted language skills included listening comprehension and writing performance. One study also uniquely investigated the development of empathy alongside writing skills through a custom-designed empathetic VR approach.

Effects of virtual reality on language learning outcomes

To address our primary research questions regarding the effectiveness of VR against non-VR instructional methods, we first report on the primary contrast, which includes four trials that compared an immersive VR group to a non-VR control group (e.g., video-watching, PC-based games, traditional instruction). Following this, we present a secondary analysis of two trials that compared different types of VR interventions (e.g., high-immersion vs. low-immersion VR; empathetic vs. standard VR) to explore within-modality design effects.

Across the included studies, the direction of effect in the comparisons consistently favored the VR interventions over control conditions, although not all findings reached statistical significance. The findings for each targeted skill, summarized in Table 2, are detailed below.

Table 2
www.frontiersin.org

Table 2. Summary table: language skill outcomes.

In the domain of listening comprehension, the single primary contrast study (Tai and Chen, 2021) found that the VR group demonstrated a moderate advantage over the control group in immediate comprehension (Cohen's d = 0.50) and a large advantage in the retention of idea units (d = 1.15). These effect sizes suggest a practically meaningful benefit for the immersive VR intervention, particularly for long-term recall.

For writing performance, Guan et al. (2024) found that an empathetic VR approach led to significantly better overall writing scores compared to a standard VR approach (η2 = 0.170, indicating a large effect). The improvements were most pronounced in the qualitative dimensions of writing, such as “ideas and content,” “word choice,” and “voice,” suggesting that VR can enhance deeper aspects of writing proficiency.

Vocabulary acquisition and retention were the most frequently assessed outcomes; Of the four studies that assessed vocabulary, the results were mixed regarding immediate learning gains. Two studies reported statistically significant advantages for immersive VR over control conditions, while the other two found no significant difference. For instance, Chang et al. (2024) observed higher vocabulary scores in the VR group than in the traditional-instruction control (p < 0.01). Tai et al. (2022) likewise found the VR group outperformed a video-viewing control on both immediate (η2 = 0.22, indicating a large effect) and delayed (η2 = 0.17, indicating a large effect) vocabulary tests. Lai and Chen (2023) further underscored VR's benefits for longer-term retention: scores were comparable on the immediate post-test, but the VR group exceeded the non-VR comparison (PC) group on a delayed translation test assessing productive vocabulary knowledge (p = 0.004). Study four (Kaplan-Rakowski and Thrasher, 2024) purported a less definitive outcome since no statistically significant difference materialized between high-immersion VR (HiVR) and low-immersion VR (LiVR) during a measure of immediate tests. For long-term retention of receptive vocabulary, the analysis revealed a small positive effect in favor of the HiVR group (η2 = 0.08), however, this result was not statistically significant (p = 0.06), indicating that a true difference cannot be concluded with confidence from this data alone.

Across the targeted skills, the included trials consistently demonstrated a positive direction of effect for VR interventions over control conditions.

VR technology features and learning effectiveness

Beyond the general finding that VR is effective, this review identified several key technological features and pedagogical approaches that appear to underpin its success in K-12 language education. These features primarily relate to the levels of immersion, interactivity, and the contextual integration of content.

A central feature of the interventions was the use of immersion. Studies employing high-immersion VR technologies consistently reported positive learning outcomes. The sense of presence and embodied interaction within these immersive environments were frequently cited as mechanisms driving these improvements (Ratcliffe and Tokarchuk, 2020). However, the relationship is not simply that higher immersion is always better. The study by Kaplan-Rakowski and Thrasher (2024) provided a more nuanced perspective, finding no significant difference between high-immersion (HiVR) and low-immersion VR (LiVR) in immediate vocabulary gains, but suggesting a potential advantage for HiVR in long-term retention. This shows that both levels of immersion can work, but high immersion may have special benefits for memory consolidation and deeper cognitive processing.

Interactivity was a consistent correlate of stronger outcomes. Interventions that incorporated branched dialogue with virtual agents, manipulable objects, and real-time feedback tended to yield higher engagement and better learning performance (Parong and Mayer, 2018). This aligns with constructivist learning theories, which posit that learners build knowledge most effectively through active engagement and discovery a process that highly interactive VR environments are well-suited to facilitate. Several studies also combined gamification and collaborative tasks, further enhancing both engagement and learning outcomes.

Finally, effects were strongest when VR content was authentic and task-relevant. By situating learners in ecologically valid scenarios (e.g., navigating a police station or a shopping mall), VR rendered target-language use purposeful. Such designs typically leveraged multimodal input—visual, auditory, and kinesthetic supporting embodied processing and deeper comprehension (Makransky and Petersen, 2021). Some interventions taught more than just language skills; they also included cultural content and even activities to help students understand how other people feel. This made the learning process more complete and useful.

Risk of bias

The methodological quality of the six included trials was assessed using the Cochrane Risk of Bias 2 (RoB 2) tool (Sterne et al., 2019). Overall, all six trials were rated as having “Some concerns” for bias. A summary of the risk of bias assessments across all studies is presented in Figure 2, with a detailed study-by-study breakdown provided in Figure 3.

Figure 2
Risk of bias table with seven studies and five domains: D1 to D5. Each domain is marked with symbols: yellow circle for “Some concerns” and green plus for “Low” risk. Overall judgment is also provided.

Figure 2. Risk of bias summary: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Figure 3
Bar chart illustrating risk of bias across five categories. “Bias arising from the randomization process” and “Bias in measurement of the outcome” show low risk (green). Other categories, such as “Bias due to deviations from intended interventions” and “Overall risk of bias,” indicate some concerns (yellow). Each bar's color represents the level of risk.

Figure 3. Risk of bias traffic light plot: review authors' judgements about each risk of bias item for each included study.

As shown in Figure 2, consistent concerns were identified across two primary domains: D1 (bias arising from the randomization process) and D5 (bias in selection of the reported result), where 100% of studies were rated as having “some concerns.” In contrast, the risk of bias was generally low for D3 (bias due to missing outcome data), where all studies were rated as low risk.

The “Some concerns” ratings in Domain one were primarily due to insufficient reporting on allocation concealment or the method of random sequence generation. Similarly, concerns in Domain five arose because most studies did not have a prospectively registered protocol, making it impossible to rule out the possibility of selective outcome reporting.

Secondary analysis: comparing different VR designs

Two studies moved beyond the VR vs. non-VR paradigm to investigate the effects of different VR design features. Kaplan-Rakowski and Thrasher (2024) compared a high-immersion VR (HiVR) condition with a low-immersion VR (LiVR) condition for vocabulary learning. They found no significant difference in immediate tests, but reported a trend favoring the HiVR group for long-term retention of receptive vocabulary (p = 0.06). Similarly, (Guan et al., 2024) compared an empathetic VR approach where students were placed in narrative, perspective-taking scenarios designed to evoke emotional engagement to a standard VR approach that delivered the same writing task in a more neutral, task-focused environment. Their findings indicated that the empathetic VR design led to significantly higher overall writing scores, particularly in qualitative dimensions like “ideas and content” and “word choice”.

Discussion

Principal findings and interpretations

The principal finding from our primary contrast analysis, which synthesized four RCTs comparing immersive VR against non-VR conditions, is that VR demonstrates a promising, albeit inconsistent, advantage. Given that only six RCTs met the inclusion criteria and that they differ markedly in age groups, VR hardware, target skills, and outcome measures, these findings should be regarded as preliminary signals rather than firm, generalizable conclusions. The most consistent benefit appeared in long-term knowledge retention, particularly for vocabulary, where VR groups consistently outperformed controls in delayed post-tests (Lai and Chen, 2023; Tai et al., 2022). In contrast, the evidence for immediate learning gains was inconclusive, with half of the primary contrast studies showing a significant benefit and half showing no difference. For listening comprehension, the single primary contrast study found a moderate-to-large positive effect for VR (Tai and Chen, 2021).

Furthermore, the secondary analysis of two studies comparing different VR designs provides critical insights. The findings suggest that specific design features, such as the level of immersion or the integration of empathetic narratives, are key variables that can significantly influence learning outcomes in domains like vocabulary retention (Kaplan-Rakowski and Thrasher, 2024) and writing performance (Guan et al., 2024). This underscores that “VR” is not a monolithic treatment; its effectiveness is highly dependent on its specific design and pedagogical implementation.

The technological features of the interventions appear to be central to their success. Immersion and interactivity were consistently highlighted as critical components that foster student engagement and positive learning outcomes (Huang et al., 2021; Kaplan-Rakowski and Thrasher, 2025). However, the review also suggests that the relationship between the level of immersion and learning effectiveness is not linear. The finding that high-immersion VR did not significantly outperform low-immersion VR on immediate tests, but showed a potential advantage for long-term retention, is particularly salient (Cadet and Chainay, 2020; Kaplan-Rakowski and Thrasher, 2025). This indicates that while various levels of immersion can be effective, high immersion might offer unique benefits for long-term memory, a crucial area for future investigation (Kaplan-Rakowski and Thrasher, 2025; Xie et al., 2025).

However, it is crucial to interpret these positive trends with significant caution due to the extreme heterogeneity across the included studies. The wide range of participant ages, from lower elementary to high school, and the disparity in VR technology, from mobile VR to high-end PC-tethered systems, prevent a monolithic conclusion about VR's effectiveness. For instance, the significant effects observed in studies using high-fidelity VR with older students (Guan et al., 2024; Kaplan-Rakowski and Thrasher, 2024) may not be generalizable to contexts using simpler technology with younger children (Chang et al., 2024). Therefore, a key finding of this review is not simply that VR has potential, but that the current K-12 RCT evidence base is too fragmented to draw firm conclusions, highlighting the urgent need for future research to investigate these moderating variables.

This heterogeneity in participant age and technology likely acts as a significant moderating variable (Hite et al., 2019). For instance, the pedagogical design for younger elementary students may require more structured guidance and gamified elements to maintain engagement, whereas high school students might benefit more from complex, open-ended exploratory environments. The cognitive load imposed by different VR systems could also explain varied outcomes (Makransky and Petersen, 2021; Parong and Mayer, 2018).

High-end, tethered VR systems offer greater immersion and interactivity, which may enhance learning and retention, but could also overwhelm younger learners (Huang et al., 2021). Conversely, simpler mobile VR systems are more accessible but may not provide the same level of presence needed to foster deep learning (Wilkinson et al., 2021). Future research should therefore not only compare VR to non-VR conditions but also conduct head-to-head comparisons of different VR designs and technologies to isolate these influential factors.

Strengths and limitations of the review

This review possesses several methodological advantages, such as an extensive literature search conducted across five major academic databases, alignment with PRISMA reporting standards, and the application of established tools (RoB 2) for assessing potential bias. However, a number of limitations should also be considered. First, the evidence base consists of only six RCTs that are highly heterogeneous in terms of participant age, intervention content, VR hardware, and outcome measures. This small and varied corpus precluded a meaningful meta-analysis and prevented formal investigation of moderators or publication bias, which further constrains the generalizability of our conclusions. Specifically, the inclusion of one study using a 360-degree projection system alongside HMDs, while justified by its interactive nature, introduces technological heterogeneity that may limit the generalizability of the pooled findings. A sensitivity analysis was considered to assess the stability of our findings. Given the narrative nature of this review, this analysis remained qualitative. If the study by Chang et al. (2024) were to be excluded, the overall conclusion of this review would not substantially change. The evidence for vocabulary and listening would remain mixed, and the most consistent finding would still be the potential benefit of VR for long-term retention, as supported by other included studies. However, the removal would increase the technological homogeneity of the evidence base, thereby strengthening the internal validity of our synthesis regarding HMD-based VR. Second, all six studies were rated as having “Some concerns” with respect to bias, particularly in domains associated with randomization procedures and selective reporting of results. As a result, the findings should be interpreted with appropriate caution. Third, the geographic concentration of the studies in East Asia may constrain the extent to which these findings are applicable to other educational and cultural settings. More specifically, the school systems represented in these East Asian contexts often differ from those in other regions in terms of curriculum structure, examination pressure, and access to immersive technologies. As such, the positive effects observed in Chinese mainland, Taiwan region of China, and South Korean classrooms should be interpreted as most securely applicable to East-Asian-style K−12 systems, rather than being generalized to all global contexts. In addition, none of the included RCTs evaluated speaking or oral production as a primary outcome, even though immersive VR is theoretically well suited to support embodied, interactive communication. This omission means that our synthesis can say little about perhaps the most promising skill domain L2 speaking and underscores the need for future trials that incorporate rigorous, VR-specific speaking measures. Implications for practice and avenues for future research are discussed accordingly.

The protocol for this review was preregistered with the Open Science Framework. We report one main deviation from the registered protocol: the data synthesis approach was updated from a planned vote-counting method to a narrative synthesis following the SWiM guideline. This change was made post-hoc to adopt a more rigorous and informative synthesis methodology. Consequently, the structuring of the results into a primary and secondary analysis also represents a deviation aimed at improving clarity.

Implications for practice and future research

The results of this review have significant ramifications for both practical application and subsequent research endeavors. For practitioners, this review provides crucial, RCT-based evidence to guide the integration of VR into K-12 language curricula, emphasizing the development of interactive, contextually rich experiences that enhance long-term knowledge retention (Essoe et al., 2022; Ng et al., 2023; Xie et al., 2025). This review highlights the need for more methodologically rigorous randomized controlled trials (RCTs) with low risk of bias, larger samples, and broader geographic coverage. Future work should include head-to-head comparisons of VR design features (e.g., immersion level, interactivity type) and extend evaluation to understudied skills such as speaking and pragmatic competence. Longer-term follow-up is also essential to test the durability of effects and substantiate VR's benefits for retention. Notably, none of the included RCTs measured oral production or speaking skills as a primary outcome. This represents a significant gap in the current high-quality evidence base and should be a priority for future research in this domain.

Conclusion

Across a small and heterogeneous set of recent RCTs, immersive VR shows promising effects—especially for long-term retention. However, the evidence for immediate learning gains is inconclusive and varies by domain. A more critical finding is the profound heterogeneity and methodological concerns (all included studies rated as having “some concerns” for bias) within the current evidence base, which preclude any single, overarching conclusion about VR's effectiveness. The significant variation in participant age, intervention design, and technology type complicates any single, overarching conclusion and underscores that the effectiveness of VR is likely context-dependent (Dhimolea et al., 2022; Dooly et al., 2023; Frolli et al., 2024; Li, 2023). Across the included trials, VR interventions tended to yield better outcomes than control conditions for vocabulary acquisition, listening comprehension, and writing proficiency (Frolli et al., 2024; Li et al., 2021). The most consistently reported advantage of VR is improved long-term retention; its immersive and interactive properties likely support deeper encoding and more durable learning (Pellas et al., 2021). While the findings are encouraging, the evidence base remains nascent and constrained by methodological limitations, warranting cautious interpretation. Therefore, while VR holds considerable potential to transform language education, further high-quality research is necessary to substantiate these findings and develop clear guidelines for its optimal implementation in diverse educational settings.

Crucially, as all six included trials were assessed as having 'some concerns' regarding risk of bias, these encouraging findings must be viewed as preliminary and require validation through more methodologically robust research.

Data availability statement

Publicly available datasets were analyzed in this study. The pre-registered protocol for this systematic review is available on the Open Science Framework (OSF) at the following repository: https://osf.io/kp78r/?view_only=7d2c8fe674d9480ca6bc45d6fe751fd5. The detailed data extraction sheets, full Risk of Bias 2 (RoB 2) assessment forms, and the PRISMA checklist are not publicly archived but are available from the corresponding author upon reasonable request.

Author contributions

LS: Project administration, Writing – review & editing, Funding acquisition, Writing – original draft, Software, Methodology, Data curation, Validation, Investigation. XS: Software, Methodology, Writing – original draft, Data curation, Conceptualization, Investigation, Validation, Resources, Formal analysis, Visualization, Project administration, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was used in the creation of this manuscript. During the manuscript preparation process, the author(s) employed the Gemini large language model (Google) to assist with linguistic refinement, including grammar, clarity, and readability.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Alfadil, M. (2020). Effectiveness of virtual reality game in foreign language vocabulary acquisition. Comput. Educ. 153, 103893. doi: 10.1016/j.compedu.2020.103893

Crossref Full Text | Google Scholar

Arts, E., De Castro, B. O., Luteijn, E., Elsendoorn, B., and Vissers, C. T. (2024). Interactive virtual reality training to improve socio-emotional functioning in adolescents with developmental language disorders: a feasibility study. Clin. Child Psychol. Psych. 29, 1100–1120. doi: 10.1177/13591045231220694

PubMed Abstract | Crossref Full Text | Google Scholar

Baidya, S., Ghosh, P., Mukherjee, A., Bhattacharjee, K., and Das, A. (n.d.). A Comprehensive Study to Build Immersive Virtual Reality-Powered Language Learning—Consensus. Available online at: https://consensus.app/papers/a-comprehensive-study-to-build-immersive-virtual-bhattacharjee-ghosh/02653dd1f4ac507c95bd346dbf260782/ (Accessed September 22, 2025).

Google Scholar

Buetler, K. A., Penalver-Andres, J., Özen, Ö., Ferriroli, L., Müri, R. M., Cazzoli, D., et al. (2022). “Tricking the brain” using immersive virtual reality: modifying the self-perception over embodied avatar influences motor cortical excitability and action initiation. Front. Hum. Neurosci. 15. doi: 10.3389/fnhum.2021.787487

Crossref Full Text | Google Scholar

Cadet, L. B., and Chainay, H. (2020). Memory of virtual experiences: role of immersion, emotion and sense of presence. Int. J. Hum-Comp. Stud. 144:102506. doi: 10.1016/j.ijhcs.2020.102506

Crossref Full Text | Google Scholar

Campbell, M., McKenzie, J. E., Sowden, A., Katikireddi, S. V., Brennan, S. E., Ellis, S., et al. (2020). Synthesis without meta-analysis (SWiM) in systematic reviews: Reporting guideline. BMJ. 368:l6890. doi: 10.1136/bmj.l6890

PubMed Abstract | Crossref Full Text | Google Scholar

Chang, H., Park, J., and Suh, J. (2024). Virtual reality as a pedagogical tool: An experimental study of English learner in lower elementary grades. Educ. Infor. Technol. 29, 4809–4842. doi: 10.1007/s10639-023-11988-y

Crossref Full Text | Google Scholar

Chen, J., Fu, Z., Liu, H., and Wang, J. (2023). Effectiveness of virtual reality on learning engagement: A meta-analysis. Int. J. Web-Based Learn. Teach. Technol. 19, 1–14. doi: 10.4018/IJWLTT.334849

Crossref Full Text | Google Scholar

Cheng, X. (2012). A survey of attitudes toward mediation among Chinese high school EFL teachers and their classroom constraints. J. Lang. Teach. Res. 3, 477–483. doi: 10.4304/jltr.3.3.477-483

Crossref Full Text | Google Scholar

Dahl, T. L., Storlykken, O., and Røssehaug, B. H. (2021). “Exploring perspective switching in immersive VR for learning first aid in lower secondary education,” in Virtual, Augmented and Mixed Reality, eds. J. Y. C. Chen and G. Fragomeni (Springer International Publishing), 301–316. doi: 10.1007/978-3-030-77599-5_22

Crossref Full Text | Google Scholar

Dhimolea, T. K., Kaplan-Rakowski, R., and Lin, L. (2022). A systematic review of research on high-immersion virtual reality for language learning. TechTrends, 66, 810–824. doi: 10.1007/s11528-022-00717-w

Crossref Full Text | Google Scholar

Dooly, M., Thrasher, T., and Sadler, R. (2023). “Whoa! Incredible!:” language learning experiences in virtual reality. RELC J. 54, 321–339. doi: 10.1177/00336882231167610

Crossref Full Text | Google Scholar

Essoe, J. K.-Y., Reggente, N., Ohno, A. A., Baek, Y. H., Dell'Italia, J., and Rissman, J. (2022). Enhancing learning and retention with distinctive virtual reality environments and mental context reinstatement. NPJ Sci. Learn. 7:31. doi: 10.1038/s41539-022-00147-6

PubMed Abstract | Crossref Full Text | Google Scholar

Figueroa, R. J., Jung, I., Palma Gil, F., Taniguchi, H., and Perez, J. (2024). Utilizing virtual reality tours in language learning. ASCILITE Publications, 24–25. doi: 10.14742/apubs.2024.1178

Crossref Full Text | Google Scholar

Fransson, G., Holmberg, J., and Westelius, C. (2020). The challenges of using head mounted virtual reality in K-12 schools from a teacher perspective. Educ. Infor. Technol. 25, 3383–3404. doi: 10.1007/s10639-020-10119-1

Crossref Full Text | Google Scholar

Frolli, A., Esposito, C., Laccone, R. P., and Cerciello, F. (2024). “English language learning in primary school children using immersive virtual reality,” in Learning and Collaboration Technologies, eds. P. Zaphiris and A. Ioannou (Springer Nature Switzerland), 78–88. doi: 10.1007/978-3-031-61691-4_6

Crossref Full Text | Google Scholar

Guan, J.-Q., Ying, S.-F., Zhang, M.-L., and Hwang, G.-J. (2024). From experience to empathy: An empathetic VR-based learning approach to improving EFL learners' empathy and writing performance. Comp. Educ. 220:105120. doi: 10.1016/j.compedu.2024.105120

Crossref Full Text | Google Scholar

Hite, R. L., Jones, M. G., Childers, G. M., Ennes, M., Chesnutt, K., Pereyra, M., et al. (2019). Investigating potential relationships between adolescents' cognitive development and perceptions of presence in 3-D, haptic-enabled, virtual reality science instruction. J. Sci. Educ. Technol. 28, 265–284. doi: 10.1007/s10956-018-9764-y

Crossref Full Text | Google Scholar

Huang, W., Roscoe, R. D., Johnson-Glenberg, M. C., and Craig, S. D. (2021). Motivation, engagement, and performance across multiple virtual reality sessions and levels of immersion. J. Comp. Assist. Learn. 37, 745–758. doi: 10.1111/jcal.12520

Crossref Full Text | Google Scholar

Jauregi-Ondarra, K., Gruber, A., and Canto, S. (2021). “Pedagogical experiences in a virtual exchange project using high-immersion virtual reality for intercultural language learning,” in N. Zoghlami, C. Brudermann, C. Sarré, M. Grosbois, L. Bradley, and S. Thouësny (Eds.), CALL and professionalisation: Short papers from EUROCALL 2021 (1st ed., pp. 155–160). Research-publishing.net. doi: 10.14705/rpnet.2021.54.1325

Crossref Full Text | Google Scholar

Jwai'ed, A. M., Masri, A. A., Hijazi, D., and Smadi, M. (2024). Utilizing virtual reality (VR) and augmented reality (AR) technologies in EFL classrooms: A novel approach to improve vocabulary learning and retention. J. Ecohuman. 7, 22-33. doi: 10.62754/joe.v3i7.4171

Crossref Full Text | Google Scholar

Kaplan-Rakowski, R., and Thrasher, T. (2024). The impact of high-immersion virtual reality and interactivity on vocabulary learning. SSRN Electron. J. doi: 10.2139/ssrn.4850163

Crossref Full Text | Google Scholar

Kaplan-Rakowski, R., and Thrasher, T. (2025). The impact of high-immersion virtual reality and interactivity on vocabulary learning. Br. J. Educ Technol. n/a. doi: 10.2139/ssrn.4850163

Crossref Full Text | Google Scholar

Kolesnichenko, A. N. (2023). The possibilities of using virtual reality technology in teaching a foreign language. Samara J. Sci. 12, 266–270. doi: 10.55355/snv2023122311

Crossref Full Text | Google Scholar

Lai, K.-W. K., and Chen, H.-J. H. (2023). A comparative study on the effects of a VR and PC visual novel game on vocabulary learning. CALL. 36, 312–345. doi: 10.1080/09588221.2021.1928226

Crossref Full Text | Google Scholar

Lee, S.-M., and Wu, J. (2023). Teaching with immersive virtual reality. Int. J. Comp. Ass. Lang. Learn. Teach. 13.

Google Scholar

Lee, T.-Y., Ho, Y.-C., and Chen, C.-H. (2023). Integrating intercultural communicative competence into an online EFL classroom: an empirical study of a secondary school in Thailand. Asian-Pac. J. Sec. Fore. Lang. Educ. 8:4. doi: 10.1186/s40862-022-00174-1

Crossref Full Text | Google Scholar

Li, H. (2023). The effect of VR on learners' engagement and motivation in K12 english education. J. Educ. Human. Soc. Sci. 22, 82–89. doi: 10.54097/ehss.v22i.12291

Crossref Full Text | Google Scholar

Li, M., Pan, Z., Sun, Y., and Yao, Z. (2021). Virtual reality in foreign language learning: a review of the literature. 2021 IEEE 7th International Conference on Virtual Reality (ICVR), 302–307. doi: 10.1109/ICVR51878.2021.9483842

Crossref Full Text | Google Scholar

Liu, S., Gao, S., and Ji, X. (2023). Beyond borders: Exploring the impact of augmented reality on intercultural competence and L2 learning motivation in EFL learners. Front. Psychol. 14. doi: 10.3389/fpsyg.2023.1234905

PubMed Abstract | Crossref Full Text | Google Scholar

Makransky, G., and Petersen, G. B. (2021). The cognitive affective model of immersive learning (CAMIL): a theoretical research-based model of learning in immersive virtual reality. Educ. Psychol. Rev. 33, 937–958. doi: 10.1007/s10648-020-09586-2

Crossref Full Text | Google Scholar

Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., and Group, T. P. (n.d.). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 6:e1000097. doi: 10.1371/journal.pmed.1000097

PubMed Abstract | Crossref Full Text | Google Scholar

Natale, A. F. D., Repetto, C., Riva, G., and Villani, D. (2020). Immersive virtual reality in K-12 and higher education: a 10-year systematic review of empirical research. Br. J. Educ. Technol. 51, 2006–2033. doi: 10.1111/bjet.13030

Crossref Full Text | Google Scholar

Ng, D. T. K., Ng, R. C. W., and Chu, S. K. W. (2023). Engaging students in virtual tours to learn language and digital literacy. J. Comp. Educ. 10, 575–602. doi: 10.1007/s40692-023-00262-2

Crossref Full Text | Google Scholar

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., et al. (2021). The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 372:n71. doi: 10.1136/bmj.n71

PubMed Abstract | Crossref Full Text | Google Scholar

Parong, J., and Mayer, R. E. (2018). Learning science in immersive virtual reality. J. Educ. Psychol. 110, 785–797. doi: 10.1037/edu0000241

Crossref Full Text | Google Scholar

Pataquiva, F. de P. F., and Klimova, B. (2022). A systematic review of virtual reality in the acquisition of second language. Int. J. Emerg. Technol. Learn. 17, 43–53. doi: 10.3991/ijet.v17i15.31781

Crossref Full Text | Google Scholar

Pellas, N., Mystakidis, S., and Kazanidis, I. (2021). Immersive virtual reality in K-12 and higher education: A systematic review of the last decade scientific literature. Virt. Real. 25, 835–861. doi: 10.1007/s10055-020-00489-9

Crossref Full Text | Google Scholar

Qiu, X. B., Shan, C., Yao, J., and Fu, Q. K. (2024). The effects of virtual reality on EFL learning: A meta-analysis. Educ. Infor. Technol. 29, 1379–1405. doi: 10.1007/s10639-023-11738-0

Crossref Full Text | Google Scholar

Ratcliffe, J., and Tokarchuk, L. (2020). Presence, embodied interaction and motivation: Distinct learning phenomena in an immersive virtual environment. Proceedings of the 28th ACM International Conference on Multimedia, 3661–3668. doi: 10.1145/3394171.3413520

Crossref Full Text | Google Scholar

Sahinler, M. (2023). Gamifying vocabulary acquisition and retention in virtual reality. Teach. Eng. Technol. doi: 10.56297/BKAM1691/DFXC4759

Crossref Full Text | Google Scholar

Shen, Y., Zhou, D., and Wang, Y. (2023). Outcomes of VR, AR and MR technologies in K-12 language education: A review. Int. J. Learn. Teach. 9. doi: 10.18178/ijlt.9.3.272-278

Crossref Full Text | Google Scholar

Sterne, J. A. C., Savović, J., Page, M. J., Elbers, R. G., Blencowe, N. S., Boutron, I., et al. (2019). RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ. 366:l4898. doi: 10.1136/bmj.l4898

PubMed Abstract | Crossref Full Text | Google Scholar

Tai, T.-Y., and Chen, H. H.-J. (2021). The impact of immersive virtual reality on EFL learners' listening comprehension. J. Educ. Comp. Res. 59, 1272–1293. doi: 10.1177/0735633121994291

Crossref Full Text | Google Scholar

Tai, T.-Y., Chen, H. H.-J., and Todd, G. (2022). The impact of a virtual reality app on adolescent EFL learners' vocabulary learning. CALL. 35, 892–917. doi: 10.1080/09588221.2020.1752735

Crossref Full Text | Google Scholar

Tan, M. C. C., Chye, S. Y. L., and Teng, K. S. M. (2022). “In the shoes of another”: immersive technology for social and emotional learning. Educ. Infor. Technol. 27, 8165–8188. doi: 10.1007/s10639-022-10938-4

PubMed Abstract | Crossref Full Text | Google Scholar

Tschanz, N., and Baerlocher, B. (2022). Virtual reality in language teaching—Consensus. doi: 10.21240/mpaed/47/2022.04.14.X

Crossref Full Text | Google Scholar

Voordijk, H., and Vahdatikhaki, F. (2020). Virtual reality learning environments and technological mediation in construction practice. Eur. J. Eng. Educ. 47, 259–273. doi: 10.1080/03043797.2020.1795085

Crossref Full Text | Google Scholar

Weng, Y., Schmidt, M., Huang, W., and Hao, Y. (2024). The effectiveness of immersive learning technologies in K−12 English as second language learning: A systematic review | ReCALL. ReCALL. 36, 210–229. doi: 10.1017/S0958344024000041

Crossref Full Text | Google Scholar

Wilkinson, M., Brantley, S., and Feng, J. (2021). A mini review of presence and immersion in virtual reality. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 65, 1099–1103. doi: 10.1177/1071181321651148

Crossref Full Text | Google Scholar

Wu, B., Yu, X., and Gu, X. (2020). Effectiveness of immersive virtual reality using head-mounted displays on learning performance: A meta-analysis. Br. J. Educ. Technol. 51, 1991–2005. doi: 10.1111/bjet.13023

Crossref Full Text | Google Scholar

Wu, J. G., Miller, L., Huang, Q., and Wang, M. (2021). Learning with immersive virtual reality: An exploratory study of Chinese college nursing students. RELC J. 54, 697–713. doi: 10.1177/00336882211044860

Crossref Full Text | Google Scholar

Xie, T., Zhang, H., and Yang, Y. (2025). Effect of immersive virtual reality based upon input processing model for second language vocabulary retention. Educ. Inf. Technol. 30, 12365–12385. doi: 10.1007/s10639-025-13333-x

Crossref Full Text | Google Scholar

Keywords: educational technology, foreign language learning, K-12 education, systematic review, virtual reality

Citation: Sun L and Song X (2026) The effectiveness of virtual reality for K-12 foreign language learning: a systematic review of recent randomized controlled trials. Front. Psychol. 16:1714481. doi: 10.3389/fpsyg.2025.1714481

Received: 27 September 2025; Revised: 27 November 2025;
Accepted: 30 November 2025; Published: 06 January 2026.

Edited by:

Daniel H. Robinson, The University of Texas at Arlington College of Education, United States

Reviewed by:

Bruno Peixoto, University of Trás-os-Montes and Alto Douro, Portugal
Yilong Pu, Central China Normal University, China

Copyright © 2026 Sun and Song. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xiacheng Song, c2lua3NvbmdsZW9AMTYzLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.