Analyzing and comparing augmented reality and virtual reality assisted vocabulary learning: a systematic review

Zhang, Mike Minwen; Hashim, Harwati; Yunus, Melor Md

doi:10.3389/frvir.2025.1522380

SYSTEMATIC REVIEW article

Front. Virtual Real., 12 May 2025

Sec. Virtual Reality and Human Behaviour

Volume 6 - 2025 | https://doi.org/10.3389/frvir.2025.1522380

This article is part of the Research TopicBreaking Language Barriers with XR: Enhancing Foreign Language EducationView all articles

Analyzing and comparing augmented reality and virtual reality assisted vocabulary learning: a systematic review

Mike Minwen Zhang

Harwati Hashim*

Melor Md Yunus

Faculty of Education, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia

Introduction: The integration of augmented reality (AR) and virtual reality (VR) into language learning, particularly vocabulary learning, has been spotlit by a growing body of research in recent years. However, there is a notable lack of comprehensive reviews analyzing the latest research on AR-assisted vocabulary learning (ARVL) and VR-assisted vocabulary learning (VRVL), especially the ones systematically comparing the vocabulary learning (VL) processes as well as outcomes, including the effectiveness of vocabulary gain and retention within the AR/VR learning environments.

Methods: To fill this research gap, a total of 37 empirical studies from the last five years (2020-2024) were meticulously selected from the domains of ARVL and VRVL and then analyzed and compared across five dimensions: main characteristics, VL process, VL effectiveness, their main benefits and limitations.

Results: Key findings reveal that while VRVL studies employed head-mounted displays (HMD) more frequently, ARVL studies greatly outnumbered their VRVL counterparts, with a predominant academic interest in non-wearable AR. Higher education was the main focus of the research, with elementary education coming in second. Although most VRVL studies took a more thorough approach, ARVL research mostly concentrated on basic vocabulary knowledge. By incorporating auditory effects, a sizable portion of VRVL studies improved the multimedia VL experience. The intentional learning approach was preferred over the incidental approach in both fields, while incidental learning was slightly more common in VRVL studies. Most studies on ARVL and VRVL indicate that learners in AR/VR-supported environments achieved significantly greater vocabulary gains than those using traditional methods. Additionally, the dual measurements of vocabulary gain and retention in VRVL studies have been examined more rigorously, and the impact of VRVL has been found to be potentially more effective than that of ARVL.

Suggestions: It is suggested that wearable devices in ARVL studies, VRVL in elementary and preschool settings, and ARVL for junior high learners, vocabulary retention in ARVL and VRVL contexts demand further investigation. It is advisable to conduct comparative empirical studies and meta-analyses regarding the effectiveness of VL in these two learning modes. Future research could benefit from integrating ARVL and VRVL to create a synergistic approach that further supports vocabulary learning.

1 Introduction

Spatial, immersive computing technologies like Virtual Reality (VR) and Augmented Reality (AR) have become the driving force behind the fourth wave of technological revolution in recent decades due to the rapid advancement of information technology (Mystakidis, 2022; Parmaxi, 2020; Dhimolea et al., 2022). Recently, in an effort to advance vocabulary education research, scholars have started to explore the possibilities of integrating cutting-edge technologies into vocabulary learning (Legault et al., 2019; Tsai, 2018). The studies on vocabulary learning have been heavily influenced by emerging technologies, including AR and VR.

AR can be described as a technology combining the physical world with digital content, providing an immersive experience of a real-world environment. AR integrates digital elements with the user’s environment in real-time through the employment of different electronic devices, for example, HMD, holographic displays, and handheld devices such as smartphones and smart glasses. By contrast, virtual reality (VR) is defined as the integration of technological elements and software designed to produce a fully immersive experience that simulates physical presence in an alternate virtual environment (Li and Cesar, 2023). Virtual reality technology emphasizes the sensation of presence within a computer-generated three-dimensional image or environment (Pinto et al., 2021). Although there are ongoing debates regarding the definition and categorization of virtual reality (VR) (Motejlek and Alpay, 2021), VR can generally be divided into two primary categories: low-immersion VR (LiVR) and high-immersion VR (HiVR) (Gruber and Kaplan-Rakowski, 2019). The connection and difference between AR and VR can be illustrated by Skarbez et al. (2021)’s Reality-Virtuality Continuum.

Both AR and VR have begun to be widely utilized in language education. The volume of research on AR-assisted vocabulary learning (ARVL) and VR-assisted vocabulary learning (VRVL) has witnessed an unprecedented rise in recent years. ARVL and VRVL foster interactive virtual settings that involve language learners in a genuine, immersive, and engaging vocabulary learning environment (Haoming and Wei, 2024; Schorr et al., 2024). In addition, the potential of ARVL/VRVL can also facilitate the lexical development needed by language learners (Tsai, 2018; Legault et al., 2019; Alfadil, 2020).

There is a common consensus that learning vocabulary plays a crucial role in the process of second language learning (Nation, 1990; Cameron, 2001; Fehr et al., 2012), and the lexicon is one of the most essential language building blocks for EFL learners (Yamamoto, 2013). Vocabulary constitutes a critical element of language acquisition and markedly affects learners’ proficiency across different age demographics (Afzal, 2019; Ng and Rosli, 2023). Proficiency in vocabulary is essential in second language acquisition (SLA) as it significantly contributes to understanding how learners comprehend and produce a second language, particularly through the exploration of cognitive aspects of lexical acquisition (Crossley et al., 2009; Schmitt, 2000; Webb, 2005). Without a diverse range of vocabulary to convey various meanings, effective communication in a foreign or second language remains unattainable, regardless of a language learner’s mastery of grammar and pronunciation (McCarthy, 1990; Folse, 2004). Some evidence argued that learners who possess extensive vocabulary exhibit greater proficiency in various language skills compared to those with limited vocabulary (Meara, 1996; Richards and Renandya, 2002).

In vocabulary learning, two terms, vocabulary gain and vocabulary retention, are often viewed as the core indicators of VL effectiveness. Vocabulary gain, also known as vocabulary learning (Chen and Yuan, 2023; Yangın Ersanlı, 2023) or vocabulary acquisition (Alfadil, 2020; Sahinler, 2023), can be defined as the short-term memory retrieval of vocabulary knowledge, measured by subtracting the pretest score from the immediate vocabulary post-test score (Elekaei et al., 2015; Azari et al., 2012), to refer to the increase in the number of words and expressions that an individual acquires and incorporates into their active vocabulary immediately after their vocabulary learning. It is a measure of language development, reflecting the short-term growth of a person’s lexical knowledge. Meanwhile, it is crucial to emphasize that vocabulary acquisition, or vocabulary learning, also generally refers to the overall process of vocabulary learning, which can be prone to conceptual confusion in many cases. Thus, this review uses “vocabulary gain” rather than “vocabulary acquisition” or “vocabulary learning” to refer to immediate VL effectiveness.

Vocabulary retention was defined by Richards and Schmidt (2002) as “the ability to recall or remember things after an interval of time” or long-term memory retrieval of vocabulary knowledge. Mohammed (2009) defines vocabulary retention as “the ability to retain the acquired vocabulary and retrieve it after a period of time following the learning intervention.” Vocabulary retention is measured by subtracting the pretest score from the delayed vocabulary post-test score (Elekaei et al., 2020; Azari et al., 2012). Vocabulary retention is often viewed as an intricate cognitive process of memory encompassing memorization or learning, recall, and recognition (Suleiman, 2009). Suleiman (2009) further explains that certain preparatory processes include the initial encoding and storage of information in short-term memory, followed by its eventual consolidation into long-term memory.

Although the application of AR and VR for vocabulary learning has garnered the interest of researchers in recent years, the number of systematic reviews on ARVL and VRVL is still scarce. Haoming and Wei (2024) systematically reviewed ARVL and VRVL in gamified learning contexts, the narrow focus on gamified ARVL and VRVL limited the generalizability of their findings. Additionally, elements such as VL presentation, vocabulary gain, and vocabulary retention were not examined. More importantly, there is an absence of systematic reviews that combine ARVL and VRVL studies to compare and analyze their similarities and differences across various dimensions, particularly in VL processes and effectiveness, including vocabulary gain and retention.

Thus, in the present systematic review, the following four questions will be answered:

1. What are the main characteristics of the reviewed studies in terms of publication year, target languages, applied technologies, target audience, and sample size?

2. What kinds of vocabulary knowledge learned, VL presentations, and VL approaches are identified?

3. How are two kinds of VL effectiveness (vocabulary gain and retention) measured in different studies, and how effective are VRVL and ARVL in improving vocabulary gain and vocabulary retention among learners?

4. What are the main benefits and limitations in implementing ARVL and VRVL?

2 Methodology

Following the recommendations in the most recent PRISMA framework, a systematic review of ARVL and VRVL was carried out. Numerous research studies have verified this paradigm, which provides an organized and transparent method for conducting systematic reviews and meta-analyses (Page et al., 2021).

2.1 Literature selection

In order to obtain sufficient high-quality literature on ARVL and VRVL, three databases, namely, Scopus, Web of Science Core Collection, and Google Scholar, were utilized in this study. The initial search conducted on 26 September 2024 included the keywords in the titles of studies: “Augmented Reality,” “AR,” “Virtual Reality,” “VR,” “vocabulary,” “vocabulary learning,” “vocabulary acquisition,” and “vocabulary retention.” In the Scopus data search engine, “Article Title” was chosen for Search within Box, and the command “AR” OR “augmented reality” OR “VR” OR “virtual reality” was entered in the Search documents box. Next, another row was added, also choosing “AND” for the linking device and “Article Title” again, the command “vocabulary” OR “vocabulary learning” OR “vocabulary acquisition” OR “vocabulary retention.” In Web of Science Core Collection, use the “Title” filter and enter the query: (“AR” OR “augmented reality” OR “VR” OR “virtual reality”) AND (“vocabulary” OR “vocabulary learning” OR “vocabulary acquisition” OR “vocabulary retention”). In Google Scholar, enter the command: allintitle: (“AR” OR “augmented reality” OR “VR” OR “virtual reality”) + (“vocabulary” OR “vocabulary learning” OR “vocabulary acquisition” OR “vocabulary retention”). A total number of 388 results was yielded, which were subsequently screened based on the eligibility criteria listed below.

2.2 Inclusion criteria

1. The paper is published from January 2020 until September 2024.

2. The paper is peer-reviewed and indexed in SCI/SSCI/Scopus/Wos.

3. The paper is downloadable.

4. The paper is published in English.

5. The paper is a journal article or conference paper.

6. The paper involves VL effectiveness measurement (i.e., vocabulary gain, vocabulary retention, or a combination of both)

There are two reasons why this review solely includes the publications from the time frame of the last 5 years (2020–2024). Firstly, only from 2019 onwards, the volume of research witnessed significant growth in the domain of ARVL and VRVL, and the growth trend has further accelerated since the beginning of 2020, mainly due to the outbreak of the COVID-19 pandemic. Secondly, technology-enhanced language learning (TELL), in particular, AR-assisted language learning (ARALL) and VR-assisted language learning (VRALL), are rapidly evolving fields; the most recent research should be prioritized because studies before 2020 can be considered outdated due to technological advancements, up-to-date discoveries, techniques, or data.

The literature selection process consisted of three main phases (Figure 1). In the first phase, identification, after obtaining 409 records from the initial search, 112 duplicated articles were identified and eliminated, which left 297 articles for further consideration. In the second phase, namely, literature screening, 130 records were identified to be ineligible because they were published before 2020 and not indexed in SCI/SSCI/Scopus/Web of Science. 95 studies were excluded for eligibility criteria 2 and 3. At the final step of screening, 72 records were assessed for the last three eligibility criteria. There is one article published in the German language and 34 not involving effectiveness measurement of either vocabulary gain or vocabulary retention or both. Ultimately, 37 papers—33 journal articles and 4 conference papers—were included in the final pool of SLR data sources. All were published in English and peer-reviewed. Since the VL effectiveness needs to be examined and compared, the eligibility criteria of this review include vocabulary learning effectiveness measurement, which is quantitative; qualitative studies must be excluded.

Figure 1

Figure 1. Flow diagram of data collection process based on PRISMA.

3 Findings

3.1 RQ 1: What are the main characteristics of the reviewed studies in terms of publication year, target language, device used, target audience, and sample size?

Among the 37 studies, there were 21 articles where AR-mediated learning instruments were utilized to facilitate the vocabulary learning process, which was predominantly more than the VR-supported vocabulary learning studies (n = 16) in this review. It indicated the popularity of AR applications in the domain of vocabulary learning.

In terms of the research methods, among the 37 studies, 16 studies used a mixed-method research approach, and another 21 chose a quantitative research approach. Since the eligibility criteria of this review include vocabulary learning effectiveness measurement, which is of a quantitative nature, qualitative studies must be excluded. Four studies still use a single-group experimental design, including two ARVL studies (Uiphanit et al., 2020; Yilmaz et al., 2022) and two VRVL studies (Fuhrman et al., 2021; Sahinler, 2023). The majority of the studies examined in this systematic review used a quasi-experimental design with both experimental and control groups.

According to Figure 2, which shows the distribution of publications by year, 2023 had the most publications (n = 10) over the examined timeframe, followed by 2024 (n = 8). The same amount of seven pertinent journal papers were published in 2020 and 2021, respectively. The number bottomed at five publications in 2022. More specifically, AR-related publications dominated the year 2020 with five studies and then decreased annually to three in the following 2 years. In 2023, both AR and VR saw an increase from the previous year and then a minor decline in 2024. This may be attributed to the fact that when the search for the articles was conducted on 26 September 2024, there were still 3 months remaining until the end of the year, and some new relevant publications are likely to be added based on the trend.

Figure 2

Figure 2. Number of publications in ARVL and VRVL from 2020 to 2024.

In terms of the target language taught in the reviewed studies (Figure 3), studies focused on English vocabulary learning comprised the majority, totaling 29 articles and representing 78% of the total. Other target languages included Japanese (n = 3), Chinese (n = 2), French (n = 1), Spanish (n = 1), and Finnish (n = 1).

Figure 3

Figure 3. Percentages of target languages.

Concerning the geographical distribution of existing ARVL and VRVL research published between 2020 and 2024, the majority of the studies (n = 29) were conducted in regions and countries in Asia, namely, Taiwan (n = 7), Mainland China (n = 5), Saudi Arabia (n = 3), Turkey (n = 3), Thailand (n = 2), Malaysia (n = 2), and Indonesia (n = 2). Notably, 32.4% of the examined articles were published by scholars in Greater China, demonstrating their important role in advancing technology-mediated language learning. Additional noteworthy article sources were found in Europe (n = 8), followed by Western Asia (n = 6), Southeast Asia (n = 6), the United States of America (n = 2), and Australia (n = 1).

As illustrated in Figure 4, a wide range of electronic devices, such as head-mounted displays (HMDs), tablets, smartphones, and desktop computers, have been utilized in AR/VR-assisted language learning research. In total, there were 15 studies using HMDs as their vocabulary learning platforms, making HMD the most popular type of VL device, of which five studies opted for mobile-rendered HMDs, including Google Cardboard (n = 2) and Samsung VR Gear (n = 3). However, this popularity was mainly attributed to VRVL studies, where HMD was predominantly selected by 14 studies as their VRVL instructional instruments. Of the 14 HMD VR studies, Lai and Chen (2023) and Kaplan-Rakowski and Thrasher (2024) employed both HMDs for the experimental group and desktops for the control group as their instruments. By contrast, only one ARVL study by Weerasinghe et al. (2022) involved HMD. Among the rest of the ARVL studies, 10 studies utilized tablets and 11 studies employed smartphones as their VL instruments, including four studies with mixed utilization of instruments (Larchen Costuchen et al., 2021; Belda-Medina and Marrahi-Gomez, 2023; Ibrahim et al., 2024), in which both tablets and smartphones were used, and one study by Weerasinghe et al. (2022), in which both HMDs and tablets were applied.

Figure 4

Figure 4. Number of studies by device used in ARVL and VRVL.

Although HMD was the mainstream device applied in VRVL studies, there was still one study by Liao (2023) using smartphones and three studies (Lai and Chen, 2023; Kaplan-Rakowski and Thrasher, 2024; Luan et al., 2024) involving desktop devices, of which two studies (Lai and Chen, 2023; Kaplan-Rakowski and Thrasher, 2024) opted for desktop devices only in their control groups. By delivering omnidirectional visual and aural input, HMD, the representative of High Immersion VR (HiVR), successfully isolates users from their physical surroundings while offering them a fully immersive 360° experience. On the other hand, two-dimensional (2D) computer screens on desktops, tablets, and smartphones—known as Low Immersion VR (LiVR)—provide immersion; nevertheless, this may lessen the overall feeling of immersion due to possible distractions from outside stimuli (Kaplan-Rakowski et al., 2024). It is worth noting that Liao’s (2023) study integrated not only VR technology but also artificial intelligence into English vocabulary test research. Three ARVL studies did not specify what kind of device was implemented.

The dominance of tablets and smartphones in ARVL studies means that the application of HMDs like AR headsets and other wearable devices with a higher sense of immersion was largely neglected, which implies that most of the ARVL studies provided a less immersive learning experience compared to most VRVL counterparts, and ARVL with wearable devices requires more attention in future investigations.

This part of the review on learner characteristics revealed an ARVL and VRVL research trend where most researchers emphasized the realms of higher education (n = 12) and elementary education (n = 11) (Figure 5). More specifically, there were 7 VRVL studies and 5 ARVL studies focusing on higher education, while all ten studies on elementary education were AR-assisted. More notably, the preschool educational level group (n = 2) was also investigated only in AR studies. This uneven distribution may stem from the certain technological drawbacks of VR devices, especially HMD, which could cause some negative effects on young learners’ physical health. In addition, the “junior high school” group is exclusively represented by four studies on VR. For “high school,” AR has three studies, and VR has two, for a total of five combined. There are three studies that did not specify their educational level, including Fuhrman et al. (2021), featuring adult language learners aged 19 to 41; Bergsma et al. (2023), encompassing 22 participants aged 21–65 years; and Hartfill et al. (2020), including 29 participants (18 male, 11 female), aged from 19 to 56. Overall, both ARVL and VRVL in high school, junior high school, and preschool education have been relatively underexplored, necessitating urgent investigations into VRVL’s impact on elementary and preschool education, as well as ARVL’s influence on junior high school students.

Figure 5

Figure 5. Number of studies by target group educational level in ARVL and VRVL.

Furthermore, out of 37 investigations, five examined single-gender individuals, comprising three studies focused on male language learners (Alfadil, 2020; Alharbi, 2022; Khan et al., 2023) and two studies involving female participants (Binhomran and Altalhab, 2021; Khodabandeh and Mombini, 2024). The rest are all mixed-gender studies.

This review also investigated the sample size in each study. As classified by Burston (2015), sample sizes are categorized as follows: “very small” (n < 15), “small” (n = 15–25), “medium” (n = 25–49), “big” (n = 50–64), and “large” (n > 64). As depicted in Figure 6, the distribution of studies for ARVL and VRVL is similar, with the largest number of studies falling within the “medium” sample size (n = 16), followed by “big” and “large” groups, both of which contain nine studies, respectively. Two studies have fallen in the category of “small” sample size and then only one study in the “very small” category, highlighting that the quasi-experimental study by Sahinler (2023) notably picked its sample size of merely six participants who are Year 9 English as an additional language learners in Australia, and there was no control group designed for this study.

Figure 6

Figure 6. Number of studies by sample size in ARVL and VRVL.

3.2 RQ2: What kinds of vocabulary knowledge learned, VL presentations and VL approaches are identified?

Figure 7 illustrates a comparison between two types of vocabulary knowledge learned—word form and meaning and word form, meaning, and use—across ARVL, VRVL, and the aggregated total. It demonstrates that word form and meaning were prioritized in ARVL with 18 studies, while only three studies (Hidayat and Yulianti, 2020; Korosidou and Bratitsis, 2021; Hung and Yeh, 2023) explore word form, meaning, and usage. By contrast, VRVL exhibits an equal distribution, with eight studies concentrating on word form and meaning and also eight investigating all three dimensions, indicating that research on VRVL is more balanced in addressing both fundamental vocabulary knowledge and its practical usage. The overall data reveal a stronger emphasis on word form and meaning (n = 26) compared to word form, meaning, and use (n = 11), highlighting a research trend where most studies in ARVL and VRVL focus on basic vocabulary recognition rather than contextual vocabulary application.

Figure 7

Figure 7. Number of studies by vocabulary knowledge learned in ARVL and VRVL.

There are three main forms of VL presentation across 37 studies: 1. Visual, audio, and text 2. Visual and text 3. Visual and audio. A total of 29 studies offered a more well-rounded multimedia VL experience with additional auditory effects, accounting for 78.3% of the total number, while seven studies integrated only visual aids and text into their VL practice, representing 18.9% of all studies, and only one study employed visual and auditory features (Figure 8). More specifically, in ARVL studies, 14 studies incorporated visual, audio, and text, while seven studies included only visual and text. By contrast, 15 out of 16 VRVL studies integrated all three presentational forms—visual, audio, and text. One study (Fuhrman et al., 2021) combined only visual and audio during VRVL. Therefore, it suggests that, compared to ARVL studies, the overall learning experience in most VRVL studies was more multifaceted.

Figure 8

Figure 8. Number of studies by VL presentation in ARVL and VRVL.

According to Hulstijn (2003), second language vocabulary learning can be approached either by intentional vocabulary learning or incidental vocabulary learning. Hence, it is crucial to compare the two main VL approaches in the AR and VR context. Figure 9 reveals that the total numbers show that intentional learning (n = 22) dominates overall, while incidental learning (n = 15) is less frequent but more prominent in VR, highlighting the immersive nature of VR that may better facilitate incidental vocabulary learning. Specifically, there was a disproportionate number of ARVL studies practicing intentional learning, where 17 studies used this approach compared to only four using incidental learning. However, five VRVL studies utilized intentional learning, and the remaining 11 studies focused on incidental learning, depicting a more incidental learning-oriented distribution.

Figure 9

Figure 9. Number of studies by VL approach in ARVL and VRVL.

3.3 RQ 3: How are two kinds of VL effectiveness (vocabulary gain and retention) measured in different studies, and how effective are VRVL and ARVL in improving vocabulary gain and vocabulary retention among learners?

Across both ARVL and VRVL, the majority of studies (n = 20) emphasized vocabulary gain only, while 14 studies investigated both vocabulary gain and retention, and only three studies focused exclusively on vocabulary retention. In the ARVL category, the majority of studies (n = 12) focused solely on vocabulary gain, while six studies measured both vocabulary gain and retention, and only three studies examined vocabulary retention alone. This suggests that AR-assisted vocabulary learning research tends to prioritize short-term vocabulary acquisition over long-term retention. For VRVL, an equal number of studies (eight studies each) explored both vocabulary gain and retention and vocabulary gain only, while no studies were found that focused solely on vocabulary retention. This balanced distribution indicates that VR-assisted vocabulary learning research is relatively more comprehensive in considering both short-term learning and retention, but there is still a gap in studies specifically investigating vocabulary retention alone (Figure 10).

Figure 10

Figure 10. Number of studies by VL effectiveness measured in ARVL and VRVL.

VL effectiveness can be quantitatively tested by various vocabulary assessments, including pretest, immediate post-test (or simply referred to as post-test), and delayed post-test. Concerning the rationale for the design of the vocabulary test in each study, the measurements of VL effectiveness can be divided into three categories (see Table 1): 1. Vocabulary test adapted from established test(s), 2. Research-coined vocabulary test, 3. The source of the vocabulary test is not specified. There are 10 studies where their measurements of VL effectiveness were adapted from established test(s) such as Wesche and Paribakht’s Vocabulary Knowledge Scale (Alfadil, 2020; Li et al., 2022; Hung and Yeh, 2023; Budianto et al., 2023), British Picture Vocabulary Scales (Jalaluddin et al., 2020), Nelson-Denny vocabulary subtest (Larchen Costuchen et al., 2021); Cambridge Assessment English (B1) (Belda-Medina and Marrahi-Gomez, 2023); Peabody Picture Vocabulary Test (Hartfill et al., 2020; Fuhrman et al., 2021; Korosidou and Bratitsis, 2021; Bergsma et al., 2023), Laufer and Nation’s Vocabulary-Size Test of Controlled Productive Ability (Khan et al., 2023), Oxford Young Learners Placement Test (Khodabandeh and Mombini, 2024), and so on. Additionally, there were 14 studies where the researchers designed and developed their vocabulary tests by themselves and another ten studies that did not specify the sources of the vocabulary tests.

Table 1

Table 1. Measurements of vocabulary learning effectiveness.

3.3.1 Studies with both vocabulary gain and vocabulary retention measurement

In the 14 studies assessing learners’ vocabulary gain and retention, the time interval between vocabulary gain measurement (mostly referred to as the post-test or the immediate post-test) and vocabulary retention measurement (mostly the delayed post-test) varied across studies. Seven studies chose a time interval of 1 week, three opted for a 2-week interval, another three studies used a 3-week interval, and one study (Feng and Ng, 2024) settled on a delayed post-test after the longest interval of 30 days.

Among the six AR-assisted comparative studies with both vocabulary gain and retention measurement, Weerasinghe et al. (2022) and Khan et al.’s (2023) studies only showed language learners’ vocabulary gain in the AR-assisted learning setting was significantly higher compared to the non-AR counterpart, while they did not witness a significant increase in the vocabulary retention. Yangın Ersanlı (2023) discovered that the experimental group, which utilized ARVL materials, demonstrated a significant enhancement in vocabulary retention compared to the control group 3 weeks after the post-test. However, both experimental and control groups displayed significant improvements in vocabulary gain, suggesting that there was no significant external disparity between the two groups in their results of immediate posttests. Korosidou (2024) revealed a positive impact of AR applications on early childhood learners’ alphabet and vocabulary gain and retention, and AR applications were also highly appealing and motivating to the participating students. However, Binhomran and Altalhab (2021) did not find any significant enhancement either in vocabulary gain or in vocabulary retention between the experimental group and the control group despite the differences in the mean scores of both vocabulary gain and vocabulary retention in favor of the AR-assisted group. Nevertheless, AR technology resulted in better understanding and higher levels of motivation among students, and most students were satisfied with the AR learning experience and found it engaging and entertaining. In Yilmaz et al.’s (2022) study, although it witnessed a 72-point increase in children’s word/concept learning in the immediate post-test, the investigation did not conclude whether the differences were statistically significant or not, and it was not able to display the superiority of the implementation of AR technology due to its single-group experimental design.

In eight additional VR-assisted comparative studies measuring both vocabulary gain and retention, Fuhrman et al. (2021) found that despite using a single-group design, motor interaction with objects in a VR setting promotes better long-term memory of novel vocabulary items compared to learning without movement. In Tai et al.’s (2022) study, the result also showed that the VR players demonstrated significantly higher vocabulary gain and retention compared to the video watchers because the VR app Mondly positively facilitated vocabulary learning by contextualizing word usage, providing multimodal support, enhancing learner engagement, and offering real-time interactivity and feedback. Additionally, Chen and Yuan (2023) also claimed that the VR approach effectively facilitated both students’ vocabulary gain and retention since VR provided an authentic and immersive learning context that enhanced student engagement and vocabulary retention. Students found the VR learning experience to be novel, attractive, and motivating for their vocabulary learning. Notably, to investigate whether there is a difference in learning Japanese words between a new immersive VR (iVR) context and a learned context, Bergsma et al. (2023) reported that learners in the new iVR context exhibited no significant enhancement in both vocabulary gain and retention compared to those in a learned context.

Besides, the results of Lai and Chen’s (2023) study revealed that the VR group demonstrated a significantly higher mean score than the PC group in the vocabulary translation delayed posttest, inferring the VR group witnessed a significant increase in vocabulary retention compared to the PC group. In comparison, the vocabulary gain in both the translation and recognition tests of the VR group was not significantly superior to that of the PC group. Likewise, Kaplan-Rakowski and Thrasher (2024) discovered that the scores of the HiVR group on the delayed receptive post-test were marginally significantly higher compared with the LiVR, indicating the receptive vocabulary retention of the High Immersion VR group (HiVR) significantly outperformed that of the Low Immersion VR (LiVR) group, although statistically the differences in the immediate receptive post-test, immediate productive post-test, and delayed productive post-test between HiVR and LiVR were insignificant. Also, owing to its single-group design, the study conducted by Sahinler (2023) could not conclude that VRVL was superior to traditional learning methods in terms of VL effectiveness even though it found significant differences in vocabulary improvements between pre-tests, post-tests, and delayed post-tests within the group. In addition, Feng and Ng (2024), in their study, only demonstrated that VR technology positively impacts vocabulary gain and memory retention among EFL learners but did not specify whether the differences were statistically significant or not. It is asserted that the spatial design of virtual environments may influence lexical memory performance, with words placed in frequently interacted positions tending to be better memorized.

Considering that the total number of VRVL studies in this review (n = 13) was significantly fewer than ARVL studies (n = 20), while more VRVL studies examined both vocabulary gain and retention, it can be concluded that VRVL studies analyzed vocabulary gain and retention more comprehensively. Furthermore, three studies, including one ARVL study (Korosidou, 2024) and two VRVL studies (Tai et al., 2022; Chen and Yuan, 2023), contended that the discrepancies in the outcomes of both post-tests and delayed post-tests between the experimental group and control group were statistically significant, indicating that the participants in those studies had significant enhancement in both vocabulary gain and retention in the ARVL/VRVL settings.

3.3.2 Studies with only vocabulary gain measurement

20 studies only measured language learners’ vocabulary gain performance, including 12 ARVL studies (Tsai, 2020a; Tsai, 2020b; Uiphanit et al., 2020; Jalaluddin et al., 2020; Hidayat and Yulianti’s, 2020; Lai and Chang, 2021; Korosidou and Bratitsis, 2021; Topu et al., 2023; Hung and Yeh, 2023; Pannim Vipahasna et al., 2024; Ibrahim et al., 2024; Khodabandeh and Mombini, 2024) and eight VRVL studies (Alfadil, 2020; Hartfill et al., 2020; Chen et al., 2021; Li et al., 2022; Budianto et al., 2023; Liao, 2023; Luan et al., 2024; Seefried et al., 2024).

It is worth noting that, among the 20 studies, only two studies with vocabulary gain measurement, which is AR-assisted research, claimed that the vocabulary gain of the AR-supported learning group was not significantly enhanced compared to the traditional learning group. In Uiphanit et al.’s (2020) study, the difference in vocabulary gains between ARVL and the traditional learning approach remains unknown owing to a single-group experimental design. Moreover, Lai and Chang (2021) contended that no statistically significant difference in performance was observed when comparing AR-mediated learning with the conventional learning method. This lack of distinction may be attributed to temporal constraints of the experiment (Lai and Chang, 2021).

More specifically, among the 12 studies utilizing AR-supported tools, in addition to VL performance, Tsai (2020a) also argued that students taught using AR demonstrated higher motivation. In another study, Tsai (2020b) explored vocabulary gains among three target groups across various proficiency levels (high, intermediate, and low). Furthermore, Jalaluddin et al. (2020) revealed that there was a statistically significant increase in students’ vocabulary performance when AR was implemented in the learning process. However, students still faced difficulties in writing the words learned via AR, often spelling words based on pronunciation. Also, Hidayat and Yulianti’s (2020) study compared the effectiveness of Flashcard Augmented Reality (FAR) media and Game Chick Learn (GCL) media on the ability to memorize English vocabulary in primary school students. The results indicated that FAR media could bolster students’ English vocabulary memorization more effectively than GCL media. Likewise, Topu et al. (2023) examined the vocabulary gain performance of 35 preschool children aged four to 5. In Hung and Yeh’s (2023) study, students in the experimental group, which leveraged AR-enhanced board games, significantly outperformed the control group not only in vocabulary gain but also in creative thinking. In Ibrahim et al.’s (2024) study, the VL effectiveness test result discovered that the AR mobile application called MyBrainy Kelate is an effective tool for improving users’ English vocabulary gain. Besides, Khodabandeh and Mombini (2024) also witnessed significant differences in vocabulary learning between the two experimental groups (flipped group and blended group) who used the Vocabulary Builder AR app and the control group who learned in the traditional face-to-face learning setting.

Among eight VRVL studies, seven of them with only vocabulary gain measured indicated that the experimental group witnessed significant improvement in vocabulary gain compared to the control group. Only two studies (Hartfill et al., 2020) using a gamified VR learning environment named Word Saber revealed that the flashcard-assisted learners outperformed their VR-assisted counterparts in recall and recognition scores. Likewise, in Seefried et al.’s (2024) study, comparing Spanish vocabulary learning performance in three different contexts—fully immersive VR, traditional non-VR, and mixed modality—its finding indicated that the inclusion of VR did not affect the overall vocabulary gains.

Furthermore, Alfadil (2020) implemented gamified vocabulary learning by leveraging a virtual reality (VR) game, House of Languages. In Chen et al.’s (2021) study, vocabulary learning was conducted in a VR-assisted problem-based learning (PBL) context. Li et al. (2022) also claimed that the experimental group that used the VR-based approach outperformed the control group (which used a video-based approach) in terms of incidental vocabulary gain and cognitive, behavioral, and social engagement. Furthermore, Liao (2023) contended that the integration of artificial intelligence (AI) and VR into vocabulary learning has been shown to substantially improve vocabulary gain and enhance student engagement. Additionally, Luan et al. (2024) compared VR-assisted vocabulary learning with the video-watching learning approach, and the results revealed that the VR players significantly outperformed the video watchers in terms of vocabulary gain.

3.3.3 Studies with only vocabulary retention measurement

Only three studies were focused on vocabulary retention measurement in this review, all of which were in the domain of AR-assisted language learning. Larchen Costuchen et al. (2021) found that incorporated with two delayed post-tests, one 15-min delayed post-test, and one 1-week delayed post-test, the experimental AR-based videospatial bootstrapping method was significantly more efficient for vocabulary retention than digital flashcards supported by image and translation. In Alharbi’s (2022) study, most of the students (124 out of 144) strongly agreed that AR technology positively impacted their learning motivation and vocabulary retention. The study found that AR technology provides an immersive and interactive learning environment that enhances students’ ability to remember content and makes learning more appealing. However, Belda-Medina and Marrahi-Gomez’s (2023) study with the post-test conducted 1 week after the intervention showed contradictory results, which did not reveal any significant difference between both groups, so the method did not have a significant impact on vocabulary retention as both groups improved their knowledge.

3.4 RQ 4: What are the main benefits and limitations in implementing ARVL and VRVL?

3.4.1 Main benefits

As shown in Figure 11, ARVL and VRVL offer several significant advantages, with a strong emphasis on interactivity and engagement. The most frequently cited benefit, mentioned by 27 studies (15 ARVL studies and 12 VRVL studies), is the ability of ARVL/VRVL to “Facilitate interactivity.” This underscores that AR/VR technologies create dynamic, user-centered learning environments where learners can actively engage with the material. Similarly, “Improve learning engagement,” supported by 27 studies (16 ARVL studies and 11 VRVL studies), suggests that the multimodal and interactive nature of AR and VR can maintain learner interest and focus more effectively than traditional methods.

Figure 11

Figure 11. Number of studies by main benefits of ARVL and VRVL.

“Facilitate multimodal learning” (n = 21) highlights how ARVL/VRVL technologies integrate various sensory inputs (visual, auditory, and kinesthetic), enhancing comprehension and retention by offering multiple ways to process information. The “Facilitate immersive learning” benefit (n = 18), which was mainly cited in VRVL studies (n = 14), and the benefits of “Boosting learning motivation” and “Presenting authenticity,” each supported by 13 studies. Notably, 11 out of 13 studies emphasizing the benefit of presenting authenticity are VRVL. Moreover, “Deepening vocabulary understanding” (n = 13) indicates that AR/VR helps learners gain a richer and more nuanced understanding of vocabulary, which is dominated by ARVL, likely due to its reality and virtuality combination to simulate real-life contexts where vocabulary is used. In the similar vein, “Enhance enjoyment and learning interest” (n = 13) noted that AR and VR can stimulate students’ academic enthusiasm. Lesser-mentioned benefits, such as “Encourage personalized learning” (n = 9), suggest that AR and VR have the potential to tailor the learning experience to individual needs. On the lower end, lesser-mentioned benefits such as “Provide scaffolding” (n = 5), “Facilitate rich-context learning” (n = 5), “Support collaborative learning” (n = 4), and “Offer a sense of presence” (n = 3) show the least amount of research attention.

3.4.2 Limitations

The limitations of ARVL/VRVL depicted in Figure 12 can be divided into four main categories. First, technical and device-related limitations are a major category. Technical glitches emerge as the most frequently reported limitation in both ARVL and VRVL, with five studies identifying this issue in VRVL (Chen et al., 2021; Tai et al., 2022; Feng and Ng, 2024; Chen and Yuan, 2023; Sahinler, 2023) and three studies in ARVL (Tai et al., 2022; Belda-Medina and Marrahi-Gomez, 2023; Yangın Ersanlı, 2023). This suggests that both technologies suffered from software or hardware malfunctions, disrupting the learning experience. Another critical concern is the inaccessibility of devices, reported in two studies on ARVL (Hidayat and Yulianti, 2020; Binhomran and Altalhab, 2021) and one study on VRVL (Liao, 2023).

Figure 12

Figure 12. Number of studies by limitations of ARVL and VRVL.

Second, pedagogical and usability concerns were also frequently noted. Lexical constraints of vocabulary learning interventions are a significant challenge, particularly in ARVL, where four studies reported this issue, compared to three studies in VRVL. The lack of emphasis on vocabulary output and lexical range in the VR game was noted in Hartfill et al.’s (2020) study. And Lai and Chen (2023) mentioned the limited inclusion of lexical factors of the words, such as word length, part of speech, and L1 phonotactic constraints, in the learning intervention. Additionally, negative usability—referring to difficulties in interacting with AR/VR systems—was equally cited in two studies each for ARVL (Yilmaz et al., 2022; Topu et al., 2023) and VRVL (Seefried et al., 2024; Hartfill et al., 2020). Notably, Yilmaz et al. (2022) noted that preschool children had difficulty holding the tablets. Likewise, Hartfill et al. (2020) mentioned the shortcoming of the single-player mode of the VR game.

Third, some noteworthy cognitive and psychological challenges also appeared. VRVL presents unique cognitive and physiological drawbacks, particularly motion sickness and dizziness, which were reported in three VRVL studies (Lai and Chen, 2023; Budianto et al., 2023; Luan et al., 2024) but not in ARVL. Similarly, cognitive overload was noted in one VRVL study (Kaplan-Rakowski and Thrasher, 2024) but not in ARVL, indicating that VR environments might overwhelm learners with excessive information. On the other hand, learning distraction was identified in three ARVL studies (Lai and Chang, 2021; Larchen Costuchen et al., 2021; Alharbi, 2022) compared to one VRVL study (Tai et al., 2022).

Additionally, several instructional and knowledge-related limitations were mentioned in the reviewed studies. A lack of instructor training was highlighted as a challenge in both ARVL (n = 2) and VRVL (n = 1). In the same vein, unfamiliarity with technical knowledge was reported in two ARVL studies (Binhomran and Altalhab, 2021; Belda-Medina and Marrahi-Gomez, 2023) and one VRVL study (Budianto et al., 2023). Addressing this limitation would require comprehensive training programs to support educators and learners in using these tools efficiently. Furthermore, some minor limitations appeared in only a few studies. Specifically, short learning intervention duration was noted in two ARVL (Tsai, 2020b; Hung and Yeh, 2023) and two VRVL studies (Lai and Chen, 2023; Tai et al., 2022), indicating that many studies examined AR and VR vocabulary learning over brief periods, potentially limiting insights into long-term effectiveness. Myopia (nearsightedness) was reported in one VRVL study but not in ARVL, pointing to a minor visual strain issue in VR applications (Alfadil, 2020).

4 Discussion

4.1 Main characteristics of reviewed studies

Regarding the number of studies selected in the current review, ARVL studies outnumbered VRVL studies by a substantial margin of five. The possible explanation may lie in the existential fact that, in the current trend, most ARVL studies opted for non-wearable devices such as smartphones or tablets (or a combination of both) rather than HMD, which is commonly utilized in most VRVL studies. In comparison to HMDs, non-wearable devices possess two primary advantages. While the expense of integrating AR and VR technology in educational environments can be prohibitively high, smartphones or tablets are generally more economical, rendering them more accessible than HMDs for individuals with limited financial resources (Al-Ansi et al., 2023). Second, non-wearables tend to be more portable and lightweight (Oun et al., 2024), facilitating the implementation of large-scale learning activities (Pierdicca et al., 2019).

In terms of the publication years, the decrease in the number of relevant studies in 2022 and a sudden increase in 2023 are consistent with the review of ARVL and VRVL in the gamification context (Haoming and Wei, 2024). English is the most commonly studied target language, consistent with several reviews on AR-assisted language learning, VR-assisted language learning, or both (Parmaxi, 2020; Dhimolea et al., 2022; Md. Ghalib et al., 2024; Haoming and Wei, 2024).

Although HMD is the most prevalent VL device, its dominance is mostly attributed to its utilization in VRVL research. The superior immersivity of VR, as opposed to AR, is a significant benefit that can fully engage language learners in educational activities, resulting in enhanced involvement (Al-Ansi et al., 2023; Parmaxi, 2020). Conversely, the majority of ARVL studies utilized non-wearable devices such as smartphones and tablets, aligning with the systematic review by Schorr et al. (2024) on AR-assisted language acquisition. Fan et al. (2020) also claimed that ARVL is more focused on portability with mobile phones as its major hardware, and teaching materials are easier to prepare. In addition, however, the dominance of tablets and smartphones in ARVL studies means that the application of HMD, like AR headsets and other wearable devices with a higher sense of immersion, was largely neglected. ARVL with wearable tools requires more attention in future investigations.

Concerning educational level, higher education, followed by elementary education, was prioritized by most researchers in the current review, which aligns with Okumuş Dağdeler’s (2023) finding and is similar to the previous review by Haoming and Wei (2024), which revealed that, while both elementary schools and universities were central focal points among educational institutions, elementary education received the highest priority, with higher education following in importance. Simultaneously, Luo et al. (2024) asserted that higher education stood foremost among all examined educational tiers. No VRVL study exists on preschool and elementary schooling. The uneven distribution may arise from specific technological limitations of VR devices, particularly HMDs, which could adversely affect the physical health of young learners (Kaimara et al., 2022). Both ARVL and VRVL in high school education, junior high school education, and preschool education were comparatively understudied, and explorations of VRVL on elementary and preschool education and ARVL on junior high school learners are imperatively needed.

The distribution of sample sizes in studies on ARVL is comparable to that of VRVL, with the majority of studies categorized under the “medium” sample size, which is similar, albeit slightly different from the findings of a previous review by Luo et al. (2024), where “medium” was the second most commonly chosen sample size. The difference may be caused by the generalized examination of language learning in an X-reality context in Luo et al.’s (2024) review.

4.2 Analysis of vocabulary learning process

In the second question dedicated to the VL process, the overall data across three dimensions (vocabulary form, meaning, and use) of vocabulary knowledge acquisition indicate a stronger focus on word form and meaning over the combined elements of word form, meaning, and use, which concurs with the findings of Haoming and Wei (2024). There are two possible explanations. Firstly, from the perspectives of researchers, incorporating vocabulary use into the vocabulary learning process requires the researchers to take into account a holistic design of VL materials/apps, learning session(s), and the assessment(s), which can be regarded as a more complicated and challenging task for researchers, especially within the time constraints imposed by most studies. Secondly, for language learners, the use of vocabulary typically demands a higher level of cognitive processing and greater involvement in the learning process (Lei and Reynolds, 2022), often resulting in a prolonged duration to attain substantial advancement compared to vocabulary form and meaning.

The findings of the VL presentation discovered that an overwhelming majority of VRVL studies offered a more well-rounded multimedia VL experience with additional auditory effects, which echoes Li et al.’s (2022) claim that VR holds significant potential for enhancing language learners’ linguistic knowledge due to its multifaceted features, including multimodal input. Another possible reason many ARVL researchers limited their studies on auditory effects may stem from the differing characteristics of the device types used in ARVL studies compared to those in VRVL studies. As mentioned before, the most prevalent type of devices employed in current ARVL studies is non-wearables, namely, smartphones and tablets, which normally allow the audio to play out loud. Therefore, the sounds emitted by each device can disrupt one another, hindering learners’ ability to concentrate on the auditory output of their own non-wearable gadgets (Schorr et al., 2024).

Upon comparing two main VL approaches-intentional VL and incidental VL—it is found that the intentional VL approach is significantly more favored than the incidental approach, which is consistent with the findings of Haoming and Wei (2024). While incidental learning is less frequent in total but more prominent in VRVL, highlighting the immersive and multimodal nature of VR that may better facilitate incidental vocabulary learning with the assistance of simulated, multisensory learning scenes (Li et al., 2022).

The most commonly utilized VL effectiveness measurement is a self-designed vocabulary test, which is congruent with the findings in Haoming and Wei (2024). Among vocabulary assessments adapted from established tests, the Vocabulary Knowledge Scale, created by Paribakht and Wesche (1993), is the most widely employed standardized measure for evaluating learners’ VL effectiveness, also aligning with the prior finding of Haoming and Wei (2024).

4.3 AR/VR-assisted vocabulary learning effectiveness

Concerning VL effectiveness, of 34 studies with vocabulary gain measurement, 21 studies (five with both vocabulary gain and retention measurement and 16 with vocabulary gain measurement only) indicated that learners in the ARVL/VRVL environments exhibited significant improvements in vocabulary gain compared to traditional learning methods, highlighting the superiority of AR/VR technologies in enhancing students’ vocabulary gain, which corroborates multiple previous reviews (Md. Ghalib et al., 2024; Dhimolea et al., 2022; Schorr et al., 2024). Additionally, among 17 studies with vocabulary retention measurement, there are seven studies (five with both vocabulary gain and retention measurement and two with vocabulary retention measurement only) asserting that AR/VL technologies are significantly effective in improving vocabulary retention. Therefore, it is concluded that most studies demonstrated that ARVL/VRVL offered their benefits in improving learners’ vocabulary gain, yet only a limited number of studies have asserted that AR/VR technologies are advantageous in enhancing learners’ vocabulary retention, which leaves vocabulary retention in ARVL/VRVL contexts with great potential for further investigation.

Many prior studies have substantiated the role of AR and VR in supporting VL effectiveness. Huang et al.’s (2021) review revealed that AR or VR technologies were effective in enhancing language learners’ vocabulary learning. In the systematic review on high immersion VR for language learning, Dhimolea et al. (2022) discovered that positive learning outcomes were generally assessed against those of the control groups in most studies on vocabulary learning. Despite the lack of dedicated systematic reviews on ARVL effectiveness, prior empirical studies support the aforementioned findings. For instance, Chen et al. (2018) asserted that an AR-based English learning system can significantly enhance students’ VL effectiveness and motivation. Ibrahim et al. (2018) also argued that participants who learned through AR achieved significantly higher scores on both immediate and 4-day delayed productive vocabulary recall tests compared to those who used the flashcard method.

Notwithstanding the overall numerical predominance of ARVL in this review, a greater number of VRVL studies investigate both vocabulary gain and vocabulary retention compared to those focused on ARVL. Three studies argued that participants showed significant enhancements in both vocabulary gain and retention within AR/VR-assisted learning contexts. Notably, two of these three studies involved VR, while only one utilized AR, suggesting a potentially greater effectiveness of VRVL compared to AR-supported methods. This discrepancy may also be attributable to the less immersive, less multisensory learning tools like tablets and smartphones commonly used in most ARVL studies, which can impact both the learning methods and quality of how language learners acquire lexical knowledge (Schorr et al., 2024). Hence, there is a need for further in-depth exploration of vocabulary learning effectiveness within wearable AR-supported learning contexts.

It is also worth noting that, among the 17 studies with vocabulary retention measurement, there are seven where the time interval between the post-test and the delayed post-test is 2 weeks and longer. According to Yongqi (2003), a 2-week delayed recall under experimental conditions is typically described as “long-term retention.” Learners in experimental groups in three studies achieved significant long-term vocabulary retention, including one ARVL study (Yangın Ersanlı, 2023) and two VRVL studies (Sahinler, 2023; Chen and Yuan, 2023).

4.4 Main benefits and limitations

Regarding the main benefits of using AR and VR in vocabulary learning, ARVL and VRVL are esteemed for fostering interactivity, engagement, multimodality, and immersive learning. This aligns with a previous review by Haoming and Wei (2024), which highlighted four main digital affordances of ARVL and VRVL in gamified learning environments, namely, providing interactivity, offering multimodal learning materials, triggering engagement, and fostering immersion. Besides, this review also demonstrated other notable benefits for motivation, authenticity, personalized learning, and vocabulary depth, which can be supported by prior studies. For instance, Huang et al. argued that both AR and VR technologies provide students with interactive learning content, thereby enhancing their motivation to learn.

Also, Chen and Chan (2019) contended that integrating AR with vocabulary training can enhance students’ depth of understanding, advancing vocabulary learning beyond rote memorization. Legault et al.’s (2019) research showed that VR’s interactive and experiential qualities can deepen language acquisition by situating vocabulary in realistic contexts, thus facilitating better memory retention and understanding compared to traditional methods such as flashcards. Moreover, VR environments, especially those leveraging high-immersion technology, are noted to strengthen cognitive engagement, as learners become active participants rather than passive recipients of information. This engagement in turn enhances motivation and memory, thereby facilitating vocabulary learning (Song et al., 2023). However, AR/VR’s potential to enhance collaborative learning, support rich-context learning, and provide a sense of presence and scaffolding warrants further investigation.

Lastly, technical glitches, myopia, learning distraction, cognitive overload, motion sickness and dizziness, unfamiliarity with technical knowledge, and inaccessibility of the devices were mostly consistent with the systematic review by Cevikbas et al. (2023). Also, pedagogical concerns and VL-related challenges like short learning periods, the lack of affordability and accessibility, attention distraction, and insufficient instructions were mentioned in Haoming and Wei’s (2024) review.

5 Conclusion

Five primary features of recent ARVL and VRVL research were compared and analyzed in this systematic review: main characteristics, VL processes, VL effectiveness, main benefits, and limitations.

Initially, there is a greater scholarly interest in ARVL than VRVL. Most ARVL studies employed non-wearable tools such as smartphones and tablets (or a combination), whereas most VRVL studies used more immersive HMDs. Future research should consider wearable devices in ARVL studies to expand this dimension. Researchers in this review predominantly focused on higher education, followed by elementary education. By contrast, both ARVL and VRVL remain underexplored in high school, junior high, and preschool education. More specifically, VRVL in elementary and preschool settings and ARVL for junior high learners demand further investigation.

The distribution of sample sizes in ARVL studies is comparable to that in VRVL, with most studies categorized as “medium” in size. Most research in ARVL and VRVL emphasizes fundamental vocabulary recognition rather than the use of vocabulary in context, suggesting the need for future investigations into the form-meaning-use trichotomy of vocabulary knowledge.

A large proportion of VRVL studies offered a more complete multimedia VL experience by incorporating auditory effects, suggesting that VRVL provides a more multifaceted learning experience compared to ARVL. Furthermore, intentional learning methods are favored over incidental approaches in both fields, though incidental learning is somewhat more prevalent in VRVL studies.

Concerning VL effectiveness, a wealth of articles on both ARVL and VRVL indicate that AR/VR-supported learning settings can help learners achieve significantly greater vocabulary gains compared to traditional methods. However, fewer studies report that these technologies improve vocabulary retention, suggesting that vocabulary retention in ARVL and VRVL holds considerable potential for further research. Despite the overall numerical predominance of ARVL studies in this review, a greater number of VRVL studies address both vocabulary gain and retention, implying that VRVL’s impact on vocabulary outcomes may be more rigorously examined and potentially more effective than ARVL. More in-depth investigations into vocabulary retention in both ARVL and VRVL contexts are advised. Future research could enhance vocabulary learning by integrating ARVL and VRVL to develop a synergistic approach. The primary benefits of AR and VR in vocabulary learning include enhanced interactivity, increased engagement, multimodal learning opportunities, and immersive experiences. VRVL excels in creating immersive and authentic learning environments, while ARVL is particularly effective in enhancing learners’ lexical understanding.

The limitations of this review are, firstly, its relatively narrow scope, which covers only the last 5 years. However, as previously mentioned, ARVL and VRVL research only gained substantial academic attention since 2019. Accordingly, the authors limited the review to this period to capture the most recent insights. Secondly, the comparison of the VL effectiveness between ARVL and VRVL in this systematic review still lacks more quantitative syntheses as well as analyses; to increase the statistical power and to reduce bias, both comparative empirical studies and comparative meta-analyses on VL effectiveness and other factors between ARVL and VRVL are advised in future studies to generate more rigorous and robust conclusions.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

MZ: Data curation, Formal Analysis, Investigation, Visualization, Writing – original draft, Writing – review and editing. HH: Supervision, Writing – review and editing. MY: Supervision, Writing – review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This research was supported by the Faculty of Education, Universiti Kebangsaan Malaysia. (Grant Number: GG-2024-012).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Generative AI was used in the creation of this manuscript. We used ChatGPT to correct grammatical and syntactical errors, ensuring clarity and precision of the language. However, the originality and innovation of the ideas presented in this study are entirely the product of our team’s creativity and expertise. AI was used solely as a supportive tool to enhance the presentation of our concepts, not to generate the ideas themselves.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frvir.2025.1522380/full#supplementary-material

References

Afzal, N. (2019). A study on vocabulary-learning problems encountered by BA English majors at the university level of education. Arab World Engl. J. 10 (3), 81–98. doi:10.24093/awej/vol10no3.6