A multidisciplinary approach to understanding the technology, opportunities, challenges, and implications for society of the metaverse

Slater, Mel; Di Dalmazi, Michele; Friedman, Doron; Galissaire, Jessica; Isaac, Henri; Kobusinska, Anna; Lopez-Tarruella-Martinez, Aurelio; Lamberti, Lucio; Lucas, Jean-François; Niamut, Omar Aziz; Pan, Xueni; Prié, Yannick; Quent, Matthias; Ruiz Rodríguez, Raul; Sanchez-Vives, Maria V.; Skrzypczyński, Piotr; Steed, Anthony; Wannerberg, Petter

doi:10.3389/frvir.2025.1566419

REVIEW article

Front. Virtual Real., 27 October 2025

Sec. Virtual Reality and Human Behaviour

Volume 6 - 2025 | https://doi.org/10.3389/frvir.2025.1566419

This article is part of the Research TopicA Metaverse for the Good: Design, Application and UnderstandingView all 24 articles

A multidisciplinary approach to understanding the technology, opportunities, challenges, and implications for society of the metaverse

Mel Slater¹*

Henri Isaac⁵

Aurelio Lopez-Tarruella-Martinez⁷

Lucio Lamberti²

Jean-François Lucas⁴

Omar Aziz Niamut⁸

Xueni Pan⁹

Yannick Prié¹⁰

Matthias Quent¹¹

Raul Ruiz Rodríguez⁷

Maria V. Sanchez-Vives^12,13

Piotr Skrzypczyński¹⁴

Anthony Steed¹⁵

Petter Wannerberg¹⁶^†

¹Event Lab, Department of Clinical Psychology and Psychobiology, Institute of Neurosciences, Universitat de Barcelona, Barcelona, Spain
²Politecnico di Milano, Department of Management, Economics, and Industrial Engineering, Milan, Italy
³Advanced Reality Lab, Sammy Ofer School of Communications, Reichman University, Herzliya, Israel
⁴Renaissance Numérique, Paris, France
⁵Laboratoire DRM, UMR CNRS 7088, Université Paris Dauphine-PSL, Paris, France
⁶Laboratory of Computing Systems, Institute of Computing Science, Poznań University of Technology, Poznań, Poland
⁷Academic Chair for the Responsible Development of the Metaverse, Universidad de Alicante, Alicante, Spain
⁸The Netherlands Organization for Applied Scientific Research TNO, The Hague, Netherlands
⁹See VR Lab, Department of Computing, Goldsmiths, University of London, London, United Kingdom
¹⁰Nantes Université, École Centrale Nantes, Centre National de la Recherche Scientifique (CNRS), Nantes, France
¹¹Institute for Democratic Culture, Magdeburg-Stendal University of Applied Sciences, Magdeburg, Germany
¹²Institut d’Investigacions Biomèdiques August Pi i Sunyer, Barcelona, Spain
¹³Institució Catalana de Recerca i Estudis Avançats, (ICREA), Barcelona, Spain
¹⁴Institute of Robotics and Machine Intelligence, Poznań University of Technology, Poznań, Poland
¹⁵Virtual Environments and Computer Graphics Group, Department of Computer Science, University College London, London, United Kingdom
¹⁶RISE, Research Institutes of Sweden, Gothenburg, Sweden

Since late 2021, there has been increasing interest in the idea of the “metaverse.” This is a virtual, augmented or mixed reality system that potentially millions of people can simultaneously inhabit and where people can interact with one another in real-time, irrespective of the physical distance between them. It is heralded by some as the next-level world wide web that will use a 3D immersive interface, with the potential of mixing virtual and physical reality, with environments persisting over time and all interoperable between different platforms and devices. Nevertheless, there is still no true example of a metaverse, only a number of nascent examples. Here we discuss the technical foundations for such metaverses, the role of generative AI, and its possible impact on society—including education, healthcare and quality of life, business and consumers, the future of work, and its implications for democracy. We also discuss public perceptions of the metaverse, issues of governance and regulation, and its legal implications. Overall, while the metaverse may radically and positively alter the way that we live, we must take into account ethical and regulatory issues in order to avoid predictions of a dystopian future being realized. This study is, therefore, a multi-disciplinary introduction to the metaverse concept that may be useful to readers with many different interests.

1 Introduction

The evolution of the internet in conjunction with virtual, mixed, and augmented reality and the increasing power of AI have led to the potential emergence of an interconnected space of virtual experiences, referred to as the “metaverse.” Such a metaverse does not exist today, though there are various nascent examples (Supplementary Table S1). However, the concept of the metaverse is clear: a real-time, shared 3D space that represents the convergence of virtual and virtually enhanced physical reality. The metaverse is persistent through time so that the effects of actions taken by participants are maintained as they would be in physical reality. It is interoperable so that it is not limited to one particular set of hardware or software, but applications built on one type of system can interact in another type of system. For example, web pages are “interoperable” in this way—it makes little or no difference which platform is used to access them or how they are created. The same should be true of the metaverse. Recent surveys can be found in Mystakidis (2022), Wang Y. et al. (2022), and Ritterbusch and Teichmann (2023).

A note on terminology. While many writers continue to use the term “users” to refer to people engaging in immersive systems, we believe that “participants” is a more appropriate term most of the time. We do not “use” an immersive environment—we are “in” it, we participate in it. We no more “use” immersive environments than we “use” reality, except when we talk about using, for example, virtual reality for the simulation of, say, chemistry experiments.

Participants in the metaverse are embodied as avatars, which are their virtual representations through which they can act in the world and interact with others (Figure 1). Avatars are typically life-sized virtual bodies that will move synchronously with the movements of the participant and ideally through which people will feel touch and force on their own real bodies when these are applied to the virtual bodies. Through their avatars, they therefore have the ability to perceive and interact with the environment and others as if they were physically present, irrespective of their actual geographic location. This leads to the illusion of presence in the virtual space and co-presence with others.

Figure 1

Top left, a virtual reality environment with avatars interacting near trees and a screen. Top right, several people sit in a classroom setting wearing VR headsets. Bottom, two virtual avatars resembling historical figures are seated at a restaurant table, engaging in conversation.

Figure 1. Main components required for a metaverse. Top left: Ubiq system focuses on networking between remote participants (Steed et al., 2022). Top right: VR United system (Oliva et al., 2023) emphasizes avatars that resemble the participants—shown here is a European Project meeting with participants physically distributed across different countries, meeting together in a shared virtual reality. Bottom: Panel discussion distributed across three different countries, including one entirely virtual participant whose appearance and behaviour were controlled by a large language model (LLM) (Shoa et al., 2023).

“Presence” refers to the illusion of being in this environment, and there is a long history of research in this area (Ellis, 1991; Held and Durlach, 1992; Sheridan, 1992; Ellis, 1996; Sheridan, 1996; Sanchez-Vives and Slater, 2005). Originally considered the sense of “being there” in environments depicted by displays, it can also be considered on two axes: “place illusion” (PI), the illusion of “being there”, and also the illusion that what is happening there is really happening—referred to as “plausibility” (Psi) (Slater, 2009; Slater et al., 2022). PI relies on perceiving the world through natural sensorimotor contingencies (O'Regan and Noë, 2001a; b), using our bodies to perceive in much the same way as in physical reality, such as turning our head, bending down, turning around, looking under, reaching out and touching, and moving the head closer to a sound source to hear it better. In virtual reality, this leads to the illusion of being in the environment (Küçüktütüncü et al., 2025), and in augmented or mixed reality it leads to the illusion that virtual objects are present in the physical environment. Psi depends at least (i) on the environment responding to the actions of the participant (e.g., a virtual crowd parting to allow the participant to walk through); (ii) that there are spontaneous events that are not contingent on participant actions that refer personally to the participant (e.g., a member of the crowd smiles at the participant); (iii) on congruence. The latter refers to situations and events in the environment conforming to what would be expected were the events to take place in physical reality and where the participant is an expert. For example, doctors in a virtual surgery consultation would expect to find a functioning virtual computer on their virtual desk through which they can consult patient data, and Psi may be disrupted if this expectation is not met (Pan et al., 2016).

PI and Psi together lead people to act as if the virtual events and situation were real. When PI and Psi are applied to the representations of others (actually present remote participants or entirely virtual characters) then this may lead to the illusion of co-presence. A deeper discussion of the related concept of ‘coherence’ can be found in Skarbez et al. (2017) and Skarbez et al. (2018b); Skarbez et al. (2018a).

Participants’ avatars are life-sized virtual bodies, typically spatially coincident with their real bodies, and participants’ eyes in particular are coincident with the eyes of their avatars. So when participants look down toward themselves, they see the virtual body substituting their real body, moving synchronously and correspondingly with their real movements. Moreover, if an object collides with their virtual body, they should feel this on their real body if a haptics system is in use. Hence, besides PI and Psi, there is also the illusion of “body ownership”. This is where participants have the sensation that their avatar is their own body. It depends on the multisensory integration between movement, haptics, proprioception, and vision (Ehrsson, 2012; Blanke et al., 2015; Tacikowski et al., 2020). Throughout our lives, whenever we look down we see our own body, and when we move a limb we see our real limb move. The simplest hypothesis for the brain to adopt when the same happens in virtual reality is that this is our body. This goes back to the rubber hand illusion, where it was shown that synchronous tactile stimulation on a rubber hand and the out-of-sight participant’s real hand can lead to a shift of proprioception to the rubber hand, which feels “owned” as if it were their own hand (Botvinick and Cohen, 1998). This has been shown to operate also in VR (Slater et al., 2008) and also for the whole body (Petkova and Ehrsson, 2008; Slater et al., 2010).

There is another class of body ownership where the participant’s virtual body is not coincident with their real body. However, there is still multisensory integration where touch on the virtual body is felt synchronously on the real body or where movements of the real body cause synchronous movements of the virtual body. These give rise to “out-of-body” illusions, introduced in Ehrsson (2007) and Lenggenhager et al. (2007), each of whom used video of a person’s distant body perceived through a head-mounted display as the virtual body, and visuotactile synchronous stimulation, where tactile events on the virtual body are felt synchronously on the real body, inducing body ownership over when distant body. Such body ownership over a distant virtual body was also demonstrated in Slater et al. (2010) and Pomes and Slater (2013) who compared ownership induced by visuotactile and visuomotor stimulation, where there is synchrony between real body movements and the movements of the virtual body. They found that when a participant feels ownership over a distant virtual body, it induces a contradiction that the brain attempts to resolve by inducing a feeling of drift toward that body or the illusion that the body drifts toward their own virtual location.

Petkova et al. (2011) compared first-person (the virtual body is coincident with the real body) and third-person (over a distant body) perspectives and found that first-person perspective was more powerful with respect to body ownership. In the metaverse, the mode used is application-dependent and may even vary during the course of an experience. For example, Gorisse et al. (2021) first embodied participants suffering from mild paranoid ideation in a coincident virtual body that looked like the participant and with visuomotor synchrony. Later, those in an experimental group saw their virtual body independently walk away to interact with groups of virtual people, whereas those in a control group saw their virtual body walk away but randomly wander around without interacting with the groups of people. In measurements taken some days after the experience, those in the experimental group reported lower levels of paranoid ideation than those in the control group. This illustrates that the relationship between the virtual body and the virtual location of the participant, whether ego- or exocentric, depends on the application.

PI, Psi, and body ownership are illusions in the sense that the participant knows for sure that they are not “true” but nevertheless cannot avoid the sensations. People in the metaverse will “be there,” having the feeling that events occurring there are really happening, their virtual bodies are their own, and by inference, assume that the same is true for other actors with whom they share the same virtual space. This is even though they may perceive some others as representations of virtual agents or, indeed, mistakenly perceive embodiments of distant real people as AIs.

The development of the metaverse is expected to significantly influence various aspects of society, from communication and entertainment to education, work, and beyond (Bibri, 2022; Pu and Xiang, 2022). First, the metaverse holds immense potential for fostering inclusivity. By transcending physical location, it provides individuals across the world with equal access to an array of resources, education, jobs, and social opportunities. Second, the metaverse might lead to fundamental changes in education (Lin et al., 2022). People can visit and take part in events in Ancient Rome rather than just read about them and watch movies. Experiments in chemistry and physics can be engaged in interactively with others across the physical world, even where participation might be infeasible or dangerous in reality (Amirbekova et al., 2024). Each student can have their own unique view and interface yet nevertheless have experiences that are consistent with others. Second-language learning can be profoundly improved through real-time social interaction with real or virtual people and in appropriate settings (Lee, 2023). For example, a European learning Mandarin could do so in virtual Chinese settings and even appear to themselves to be Chinese when looking in a virtual mirror.

In the realm of business and commerce, the metaverse might offer innovative ways of interacting with clients, delivering products and services, and conducting business operations. It could lead to further growth of the digital economy, thus leading to emerging methods of business that are not conceivable today (Firmansyah et al., 2023).

The metaverse also offers the potential for improved quality of life and wellbeing. For example, it can facilitate effective medical rehabilitation by allowing stroke survivors to engage in motivating, personalized therapy sessions in virtual environments designed to adapt to their progress (Usmani et al., 2022). Furthermore, the metaverse can support mental health therapies by offering immersive cognitive–behavioral interventions, exposure therapy for anxiety disorders, and virtual social support networks that reduce feelings of loneliness (Massetti and Chiariello, 2023). Immersive fitness applications that incorporate gamified physical activities can encourage healthy lifestyles by making exercise more engaging and accessible for diverse populations. Additionally, the metaverse can promote mindfulness and stress reduction through virtual environments designed for relaxation, nature immersion, and guided meditation sessions, potentially contributing to better mental resilience and emotional wellbeing. The metaverse can also enhance social connections, enabling people to interact with and form relationships in immersive spaces. This can enhance a sense of community, diminish feelings of isolation, and enrich people’s social lives, particularly in an era where remote interactions have become increasingly common.

The metaverse can also enhance social connections, enabling people to interact with and form relationships in immersive spaces (Figure 2). This can enhance a sense of community, diminish feelings of isolation, and enrich people’s social lives, particularly as remote interactions have become increasingly common.

Figure 2

Two-panel image of a woman sitting in a modern living room. In the left panel, she sits with her arms down. In the right panel, she raises her right hand. Both panels feature wooden floors, a large window with blinds, a plant, a standing lamp, and a brown chair.

Figure 2. In mixed reality, a remotely located person can appear as an avatar and interact with you in your living room.

Moreover, the metaverse has the potential to reshape governance and civic participation. With the potential for more transparent, real-time, and inclusive interactions, the metaverse could serve as a platform for enhanced democratic engagement and policy-making, fostering a more participatory and responsive democracy (Hasan et al., 2023).

This paper explores the implications of the metaverse from various perspectives, including its ethical and legal aspects, its role in democracy, and applications in medicine, education and business, as well as its potential to improve quality of life and wellbeing, change the nature of work, and issues related to governance and regulatory issues. However, for every possible benefit, there will also be the possibility of substantial drawbacks. In the following sections, we consider these areas in turn—both the positive possibilities and the potential downsides—and their implications for a regulatory framework.

2 Foundations

2.1 Technical precedents and foundations

While the metaverse is today a set of somewhat different visions of how people will engage with each other in future through telepresence technologies, there are some clear technical precedents. The key ideas of interacting with multiple other participants in real-time in digitally synthesized environments can be traced back to work in military simulators. As flight simulators—first analog and then digital simulations—are often signposted in histories of XR technologies, a key reference point for metaverses is the linking of multiple simulators in systems such as SIMNET (Miller and Thorpe, 1995) and successor systems (Capps et al., 2000). Multiple participants engage in simulations designed to reproduce real-world scenarios. Some participants engage as individuals or teams in vehicles, while others would control or observe the situation.

The subsequent development of early distributed simulations into systems that are described variously as multi-participant systems, multiplayer games, collaborative virtual environments, or social virtual reality has been moderated by two main factors: the connectivity and available bandwidth of internetworking technologies, and the availability of real-time 3D computer graphics hardware. The early distributed simulations used bespoke user interfaces (including simulators) and dedicated networks. As the internet has become more widely available and bandwidth higher and as the majority of home computers, then mobile devices, gained the ability to perform 2D and then 3D graphics, a very broad array of prototypical metaverse systems have been developed (Singhal and Zyda, 1999; Steed and Oliveira, 2009).

Today, we tend to focus on the visual and audio experiences of the various prototypical metaverse systems. However, most platforms must also synchronize real-time messages between clients. The origins of these environments can be traced to early text-based communication systems such as MUDs (multi-user dungeons) and IRC (internet relay chat). Simple messaging mechanisms remain foundational for online presence and activity awareness today, with modern examples like WeChat and WhatsApp supporting scoped real-time messaging (e.g., group chats) and various degrees of activity notification (e.g., friend status updates).

Any metaverse system will inherently incorporate text communication features. This integration is necessary not only for interoperability with existing communication platforms but also to manage participants’ awareness and presence across diverse systems.

As noted previously (Steed, 2024; Steed and Oliveira, 2009, Chapter 12), participant awareness can be conceptualized into three broad levels.

• Primary awareness, where participants are co-located and are proximate to each other. We would expect detailed animation and voice and the ability to jointly interact in a common environment.

• Secondary awareness that a participant is somehow in a reachable space, such as within the scope of a server or server network. This now means that rendezvous is possible and can be achieved easily, for example, by virtual travel.

• Tertiary awareness of the activity anywhere in the population of participants with which one is connected (e.g., friend networks), scoped by privacy policies. This facilitates the coordination and rendezvous of participants.

Thus, a tertiary awareness network would at least support the ability to send text and other status messages to and from other systems. We might expect, for example, a person present in a specific environment in a metaverse system to be able to share awareness and messages with others with whom they do not share immediate or reachable awareness at that moment. Thus, a key part of an ambient awareness system is a mechanism to locate and thus join other participants to move to a higher level of awareness.

At the highest level of awareness, we rely on voice, possibly video, of the participants and on other shared media such as streamed avatar representations in virtual environments, which we will discuss shortly. Voice and video systems started to be more common on the internet in the early 2000s, with tools such as Skype and a plethora of similar tools (Singh et al., 2014). Today, these have evolved into tools such as Microsoft Teams and Zoom that we came to rely upon during the COVID-19 pandemic. The key enabler for both was increasing bandwidth available on networks. While the 2010s saw the rise of streamed media, a key point of modern audio–video systems is that there is support for many clients to engage in both sending and receiving audio and video, requiring significant infrastructure support.

Alongside audio and video, we can chart the rise and increasing capabilities of collaborative virtual environment (CVE) systems (Benford et al., 1995; Churchill and Snowdon, 1998). We use this early, and now uncommon, term because it often encompassed broader experiences than just immersive systems. Thus, a common thread of history would note as early social games Habitat (Morningstar and Farmer, 2008), which provided a flat 2D experience but with multiple rooms, and other early systems such as massively-multiplayer online role-playing games (MMORPGs) Meridian59¹ and Ultima Online². Such games, with back-stories, persistent player characters, persistent worlds, and participants with different roles, hint at some of the distinguishing features that we will find in metaverse systems compared to other real-time communication systems. These games also highlight the difference between primary and secondary awareness systems—you might be in the same environment or on the same screen as other participants, but you can switch your group by traveling in the virtual environment.

This then leads to questions of scalability: given that you can only have useful primary awareness with a small group of participants, what is the pool of people with whom you can have secondary awareness? Some systems jump straight from effectively tertiary to secondary. That is, there is a fixed set of people on a system or in a virtual environment, but using other tertiary awareness systems means you can rendezvous with friends or join a lobby to then launch a primary awareness experience. Games such as Quake were hosted by a server. Gamers would know the addresses of servers or use a separate process such as GameSpy³ to find servers. Once connected to the server, all players could interact, although players would usually be spread out over a map and interacting with only a rapidly changing subset of players. The virtual environments in MMORPGs are much larger, and thus interaction with players was largely scoped to those who were proximate. Thus, they might not have been aware of players in other parts of the world. Moving from such reachable awareness to immediate awareness would involve travel through the world.

Finally, we note that collaboration has been a key feature of many early virtual reality systems. The NASA Ames VIEW Workstation is one of the seminal early immersive systems, supporting multiple participants (Fisher et al., 1988). In the early 1990s, Virtuality’s VR arcade games supported multiple participants, although over custom local networking⁴. At a similar time, early research platforms such as DIVE supported spatialized audio, video streaming, and immersive interfaces, but this was only possible because they were exploiting bandwidth and networking features that were only available on academic backbone networks of the time (Carlsson and Hagsand, 1993). The key networking features that distinguish VR systems from games of a similar period were their requirements for relatively low latency and faster update rates.

2.2 Features

Prototypical metaverse systems have developed alongside media such as multi-player games, video conferencing, and simulation systems, drawing on a very broad range of technologies and systems. In this section, we detail some of the distinctive features that metaverse systems need to consider.

The most obvious feature is that they must maintain a consistent shared state across distributed participants who are interacting in a shared 3D space or virtual environment. Maintaining state involves keeping all participants updated with changes to the environment, such as object positions, actions, and events. One can consider sharing full environment descriptions, such as their full scene-graphs (Zeleznik et al., 2000). However, there is in most existing systems a layer of abstraction between high-level events (e.g., an explosion) and the low-level visual representation which might vary between different platforms and rendering systems. Thus, consistency needs only to be maintained with sufficient detail that participants can collaborate successfully (Steed and Oliveira, 2009, Chapter 10). This is similar to games, especially since some events are high-priority (e.g., objects that the player is carrying) and others might be low-priority or only synchronized sporadically (e.g., background weather simulation).

To expand on this point, metaverse environments have quite complex actions that not only need to be synchronized but must be handled with proper authority. That is, events might be complex (e.g., with physics effects), but any concurrency must also be handled properly (e.g., two participants trying to control the same objects). This is typically handled in games by having an authoritative server, but this means that the server is a point of weakness. With metaverse systems, some data sources can usually just be streamed (e.g., tracking one’s own avatar is usually considered to be authoritative and thus can just be streamed to others); however, with complex side effects of actions (e.g., bowling a ball at some pins), the system must decide who or what is authoritative and over what time period.

Related to these features is that metaverse systems need to be built around considerations of latency tolerance and compensation. Latency means that different clients might constantly see inconsistent situations. Streamed media will arrive with some delay and potentially from different sources. The system must decide, as best it can, how to extrapolate or interpolate data to achieve the most consistency. Sometimes, local extrapolation can fail, and thus consistency needs to be restored. For example, the underappreciated part of the dead-reckoning algorithm is the restoration of consistency when the extrapolation of physics bodies is wrong. While audio–video systems do degrade under latency, they have looser requirements over the timing of delivery.

Perhaps the most distinguishing feature of proposed metaverse systems in comparison to today’s social VR systems is scale. The assumption is that very large numbers (millions) of participants will connect to some tertiary awareness system, allowing them to join spaces with thousands of others (secondary awareness). Given that a participant can only have primary awareness of a small number of others, we can use scalability mechanisms to manage messages and streams. This problem is sometimes known as “interest management” (Morse et al., 2000) and has been tackled in many ways. As discussed in the previous section, it is common in many games for there to be a tertiary awareness mechanism that allows users to join specific servers that can host a maximum number. We can see a similar mechanism in Rec Room⁵, where participants can only join public rooms if participant capacity is not met. Alternatively, they can create with their friends a private room with a fixed number of participants. Within a metaverse system, we would expect to find such mechanisms but also larger seamless environments where interest is moderated by moving around the space, with the infrastructure supporting the dynamic formation of groups and thus the underlying message passing and streaming. Early distributed simulations pioneered some of the techniques in this space (Capps et al., 2000). While it remains a topic of commercial and academic interest, there are presently no large-scale open protocols for this.

Scale is most often considered a function of the bandwidth available on the network. Routing traffic through a single point will create a bottleneck at that point. For streaming data, the overhead of routing is quite small, but for event and interactive behaviors, servers need to make decisions or arbitrate discrepancies, thus causing further bottlenecks. Friston et al. (2023) found single server bandwidth to be around 50 participants, consistent with commercial systems. Peer-to-peer connectivity is thus attractive and is broadly supported through technologies such as WebRTC (Introduction to WebRTC Protocols - Web APIs | MDN, 2023). However, this means that each client is a potential bottleneck as each needs to ingest all messages and perform some of the functions of determining the evolution of the interactive simulations. Thus, it is very likely that metaverse systems will use some form of hybrid method. At a very gross level, peer-to-peer is more appropriate for primary awareness because of its lower latency, whereas a reachable awareness fits a distributed server model with dynamic formation of groups (e.g. Duong and Zhou, 2003), and ambient awareness fits the concept of a centralized database.

Another important distinction of metaverse-like systems is the need to store persistent state. This might be participant avatar choices, the score in games, but also authored content. There has been quite a lot of discussion in the early research on metaverses about the portability of content across different environments (Jerome, 2024). This is addressed in many games through the provision of centralized servers. For example, analysis of the Second Life and Open Sim platforms highlights different needs for persistency and authority for different types of assets (Makineni et al., 2009). This is one area where it is proposed there is a role for fully distributed datastores such as blockchains (e.g., Casale-Brunet et al., 2023), though it remains to be seen how persistency can be scaled to support the likely bandwidth of changes required.

Finally, we note that despite the focus on distributed computation, we do need to bear in mind that clients themselves need to manage complex resources, including streams and increasingly large assets, render to multiple display types, and sense the participant and their environments. They need to do this at high frame rates. Current consumer hardware such as Meta Quest 3 does a remarkable job in managing all these tasks, but it can only run for a couple of hours on its battery. Moving forward, energy-efficient rendering and simulation methods are needed. It may be that server-side or network edge simulation and rendering will have a significant role if latency concerns can be alleviated (Hou et al., 2017). The opportunity is that the battery capacity of the device can be significantly extended by offloading computing resources. This is both the rendering that needs be done per device and simulation that can be performed and shared between multiple devices. In turn, this means that the environmental impact of the device construction can be lowered.

2.3 Metaverse components

Participants may access the metaverse through many different types of immersive technologies. “XR” is an umbrella term that encompasses current immersive technologies, including virtual (VR), augmented (AR), and mixed (MR) reality (Schlichting et al., 2022). XR represents the full spectrum of environments where real and virtual worlds converge. It acts as the foundational framework within the metaverse, facilitating seamless transitions and interactions across various levels of immersion (Guan et al., 2022).

VR immerses participants in full surrounding virtual environments that isolate them from the physical world. Through VR headsets and motion controllers, participants can explore, interact with, and manipulate the environment, leading to fully immersive experiences. AR, on the other hand, overlays digital information onto the physical world, enhancing real-world environments with virtual elements. Participants typically experience AR through smartphones, tablets, or AR glasses, making it a more accessible form of immersive technology. AR’s ability to blend the physical and digital worlds is particularly useful in navigation, education, and industrial applications. The HoloLens and Magic Leap are prime examples of AR devices that enable people to interact with virtual objects while maintaining awareness of their real surroundings. MR combines elements of both VR and AR, enabling real-time interaction between physical and virtual objects. This technology allows participants to manipulate both types of objects seamlessly within a shared environment, thereby enhancing participant engagement (An, 2023). MR is particularly effective for collaborative tasks, complex simulations, and interactive learning. MR is typically achieved by the combination of video mounted on a head-mounted display that captures the real environment overlayed and aspects of the environment that are entirely virtual. For example, in the metaverse you may be in your own living room, seeing it through the head-mounted display video, but also see another remote person displayed in the room (e.g., Figure 2).

A critical component of XR’s immersive experience is the accuracy and responsiveness of head tracking technology. The accurate positioning of head-mounted displays is essential for maintaining immersion and ensuring that the displayed digital objects remain stable and interact correctly with participant movements, especially in precision-demanding applications like engineering (Reimer et al., 2021). Significant advances in VR head tracking have focused on improving accuracy, reducing latency, and enhancing participant experience (LaValle et al., 2014; Gourlay and Held, 2017). These developments include low-cost lighthouse-type systems (Ng et al., 2017), outside-in tracking with RGB-D cameras and facial features (Amamra, 2017), highly accurate visual–inertial tracking (Fang et al., 2017), and large-area headset localization using the Global Navigation Satellite System (GNSS) (Humphreys et al., 2020). Commercial VR headsets, such as those from Meta, often utilize inside-out vision-based positioning or ego-motion tracking, which involves environmental mapping (Gourlay and Held, 2017). While these solutions enable effective tracking, they also require substantial computational power and can struggle in environments with repetitive textures, rapid motion, significant occlusions, or poor lighting (Fu et al., 2020).

The accuracy of tracking and positioning a participant’s head tracked with inside-out tracking on the recent Meta VR headgear was evaluated experimentally using a very accurate optical motion capture system (Banaszczyk et al., 2024; Figure 3) which found that the devices exhibited high positioning accuracy.

Figure 3

Three side-by-side images show hand tracking in different tasks: soldering, assembling pipes, and device repair. Each activity features hands with colorful markers indicating digit positions.

Figure 3. Measuring the accuracy of inside-out head tracking.

2.3.1 See-through and passthrough devices

The technologies underlying XR, VR, and AR also include see-through and passthrough devices. See-through devices, such as AR headsets or glasses, allow people to view the physical world directly through transparent lenses while overlaying digital information. Passthrough devices, typically VR headsets with external cameras, capture the physical environment via video and display it within the headset, combined with virtual elements. This technology allows participants to switch between fully immersive virtual experiences and awareness of their physical surroundings, enhancing safety and convenience during prolonged use. Meta Quest 2 and Meta Quest Pro, with their enhanced passthrough mode, and the Varjo XR-3 are examples of how this technology is implemented. The Apple Vision Pro allows users to continuously transform between seeing the real space in which they are located and the virtual environment by turning a button.

2.3.2 Hand tracking

In the context of the metaverse, hand pose estimation (hand tracking) plays a pivotal role in creating immersive and interactive virtual environments. As the metaverse aims to seamlessly integrate digital and physical worlds, it is essential to allow participants to interact naturally with virtual objects, much as they would with objects in the real world. Hand tracking is crucial for participant interaction in both AR and VR, which can be part of the metaverse. Accurate hand tracking enables the intuitive manipulation of virtual objects, essential for applications in education, training, gaming, and various interactive tasks. Two main approaches are used for hand-based interaction: tracking dedicated VR handheld controllers and direct vision-based hand tracking without controllers.

• Dedicated VR handheld controllers (such as those from Meta Quest or HTC Vive) typically use embedded sensors like accelerometers and gyroscopes, often combined with external cameras or infrared tracking, to determine their position and orientation. These controllers usually incorporate physical buttons, thumbsticks, or haptic actuators, allowing precise input and providing immediate tactile feedback.

• Vision-based hand tracking, on the other hand, relies on cameras (RGB or depth) to visually capture a participant’s hands, estimating finger poses without requiring any physical device. This approach is present in some VR systems, such as Meta Quest 2 and Meta Quest Pro, which offer controller-free hand tracking. Apple Vision Pro dispenses with controllers altogether, and all interactions are via actual hand movements (or the eyes). AR glasses typically implement direct hand tracking with the sensors of the headset.

Dedicated VR handheld controllers offer several advantages. They require less fine finger movement to invoke actions, provide a physical object to hold during virtual interactions (Luong et al., 2023), and yield immediate haptic feedback (e.g., through mechanical buttons, or vibrations), thus enhancing realism and usability compared to bare-hand tracking (Schneider et al., 2017). This immediate feedback allows participants to better perceive the success of actions, such as pressing virtual buttons or grasping objects. Some findings suggest that users of recent VR headsets prefer handheld controllers over bare hand tracking (Hameed et al., 2023; Luong et al., 2023).

In contrast, bare hand tracking provides more naturalistic interaction and may feel more intuitive for tasks involving gesture-based controls. Devices like the Meta Quest 2/3, Meta Quest Pro, and Apple Vision Pro use onboard cameras to track the hands directly, with, as mentioned, the Apple Vision Pro integrating eye tracking and hand tracking for a controller-free experience (Huang et al., 2024). The goal is to make interactions feel as natural as possible, bridging the gap between the virtual and real worlds and enabling accurate tracking of a participant’s hands and fingers motion, which can be essential in applications such as rehabilitation (Juan et al., 2023).

However, bare hand tracking faces challenges. It often struggles during manual tasks that involve the detailed manipulation of virtual or physical objects requiring fine motor control—such as typing on a virtual keyboard or assembling small components. For example, holding and operating a virtual screwdriver using only hand tracking can result in inaccurate or unstable input compared to using a handheld controller (Laukka, 2021).

Occlusions present another major challenge for direct hand tracking. Fingers frequently block each other from the camera’s view (self-occlusion), or one hand may occlude the other, leading to inaccuracies in pose estimation. Occlusions caused by manipulated objects further complicate hand tracking during interactions. For example, hand pose estimation can suffer during complex manual tasks like assembling virtual components (Mueller et al., 2017).

Despite extensive research, direct hand tracking on devices like the Meta Quest 2 (Abdlkarim et al., 2024) or HoloLens 2 faces challenges, particularly during manual tasks (Tadeja et al., 2023). Current VR and AR devices allow tracking of bare hands with acceptable accuracy, but its robustness can be affected negatively while performing specific scenarios, such as wearing technical gloves or using tools. Figure 4 shows an example where the tracking of hands from the Microsoft HoloLens2 is negatively affected while operating plastic pipes that appear to have color and texture similar to the hands.

Figure 4

A person wearing specialized goggles is seated in a room surrounded by equipment on tripods. The setup appears to involve cameras or sensors, possibly for motion capture or virtual reality. The room contains desks, computers, and other miscellaneous equipment.

Figure 4. Hand tracking of bare hands can be less effective when devices are operated that have similar color and texture to the hands.

Recent advances in machine learning and computer vision have improved video-based hand tracking. Techniques that leverage deep learning, such as convolutional neural networks (CNNs) and attention mechanisms, have enhanced the precision and efficiency of hand-tracking algorithms. These advances facilitate more natural and responsive interactions in the metaverse, enhancing participant experience and expanding the potential applications of AR and VR technologies (Wang, 2024).

Most existing methods operate on depth images but struggle in practical setups, revealing performance shortcomings. However, the idea of “lixels” has been introduced, which are line segments or rays that carry information such as direction, position, or feature embedding. Instead of sampling points in a pixel grid (2D), lixels are 1D structures such as oriented lines in 2D or 3D space. In the realm of RGB camera-based hand pose estimation, methods such as Image-to-Lixel Mesh Net and Neural Voting Field (NVF) have demonstrated superior accuracy through novel representations like lixel meshes and dense 3D point-wise voting (Moon and Lee, 2020; Huang et al., 2023). Separating joint localization and recognition tasks has achieved state-of-the-art performance with fewer model parameters (Hampali et al., 2022).

Methods like the Image-Point Cloud Network (IPNet) incorporate both 2D image structures and 3D geometry to improve performance on challenging benchmarks (Pengfei et al., 2023). However, benchmark results often do not translate well to real-world AR applications due to sanitized datasets that do not reflect practical scenarios (Wang S. et al., 2022; Woo et al., 2023).

Mid-air gestures, essential for interacting with both physical and digital objects, include moving items, pressing buttons, and using tools. These interactions align with the requirements of applications in educational and training contexts, particularly for AR systems (Lee, 2012). This necessitates highly accurate and responsive hand-tracking technologies. One significant challenge in the metaverse is the diversity of interactions and environments that involve complex tasks requiring fine motor skills, such as virtual sculpting (Pascucci et al., 2024), or broader actions like navigating virtual spaces (Zhang et al., 2017). Consequently, hand-tracking systems must be versatile and robust across various scenarios.

Further research underscores the significance of hand tracking in the metaverse. A gesture interface designed by Cho et al. (2022) links real hand gestures to various virtual actions, enhancing the immersive experience. The integration of visual elements and participant-centric exploration in the metaverse was discussed by Zhao et al. (2022). Advances in hand gesture recognition for metaverse applications have been reviewed by Duan et al. (2023), emphasizing the role of wearable sensors. Challenges and innovations in freehand 3D model editing within virtual environments, considering head movements and optimizing hand gestures, have been highlighted by Lam et al. (2022). The development of full hand tracking for natural interactions in the metaverse has been emphasized by Mystakidis (2022).

Consequently, robust hand tracking in real-world metaverse applications remains an open research challenge.

2.3.3 Haptic feedback

Although the capacity for controller-free interaction is becoming an essential part of most XR devices, the controllers themselves are unlikely to completely disappear. One of the reasons is that controllers provide haptic feedback, which is a crucial part of generating realistic and immersive experiences.

We explore the real world by touching and grasping. Our skin is not only a separator between ourselves and the world, but it also includes many different types of receptors. When we touch an object, the physical properties of the object are translated into signals that are received by our brain and interpreted as different sensations. There are different ways in which we sense objects with our skin: pressure, vibration, texture, and temperature.

The process to simulate the sense of touch with computers is often referred to as “haptic rendering,” and devices that provide this feedback, such as VR controllers, can be called “haptic displays” (Choi and Kuchenbecker, 2013). Those haptic displays can be categorized as grounded, such as PHANTOM (Salisbury and Srinivasan, 1997), mobile/wearable, such as exoskeleton gloves (Blake and Gurocak, 2009), or more commonly, the hand-held controllers that come with most VR HMDs (Meta Quests, Pico, and VIVE). This contrasts with the Apple Vision Pro which, as we have seen, relies completely on actual hand tracking in combination with eye tracking, with no controllers.

The way most VR controllers provide haptic feedback is through vibration, which is enabled by vibrotactile actuators embedded in those controllers. These are very similar to those that come with consumer smartphones. Choi and Kuchenbecker (2013) review different types of actuators. These can be programmed to operate for a certain period and at a specified frequency and amplitude, generating a variety of haptic experiences that represent real-world interactions such as grabbing an object or touching a surface. Those different haptic sensations, although subtle, play an important part in the participant experience in VR, allowing participants to more intuitively navigate the virtual environment.

In the context of the metaverse as a platform where participants socialize with each other, it is important to understand that touch serves a crucial function in our everyday social interactions. Michael Banissy (Banissy, 2023) has demonstrated how “handshakes, hugs, and high-fives” or even just a simple pat on the shoulder could strengthen the connections we have with each other. Although the haptic sensation of a hug can only be rendered via devices such as a VR Haptics Suit, it is relatively easy to simulate the feeling of a virtual high-five by triggering the vibrotactile actuators in consumer VR controllers. For a review on technology for social touch, see Huisman (2017).

2.4 Relation with the metaverse concept

The integration of XR, VR, and AR technologies within the concept of a metaverse profoundly shapes how people experience and interact with digital content. VR offers deep immersion and extensive virtual interaction capabilities, creating rich, digital audio-visual experiences that entirely replace the real world. This immersion is a key feature of human-computer interaction (HCI), which serves as the gateway to the metaverse, offering participants immersive digital environments through VR headsets (Yang et al., 2023). A composite example of several metaverse applications involving face-to-face communication is shown in Figure 5.

Figure 5

A collage of images depicting virtual and augmented reality scenarios in various environments. People are seen wearing VR headsets in diverse settings, such as a restaurant, an office, and a gaming area. Some images show virtual diners at tables, others highlight individuals immersed in virtual experiences. There are also scenes of people operating VR equipment in an event space, interacting with digital interfaces, and a virtual chess game.

Figure 5. Composite image of many metaverse examples; virtual meetings including during the pandemic, demonstrations at a conference where the demonstrator is thousands of km distant, playing chess together, and many other possibilities.

AR and MR, by contrast, enhance real-world environments with interactive digital elements, allowing for a blend of physical and virtual experiences. This blending is particularly relevant in applications where real-time interaction between participants and virtual objects is necessary, such as in video games or collaborative workspaces (Chantziaras et al., 2021). AR’s integration with everyday devices like smartphones makes it more accessible, while VR and MR require specialized equipment but provide more profound immersive experiences.

However, the widespread adoption of these technologies also raises concerns about safety and comfort. See-through devices offer better situational awareness, reducing the risk of accidents during use, while passthrough devices strike a balance between immersion and awareness. Despite these advantages, challenges remain, such as the physical strain associated with prolonged use of VR headsets, the potential for motion sickness (Chang et al., 2020), and psychological effects due to the immersive nature of these technologies (Chan et al., 2023).

3 AI and the metaverse

Artificial intelligence (AI) has recently undergone a profound transformation, propelled by innovations such as deep neural networks (DNN), large language models (LLMs), and emerging multimodal foundation models. We speculate that these developments will serve as a pivotal catalyst for the widespread adoption and maturation of the metaverse. In this section, we focus on two key aspects of AI in the metaverse—its role in enabling more intuitive content generation and the creation of lifelike virtual humans—both of which are central to shaping immersive, persistent, and socially vibrant virtual environments (Kaplan and Haenlein, 2020; Dwivedi et al., 2021). These focal points, however, represent only a subset of the many intersections between AI and the metaverse. For example, machine vision is key to developing augmented reality. Additional avenues include the dynamic adaptation of extended reality (XR) content in response to individual participant behaviour (e.g., personalized learning tasks and context-aware entertainment experiences), the orchestration of multimodal interactions across virtual and augmented displays, and the development of intelligent frameworks to manage large-scale, distributed XR infrastructures.

Generative models remain computationally intensive, demanding substantial energy and hardware resources—a constraint that affects both operational costs and environmental sustainability (Strubell et al., 2020; Patterson et al., 2021). Addressing this bottleneck is now a priority across the AI community, driving work on model compression, distillation, sparse and low-rank architectures, hardware-aware training, and greener data-center infrastructure (Henderson et al., 2020). Meaningful progress in these areas is likely a prerequisite for scaling generative AI from isolated demonstrations to the persistent, real-time content production that a fully realised metaverse will require.

3.1 Automated content generation

A pivotal factor behind the widespread adoption of early internet ecosystems was user-generated content (UGC). In the Web 2.0 era, UGC initially took the form of user-created websites and text posts, then expanded into images and videos as smartphones, and social media platforms became pervasive (O'Reilly, 2007; van Dijck, 2009; Kaplan and Haenlein, 2010). However, crafting three-dimensional (3D) experiences and interactive virtual worlds is orders of magnitude more complex than simply uploading a video or posting text; the metaverse’s potential will therefore require lowering the barriers to 3D content creation. AI-based tools promise to meet this need by automating and streamlining the design pipeline. Currently, several complementary workflows are emerging.

3.1.1 Text-to-XR

Users may soon be able to automatically convert text descriptions (prompts) into complete 3D scenes, both static and animated. The rapid progression of multimodal generative models signals a future where simple textual descriptions can effortlessly produce tailored 3D scenes and interactive scenarios (Ramesh et al., 2022; Tang et al., 2023). Key to these developments is the emergence of various generative frameworks, including generative adversarial networks (GANs) (Goodfellow et al., 2014; Karras et al., 2019) and variational autoencoders (VAEs) (Doersch, 2016); more recently, diffusion models (Ho et al., 2020) have been demonstrated to be superior. These methodologies leverage textual prompts to synthesize complex virtual assets, streamlining production pipelines and reducing the specialized skillsets historically required for XR content development. Although current text-to-3D pipelines remain constrained by limited fidelity and complexity, recent advances—echoing earlier leaps in text-to-image and text-to-music generation—suggest that these systems will soon offer significantly enhanced quality. Consequently, novice users may supply only a short textual prompt to generate an initial immersive environment, while expert creators can employ AI-driven code generation (Section 3.1.3) and sophisticated modelling tools to produce intricate, fully interactive worlds.

Generative models have already proven effective across a range of applications, including the creation of realistic 3D human figures (Kanazawa et al., 2018), dynamic, object-aware landscapes (Nguyen-Phuoc et al., 2020), and various complex structures from textual descriptions (Sanghi et al., 2022). The integration of differentiable rendering algorithms with explicit, implicit, and hybrid scene representations has further enhanced the fidelity, scalability, and adaptability of generated content (Liu et al., 2019; Mildenhall et al., 2021). This approach ensures that the resulting 3D models, facial animations, and gestures closely align with corresponding textual inputs, as demonstrated in the generation of high-fidelity facial animations suitable for multiple avatars (Lombardi et al., 2018), text-to-gesture systems (Ginosar et al., 2019), and infrastructures supporting text-to-3D texture production (Michel et al., 2022). Together, these advances lay a robust foundation for more intuitive, accessible, and expressive content creation workflows essential for realizing the full potential of the metaverse. However, more effort is required to make these into efficient easy-to-use pipelines; Figure 6 illustrates a combination of standard 3D meshes with an AI-generated 2D skybox background.

Figure 6

A game environment depicting a rundown urban area with buildings in disrepair. A small kiosk labeled

Figure 6. Snapshot from VR art “Destiny” ⁶^,⁷. The figure shows elements from a gas station that is rendered off 3D objects, whereas the background is a 2D AI-generated skybox (BlockadeLabs).

Several projects propose to aid the development of interactive game narratives, including Expressionist (Ryan et al., 2016) and Comme il Faut (McCoy et al., 2011). Recent work has also begun leveraging LLMs to automatically generate scenes and content. Chen et al. (2024) developed a tool that facilitates VR storytelling by mapping events to visuals and animations, utilizing action-trigger mechanisms. Other platforms also explore related challenges. For instance, Beinema et al. (2021) present Agents United, an open platform for multiagent conversation systems. Collectively, these advances illustrate the growing sophistication and diversity of tools aimed at streamlining and enriching narrative and content creation processes in immersive environments.

3.1.2 Image-to-XR

A second avenue through which AI can facilitate user-generated XR content is the automatic conversion of conventional images and videos into 3D content. Recent years have witnessed considerable advances in methods that take one or multiple 2D inputs—such as photographs or video footage—and reconstruct accurate 3D models of scenes, objects, or human figures (Nguyen-Phuoc et al., 2020; Mildenhall et al., 2021; Sanghi et al., 2022). Techniques like neural radiance fields (NeRFs), multi-view stereo (MVS), monocular depth estimation, and Gaussian splats, have all contributed to the gradual convergence of computer vision and graphics, enabling the generation of intricate and lifelike 3D scenes from relatively sparse inputs (Liu et al., 2019; Mildenhall et al., 2021). As a result, users with no specialized modeling skills can produce workable 3D assets simply by uploading their own image and video media, drastically reducing the barrier to entry for content creation in XR settings.

These underlying technologies are closely related to real-time reconstruction methods used in AR, where the system must analyze and interpret the physical environment on-the-fly to overlay virtual elements accurately. In contrast, generating 3D content from static or offline sources is often more straightforward since it does not require on-the-spot localization or dynamic alignment with a user’s physical surroundings. Essential components like hand and body tracking also play a vital role here. By harnessing pose estimation and skeletal tracking frameworks, AI can seamlessly integrate user movements into the reconstructed environment, adding yet another layer of realism and engagement (Kanazawa et al., 2018). Thus, we anticipate that AI will first be used to facilitate content creation and only later be used for robust real-time tracking and scene understanding for the public.

3.1.3 Text-to-XR-code

Users will soon be able to automatically convert textual descriptions to dynamic and interactive 3D scenes, including interaction code (e.g., in Unity C#). For example, Li et al. (2009) applied text-to-scene creation techniques with integration into digital content creation tools such as Autodesk Maya. Their pipeline parses free-form text into structured data, including skybox configuration, 3D models, scale, position, code, and animation directives. LLMs have made the development of such systems much easier; for example, Pollak et al. (2023) use LLMs to convert natural language scripts into structured JSON code. Such developments are expected to result in both accelerating the development times of experts as well as lowering the bar for XR content creation.

3.2 AI-based virtual humans

Contemporary AI models can now synthesize believable, context-aware virtual agents that not only appear photorealistic but can also respond adaptively to user inputs—Swartout et al. (2006) provide an early review of the challenges involved. Both verbal and non-verbal communication modalities must be carefully orchestrated for these agents to engage naturally with human users. These agents should be able to engage in the equivalent of face-to-face dialogue with real humans, but they should also be able to take part in group discussions (Figure 7).

Figure 7

A 3D-rendered scene depicts four people seated around a restaurant table with white tablecloths. They appear engaged in conversation while meals are served on red trays. The setting features stone walls, wooden flooring, and other tables with similar setups in the background.

Figure 7. AI-controlled virtual human, represented as Albert Einstein, taking part in a multi-user discussion in a virtual restaurant. The virtual agent listens to the conversation and is expected to respond appropriately.

Considerable research has addressed dialogue systems over the years, ranging from purely text-based to audio-centric interactions and embodied virtual agents. Nevertheless, creating animated virtual agents that can participate in fluid conversation remains a multifaceted challenge. Core technical hurdles include robust speech recognition, effective natural language understanding and management, speech generation, turn-taking, and the alignment of verbal content with synchronized non-verbal cues such as facial expressions, gaze, and gestures. Foundational work by Cassell et al. (2000) laid the groundwork for integrating speech and gesture, illustrating that speech–gesture coordination is critical for the perception of a coherent, believable agent. Progress with deep neural networks (DNNs) allowed for non-incremental progress, as outlined in the next subsections.

Integrating these advances within immersive XR environments—encompassing VR, AR, and MR—introduces further challenges into agent design, development, and evaluation. The evaluation of non-immersive virtual agents, displayed on flat screens frequently focuses on photorealism. Immersive VR scenarios, however, are often evaluated for the sense of presence, or behavioral realism (Slater, 2009). In these contexts, subtleties of proxemics and eye gaze become paramount, influencing user perceptions and social presence (Bailenson et al., 2004). Achieving high levels of behavioral fidelity can be even more challenging when integrating LLM-based virtual agents into live AR settings, as virtual humans must convincingly inhabit the user’s physical environment and dynamically adjust to its evolving context (Figures 7–8).

Figure 8

Three people are standing indoors. The individual on the left wears a light jacket over a black shirt. The middle figure appears to be a robot with a human-like appearance, wearing a white shirt. The person on the right is dressed in a black shirt. There is a computer and a colorful wall display in the background.

Figure 8. Two humans and an augmented virtual human in a physical workspace, as seen through Apple Vision Pro device.

3.2.1 Dialogue

A range of methodologies have been explored to manage dialogue interactions within VR. One common experimental technique is the Wizard-of-Oz paradigm (Dahlbäck et al., 1993) in which human operators discretely control a virtual human’s verbal responses. This approach has been applied effectively in numerous studies (e.g., De Rosis et al., 2003; Nakash et al., 2022) to investigate user reactions, conversational flow, and social dynamics, all without the complexity and unpredictability of fully autonomous systems.

Another prominent strategy involves structured dialogue approaches that separate natural language understanding (NLU) from response generation. In such frameworks, advanced machine learning methods parse user utterances into discrete semantic “intents” which are then mapped to scripted responses. This architecture allows for more controlled and predictable interactions. For example, Traum et al. (2015) demonstrated the viability of this approach in their reconstruction of Holocaust survivor testimonies, where the greatest technical challenge was developing an NLU component capable of mapping diverse, open-ended user queries to the most contextually appropriate recorded responses.

LLMs open new opportunities for supporting nuanced and context-aware dialogue within XR environments, enabling social interactions that extend beyond simple question-and-answer scenarios. Models such as GPT-3 and its successors have demonstrated remarkable zero-shot performance across a range of tasks, leading to the development of “prompt engineering”—the practice of carefully designing input prompts to guide the model’s output toward a desired format or style. For instance, Reynolds and McDonell (2021) showed how providing strategically constructed input sequences to a machine translation model can yield superior performance compared to straightforward zero-shot or few-shot approaches.

Despite steady progress, LLMs still exhibit several well-documented weaknesses, many of which are likely to impact dialogue. Recent evaluations on multi-hop benchmarks show that even state-of-the-art models succeed on the individual hops yet falter when the intermediate inference must be chained—accuracy on two-step factual queries drops by 25–35 percentage points relative to single-hop baselines (Biran et al., 2024). At the same time, a growing body of work demonstrates that GPT-4 and its peers can reproduce or amplify demographic biases, such as in clinical decision-making (Zack et al., 2024), even when prompts are carefully controlled. Maintaining a stable persona is another open issue; LLM responses drift in sentiment and role consistency over long dialogues, prompting proposals for dedicated “persona memory’’ modules or steerability objectives (Ji et al., 2025).

Real-time voice interaction introduces an additional constraint—latency. Human speakers expect turn-taking delays below ∼300 ms, yet naively streaming an LLM often incurs multi-second stalls. Recent work explores incremental recognition plus speculative decoding (Jacoby et al., 2024) or full-duplex control-token schemes that let the model decide when to begin speaking before the user has finished (Wang et al., 2024), cutting average lag by 40%–60% while preserving response quality. Response-delay reduction techniques such as adaptive early-exit, quantized inference, and GPU/TPU batch-fusion remain an active research frontier (Kamioka et al., 2024).

3.2.2 Non-verbal behavior

Looking forward, moving from 2D “deepfake”-style manipulations into fully immersive 3D and VR environments introduces a complex array of new opportunities and challenges. In 2D media, manipulation typically involves altering facial features or lip-syncing to create realistic but flat, screen-based characters. The shift to 3D—and particularly to VR—involves the generation and control of three-dimensional avatars that can occupy shared, persistent virtual spaces. These avatars must seamlessly integrate multiple modalities, including facial expressions, body movements, gaze, and spatial audio. The complexity of achieving believable interactions in three dimensions and within VR goes well beyond simple facial mapping, necessitating advanced models that can adapt to user viewpoints, dynamic lighting, and interactive elements (Lombardi et al., 2018; Mildenhall et al., 2021).

Dyadic interactions can be viewed as a constellation of communicative channels, with each participant possessing multiple input and output streams. In the context of a human interacting with a virtual human, we propose that the primary modalities include text, voice, facial expressions, and full-body skeletal dynamics (i.e., gestures and postural adjustments). Supporting these interactions at a high level of fidelity requires the conditional and synchronized generation of all four channels, influenced not only by the human interlocutor’s current behavior but also by the virtual human’s recent communicative history. While fully integrated systems of this kind remain a target for ongoing research, individual components have already reached impressive quality. For instance, advanced dialogue modeling (Cassell et al., 2000; Traum et al., 2015), high-quality speech synthesis (Zen et al., 2013; Shen et al., 2018), expressive facial animation (Cao et al., 2013), and gesture generation (Pelachaud, 2009) have each shown substantial progress over the years, and contemporary DNN-based systems are achieving very high-quality results. Multimodal frameworks predicting listener backchannels (Morency et al., 2010) and deep learning methods for character motion editing (Holden et al., 2016) further suggest that a comprehensive, real-time solution may be achievable in the near future. Nevertheless, current limitations—particularly those related to achieving low-latency, tightly-coupled responses across multiple modalities—must be addressed before truly seamless dyadic interactions with virtual humans become commonplace.

Recent advances in foundation models suggest a general recipe for building “AI” systems: first, establish model priors through large-scale data-driven self-supervised pretraining, and then refine the model toward specific goals through additional targeted training. This paradigm effectively combines imitation learning—learning from what humans have done—with goal-directed optimization strategies such as reinforcement learning. In the domain of language, the pretraining phase corresponds to predicting what a human might say in a given context, forming the basis of large language models (LLMs) (Brown et al., 2020). Analogously, in the domain of non-verbal behavior, this might involve learning to produce facial expressions or gestures that a typical person would display in the same situation (e.g., Ginosar et al., 2019).

However, imitation alone is rarely sufficient. Much like in language modelling—where post-training techniques such as instruction tuning (Ouyang et al., 2022), reinforcement learning with human feedback (RLHF) (Christiano et al., 2017), and preference learning (Ziegler et al., 2019) have been applied to align models with desired outcomes—non-verbal behaviour generation may also benefit from goal-oriented approaches. To date, most research has focused on the first phase: modelling what a person would do in a given context. However, an important next step lies in exploring how non-verbal signals can be selected or adapted to serve specific communicative goals.

Initial efforts in this direction include research that applies reinforcement learning and similar strategies to gesture and expression generation (Friedman and Gillies, 2005; Kwiatkowski et al., 2022), although such approaches remain underexplored. Looking ahead, incorporating goal-directed behavior into non-verbal communication models could substantially enhance the effectiveness and realism of virtual agents, enabling more purposeful and contextually aware interactions.

3.2.3 Hybrid frameworks

In the metaverse, participants are represented by avatars, effectively separating the controlling entity—the human user—from the virtual body that others perceive. Virtual humans, as discussed previously, offer an alternative scenario in which the virtual body is controlled entirely by software. Between these two extremes lies a spectrum of hybrid configurations wherein certain communicative channels are driven directly by the human participant while others are autonomously generated by AI.

For example, a metaverse attendee might choose to participate in a virtual meeting solely through voice, opting not to control body gestures or facial expressions—perhaps due to their physical location (e.g., commuting), comfort (e.g., lying in bed), or ability constraints. In such a situation, a conditional generation network, as described above, can synthesize the missing channels of non-verbal expression, ensuring that other participants experience a coherent, embodied presence.

Similarly, in XR scenarios, a participant’s real facial expressions might be obscured by a head-mounted display (HMD). While one approach is to track and reconstruct these facial expressions directly from sensor data, another viable option is to employ a generative model to produce plausible expressions that align with their voice voice and contextual cues. Such methods enable a more seamless, believable presence, even when direct tracking is not feasible. Of course, using AI to interpolate or extrapolate from sensor data has the risk of misrepresenting the participant. Just as AI automatically deciding to modify the content of your speech may be highly undesirable the unintended modification of non-verbal behaviour can also be harmful—for example, consider an avatar smiling in a tragic event due to AI misinterpretation of sensor data.

These frameworks also extend to more ambitious configurations, such as being “present” in multiple virtual spaces simultaneously. Kishore et al. (2016) propose a hybrid model that leverages this division of channels to facilitate presence in several locations at once. Friedman and Hasler (2016) similarly describe scenarios in which users strategically allocate control over different communicative channels, enabling them to manage multiple engagements concurrently. Together, these examples demonstrate the flexibility and potential of hybrid avatar control, laying the groundwork for rich, dynamic, and context-sensitive presence in the metaverse.

4 Societal impact

4.1 Education

XR technologies hold transformative potential for both professional training and curriculum-based education, offering immersive and interactive experiences tailored to diverse learning needs. Despite challenges such as cost, accessibility, and technological limitations, the evidence underscores their capacity to enhance engagement, foster creativity, and improve learning outcomes across contexts. By leveraging shared principles and addressing existing barriers, XR can create a future where learning is more dynamic, interactive, inclusive, and impactful. However, many considerations and questions are raised throughout this journey of discovering the power of XR learning. It is imperative to conduct research not only into technological solutions but to also explore new design principles, take ethical considerations into account, and discover the didactic methodologies that have true and lasting learning effects.

There are many cases for using the metaverse in various educational settings: primary school students could visit remote places virtually to stimulate their interest in geography, secondary students could travel back in time to experience key historical events, and university students could run simulations where they can visualize molecules and interact with them in a 3D setting.

Key motivations for such applications have been identified as experiential and gamified learning (Marougkas et al., 2023; Lampropoulos and Kinshuk, 2024). From a narrative perspective, simulated learning environments can be categorized as adventure world (narrative-driven quests), simulation world (real-world simulations), creative world (open-ended creative spaces), role-playing word (perspective-taking), and collaborative world (teamworking) (Damaševičius and Sidekerskienė, 2024). In particular, HMDs are recognized as useful for skills acquisition, whether cognitive, psychomotor, or affective (Jensen and Konradsen, 2018).

Beyond the obvious educational organizations, VR has also been widely adopted for professional skills training; industrial training is a major application as it allows participants to learn in a simulated environment that is controlled and safe. Out of the 60 studies reviewed on professional training by Renganayagalu et al. (2021), 17 were from industrial training ranging over assembly, maintenance, visual inspection, procedure training, and spatial memory. Other professional training discussed by them were for firefighters, safety and emergency preparedness, defence, aerospace and aviation, and healthcare.

Beck et al. (2024) carried out a meta-review of 47 review papers on immersive learning, focusing on the educational practices and strategies of teachers rather on outcomes. They notably underlined, with regards to the metaverse, that immersive learning encouraged active learning, collaboration, and engagement and scaffolding. Mystakidis and Lympouridis (2024) focused in their literature review on the practices that are enabled by the metaverse and sought to elicit the various instructional design models that could be used. First, replication consists in using the same pedagogical methods (e.g., problem-based) as in classical learning. Second, adaptation focuses on specific cases in which the metaverse considered as a medium can be effective (e.g., a situation that is too dangerous, or when social interactions are necessary). For the reconceptualization approach, new approaches to learning should be designed, for instance by making use of avatar freedom, social connections, discovery, and experiential learning. The authors conclude that “…instructional designers and educators are encouraged to think big; the metaverse can become literally their magic wand to construct rich and emotion-filled, memorable experiences”. The metaverse is recognized as having the potential to disrupt teaching paradigms, but Mystakidis and Lympouridis (2024) underline that this would need both to change the mindset of teachers and clear institutional visions on those changes.

Uribe et al. (2024) compared four metaverse tools used in class activity. They showed that avatar expressiveness is important, as well as naturalness of movement (rather than liberty of movement), and that the interaction possibilities that are provided to the students should be adapted to the on-going task.

In this section we examine these topics and provide a very brief overview of learning in immersive environments. We highlight the possibilities and the hurdles to overcome as we continue to strive toward a metaverse for learning.

4.1.1 Immersive learning

Virtual environments can bridge the gap between abstract concepts and tangible understanding by providing the learner with an interactive and immersive environment rather than explanations and words or visualizations represented in two dimensions. Static explanations require that students bridge concepts in their minds rather than through tangible examples and experiences, often leading to dissonance in integrating learning and benefitting from its context. In real time, learning can occur faster, with less mental friction and with a higher degree of lasting impact. XR has the potential to not only be interactive but also immerse the learner in the exact environment that is best suited to deliver a specific learning experience. In an industrial AR scenario, a participant can gain superpowers to see through walls and machines, or in a classroom students can see 3D projections on a classroom table that they can gather around. In VR, the whole body takes part in the experience, which has an impact on the spatial parts of the brain. With both the visual and auditory senses engaged (and potentially touch with the use of haptics) the brain starts to believe that the body actually occupies the virtual space (i.e., presence). This fundamentally changes how the body learns from its surroundings. For example, seeing a polar bear on a screen is very different from standing next to one in VR. The spatial relationship between body and environment begins to actively shape the learning experience.

4.1.2 Use cases: school and professional setting

DICE (dangerous, impossible, counterproductive, or expensive and rare) (Bailenson, 2018) is a lens through which XR use cases for both professional education and schools can be viewed. Learning-scenarios that are dangerous, impossible, counterproductive, or expensive are most likely a good fit to try and practice or learn in XR rather than in real life. In the school system and in professional education, the technology may be viewed differently and used to achieve different and sometimes overlapping goals. Presented below are distinctions and similarities between these two domains.

4.1.2.1 Professional education

In professional settings, XR technologies are often used to standardize training and align workers with specific ways of thinking or performing tasks. These immersive applications emphasize immersive experiences that transfer critical skills or ensure compliance with safety protocols, manufacturing guidelines, and so on. In industry, XR environments are particularly effective for tasks that are hazardous, complex, or costly to replicate in the real world. VR simulations in engineering and manufacturing allow workers to practice operating heavy machinery in a safe, controlled virtual environment (Renganayagalu et al., 2021). XR has the inherent ability to also help where time and place is a factor. XR may, for example, enable training for production personnel on a digital factory floor that only exists as a 3D drawing and has not yet been built in the real world. When the factory is built, workers can start to work instead of train. Furthermore, as new personnel are onboarded, the real factory can remain in production while the XR environment continues to serve as the training platform, increasing efficiency and saving both time and money in the long term. The research also supports VR training boosting confidence and skills transfer more than traditional video training, particularly for new employees (Hoang et al., 2022). VR may also significantly reduce additional training-related costs by eliminating the need for physical materials or on-site simulations. For instance, Magar and Suk (2020) highlighted the efficiency of VR in training professionals without the logistical limitations of real-world setups, making it particularly valuable for industries like aviation and medicine. Beyond practical benefits, Lohre et al. (2020) also demonstrated the effectiveness of immersive VR in teaching complex surgical procedures, showing faster skill acquisition and higher performance compared to traditional methods. In everything from smart factories (Industry 4.0 and beyond) and traditional industries like concrete, plaster, and paper production to car manufacturing and the military, the effects and uses of immersive training are being investigated (Renganayagalu et al., 2021).

4.1.2.2 School education

In schools, XR offers an unprecedented opportunity to bring abstract and complex concepts to life. Unlike professional settings, where XR is often used to train for specific tasks, curriculum-based education leverages the technology to foster exploration, creativity, and deeper understanding, both for younger students and lifelong learners. For instance, Chao and Chang (2020) demonstrated that VR significantly improved spatial reasoning and mathematical understanding among elementary school students by transforming two-dimensional problems into three-dimensional explorations. In a multiuser virtual geometry lab, students can pick up a 2D shape (such as a triangle) and view it directly from any angle. To hold an object that does not have a third dimension is impossible in the real world, but in VR it is possible. Such an experience, especially when the student can experience the perceived thinness of the object in their own hand, leaves a lasting impression and may lead to a fundamental understanding of geometry that was not there before. Zhang and Wang (2021) showed how inquiry-based learning with VR enhances motivation and scientific literacy while providing safe and repeatable experiments. In science education, students may virtually shrink to the size of molecules to observe chemical bonds or simulate physics experiments in virtual labs. Gamification (using the tool visualization and mechanics of the games industry) along with narrative-driven VR experiences have proven effective in increasing student engagement. For example, Merchant et al. (2014) found that VR-based educational games improve learning outcomes, particularly when students play individually rather than in groups. Similarly, VR’s ability to immerse students in historical or cultural settings enhances creativity and broadens perspectives (Tilhou et al., 2020).

Bringing previously abstract learning into an interactive and immersive experience fundamentally shifts how a lesson is processed by the learner, but it also has unexpected payoffs. In interviews with Swedish teachers (conducted by Research Institutes of Sweden in 2024), students as young as 10 years have been observed to improve their communication skills with each other as they guide each other through VR interaction basics. This communication practice has lasting effects in the students and the dynamic of the class outside of VR. In other examples, students who have previously been reported as so introverted that they are unable to partake in the classroom are suddenly talking, explaining, and collaborating with other students as they enter a virtual environment where the social rules of the classroom can be remade when acting through an avatar.

4.1.3 Hurdles and thresholds

While professional and curriculum-based XR education often differ in focus, they share common principles for learning which also unites many of their challenges.

There are benefits from experiential and gamified learning that foster engagement and retention when designed effectively. For instance, gamification strategies used in professional training to enhance skill transfer may inspire methods for improving engagement in schools (Kyaw et al., 2019). On the other hand, an over-reliance on surface learning—such as scoring highly in gamified training modules without true comprehension—can undermine long-term skill retention (Grassini et al., 2020). Many VR applications focus on knowledge acquisition without sufficiently fostering higher-order cognitive skills such as critical thinking (Zhang and Wang, 2021), highlighting the importance of careful design decisions for XR and gamification in general.

XR, like gamification, may show a drop in motivational effectiveness over time as its novelty wears off and students become accustomed to the technology (Ratinho and Martins, 2023). The “novelty effect” (the excitement of entering an immersive environment for the first time, for example) has been identified as a reason for poor learning outcomes during first encounters with VR. The initial excitement can interfere with learning focus; however, structured tutorials can help mitigate this and transition users into more productive engagement (Miguel-Alonso et al., 2024). Low-quality implementation or misdirected purpose will fail to achieve meaningful learning. Sustaining engagement that leads to true learning requires thoughtful curriculum design and pedagogical support.

Studies have also shown that while VR significantly enhances the sense of presence compared to traditional desktop simulations, it can also increase cognitive load, potentially reducing learning outcomes. EEG measures have indicated that VR environments could be cognitively overwhelming, which might hinder knowledge construction in some scenarios (Makransky et al., 2019). Other studies contradict the very interactive nature of VR learning environments and indicate that non-interactive VR may yield better learning results than interactive VR, suggesting that certain types of immersion may detract from rather than enhance learning (Loureiro Krassmann et al., 2020).

Teachers also report difficulty implementing and controlling the new technology in the classroom. The introduction of XR is often blocked by technical barriers, inadequate training, and infrastructure limitations. These issues, combined with the demanding upkeep of the equipment, contribute to stress and reluctance among educators to adopt XR widely in K–12 classrooms.

Teachers may also feel anxious and report a lack of control when letting students interact with each other in a potentially endless virtual space, not knowing where students are and what they do at any given time. There is a worry (which has also been observed in studies) that students potentially disturb or harass each other without the teacher knowing that it is happening. This is especially true when teachers are new to the experience whereas the students, in contrast, quickly learn how to use the tools offered by the virtual world. This dynamic contributes to a perceived loss of control in the classroom, and thus targeted professional development is required to equip teachers for the new digital environments (Rutten and Brouwer-Truijen, 2025).

In addition, the headsets need to be updated, charged, and cleaned, which is a challenge for teachers, who are often struggling to keep up with the learning schedule as it is. The lack of simple and affordable MDM (mobile device management) solutions is also a barrier. In professional contexts, but even more prominently in school-based education, the pricing for apps, subscriptions, and MDM systems is often too high, confusing, or demand payment though credit cards that schools seldom have access to. Resorting to cheap or free apps may mean higher availability for a teacher but may also mean the content is too generic to align with a specific curriculum or learning goals, and it is also seldom available in the native language (Meccawy, 2022).

Before its implementation in the learning environment, XR is often difficult to produce due to the highly advanced technology involved in creating the most interactive scenes that reap the full benefits of a fully immersive learning experience. The potential upfront cost, along with cybersickness and technological limitations, pose significant barriers to adoption (Jensen and Konradsen, 2018), which means many schools lack the resources to implement VR at scale (Maas and Hughes, 2020).

For XR to be effective in education, it is important that it is both well designed (high quality) and well applied (used in learning subjects where the strengths of XR are leveraged). The authenticity of learning experiences is significantly influenced by both the fidelity of the virtual environment and the design of learning tasks within it (Lowell, 2024).

In abstract subjects (such as philosophy or politics, which are not easily translated into 3D), XR may increase engagement, but the cognitive load or mismatch between medium and content can just as well reduce effectiveness if not carefully designed (Crogman et al., 2025).

XR should be applied in subjects where nonlinear storytelling, interaction, experimentation, and immersion help enhance learning in that particular subject. Geometry and spatial sciences, for example, benefit from the immersive, 3D nature of XR as it aligns closely with the spatial understanding required in geometry and similar disciplines. Learners can manipulate shapes and view mathematical objects in space, improving comprehension and retention (Lowell, 2024)—which is where 3D engines excel. Environmental and ecological education can also see strong learning gains because XR helps learners visualize complex, dynamic systems (e.g., marine ecosystems), enhancing understanding through embodied and interactive learning (Aguayo and Eames, 2023).

As mentioned at the beginning of this section, there can also be unexpected learning gains from implementing XR in schools. In interviews with teachers who introduced simple VR apps for 10-year-old students in the classroom (Swedish VR school, 2025), the teachers observed that students developed language and communications skills at an unprecedented rate. The students were excited to verbally share what they experienced in VR and teach each other how to interact in the new environment. This became an impromptu platform and attraction point for linguistic interaction where there previously had been none. This proved especially effective in fostering language development for newly arrived international students.

Another observation in the VäXR project (an immersive geometry lab developed by RISE)⁸ is that stay-at-home students—who have had little to no interaction with their fellow students for several years—are able to use the virtual space to re-engage with their school and other students. The middle step of entering a virtual learning environment allowed for pent-up nervousness and anxiety to dissolve and for the students to return to the real school environment.

4.1.4 Evolving VR vs. keeping VR simple

While technology keeps evolving and game engines become increasingly easier to use, we will enjoy more and more technologically advanced—and, if we choose, more realistic—virtual learning worlds to explore. Generative AI will impact the creation of XR not only back-stage (the production of the XR content, 3D models, writing, and world concepts) but also on-stage as it inhabits the virtual worlds alongside human learners in the form of AI driven avatars (also known as immersive agents, or AI driven NPCs). They will talk, point, guide, teach and explain that which was previously difficult and time-consuming to understand. These subjects are becoming increasingly important research issues for ensuring the creation of ethical, safe, and trustworthy systems that the world can enjoy in peace, both inside the metaverse and outside it.

As we continue the work of breakthrough technology, we may also benefit from some of the simpler interactions that can convey the most impactful learning in VR. For example, the virtual geometry lab mentioned previously, where a student picks up a 2D shape and views it directly from any side, does not need to be powered by advanced AI systems and is straightforward to code. Picking up, scaling, and placing 3D objects such as spheres is not difficult to code either, but this simple tool can be instrumental for a teacher to convey the relationship between the planets in our solar system at a distance and scale that no 2D screen or poster is able to convey. Another example is to experience the concept of volume by standing next to a square centimetre, a square meter, and a square kilometre and experience the difference in relationship between them. Again, such a thing is near impossible in the real world but is trivial to implement in VR and can leave a lasting memory, not as a lesson but as an experience where the student’s whole body takes part in absorbing the concept of scale and volume.

4.1.5 The digital democratization of learning

XR applications can be distributed and updated to supply learning environments for a virtually unlimited number of people at once. This mirrors distribution mechanisms in the gaming industry, enabling immersive learning experiences to scale rapidly and flexibly across geographies. Furthermore, interconnected metaverse platforms could allow learners and teachers to not only access new content seamlessly but also create and share their own learning experiences.

However, it is important to recognize that, as evidenced by the evolution of current massive open online course (MOOC) platforms, widespread access does not automatically imply the equitable availability of high-quality educational resources. Although many MOOCs offer free access to courses, a significant portion of the most comprehensive or specialized content is typically gated behind paywalls, and free offerings are often limited to introductory levels. Given the higher costs associated with developing immersive XR experiences compared to traditional online courses, it is likely that free access to high-quality VR-based educational content may be even more restricted.

Therefore, while the metaverse holds strong potential for democratizing access to innovative learning environments, achieving true educational equity will require conscious efforts, including supportive funding models, open-access initiatives, and collaborative development frameworks that reduce dependency on purely commercial motivations.

4.2 Healthcare and quality of life

4.2.1 Communication in healthcare

An overlapping area between education and professional training is applications in the healthcare domain. A recent review paper identified five key applications in medical training using VR: anatomy, procedural skills, surgical procedures, communication skills, and clinical decision-making (Tursø-Finnich et al., 2023). A total of 40 studies were included in this review, with half of them tackling surgical procedures and another nine procedure skills, sharing similarities with general professional skills training as previously discussed. Six papers investigated anatomy learning in VR, and four clinical decision-making, including scenarios such as acute trauma life support, mass-shooting triage events, and patient respiratory distress. Only two of the 40 studies covered by this review focus on communication skills, with both looking at a scenario where participants (medical doctors) went through a virtual consultation to promote vaccinations (Real et al., 2017). Despite the low number of papers reviewed in that article, communication has been recognized as a key element in medical training. Effective communication is crucial for patient-centred care in medicine, allowing more room for patient satisfaction, improved quality of care, cost-effectiveness, and less space for medical errors (Kwame and Petrucka, 2021). The low number of studies in medical communication training is probably due to the technical challenges in simulating realistic social interactions. However, several studies in this domain have provided evidence that the combination of immersive VR and simulated virtual human characters, although with limited capacity in their responses, could still be effective in triggering realistic responses in medical doctors. In one of the earliest studies using immersive VR in this area, general practitioners (GPs, similar to family doctors in the US) were confronted with virtual patients asking for antibiotics which were not medically indicated. Of the 21 medical doctors, despite knowing that the patients were simulated and with a clear case that antibiotics should not be prescribed, very few were able to resist the demand toward the end of the VR experience (Pan et al., 2016). In a more recent study, medical doctors and nurses in a pediatric hospital had to inform the parent of a young patient in the waiting room of the cancellation of a routine process due to an emergency. Their performances were recorded and replayed in VR—half of them watched the replay inside the body of the virtual parent while the other half from a third-person perspective (Collingwoode-Williams et al., 2024). In order to maintain a plausible illusion, both studies were conducted using the Wizard-of-Oz method, where an experimenter selected the most appropriate responses in real-time from a collection of pre-recorded utterances (Pan and Hamilton, 2018).

The greatest challenge in communication training is creating animated virtual characters that are expressive, believable, and able to respond to the participants in real-time. A more affordable method was used in a recent 360° video study, where medical doctors were put behind the eyes of their patients and experienced positive and negative communication recordings from the patients’ perspective (Hoek et al., 2023). However, this experience was passive and non-interactive. The recent advance in large language models (LLMs) discussed above is likely to generate a new body of work, where the conversation between the trainee and the (virtual) conversational partner could go beyond pre-recorded responses. However, the latency of such LLMs and the ability to automatically interpret and generate the non-verbal aspect of the conversation would remain a challenge to creating believable scenarios in this area.

Medical application domains in the metaverse have been grouped in to five topics (Chengoden et al., 2023). Medical education and training deals with primary or vocational education for professionals, who can carry out virtual experiments or procedures but also patient education. Medical visualization is related to collaboration between practitioners on the immersive analytics of patient data, models, or digital twins, distant monitoring and alerts, and also AR-assisted surgery. Telerehabilitation is based on the use of immersive technologies for therapy at home, such as for shoulder treatment (Álvarez de la Campa Crespo et al., 2023), with AI-based personalization. Virtual consultation and telemedicine corresponds to meetings between practitioners and patients as avatars in virtual environments, with audio and visual medical examinations being complemented by tactile or physiological data collected by the appropriate sensors (Massetti and Chiariello, 2023). Finally, social support corresponds to the use of the metaverse for group therapy, where patients can meet and share experiences (for instance, using VRChat to meet during COVID-19).

Telemedicine aims to provide distant clinical healthcare services using information technologies. The two main possibilities with regards to the use of the metaverse are telerehabilitation, where patients perform VR activities at home, and virtual teleconsultation, where practitioner and patient meet in VR.

4.2.2 Telerehabilitation and teleconsultation: toward teleclinics in the metaverse

Telerehabilitation allows patients to self-administer VR activities at home for a certain period of time while remaining in contact with a health institution. It has proven its value and feasibility, with medical results corresponding to those obtained in health facilities for post-stroke arm motor-impairment rehabilitation (Piron et al., 2008), VR exposure therapy (Levy et al., 2016), pulmonary rehabilitation (Jung et al., 2020), phantom limb pain reduction (Thøgersen et al., 2020), multiple sclerosis rehabilitation (Pagliari et al., 2021), and lower back and neck pain (Maddox et al., 2023; Orr et al., 2023). VR can then be effectively considered an ‘‘immersive therapeutic (ITx) delivery device” (Maddox et al., 2024). Nevertheless, if those exercises were supervised to some extent, be it by video conference means or by regular exchanges with practitioners, they did not benefit from the potential of social VR and the metaverse. A first possibility deals with synchronous monitoring of the exercises by a specialist through their avatar, possibly with real-time dashboards constructed from data collected from VR devices or additional sensors (e.g., fine tracking of body positions, or the heart rate) so as to provide real time feedback (e.g., correct a leg position) and exercise adaptation (e.g., lower the difficulty). A second possibility is related to collective exercise by patients who would meet in an exercise room, for instance, to carry out rehabilitation or tertiary prevention-related exercises while enjoying social relations, thus further enhancing adherence to the treatment.

For immersive clinical teleconsultation in the metaverse, in 2012 a pioneering remote neuro-rehabilitation system allowed a patient in VR to see the practitioner’s avatar and physically interact with his hand using an active haptic hand (Perez-Marcos et al., 2012). Another system successfully facilitated a follow-up of a patient presenting with vasospastic angina using social VR with ECG, blood pressure, and oxygen saturation sensors (Skalidis et al., 2022). Regarding mental health consultations and teletherapy, where the use of avatars may combine the benefits of face-to-face communication with anonymity (Baccon et al., 2019), a study comparing the experiences of psychotherapeutic counselling delivered to workers in remote locations showed that VR was easy to use and outperformed video conferencing for presence and the perceived realism of the session (Pedram et al., 2020). At the group level, a mindfulness instructor could lead group sessions of patients in VR (Cikajlo et al., 2017), and a music therapist could lead group singing interventions for people living with spinal cord injury (Tamplin et al., 2020). In conclusion, VR teleconsultation between patients and clinician avatars in the metaverse, possibly completed by supplementary sensors and devices, has been shown to be feasible, although only in a few cases. It is however desirable, at least for the domains of mental health and rehabilitation, because impersonating and interacting with avatars in simulating social activities is “crucial to increasing the effectiveness of treatment” (Cerasa et al., 2022).

All these considerations hint at the possibility and need to design and build immersive teleclinics in the metaverse where patients and practitioners could meet, be it for activities such as diagnosis, follow-up meetings, group meetings, and exercise. There are a few principles for such teleclinics, adapted from Prié et al. (2025).

• They should allow immersive remote consultations between medical practitioners and patients as well as interaction between patients.

• They should be run by socio-economic actors, such as public or private hospitals or clinics, telemedicine operators who would make social VR consultation rooms available to their staff, or lend them to independent practitioners.

• They would be composed of various spaces such as welcoming places, social rooms, waiting rooms, consultation rooms, and collective or individual exercise rooms.

• Patients and practitioners could perform various activities in the clinic, ranging from simple social interaction in consultation via avatars to remote examination with specific sensors, as well as a variety of exercises, either dedicated to assessment or remediation.

• Those activities could be carried out alone (e.g., cognitive training) or collectively (e.g., elbow rehabilitation) and be accompanied or not by practitioners’ activity rooms.

• Activities (e.g., exercises, tests) could be proposed by third-party vendors and bought or rented by the clinic.

• The use of an immersive teleclinic and immersive teleconsultation should be coherently integrated in patients’ healthcare paths. The paths would be composed of real and virtual teleconsultations.

• Patients could be at home or in local centres and be autonomous or require a handover assistant to be present.

Prié et al. (2025) built a prototype of a teleclinic for neuropsychological assessment as a first step towards a medical metaverse based on social VR (Figures 9A). Numerous challenges remain, be they technological (e.g., which avatar quality is necessary for psychotherapy or for other medical practices? how to integrate an activity from a third party?), clinical aspects (e.g., how to maintain the therapeutic alliance in social VR?), data management (e.g., how to securely access patient information? how to protect patient data?), ecosystem (e.g., how to integrate multiple stakeholders and vendors to build actual teleclinics based on technical metaverse frameworks?), or legal and regulatory issues associated with such digital worlds. Indeed, several potential issues arise that may pose ethical or legal challenges. Some are related to the development of the metaverse in general, such as the unequal access to such teleclinics for patients who might suffer cybersickness or not afford the cost of access (digital divide). Cyberbullying or other problems may also appear in such clinics, as in any immersive social environment, but the difference here would be that a clinic should, by definition, be a safe space, where “on-call” professionals ready to intervene are never far away. Another issue may be related to the fact that meeting medical practitioners or exercising online while enabling access to healthcare may also reinforce sedentary lifestyles, notably for vulnerable people (elderly, depressed, etc.) This calls for particular attention in designing teleclinic-based care pathways for such at-risk patients.

Figure 9

A group of people is seated around a large, round conference table in a bright room. Blue chairs surround the table, and diagrams are visible on the walls. The room is well-lit, with multiple overhead lights.

Figure 9. Examples of virtual medical consultation. (A) A proposal of a teleclinic for neuropsychological testing. Left: Waiting room where a patient can carry out various waiting activities. Middle: Consultation room where desk-based testing activities can take place (here Tower of London). Right: Independent environment for testing executive functions where the patient carries out a standing testing activity (here putting books in the right shelf) under the monitoring of the clinician; with permission from Prié et al. (2025) (B) Patients, clinicians, developers, and researchers in a shared space in immersive VR United during a focus group in the metaverse; with permission from Amestoy Alonso et al. (2024).

4.2.3 Case study of patients discussing their back pain treatment

Co-designing healthcare interventions with patients is crucial for ensuring their usability, acceptability, and effectiveness (Coulentianos et al., 2020). Focus groups thus provide a valuable approach for gathering diverse perspectives and shared experiences. However, traditional in-person focus groups can present logistical challenges for patients, particularly those undergoing rehabilitation with movement limitations or who are managing chronic pain and discomfort.

The metaverse offers a promising alternative by enabling virtual focus group meetings within shared virtual environments. In a pilot study conducted with patients with chronic lower back pain, a focus group was held within VR United, a virtual reality platform (Oliva et al., 2023), that allowed multiple users to interact in real time (Figure 9B). Participants, including patients, medical doctors, developers, researchers, and physiotherapists, interacted through personalized (look-alike) avatars with upper body tracking based on Quest head tracking and controllers, with the use of inverse kinematics and audio-generated mouth animations (Amestoy Alonso et al., 2024). All participants had customized avatars generated from their own front-face photograph provided previously. This virtual setting allowed patients to participate from the comfort of their homes, eliminating the need for travel and associated logistical problems.

The structured discussion independently focused on patients’ experiences using a virtual reality system for lower back pain at home for 1 to 3 weeks (Donegan et al., 2025). This system was based on the embodiment of the customized avatar and on the principle that training the virtual body has a positive impact on the physical body (Matamala-Gomez et al., 2022). In the case of chronic pain, virtual reality is effective for pain relief (see review of Matamala-Gomez et al., 2019). The patients provided valuable answers and feedback to the designers on usability and positive or negative experiences, and they also made numerous suggestions regarding exercises, limitations, and so on. We will not review the feedback regarding the specific device, but that related to the experience in the shared virtual space.

Participants reported positive experiences with the virtual focus group format, highlighting its convenience, sense of being in a shared space, and the naturalness of the interaction. Compared to traditional videoconferences, participants found the virtual environment less stressful and more engaging (Amestoy Alonso et al., 2024).

This use of the metaverse presents several potential advantages. First, it improves accessibility by enabling participation from diverse geographic locations and eliminating physical barriers. In the case of focus groups with patients, it facilitates the gathering of feedback from a larger number of patients than in meetings in person. Second, the immersive nature of the metaverse can enhance patient engagement and motivation in a situation where patients value sharing the daily life problems that their chronic pain condition entailed, including social isolation. Third, it can be more cost-effective by reducing the need for travel and associated expenses. Finally, the novelty factor can enhance a participant’s interest. We envision a future where VR systems become as ubiquitous as mobile phones today, facilitating the possibility of scaling up this use in focus groups.

However, challenges and limitations also exist. Technical issues such as hardware problems, software bugs, and poor internet connections can disrupt meetings and frustrate participants. Some participants may experience discomfort due to the weight of the systems or cybersickness when using VR headsets for extended periods. Subtle non-verbal cues, such as facial expressions and eye gaze, may be less effectively conveyed in virtual environments, although there are advances regarding these. Despite these challenges, ongoing technological advances in VR/AR technology have the potential to address many of these limitations, improving the quality and effectiveness of virtual focus group interactions.

4.2.4 Quality of life and wellbeing

In the evolving landscape of wellbeing, especially in the wake of the COVID-19 pandemic, innovative solutions in the medical sector have gained prominence, redefining how healthcare and mental health therapies are delivered. The incorporation of technologies such as telemedicine, virtual consultation, robotic cooperation, and medical education employing virtual reality (VR) has become increasingly prevalent. These advances not only address the challenges posed by social isolation and lack of activity but also complement the pursuit of intellectual, social, and emotional wellbeing. Intellectual wellbeing, encompassing activities such as learning, creativity, and critical thinking, finds support in virtual education and intellectually stimulating virtual environments. Social wellbeing, rooted in positive relationships and social stability, benefits from the connectivity facilitated by telemedicine and virtual interactions. Moreover, as we navigate the digital frontier, the concept of the metaverse emerges as a groundbreaking tool with profound implications for medical, rehabilitation, and mental health treatment, presenting new avenues for enhancing overall wellbeing (Shi et al., 2023). As people engage with these technological advances, the blend of traditional wellbeing principles with cutting-edge medical approaches becomes pivotal in fostering a positive quality of life experience. Applying the metaverse environment in these fields can provide a real-time, immersive digital environment where patients, their careers, doctors, physicians, patient family members, pharmacists, and researchers can interact, collaborate, and participate in a variety of healthcare activities (Tang et al., 2022; Yang, 2023). Amongst other possibilities, the metaverse environment may improve the quality of treatment and noticeably reduce treatment costs (Chengoden et al., 2023). It can also improve access to medical staff (doctors, physiotherapists, psychologists, and care assistants for the elderly) for all patients, regardless of their location (Mariano et al., 2022). As a result, unprecedented opportunities to enhance patient care, physiotherapy, or medical training arise. All these activities aim to increase patients’ quality of life and, in general, improve their wellbeing.

During remote medical rehabilitation and psychological consultations in the metaverse, patients and doctors or therapists interact with each other through their avatars (Chengoden et al., 2023; Šarić et al., 2024), as discussed above. Currently, participants can control their avatars using a head-mounted display and gloves. However, more sophisticated and less conspicuous tools, such as lightweight glasses and lenses, are beginning to appear to enhance the metaverse experience (Łysakowski et al., 2023a; b). Furthermore, continuous improvements in computer vision technology help doctors and therapists capture patients’ facial expressions and body language quickly and determine their conditions, thus increasing the accuracy of examination and of the diagnosis provided in the metaverse environment (Chengoden et al., 2023).

Moreover, the metaverse can work excellently as a tool for effective meditation, stress relief, mindfulness training, and cognitive therapies. The immersive and engaging nature of the metaverse virtual environment allows patients to be moved to serene and calming virtual environments while undergoing medical or treatment procedures, and thus help them manage pain and anxiety more effectively (Koohang et al., 2023). However, it is important to recognize that excessive reliance on virtual environments for emotional regulation could risk reinforcing avoidance behaviors or limit real-world coping skills, particularly in vulnerable populations.

The metaverse also has enormous potential in the treatment of anxiety disorders by helping recreate situations that cause the patient to feel anxious, such as flying in an airplane or public speaking (Din and Almogren, 2023; Meinlschmidt et al., 2023). Consequently, patients’ avatars can interact with stress-inducing situations in safe environments where conditions can be simulated and every aspect of the interaction can be safely monitored and controlled. A patient can thus face his or her fears in a gradual and controlled way, which helps reduce anxiety and improve their ability to cope with similar real-life situations. Consequently, such a metaverse-based therapy enhances overall wellbeing, contributing to better patient outcomes. Nevertheless, there are concerns that poorly designed virtual scenarios might inadvertently intensify anxiety or lead to emotional desensitization if not appropriately monitored by therapists.

Increasing applications are being discovered for the use of metaverse technology in immersive rehabilitation in neurological therapy, particularly for sensorimotor and cognitive impairments (Ventura et al., 2022). In such cases, patients learn to control their bodies using avatars to perform gestures and movements. Activities performed by avatars in the metaverse environment translate into improvements in patients’ real-life functioning. Particularly important here is that exercises in the metaverse are implemented for optimal movement patterns, helping patients perform real-life activities correctly. Also essential is the repetitive nature of the training provided by the metaverse environment and the possibility of realizing therapeutic scenarios that would be impossible to carry out in the real world due to patient safety, for example.

Whereas rehabilitation, such as for neurological post-stroke therapy, has established standards and scenarios of the exercises, the virtual or augmented reality in the metaverse offers important enhancements (Phan et al., 2022; Veras et al., 2023) related to two principal aspects: (i) providing patients with a stronger motivation to execute the exercises on a regular basis and (ii) better control of how the exercises are executed. The first aspect is typically improved through the gamification of the exercises, which introduces an engaging and interactive dimension to the rehabilitation process that significantly benefits patients (Sun et al., 2023). By integrating game elements into therapy routines, such as VR simulations or interactive augmented reality apps, patients are motivated to actively participate in their exercises. This approach not only adds an element of enjoyment to the rehabilitation process but also encourages consistent and prolonged engagement. The competitive and goal-oriented nature of gamification provides a sense of achievement, boosting patients’ morale and promoting a positive mindset (Figure 10). Nevertheless, not all patients may respond equally well to VR-based exercises, and prolonged exposure to VR may cause cybersickness, fatigue, or even frustration due to the technological barriers and learning curves associated with sophisticated systems.

Figure 10

Two virtual reality scenes are shown. The first scene depicts a pair of hands interacting with a red cube against a wooden background. The second scene shows hands holding a black sphere and a red cylinder on a wooden surface, with a mountainous landscape in the background, featuring green, low-polygon trees.

Figure 10. Example scenes from a VR application for post-stroke rehabilitation of upper limbs implemented on Meta Quest 3/Meta Quest Pro.

Additionally, real-time feedback and progress tracking within the gaming framework allow healthcare professionals to tailor rehabilitation programs more effectively, addressing specific neurological challenges and adapting exercises to individual needs. Augmented reality and VR technologies offer superior control in the execution of exercises for neurological rehabilitation, revolutionizing the rehabilitation process. Through augmented reality, patients can receive real-time visual and auditory cues superimposed on their physical surroundings, guiding them through exercises with precision. This immediate feedback enhances the accuracy and ensures that patients perform movements correctly to reduce the risk of improper execution that could hinder recovery. On the other hand, VR immerses patients in computer-generated environments, allowing for a controlled and customizable virtual space where exercises can be executed. The immersive nature of VR enables therapists to simulate various scenarios and challenges and tailors the rehabilitation experience to the specific needs and abilities of each patient. This level of control is particularly beneficial for patients with neurological disorders, as it ensures a safe yet challenging environment for rehabilitation (Cho et al., 2023). Furthermore, metaverse-related technologies facilitate remote monitoring by healthcare professionals, thus enabling real-time assessment of patients’ progress and the ability to adjust exercises accordingly. Tracking of eye motions, available in some of the headgear, allows determination of a patient’s focus of visual attention during the exercises, which helps relate the physical exercises to the mental activity of the patient (Holmes et al., 2010; Park et al., 2019). Similarly, accurate tracking of a patient’s hands can be beneficial to rehabilitation (Juan et al., 2023). This dynamic and adaptable approach to neurological rehabilitation contributes to more effective and personalized treatment plans and ultimately leads to improved outcomes for patients on their path to recovery.

Whether their medical, rehabilitation, or psychological treatment is considered, patients often actively engage in their treatment and therapy plans in the metaverse environment. They monitor treatment progress, set goals, and receive ongoing support. Such involvement fosters a sense of responsibility for their health, quality of life, and wellbeing; this leads to better adherence to treatment plans and more favorable treatment outcomes.

It is notable that the metaverse significantly enhances the convenience and quality of life of patients, who can consult doctors and medical professionals from home. Metaverse technology also provides coordinated care between departments and hospitals, allowing world-class specialists from different disciplines to consult on patient cases in the metaverse, analyse patients’ conditions, and agree on a diagnosis based on multi-criteria assessment. This technology benefits people with limited mobility, those living in rural areas with acute shortages of medical specialists, and patients in remote areas who are usually compelled to travel long distances to medical specialists. However, the high costs of developing and maintaining metaverse platforms, combined with the technical requirements (high-speed internet, VR headsets, and continuous system updates) could exacerbate existing digital divides rather than mitigate them.

Although the use of the metaverse allows training, rehabilitation, therapy, and treatment to be intensified and thereby increases the satisfaction, quality of life, and wellbeing of those who use the technology, it can also have side effects.

Not all healthcare institutions are prepared at this point to implement metaverse technology seamlessly, creating infrastructure and technology integration challenges. Limited access most often affects smaller or underfunded healthcare facilities, which may find it challenging to invest in the technology needed to integrate with the metaverse. The cost of hardware and software to make a metaverse-enabled healthcare facility is a crucial challenge. In addition, most clinicians and patients do not yet have the tools to use the metaverse.

Not all patients have a positive attitude to receiving treatment online or remotely. Older populations may find adapting and embracing modern technologies more challenging, particularly the metaverse.

Moreover, concerns have arisen that, particularly for people struggling with mental health problems, spending hours in a metaverse environment can result in growing social phobia, complexes, or the loss of basic skills to function in the real world. The long-term impact of the metaverse on human functioning and child development is also unknown. This environment reduces virtual distance but unfortunately does not reduce loneliness, which may result in depression or various disorders. Also unexplored is the impact of technology on genetically affected people with schizophrenia-type symptoms. Thus, further development of the metaverse should be monitored by specialists in psychology, sociology, and ethics.

Another aspect to consider is the management and storage of patient data. Doctors, physiotherapists, and psychologists have expressed concerns about privacy, ethics, and security as healthcare moves to the metaverse. The metaverse would rely heavily on collecting, storing, and transmitting sensitive health information, which raises significant challenges related to data privacy, cybersecurity, and compliance with healthcare regulations. Any breach or misuse of such data could have serious consequences for patient trust and safety. As such, addressing privacy and security issues and technical, legislative, and regulatory concerns is essential.

4.3 Consumers and companies

Another key aspect of the metaverse is understanding its potential impact on business and the economy. Figure 11 shows some views from Meta’s Horizon Worlds, including shops and entertainment outlets. This is the main way the metaverse may become a pervasive medium, as happened for the internet before. Although defining the metaverse remains a challenge due to its multidisciplinary nature and wide range of applications, the growing belief that it will create new opportunities for brands and companies suggests the importance for marketers of capitalising on these developments and integrating them into firms’ strategic plans (Dwivedi et al., 2022; Di Paolo et al., 2025). For this reason, it has become crucial to understand how and why the metaverse may be adopted by companies and, hence, by individuals (both as employees and as consumers). In a market economy, this may occur only if it can generate value for each party. The metaverse is thus expected to emphasize the significance of the experiential dimension in value exchange processes between brands and consumers (Giang Barrera and Shah, 2023). The profound impact of internet-driven technological advances on customer experience—the “set of interactions between a customer and a product, a company, or part of its organization” (Gentile et al., 2007) —has been readily apparent with the emergence of e-commerce, digital advertising, and social media engagement. Transformative technologies have reshaped traditional notions of customer interaction and become integral to business strategies. The metaverse thus introduces two novel elements that are intricately connected and are set to reshape how individuals engage with brand offerings.

Figure 11

Futuristic urban environment with vibrant neon colors and digital buildings. Streets feature glowing signs and towering structures. People are visible walking along illuminated pathways, surrounded by stylized trees and floating elements that enhance the digital cityscape.

Figure 11. Some snapshots from Horizon Worlds showing shops and entertainment areas.

First, customer experiences are becoming increasingly virtual. Virtual experiences occur in 3D technology-mediated environments, mirroring real-world affordance, interactions, and scenarios (Li et al., 2001). The second novel element is that customer experiences are becoming increasingly immersive. Immersion is the extent to which a medium can deliver an inclusive, extensive, surrounding and a vivid illusion of reality to a participant’s senses (Slater and Wilbur, 1997); this largely depends on the medium employed to experience the virtual content. Modern devices such as VR headsets are portable, wearable, and increasingly integrated into the human body, and are thus capable of extending sensory, cognitive, and motor functions (Flavián et al., 2019). Including three dimensions and immersion as the two core components of the novel consumer experience makes the metaverse distinguishable from previous technological innovations, paving the way for creating new sources of value for both consumers and companies. From a consumer standpoint, interacting in virtual spaces allows the replication of existing settings and activities from the physical world, thereby overcoming spatial and temporal constraints. For instance, virtual experiences have the potential to simulate authentic shopping sessions, enabling individuals to explore virtual retail environments from the convenience of their homes. It also enables the exploration of new forms of customer experience that were previously impossible, offering innovative and entertaining ways to browse and purchase products or services (Alcañiz et al., 2019). For instance, customers could engage with company offerings in imaginative or unconventional settings, radically transforming traditional ways of interacting with products and services (e.g., Bin Kim and Jung Choo, 2023). In parallel, the immersive nature of VR technology is expected to create an authentic sense of active participation in virtual experiences, a phenomenon extensively studied in the past and referred to as “spatial presence” (Wirth et al., 2007). In recent years, the investigation of the effects of presence in marketing and consumer behaviour has garnered significant academic attention based on the hypothesis that presence can enhance the cognitive, emotional, and behavioral responses of consumers. Previous research in cyberpsychology shows how presence induced by immersive technologies is strongly linked to intensified emotional reactions (Riva et al., 2007)—a central dimension in consumer experience design. Presence induced by immersive media technology in a pre-purchase context is expected to influence various brand experience dimensions (Wedel et al., 2020). This perspective provides a solid basis for developing highly promising research streams on the impact of immersive and virtual technologies in marketing (Dwivedi et al., 2022).

The business sector can leverage the metaverse by adjusting its offerings and business models to align with this innovative technological shift. Several companies are already investing in marketing initiatives, including product placement, advertising, and branded events on platforms such as Roblox, The Sandbox, and Decentraland. These are virtual platforms where people can play, connect, and transact; they are thus considered embryonic forms of the forthcoming metaverse. The evolution from traditional to digital environments and now to the metaverse has been witnessed in prominent brands such as Gucci, Coca-Cola, and Nike. Other companies are building their distinct virtual presence by establishing proprietary platforms and spot initiatives to enrich brand offerings before, during, and after offering consumption. Regardless of the nature of the platform involved, these initiatives already highlight the latent potential of the metaverse to generate value and the potential to use immersive technologies and promises of profitability. However, several frictions and challenges prevent companies from including metaverse-related experiences in their marketing strategy. Limited consumer acceptance of immersive devices is widely recognized as one of those barriers (Herz and Rauschnabel, 2019) but still represents merely one facet of the overarching challenge. Providing consumers with immersive experiences usually requires significant investments for practitioners (Bowman and McMahan, 2007) who are often faced with a late return in company revenues (Dincelli and Yayla, 2022). In this perspective, the lack of a clear understanding of consumer demands in such virtual environments and a scarce comprehension of brand-consumer dynamics in creating immersive and engaging experiences still constitute a frictional factor in allocating resources to immersive customer experiences. Additionally, the ability to provide a satisfactory participant experience (how individuals interact with virtual content, including products, through a participant interface) is a crucial contributor to customer experience.

To address these limitations, future research on the impact of virtual and immersive technologies must shift from broad, exploratory analysis to a more focused investigation of value creation dynamics within specific, real-world contexts. Part of the existing literature on the potential effects of the metaverse is grounded in hypothetical applications, complicating efforts to understand its practical relevance for businesses. While forward-looking research on the metaverse is undoubtedly crucial for equipping practitioners, academics, and policymakers with insights into the transformative potential of this technology, the absence of concrete, real-world case studies exacerbates the gap between theory and practice in marketing. Therefore, future research should explore how companies engage with the metaverse and should provide focused academic support tailored to these developments. Much like the early days of the Web 1.0 revolution, the metaverse is currently seen more as an entertainment platform than a fully realized sales channel. As a result, most corporate initiatives are concentrated at the top of the marketing funnel, particularly in the pre-purchase phase.

Interestingly, certain areas appear to present more compelling solutions and use cases than others, and they offer promising avenues for further research on this topic. For instance, immersive virtual experiences have already found significant applications within the advertising domain, mainly through product placement, where branded products are seamlessly integrated into entertainment environments. Product placement has become a widely adopted monetization strategy on virtual platforms, and its effectiveness is often attributed to its ability to incorporate advertisements in a non-disruptive manner. Rather than interrupting the experience, product placement contributes to a cohesive and immersive narrative (Balasubramanian et al., 2006). In this context, product placement stands out as one of the most prevalent metaverse-related initiatives, as it successfully maintains the entertainment aspect that characterizes metaverse experiences while simultaneously harnessing the virtual environment as a valuable tool for business growth and value creation. Although this approach is not entirely new, the nature of participant engagement within digital environments can significantly influence how consumers perceive and react to various stimuli (Wu and Lin, 2018). Consequently, further research is needed to investigate the effects of product placement on consumer attentional processes within these immersive environments and identify optimal engagement strategies that enhance the effectiveness of such placements.

Another significant application of metaverse-related technologies lies in contexts where the inherent complexity of products limits the availability of comprehensive and reliable consumer information. In such cases, immersive technologies become indispensable for mitigating uncertainty and enhancing consumer confidence in their decision-making processes. By offering highly detailed, anticipatory experiences, immersive virtual environments prove particularly valuable in industries where the product is largely experiential, such as tourism (Di Dalmazi et al., 2024). In this field, immersive technologies bridge the gap between promotional materials and the intangible nature of services, providing a more concrete representation of what consumers can expect. This makes such technologies an ideal tool for delivering “try-before-you-go” experiences and effectively allowing prospective tourists to preview their experience in the pre-purchase phase. Such technology simplifies the process of information gathering, thus facilitating smoother decision-making and transactions. While the literature on VR marketing in tourism is more developed than in other sectors, significant questions remain regarding the efficacy of immersive technologies in fostering positive and authentic tourist experiences. Future research should, therefore, assess the extent to which virtual technologies can effectively enhance the consumer experience and how this improvement can translate into tangible value for businesses in this sector. It will also be essential to consider the contextual factors that allow companies to maximize returns on their investments, thus ensuring that the deployment of these technologies aligns with strategic goals and optimizes outcomes.

Another highly promising application of metaverse initiatives lies in the strategic management of branding. Effective brand management is essential for maintaining a competitive edge as the brand serves as a distinguishing feature that not only differentiates a company from its competitors but also fosters enduring consumer loyalty (Wood, 2000). Immersive environments possess the unique ability to evoke consumer reactions akin to real-world experiences, making them an optimal medium for delivering impactful and engaging brand communication. Furthermore, the dematerialization of a physical context facilitates the scalability of these initiatives, enabling a broader transformation aimed at enhancing brand equity. Therefore, three-dimensional virtual spaces are emerging as a powerful tool for cultivating deeper connections between consumers and brands. Several companies are capitalizing on this new form of brand–consumer interaction through immersive experiences on third-party or proprietary platforms. These experiences manifest as experiential spaces, virtual headquarters, retail environments, and pop-up stores, offering consumers opportunities to engage with products and services or participate in entertainment activities, performances, and events. The primary objective of these metaverse strategies often extends beyond merely enhancing positive attitudes towards brands and offerings, given that they are often designed to influence brand positioning and generate specific associations with desired values. For instance, many brands are starting to utilize immersive spaces to raise awareness about corporate sustainability efforts, strengthening the brand’s equity and aligning it with socially responsible values. In this context, future research should investigate whether these initiatives truly offer superior opportunities compared to traditional communication channels and quantify their effectiveness by adopting a dual perspective that focuses on both consumer engagement and corporate performance. Additionally, research should assist companies in identifying optimal use cases, target participant profiles, and the resources and capabilities required to successfully develop these initiatives.

The integration of virtual and immersive experiences in the consumer marketing process also underscores several barriers and risks for the individual and society. Immersive technologies do hold significant promise in enriching user experiences and broadening access to diverse forms of content. Nevertheless, the push for ever more captivating digital experiences and the rapid development of highly immersive devices may gradually lead people to favor virtual over real-world interactions in the long term. This trend risks weakening social bonds and reshaping the nature of human relationships. For companies, this could have reputational repercussions as some consumers may perceive them as contributors to a system that fosters digital alienation. More urgently, these technologies open avenues for subtle forms of consumer manipulation. Indeed, emotions triggered in these contexts can be as intense as those experienced in actual life (Chirico et al., 2017), potentially making consumers more susceptible to persuasive messages, even when those messages are manipulative in nature.

Finally, understanding the methodologies that support research and the opportunities arising from using new media to access virtual experiences is crucial for studying the value generated by the metaverse. This comprehension will enable a more precise evaluation of how immersive environments contribute to business strategies and consumer engagement. Experiences in the metaverse allow access to a broad spectrum of data about individuals’ behaviour in virtual spaces. The ability to gather more precise information about participants’ preferences and needs also allows for increasingly targeted outreach to the audience. This phenomenon is further scalable thanks to recent advances in generative AI, as discussed above (Dwivedi et al., 2022). With this in mind, businesses are already using the metaverse as an additional touchpoint to boost traffic to their initiatives and attract younger target audiences. The metaverse’s potential for multi-accessibility and the complete elimination of any physical barriers enables the reach of global segments at low costs. This signifies a paradigm shift where the metaverse becomes a strategic arena for data-driven insights, personalized marketing, and enhanced global outreach for forward-thinking businesses. Recent approaches to studying individual experience underscore the significance of employing market research techniques that extend beyond conventional methodologies. This involves incorporating signals and markers capable of implicitly gauging individual responses through novel neuroscientific tools (Hsu, 2017). Therefore, the need for new skills capable of managing emerging types and forms of data will be fundamental in identifying the competencies required to build a strong and valuable presence for companies in the metaverse. It is also crucial to emphasize that a marketing strategy that incorporates immersive experiences as an additional touchpoint for consumers inevitably necessitates acquiring business competencies related to generating satisfactory participant experiences. These skills are rarely found within companies that do not operate in the technological sector, thus making the capacity of companies to exhibit absorptive capacity crucial for the formulation and proliferation of commercial solutions within the metaverse. Moreover, these technologies are not only immersive and virtual but also increasingly integrated into users’ sensory environments. They can deliver content directly through auditory and visual channels, enhancing immersion, but they also allow for the reverse flow of data. Many recent VR systems, for instance, are equipped with eye-tracking capabilities, enabling them to capture fine-grained user data. The increasing ability to collect granular behavioral data through digital platforms, even from subtle non-verbal actions (e.g., Dui et al., 2022), raises potential concerns regarding the sensitivity of such information and how it is being used. The capacity to track users’ attention and emotional states in real time presents the risk of highly intrusive marketing tactics. By targeting individuals during emotionally charged or psychologically vulnerable moments, brands might amplify the persuasive power of their messaging. These practices pose serious ethical issues related to privacy, transparency, and the fairness of consumer engagement.

Another crucial challenge is navigating the integration of virtual initiatives within the broader framework of existing channels. The success of this integration hinges on creating a seamless participant experience that transcends physical, digital, and virtual realms. Brands must strive for consistent messaging and branding, ensuring a coherent participant experience through a unified journey irrespective of the touchpoints involved. Examining the importance of directing attention toward these challenges is paramount in understanding the rationale behind companies’ investment in heightened development costs for the provision of highly immersive experiences. Identifying measurable benefits derived from immersion and discerning the boundary conditions that optimize economic returns, particularly concerning the brand–individual relationship, individual characteristics, and application contexts, are some of the crucial challenges that will shape future research.

4.4 The future of work

The integration of immersive technologies and extended reality (XR) into the workplace represents a rapidly expanding area of research, characterized by significant advances in virtual collaboration, training, and the deployment of digital colleagues. Virtual collaboration platforms are facilitating more interactive and engaging virtual meetings by leveraging spatial audio, eye, and face tracking, and using realistic avatars to enhance the sense of presence and improve interaction dynamics. XR technologies are increasingly utilized for training and skills development, creating realistic training simulations that reduce the time and cost associated with traditional training methods. These simulations provide immersive environments where employees can practice and refine their skills in a safe and controlled setting. Additionally, AI-powered virtual assistants and digital twins are beginning to assume routine tasks as digital colleagues, thereby allowing human workers to concentrate on more complex and creative activities. This shift not only enhances productivity but also improves job satisfaction by alleviating the burden of repetitive tasks. Consequently, the metaverse holds the potential to transform workforce dynamics, potentially mitigating some of the disadvantages associated with remote working and two-dimensional online meetings. Although the full implications of the metaverse on labour markets and the nature of work remain largely unknown, recent studies have explored the benefits of the enterprise metaverse, specifically the use of immersive and XR technologies to facilitate remote and hybrid business meetings.

4.4.1 Remote meetings become more common

In recent years, the prevalence of remote meetings in the workplace has significantly increased. This trend was particularly pronounced during the COVID-19 pandemic, which saw a dramatic rise in the number of business meetings conducted via videoconferencing platforms such as Zoom and Microsoft Teams (Agostino et al., 2020; Waizenegger et al., 2020; Bennett et al., 2021). The widespread adoption of these 2D videoconferencing platforms for business meetings offers several notable advantages. First, remote meetings provide greater flexibility in scheduling and accommodating different time zones, as participants can join from any location with internet access (Mohamedbhai et al., 2021; Standaert et al., 2022). Additionally, remote meetings enhance efficiency by reducing time spent on non-essential activities such as commuting, thereby lowering the environmental impact associated with in-person meetings and resulting in travel-related cost savings. Furthermore, remote working can facilitate a better work–life balance by easing the management of tasks related to childcare and household responsibilities (Sullivan, 2012).

However, remote working also presents challenges, primarily due to the necessity of communicating at a distance, which can lead to a lack of social interaction with colleagues. This isolation may contribute to mental health issues such as loneliness and feelings of isolation (Van Zoonen and Sivunen, 2022). Although online videoconferencing partially alleviates these issues, participants often still experience a lack of social presence, defined as “the sense of being with another person” (Short et al., 1976; Hove and Watson, 2022). Moreover, videoconferencing can introduce additional problems, such as increased fatigue resulting from the physical and mental effort required to engage in these meetings—a phenomenon widely recognized during the pandemic as “Zoom fatigue” (Bailenson, 2021; Montag et al., 2022; Fauville et al., 2023).

4.4.2 The emergence of virtual collaboration and social XR platforms

In scenarios where videoconferencing falls short in terms of engagement and conversational flow during business meetings, embodied virtual reality (VR) and extended reality (XR) technologies present a promising alternative (Campbell et al., 2019; Abdullah et al., 2021; Sadeghi et al., 2021; Döring et al., 2022; Standaert et al., 2022). In recent years, the maturity, technical readiness, and proliferation of VR communication solutions have rapidly increased. Notable platforms are listed in Supplementary Table S1, each offering unique strengths, weaknesses, and levels of interaction. Embodied VR, which tracks participants’ movements and facial expressions, allows them to control their avatars’ nonverbal communication within the virtual environment. This technology has demonstrated efficacy in fostering a sense of social presence (Smith H. J. and Neff M., 2018; Smith T. and Neff D., 2018), making it particularly beneficial for remote meetings. The interpersonal connection facilitated by embodied VR is critical for groups to collectively develop innovative ideas during brainstorming sessions (Paulus and Kenworthy, 2019). Avatars in VR can support individuals who experience discomfort with public speaking by providing a sense of concealment behind their virtual representation (Abramczuk et al., 2023). This capability positively impacts XR meetings by reducing the perception of being observed and lowering self-consciousness compared to videoconferencing (Lennig et al., 2023). By mitigating these negative effects, avatars can decrease the nonverbal load (Bailenson, 2021; Fauville et al., 2023), thereby enhancing overall wellbeing.

Social XR, in particular, represents an emerging paradigm of social interaction mediated by XR technologies, wherein individuals experience both social and spatial presence, engaging in real-time interpersonal conversations and shared activities. This paradigm is crucial for facilitating social interactions within the metaverse, a network of virtual, computer-mediated environments (Hennig-Thurau et al., 2023). Numerous social XR applications have been developed and deployed to enhance remote interaction and collaboration, especially during the COVID-19 pandemic (Osborne et al., 2023). These applications and their immersive environments enable participants to interact in ways that closely mimic face-to-face communication by enhancing social (Campbell et al., 2019; Abramczuk et al., 2023) and spatial (Hartmann et al., 2015) presence. This enhancement fosters a sense of togetherness—the feeling of being together with others in a virtual environment (Durlach and Slater, 2000; Barreda-Ángeles and Hartmann, 2022). Moreover, social XR improves the comprehension of nonverbal cues exhibited by fellow participants, allowing them to discern others’ intentions and gauge their engagement levels during meetings (Abramczuk et al., 2023). This capability addresses a significant challenge in videoconferencing, where detecting nonverbal cues is often difficult (Bailenson, 2021). Enhanced detection of nonverbal cues leads to improved turn-taking (Degutyte and Astell, 2021), which positively impacts conversational flow. Mills and Boschker (2023) suggested that social XR enables conversation partners to flexibly take turns through visual gaze, further demonstrating social XR’s potential to overcome the limitations of videoconferencing for remote business meetings. In summary, social XR not only provides a more natural and immersive experience (Skowronek et al., 2022) but also facilitates the convergence of geographically separated individuals within the same virtual environment. This sense of co-location allows participants to communicate as if they were interacting face-to-face (Perry, 2016; Standaert et al., 2022).

4.4.3 Moving forward: areas for future research

It is frequently posited that immersive technologies such as virtual reality (VR) could provide the ultimate solution to the challenges associated with videoconferencing. These technologies aim to mitigate the drawbacks of videoconferencing by offering a more engaging and realistic experience. Future research on the metaverse for work and labour should focus on several key challenges. First, a significant limitation of current VR communication platforms and social XR solutions is the notable discrepancy between the avatars and those they represent. According to the Proteus effect (Yee and Bailenson, 2007), an individual’s behavior in the virtual environment is influenced by the characteristics of their avatars. This divergence can impact the behavior of meeting participants. In social immersive environments, realistic avatars are perceived as being significantly more human-like when used to represent other participants, leading to stronger acceptance in terms of virtual body ownership (Latoschik et al., 2017). The importance of avatar realism suggests the potential viability of photorealistic social XR for remote business meetings, where individuals are scanned by a camera and their representations are displayed in the virtual environment using point cloud technology (Prins et al., 2018; Gunkel et al., 2021). The effectiveness of certain platforms for business meetings could be a promising area for future research. Additionally, the impact of avatars on workplace equality may serve as an intriguing research trajectory, particularly in exploring whether avatars can help prevent workplace discrimination. A second limitation is rooted in the technology itself, specifically in the head-mounted displays (HMDs) that participants wear to enter the virtual world. Participants in immersive and virtual environments often report that wearing the HMD becomes cumbersome over time. This discomfort can hinder participants’ ability to engage in longer or multiple consecutive experiences, thereby reducing the efficacy of XR for business meetings. However, as XR technology evolves, newer HMDs are becoming lighter and less stressful to wear. Given that the acceptance of XR is negatively affected by cyber-sickness (Sagnier et al., 2020), it is crucial to continue studying issues related to cyber-sickness. Lastly, the dynamics of taking and yielding conversational turns in multiparty XR meetings remain uncertain. Considering that gaze patterns, which are crucial for interaction and turn-taking, vary with group sizes (Maran et al., 2020), investigating how these dynamics manifest in XR could be a valuable avenue for future research. It is increasingly evident that a refined understanding of the capabilities and effects on participant experience is needed. Future research should analyse behavioural patterns in greater depth and transition from experimental settings to real-life applications.

4.5 Immersive democracy: many risks and some opportunities

The metaverse could provide many new opportunities for enhancing inclusion and democratic participation. It offers new spaces for movement and action for disadvantaged individuals who can overcome physical, psychological, and social barriers in the real world by assuming different identities as avatars. The metaverse has the potential to create more immersive cross-border connections and opportunities for empathy-building through perspective shifts. By allowing individuals to step into the shoes of marginalized groups as avatars or in specialized apps, empathy and social skills can be strengthened and intolerance can be reduced. It also presents new possibilities for civic education and engaging with target audiences that are traditionally difficult to reach through methods such as edutainment and serious gaming. It has the potential to strengthen public participation processes and individual experiences of self-efficacy, thus reinforcing pro-democratic attitudes. Allowing people to experiment with and shape their own virtual worlds can also enhance their sociological imagination and potentially drive transformations in the real world. The existence of the “Wistaverse,” a metaverse designed for protests and social actions, further demonstrates how the metaverse can be used to unite, protect, and give a voice to people. It remains to be seen whether it will be possible to open up the metaverse as a space for participation and political engagement on a relevant scale. On the other hand, there are already significant risks.

Harnessing the democratic potentials of the metaverse is not automatic and depends on conscious pro-democratic design choices that consider inclusive spaces and unintended negative effects. This responsibility lies not only with the companies driving the advance of the metaverse, but it is also necessary to make it attractive and positive for participants. Without appropriate regulation and moderation, the metaverse could replicate—or even amplify—the negative phenomena already visible in current online spaces. Risks include harassment, disinformation, extremist radicalization, pseudo-participatory governance models, surveillance capitalism, and social polarization. Such incidents such as harassment, disinformation, or extremism can generate negative headlines and deter potential participants from engaging with the metaverse. A survey conducted by KPMG and the Sinus Institute in 2022 among individuals aged 14 to 39 in Germany revealed that 68% of respondents were looking forward to a “new identity in the metaverse.” Additionally, 43% believed that they could make their consumption in the metaverse almost completely sustainable (climate-neutral, species protection, resource conservation), but 69% expressed concerns about becoming victims of discrimination, cyberbullying, or hate speech.

Thus, while the metaverse holds real potential to foster democratic innovation, it simultaneously poses significant risks to democratic values. These risks must be considered in platform design, regulation, and civic participation strategies to ensure that the metaverse evolves as a force for democratic enrichment rather than erosion.

Dwivedi (2023) employs the term “Darkverse” to encompass the darker aspects of the metaverse, including but not limited to privacy and diminished reality, identity theft, invasive advertising, misinformation, propaganda, phishing, financial crimes, terrorist activities, abuse, pornography, social exclusion, mental health, sexual harassment, and unintended negative consequences within the metaverse. In the context of gaming, Kowert (2020) identifies hate speech, (sexual) harassment, trolling, griefing, doxxing, fake news, cheating, trash talking, contrary play, and inappropriate role playing as forms of “dark participation.” These latter aspects pose unique challenges for avatar-based environments, which distinguishes them from traditional social media platforms. Certain publications raise critical positions and concerns about a new dimension of digital surveillance capitalism (Anderson and Rainie, 2022; Bojic, 2022) or warn about scenarios of violent radicalization (Bajwa, 2022). The Anti-Defamation League reports an increase in discrimination and far-right ideologies in online games⁹, including immersive environments such as Fortnite or Roblox. Instances of participant-generated environments on Roblox not only foster creativity, collaboration, and a sense of self-efficacy but also encourage group-based prejudice and right-wing extremism, such as environments featuring quests who promote social Darwinism by running over homeless people. In Minecraft, participants construct imaginative worlds alongside instances of swastikas and racist insults. Although democratic participation approaches are visible in the metaverse, they sometimes appear precarious or even counterproductive. In addition to the risk of pseudo-participatory processes, the design of participation in immersive digital environments can also contribute to undemocratic and plutocratic developments. For instance, in the platform Decentraland, participants can participate in voting on the development and governance of the virtual world, but the weight of their votes depends on the amount of MANA tokens, or digital currency, they hold. This violates the democratic principle of “One Person, one vote” (Quent, 2023).

The game Second Life has already been used for political campaigns, and the metaverse will also become a venue for political campaigns and controversies. In order to protect the integrity of elections and prevent disinformation, manipulation, and polarization, strict rules, transparency standards, and independent oversight bodies need to be established for these activities. This is particularly important for personalized (election) advertising and the use of artificial intelligence.

Potential negative social impacts should be considered in the design of the metaverse to implement preventive measures against dangers that have accelerated and amplified threats to democracy in the Web 2.0 era. It is important to involve civil society actors beyond industry and academia in shaping the metaverse to foster a democratic culture within it and ensure that it serves as a space for positive social interactions and constructive discourse.

5 Perceptions of the metaverse

In 2023, the French think tank Renaissance Numérique conducted a study on French people’s perceptions of the metaverse (Renaissance Numérique, 2023b). In this qualitative study, 24 people were interviewed, including 17 early adopters and seven virtual reality professionals. The decision to specifically interview this study sample was based on the assumption that their concrete and sensitive experiences with digitally simulated immersive worlds might encourage them to be among the first to test the proposed experiences in the metaverse.

Focusing both on the perceptions of “everyday people” and immersive worlds experts, the study reveals a complex mix of curiosity, hesitation, and caution about this new digital frontier. It highlights how French people’s understanding of the metaverse remains ambiguous, particularly regarding its utility, accessibility, and the potential changes it may bring to society. The following sections are based on this research.

5.1 General awareness and definitions

Awareness of the metaverse concept has grown significantly, largely spurred by media coverage and corporate rebranding by tech giants like Meta. However, definitions of what the metaverse entails vary widely. For some, it is perceived as a virtual world primarily designed for gaming, while others see it as a comprehensive immersive experience with applications extending beyond entertainment into work, education, and social interaction. Despite this growing familiarity, many French people remain uncertain about its practical applications and the tangible benefits it offers over existing digital platforms.

5.2 Perceived opportunities and benefits

Many of the people who were interviewed express a cautious optimism, seeing the metaverse as a potential arena for innovation, creativity, and new forms of social interaction. They envision opportunities for enhanced experiences, such as virtual tourism, remote work, and digital art. The potential for real-time interaction with distant family or friends also resonates positively, especially as technology has already begun to reshape communication practices. However, these aspirations are tempered by a need for clarity on how the metaverse would meaningfully distinguish itself from current online platforms and virtual reality applications.

5.3 Concerns and scepticism

Scepticism about the metaverse is significant and multifaceted. Privacy and data security are prominent concerns, as the immersive nature of the metaverse would entail collecting more personal data, including behavioral and biometric information, which could increase exposure to privacy risks. There is also apprehension about potential issues of addiction and escapism, with respondents expressing worry that the metaverse might encourage people to prioritize virtual interactions over real-life experiences. Environmental concerns are also notable as the metaverse’s heavy reliance on advanced technology could exacerbate energy consumption and electronic waste.

Moreover, the influence of large corporations like Meta in shaping the metaverse raises concerns about monopolistic control, surveillance, and commercialization. Respondents felt uneasy about how these corporations might control participant data, interactions, and experiences within these virtual spaces, potentially exploiting participants for profit.

5.4 The dystopian potential

The study also captures a widespread fear of a dystopian future in which the metaverse could intensify existing societal issues. Respondents worry that an overreliance on virtual environments could erode social bonds, weaken communal ties, and further stratify society. The metaverse, some fear, could become an “echo chamber” where individuals are exposed only to content that reinforces their beliefs and preferences, limiting exposure to diverse perspectives. There is also concern that people without access to the metaverse, due to socioeconomic constraints or digital illiteracy, might face further marginalization, leading to a widening digital divide.

5.5 Moving forward: governance and regulation

Finally, the study underlines the importance of developing a regulatory framework to address these concerns. This framework should prioritize participant autonomy, data privacy, and ethical governance to foster trust and protect participants in the metaverse. There is a strong call for multi-stakeholder governance involving policymakers, tech companies, and civil society to ensure that the development of the metaverse is guided by principles that safeguard public interest and societal values.

In conclusion, while there is excitement around the possibilities the metaverse could offer, there is equally strong caution about its potential risks. The French public’s perception highlights a critical need for transparency, regulation, and responsible innovation to ensure that the metaverse evolves in a way that aligns with societal values and safeguards individual rights. As the metaverse continues to develop, addressing these concerns will be pivotal to gaining public acceptance and ensuring that it becomes a tool that enhances, rather than detracts from, real-world social and cultural life.

6 Regulating the metaverse

Metaverses promise to redefine how people interact, work, and socialize online. However, these virtual environments also bring complex governance challenges that stretch the limits of current legal and regulatory systems. Renaissance Numérique (2023a) explored these issues, proposing a multi-faceted approach to governance that emphasizes adaptability, collaboration, and inclusion. This section is mainly based on this study.

6.1 Understanding governance in the age of the metaverse

The concept of governance in the metaverse is inherently complex. Unlike traditional online environments, the metaverse is a blend of virtual and real-world interactions where digital avatars navigate immersive, often three-dimensional spaces. Effective governance in this context goes beyond merely policing online behavior; it requires a cooperative approach among government bodies, private entities, and civil society to establish cohesive norms, regulatory standards, and processes that address the specific challenges posed by immersive online environments.

Central to this approach is the need for clear, high-level regulations that work in tandem with more granular content moderation practices. Governance in the metaverse must balance regulatory oversight with flexibility, allowing for participant-generated content while protecting participants from harmful activities. This layered approach to governance, where policy and practice evolve together, is crucial for managing the unique issues posed by immersive environments.

6.2 The current European legal framework: strengths and shortcomings

Current EU legal texts, such as the General Data Protection Regulation (GDPR), the Digital Services Act (DSA), and the Digital Markets Act (DMA), provide a strong foundation for addressing many of the issues related to data protection, competition, and digital services. Nevertheless, these laws often fall short when applied to the metaverse, a space that demands new levels of interoperability and economic flexibility. Without additional coherence and standardization, the existing framework will be insufficient to effectively regulate the metaverse.

A significant aspect of a metaverse-proof legal framework is trust in virtual transactions, as the metaverse will most probably rely heavily on digital commerce and asset exchange. Interoperability standards that allow participants to navigate between different digital environments with their virtual identities and assets intact are necessary, as are legal standards that ensure the protection of these assets across different platforms. Innovation in business models, particularly those that ensure participant protection within these digital ecosystems, is critical for a functional governance model.

6.3 Standardization and compliance by design

To avoid the regulatory pitfalls of retroactive policymaking, Renaissance Numérique calls for “compliance by design”, where ethical and regulatory standards are integrated directly into the technological and infrastructural architecture of the metaverse. This approach means that from the outset, designers and developers would incorporate security, interoperability, and ethical considerations into their platforms. Such design would not only ease the regulatory burden but would also provide a stable foundation for trust and participant engagement in virtual spaces.

Compliance by design requires coordination among international standard-setting bodies and regulatory authorities as well as commitment from technology companies. While this integration can avoid potential harms and legal challenges, its implementation remains a complex task, necessitating broad cooperation across sectors.

6.4 Adapting governance through experimental approaches

In response to the metaverse’s constantly evolving nature, there is a need for agile, experimental approaches to regulation, such as policy prototyping and regulatory sandboxes. These mechanisms allow for iterative testing and feedback, enabling regulators and developers to adapt governance frameworks as technology and participant behavior evolve. Such an approach helps identify practical challenges in real-world settings, allowing for more responsive and tailored governance strategies.

Despite the advantages of regulatory sandboxes and policy prototyping, financial and resource constraints currently limit their broader application. Increased investment in these experimental methods is essential if regulators hope to stay abreast of technological advances in the metaverse.

6.5 Toward a collective, multi-stakeholder governance

A multi-stakeholder model, where regulators, industry leaders, civil society, and participants collaboratively shape the governance of the metaverse, appears necessary. This inclusive approach reflects the diversity of interests and perspectives within virtual spaces, acknowledging that the metaverse is not simply a digital product but a new societal arena where individuals gather, create, and express themselves. By bringing together diverse stakeholders, a more structured governance framework can be created where responsibilities are clearly delineated and aligned with the goals of an inclusive and sustainable digital society.

6.6 Specific governance challenges in the metaverse

The metaverse presents unique legal and regulatory challenges, especially concerning the protection of intellectual property and safeguarding against criminal behavior. In these immersive spaces, avatars interact in ways that are both complex and difficult to regulate. Ensuring that participants’ intellectual property rights are respected, for instance, requires clearer legal definitions and enforcement mechanisms adapted to the digital assets traded within the metaverse.

Additionally, Renaissance Numérique has identified a gap in how existing laws address behaviours specific to virtual spaces, such as harassment or fraud involving avatars and virtual property. Given that current laws were not designed with these scenarios in mind, the think tank advocates for a regulatory re-examination that includes input from both legal experts and digital rights advocates to ensure that emerging regulations are both practical and enforceable.

6.7 The metaverse as a governance laboratory

In the final analysis, the metaverse should be seen as a kind of “laboratory” for internet governance. By confronting the diverse challenges posed by the metaverse, policymakers, technologists, and participants alike can develop a more nuanced and effective governance model that could benefit the broader internet. The lessons learned from managing these immersive spaces may offer valuable insights for addressing long standing issues of internet governance, such as privacy, security, and participants’ rights.

Tomorrow’s internet, whether it takes the form of the metaverse or not, demands a holistic and adaptive governance approach. By building on existing regulations and fostering collaboration among stakeholders, the society can navigate the transition to immersive digital environments in a way that upholds ethical standards, protects participant rights, and encourages innovation. This vision for governance, grounded in current regulations but open to evolution, is essential for shaping a responsible, inclusive, and future-proof internet.

6.8 Key legal concerns from an EU perspective

A crucial question from an EU regulatory perspective is whether the new European digital acts (e.g., the DSA and the DMA) apply to virtual world platforms and the diverse actors engaged in metaverse activities and experiences (Lopez-Tarruella and Rodríguez de las Heras Ballell, 2024). Given the broad scope of these regulations, an initial review suggests that they likely do. However, specific legal issues will need further attention as they will play a major role in shaping the stability and success of this technology. The following subsections provide a non-exhaustive summary of key issues, along with an introductory outline of the primary challenges that may arise in addressing each one. Many of these areas align with the action lines identified by the Chair for the Responsible Development of the Metaverse at the University of Alicante and have already been explored by its collaborators in a series of working papers¹⁰ published as part of the Chair’s activities, from which inspiration has been drawn for this section.

6.8.1 Content and behavior moderation

To ensure a safe and welcoming environment, the metaverse will need advanced systems for content and behavior moderation¹¹. These systems must tackle not only illicit content but also harmful behaviors such as harassment and hate speech, while carefully balancing fundamental rights like freedom of expression (Bovenzi, 2024). Key moderation tools—such as AI-powered applications, community guidelines, and participant reporting mechanisms—will be essential for guaranteeing a fair and transparent experience. However, these practices must also comply with current EU standards on behavior moderation, including the DSA, as well as directives on issues like terrorist content, sexual abuse and child exploitation, and copyright.

6.8.2 Personal data protection

The immersive and interactive nature of the metaverse is expected to drive an unprecedented level of personal data collection. These data will span a wide array of sources, including XR devices, haptic technologies, and brain–computer interfaces (BCIs), which can reveal participant emotions, responses, and physical behaviors in virtual settings¹². Much of this information (e.g., biometric data like iris scans) could be classified as “sensitive” and therefore demands protection under Art. 9 of the GDPR in a way adapted to these novel digital environments.

Furthermore, participants may face challenges in exercising rights such as data erasure, the right to be forgotten, and data portability due to the persistent, interconnected, and decentralized nature of the metaverse (Kramcsak and Papakonstantinou, 2024; Lopez-Guzmán, 2024; Menéndez et al., 2024; Moerel, 2024).

6.8.3 Avatars and digital identity

Avatars (i.e., the digital representation of participants in virtual environments) raise numerous legal and ethical questions regarding their use, rights, and responsibilities (Coppo et al., 2024; Ebers, 2024; Raposo et al., 2024). These include, for instance, whether avatars must reflect participants’ real-world identities, whether participants can create multiple avatars across virtual worlds, and who bears responsibility for avatars’ actions when the participant is disconnected. There are also questions about whether avatars have a “right of image” or whether participants have ownership rights over them. Platforms that deploy AI-based avatars capable of interacting autonomously raise additional questions, such as whether participants should be informed of an avatar’s synthetic nature and whether there should be limits on what non-human avatars can do.

Digital identity is another complex issue, especially with the introduction of the European Digital Identity Wallet under eIDAS 2.0. This initiative brings the possibility of a unified digital identity across various virtual worlds, but it raises questions about privacy and anonymity (Schwalm and Kudra, 2024; Sorrentino, 2024). The European Parliament supports anonymous use where possible to protect participants’ privacy, but it also acknowledges that identifying participants may be essential for accountability, especially with decentralized systems that complicate law enforcement. Thus, it suggests that the individual behind an avatar should be identifiable when necessary, applying a “know-your-customer” principle to virtual worlds¹³.

6.8.4 Virtual worlds as markets

The economic potential of virtual worlds is immense, with commercial transactions projected to grow substantially in the coming years¹⁴. These transactions will encompass both business-to-business (B2B) and business-to-consumer (B2C) exchanges, as well as a likely increase in peer-to-peer transactions within virtual worlds’ e-marketplaces.

For B2B exchanges, the EU’s B2B regulation¹⁵ currently addresses fairness in business relationships in digital markets. However, its relevance for metaverse-related markets needs further evaluation—particularly regarding the classification of content creators as “business participants” and its applicability to decentralized platforms governed by DAOs.

In the realm of B2C, consumer protection laws will be essential as marketing and advertising shift into immersive virtual spaces (Alcañiz et al., 2019). New methods, including programmatic and AI-driven advertising, along with XR technologies that merge physical and digital worlds, must adhere to EU regulations on unfair commercial practices. The European Parliament has already raised concerns about practices like selling virtual real estate via NFTs, which may mislead consumers about their rights, as they generally receive a license rather than actual ownership¹⁶.

6.8.5 Intellectual property

Intellectual property rights are pivotal in the metaverse¹⁷ (López-Tarruella Martinez, 2023), as both the content and experiences within virtual worlds and as the underlying technologies are often protected by existing IP frameworks, including patents, trade secrets, copyright, and industrial design rights (Ramos Gil, 2023; Mezei and Chawla Arora, 2024; Randrianirina, 2024). However, the application of existing laws within the metaverse is not without uncertainties. For instance, industrial design laws may need to evolve to cover animated 3D designs (Ferrero Guillen and Kyrylenko, 2023; Bonadio and Anjay Mohnot, 2024), and there are ongoing concerns about enforcing copyright and trademark rights, especially when trademarks are embedded in digital assets represented in NFTs (Jimenez Serrania, 2023).

Additionally, collaborative and AI-driven content creation introduces complexities around joint ownership and authorship. Platforms that enable participant-generated content will need to establish suitable frameworks for equitable IP management and ownership to encourage widespread metaverse adoption. Centralized platforms may address these issues through their governance structures and terms of use, but decentralized platforms and Web3 technologies will likely require innovative approaches to manage digital assets and distribute rights, particularly to guarantee that creators can fully benefit from their work through mechanisms such as NFT marketplaces (Girard-Gaymard, 2024).

6.8.6 Dispute resolution and applicable law

Traditional judicial systems are often slow, costly, and ill-suited for addressing disputes within virtual worlds, largely due to their cross-border or even non-territorial nature. EU conflict-of-law rules, which typically rely on territorial connections to allocate jurisdiction and determine the applicable law, encounter challenges when applied to digital environments—especially the metaverse—where such connections can be ambiguous or irrelevant (Lopez-Tarruella Martinez, 2023; Lopez Rodriguez, 2024; Pellegrini, 2024). As a result, the European Parliament has called for a reassessment of existing private international law regulations (e.g., the Brussels I Recast and the Rome I and II Regulations) to better align them with the realities of digital contexts¹⁸.

In place of traditional litigation, virtual worlds may instead adopt platform-based dispute resolution mechanisms, including notice-and-action procedures and internal complaint-handling systems as mandated by the DSA and P2B Regulation. These approaches could streamline dispute resolution while safeguarding participants’ rights to fair treatment. As these methods develop, however, it will be essential to implement protections to uphold each party’s right to a fair trial (Bueno de Mata, 2022; Ortolani, 2024).

7 Metaverse issues

Here we revisit and summarize some of the points made in more detail above. In “Immersive Social Media and the Metaverse” (David et al., 2023), Slater and de Gelder pointed out several major areas of interest in research on the metaverse. The first concerns surveillance, manipulation, and control. People in the metaverse will be continuously tracked—body movements, eye movements, facial expressions, and physiological recordings. This tracking is absolutely necessary for the functioning of the metaverse regarding interaction with the environment (e.g., touching or lifting an object), the real-time postures and movements of the virtual body, eye tracking to determine in detail the point of focus, facial tracking for display to others, and so on. Without this tracking, the metaverse would be impossible. However, if the tracking data are stored in the cloud and available, for example, to companies or governments, then this poses a risk. The data would provide information such as the relationship between what people perceive and their responses. These data would be available on a massive scale, and with modern machine learning techniques could be used to predict behaviour at an individual and mass level—extremely useful for advertising and political manipulation. It would essentially allow the construction of models of people and groups of people, and such modeling can be used for manipulation. What is required is a world-wide regulatory framework that balances technological needs and the absolute prevention of individual and group modelling capabilities.

The second major issue is ensuring personal identity security. For example, you may have a conversation with a close friend or relative and reveal private information. Yet it is possible that you are talking with an imposter, where even the voice has been changed to sound like the real person. Methods for individual personal identity protection are an extremely high-level requirement for the safe functioning of metaverse-like systems.

The third issue that requires attention is the potential breakdown of shared reality. Imagine various people witnessing the same event in mixed reality. It is entirely possible that this event is portrayed quite differently to each person. To one it looks like two people are embracing. To another it may be portrayed as a fight. Moreover, shared reality can break down without anyone actually realizing it. We tend to believe our senses, and if someone else says that something else happened, we are inclined not to believe them but rather rely on our own perception. If the metaverse becomes the norm for conducting mass human interactions, then shared social reality is an absolute requirement in order to avoid societal conflict and breakdown (Slater et al., 2020). Nevertheless, how is it possible to regulate content to the extent that there is some minimal level of invariable truth across participants?

The fourth issue concerns blurring the distinction between reality and digital reality. For example, suppose that in the metaverse you have a negative encounter with a person representative of a particular identifiable group (e.g., race and social class). The encounter is fake in the sense that the person is wholly virtual and deliberately designed to offend you. It is highly likely that even if you know that the event is fake, it will still influence your attitudes and behavior toward others of that group. Hence the metaverse could be used by bad actors to deliberately foster hatred towards particular identifiable groups in reality. Another area of failure to distinguish physical reality and the metaverse is where actions that are safe in virtual reality are physically dangerous. A very simple example is that you see many people walk over to a chair and sit down. You see an empty chair and you do the same. However, the chair does not exist in physical reality, so you fall. In a recent experiment, we found that 20% of people did sit without checking whether there was a real chair there (Wiesing et al., 2025). This crossing of the boundary between virtual and physical is very hard to overcome. For example, it cannot be solved through labelling (e.g., a virtual chair has a label attached, “This is virtual”) since it would undermine how virtual reality works and lead to a loss of presence. This type of problem requires experimentation to understand the extent to which it can happen and how it might be avoided.

The challenges involved in many of these issues are discussed in more depth in David et al. (2023) and the papers referenced therein.

8 Conclusion

As our analysis shows, the intended transformation processes in immersive virtual environments affect nearly all areas of personal and public life. Supplementary Table S2 provides an overall summary. Therefore, transdisciplinary research concepts and networks are necessary to understand the implications of the metaverse and contribute positively based on that understanding. In order to better comprehend social interactions in the metaverse, qualitative and ethnographic explorations of existing usage patterns are as essential as quantitative data on avatar behavior and the consequences for individuals. Long-term accompanying research on the central issues, which could only be touched upon in this paper, is needed.

To harness the positive potential of the metaverse and make it accessible to as many people as possible, collaborative efforts between platforms, science, and civil society actors should be expanded in the coming years. Only by using an evidence-based foundation (Slater, 2021) and with qualified development forecasts that also address social, cultural, and political aspects can informed policy decisions be made. This includes decisions regarding funding programs, regulations, or the utilization of the metaverse for public services and affairs.

Author contributions

MS: Writing – original draft and Writing – review and editing. MD: Writing – review and editing. DF: Writing – review and editing. JG: Writing – review and editing. HI: Writing – review and editing. AK: Writing – review and editing. AL-T-M: Writing – review and editing. LL: Writing – review and editing. JL: Writing – review and editing. ON: Writing – review and editing. XP: Writing – review and editing. YP: Writing – review and editing. MQ: Writing – review and editing. RR: Writing – review and editing. MS-V: Writing – review and editing. PS: Writing – review and editing. AS: Writing – review and editing. PW: Writing – review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. XP: partially supported by the Economic and Social Research Council (ES/W003120/1) and the Arts and Humanities Research Council (AH/T011416/1). DF and MS: partially supported by the GuestXR project which has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 101017884. SOCRATES received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 951930. DF: partially supported by Finding the Tools: Fostering Resilience, Union Group, Israel. J-F.L. This work and the metaverse Dialogues were produced with Meta’s financial support, obtained as part of the XR Programs and Research Fund aimed at supporting academic and independent research across Europe into metaverse issues and opportunities. MVSV and MS: co-funded by Departament de Recerca i Universitats de la Generalitat de Catalunya (AGAUR 2021-SGR-01165 - NEUROVIRTUAL), supported by FEDER and by EU HORIZON WiDERA “META-TOO” 101160266. MS: partially supported by the Ministerio de Ciencia y Innovación, España under the project ‘La ética de las experiencias inmersivas digitales’ (The Ethics of Digital Immersive Experiences) (TEDIX) PDI2020-117108RB-100-TEDIX (financed by AEI/10.13039/501100011033).

Acknowledgments

The EMRN is being led independently by the research institutions listed on https://metaverse-research-network.info/about-us/, the Founding Members of which had received funding provided through an unrestricted gift by Meta. We would like to thank Laura Hirvi of Meta for support and encouragement. We thank Anadin Attar for the image displayed in Figure 6. We thank Ronit Elyoseph and Maya Sheke for the Einstein model used in the application displayed in Figure 7.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers at the time of submission. This had no impact on the peer review process and the final decision.

Generative AI statement

The author(s) declare that Generative AI was used in the creation of this manuscript. Supplementary Table S2 was generated by Gemini and checked. This is declared in the Table footnote.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frvir.2025.1566419/full#supplementary-material

Footnotes

¹https://en.wikipedia.org/w/index.php?title=Meridian_59&oldid=1231737443

²https://uo.com

³https://en.wikipedia.org/wiki/GameSpy

⁴https://en.wikipedia.org/w/index.php?title=Virtuality_(product)&oldid=1240231502

⁵https://recroom.com

⁶https://medium.com/@anadindesign/would-you-trust-your-destiny-to-artificial-intelligence-ebdc3c6ecf7c

⁷https://assetstore.unity.com/packages/3d/environments/countryside-gas-station-132485 for the 3D model of the station.

⁸https://www.ri.se/sv/expertisomraden/projekt/vaxr-matematik-i-praktik-med-hjalp-av-vr

⁹https://www.adl.org/resources/report/hate-no-game-hate-and-harassment-online-games-2022

¹⁰https://catedrametaverso.ua.es/working-papers/

¹¹European Commission, “An EU initiative on Web 4.0 and virtual worlds: a head start in the next technological transition”, Doc. COM(2023) 442 FINAL, pp. 8–9.

¹²European Parliament, “Resolution of 17 January 2024 on policy implications of the development of virtual worlds–civil, company, commercial and intellectual property law issues (2023/2062(INI))”, point 10.

¹³European Parliament, “Resolution of 17 January 2024 on policy implications of the development of virtual worlds–civil, company, commercial and intellectual property law issues (2023/2062(INI))”, points 20–21.

¹⁴Ibid, point 17.

¹⁵Regulation (EU) 2019/1150 of the European Parliament and of the Council of 20 June 2019 on promoting fairness and transparency for business participants of online intermediation services (L 186/57).

¹⁶European Parliament, “Resolution of 17 January 2024 on policy implications of the development of virtual worlds–civil, company, commercial and intellectual property law issues (2023/2062(INI))”, point 17.

¹⁷For a holistic overview, see: EUIPO, “Impact of the Metaverse on Infringement and Enforcement of Intellectual Property” (2024).

¹⁸European Parliament, “Resolution of 17 January 2024 on policy implications of the development of virtual worlds–civil, company, commercial and intellectual property law issues (2023/2062(INI))”, points 13–16.

References

Abdlkarim, D., Di Luca, M., Aves, P., Maaroufi, M., Yeo, S.-H., Miall, R. C., et al. (2024). A methodological framework to assess the accuracy of virtual reality hand-tracking systems: A case study with the Meta Quest 2. Behav. Res. methods 56, 1052–1063. doi:10.3758/s13428-022-02051-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Abdullah, A. A., Kolkmeier, J., Lo, V., and Neff, M. (2021). Videoconference and embodied VR: Communication patterns across task and medium. Proc. ACM Human-Computer Interact. 5, 1–29. doi:10.1145/3479597