The Combination of Artificial Intelligence and Extended Reality: A Systematic Review

Reiners, Dirk; Davahli, Mohammad Reza; Karwowski, Waldemar; Cruz-Neira, Carolina

doi:10.3389/frvir.2021.721933

SYSTEMATIC REVIEW article

Front. Virtual Real., 07 September 2021

Sec. Virtual Reality in Industry

Volume 2 - 2021 | https://doi.org/10.3389/frvir.2021.721933

The Combination of Artificial Intelligence and Extended Reality: A Systematic Review

Dirk Reiners¹^†

Mohammad Reza Davahli²*^†

Waldemar Karwowski²

Carolina Cruz-Neira¹

¹Department of Computer Science, University of Central Florida, Orlando, FL, United States
²Department of Industrial Engineering and Management Systems, University of Central Florida, Orlando, FL, United States

Artificial intelligence (AI) and extended reality (XR) differ in their origin and primary objectives. However, their combination is emerging as a powerful tool for addressing prominent AI and XR challenges and opportunities for cross-development. To investigate the AI-XR combination, we mapped and analyzed published articles through a multi-stage screening strategy. We identified the main applications of the AI-XR combination, including autonomous cars, robotics, military, medical training, cancer diagnosis, entertainment, and gaming applications, advanced visualization methods, smart homes, affective computing, and driver education and training. In addition, we found that the primary motivation for developing the AI-XR applications include 1) training AI, 2) conferring intelligence on XR, and 3) interpreting XR- generated data. Finally, our results highlight the advancements and future perspectives of the AI-XR combination.

Introduction

Artificial Intelligence (AI) refers to the science and engineering used to produce intelligent machines (Hamet and Tremblay, 2017). AI was developed to automate many human-centric tasks through analyzing a diverse array of data (Wang and Preininger, 2019). AI’s history can be divided into four periods. In the first period (1956–1970), the field of AI began, and terms such as machine learning (ML) and natural language processing (NLP) started to develop (Newell et al., 1958; Samuel, 1959; Warner et al., 1961; Weizenbaum, 1966). In the second period (1970–2012), rule-based approaches received substantial attention, including the development of robust decision rules and the use of expert knowledge (Szolovits, 1988). The third period (2012–2016) began with the advancement of the deep learning (DL) method with the ability to detect cats in pictures (Krizhevsky et al., 2012). The development of DL has since markedly increased (Krizhevsky et al., 2012; Marcus, 2018). Owing to DL advancements, in the fourth period (2016 to the present), the application of AI has achieved notable success as AI has outperformed humans in various tasks (Gulshan et al., 2016; Ehteshami Bejnordi et al., 2017; Esteva et al., 2017; Wang et al., 2017; Yu et al., 2017; Strodthoff and Strodthoff, 2019).

Currently, we are witnessing the rapid growth of virtual reality (VR), augmented reality (AR), and mixed reality (MR) technologies and their applications. VR is a computer-generated simulation of a virtual, interactive, immersive, three-dimensional (3D) environment or image that can be interacted with by using specific equipment (Freina and Ott, 2015). AR is a technology allowing to create a composite view by superimposing digital content (text, images, and sounds) onto a view of the real world (Bower et al., 2014). Mixed reality (MR) is comprising both AR and VR (Kaplan et al., 2020). Extended reality (XR) is an umbrella term including all VR, AR, MR, and virtual interactive environments (Kaplan et al., 2020). In general, XR simulates spatial environments under controlled conditions, thus enabling interaction, modification, and isolation of specific variables, objects, and scenes in a time- and cost-effective manner (Marín-Morales et al., 2018).

XR and AI differ in their focus and applications. XR tries to use computer-generated virtual environments to enhance and extend the human’s capabilities and experiences and enable them to use their capabilities of understanding and discovery in more effective ways. On the other hand, AI attempts to replicate the way humans understand and process information and, combined with the capabilities of a computer, to process vast amounts of data without flaws. Possible applications for both approaches have been found in various application areas, and both are critical and highly active research areas. However, their combination can offer a new range of opportunities. To better understand the potential of the AI-XR combination, we conducted a systematic literature review of published articles by using a multi-stage screening strategy. Our analysis aimed at better understanding the objectives, categorization, applications, limitations, and perspectives of the AI-XR combination.

Methods

We conducted a systematic review of the literature reporting a combination of XR and AI. For this objective, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework was adapted to create a protocol for searching and eligibility (Liberati et al., 2009). The developed protocol was tested and scaled before data gathering. The protocol included two main features of developing research questions on the basis of objectives and determining the search strategy according to research questions.

The following research question was formulated for this review:

RQ. What are the main objectives and applications of the AI-XR combination?

To discover published and gray literature, we used a multi-stage screening strategy including 1) screening scientific and medical publications via bibliographic databases and 2) identifying relevant entities with Google web searches (Jobin et al., 2019). To achieve the best results, we developed multiple sequential search strategies involving an initial keyword-based search of the Google Scholar and ScienceDirect search engines and a subsequent keyword-based search with the Google search engine in private browsing mode outside a personal account. Both steps were performed by using the keywords in Table 1.

TABLE 1

TABLE 1. The set of keywords used in the review.

In each search step and search keyword, links or articles were followed and screened until the 350th record (Jobin et al., 2019). A total of 321 documents have been identified and included in this step. Total records included 232 articles identified through bibliographic database searching and 89 articles identified through Google searching. The inclusion criteria were articles clearly describing the AI-XR combination and written in the English language. After applying inclusion criteria and removing duplicates, 28 articles were considered from bibliographic database searching and four papers from Google. After combining these two categories and removing duplicates, 28 articles were included in the subsequent analysis. After identification of relevant records, citation-chaining was used to screen and include all relevant references. Seven additional papers were identified and included in this step. To retrieve newly released eligible documents, we continued to search and screen articles until April 3, 2021. One additional article was included on the basis of an extended time frame, as represented in Figure 1. By using three-step searching (web search, search of bibliographic databases of scientific publications, and citation chaining search), we included a total of 36 eligible, non-duplicate articles.

FIGURE 1

FIGURE 1. PRISMA-based flowchart of the retrieval process.

Bias could have occurred in this review through 1) application of the inclusion criteria and 2) extraction of the objectives and applications of the AI-XR combination. To address this bias, two authors separately applied the inclusion criteria and then extracted the objectives and applications of the AI-XR combination. The authors stated the reasons for exclusion, and summarized the main aspects of the AI-XR combination among the included papers. The authors then compared their results with each other, and resolved any disagreements through discussions with the third and fourth authors. In addition, for observational and cross-sectional included articles, we used the National Heart, Lung, and Blood Institute (NHLBI) Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies (National Heart, Lung, and Blood Institute (NHLBI), 2019). We ensured that the quality assessment scores among the included papers were acceptable.

Results

As shown in Table 2, Figures 2, 3, our systematic search identified 36 published records containing AI-XR combinations. A substantial number of developed combinations (n = 13; 36%) have been used for medical and surgical training, as well as autonomous cars and robotics (n = 10; 28%). A review of the affiliations of the authors of the included papers indicated that most AI-XR combination applications were developed in high-ranking research institutes, thus underscoring the importance of the AI-XR research area. The leading high-ranking research institutes cited by the selected articles include Google Brain, Google Health, Intel Labs, different labs in Massachusetts Institute of Technology (MIT), Microsoft Research, Disney Research, Stanford Vision and Learning Laboratory, Ohio Supercomputer Center, Toyota Research Institute, Xerox Research Center, as well as a variety of robotics research laboratories, medical schools, and AI labs.

TABLE 2

TABLE 2. Included papers.

FIGURE 2

FIGURE 2. Included papers per year (publishing trend). There was a significant increase in the number of included papers published after 2017.

FIGURE 3

FIGURE 3. Applications of the AI-XR combination among the included records (categorization of included studies). The most popular categories include medical training, autonomous car, and gaming applications.

To explore the included articles, we developed keyword co-occurrence maps of words and terms in the title and abstract. We used VOSviewer software (https://www.vosviewer.com/accessed July 20, 2021) to map the bibliometric data as a network as it is represented in Figure 4. In this figure, nodes are specific terms, their sizes indicate thier frequency, and links represent the co-occurrence of the terms in the title and abstract of the included papers. The most co-occurring terms among selected articles are including virtual reality with artificial intelligence, training, virtual patient, immersive, and visualization.

FIGURE 4

FIGURE 4. The map of the co-occurrence of the words and terms in the title and abstract of included papers.

All included records could be categorized into three groups: 1) interpretation of XR generated data (n = 12; 33%), mainly in medical and surgical training applications, 2) conferring intelligence on XR (n = 10; 28%), mostly in gaming and virtual patient (medical training) applications, and 3) training AI (n = 14; 39%), partly for autonomous cars and robots.

The main applications of the AI-XR combination are medical training, autonomous cars and robotics, gaming, armed forces training, and advanced visualization. Below, we discuss each category.

Medical Training

XR has become one of the most popular technologies in training, and many publications have discussed XR’s advantages in this area (Ershad et al., 2018; Ropelato et al., 2018; Bissonnette et al., 2019). XR provides risk-free, immersive, repeatable environments to improve performance in various tasks, such as surgical and medical tasks. Although XR generates different types of data, interpreting XR generated data and evaluating user skills remain challenging. Combining AI and XR provides an opportunity to better interpret XR dynamics by developing an objective approach for assessing user skill and performance. Through this method, data can be generated from either the XR tool or XR user, after extraction and selection of features from data, and then fed into AI to determine the most relevant skill assessment features. Of 36 included articles, ten (n = 10; 28%) articles focused on medical training. These articles used various AI methods, including Support Vector Machine (two articles), Hidden Markov (two articles), Naive Bayes classifier (two articles), Fuzzy set, and Fuzzy logic (two articles), Neural Networks and Decision trees (two articles). Furthermore, included papers used different XR platforms, including virtual reality surgical (six articles), virtual reality hemilaminectomy (one article), virtual reality laparoscopic (one article), virtual reality mastoidectomy (one article), and EchoComJ (one article). EchoComJ is an AR system that simulates an echocardiographic examination by combining a virtual heart model with real three-dimensional ultrasound data (Weidenbach et al., 2004).

Virtual Patients

In medical training, XR and AI combinations have been used to develop virtual (simulated) patients. In this application, a virtual patient engages with users in a virtual environment. Virtual patients are used for training medical students and trainees to improve specific skills, such as diagnostic interview abilities, and to engage in natural interactions (Gutiérrez-Maldonado et al., 2008). Virtual patients help trainees to easily generalize their knowledge to real-world situations. However, developing appropriate metrics for evaluating user performance during this human-AI interaction in interactive virtual environments remains challenging. Of 36 reviewed papers, three articles focused on the area of virtual patients. Included papers combined NLP methods with virtual environments to develop diagnostic interviews and dialogue skills.

Armed Forces Training

AI can be used as an agent in virtual interactive environments to train armed forces. The main objective of AI is to challenge human participants at a suitable level with respect to their skills. To serve as a credible adversary, the AI agent must determine the human participant’s skill level and then adapt accordingly (Israelsen et al., 2018). For this purpose, the AI agent can be trained in an interactive environment against an intelligent (AI) adversary to learn how to optimize performance. For this AI-AI interaction in an interactive virtual environment, Gaussian process Bayesian optimization techniques have been used to optimize the AI agent’s performance (Israelsen et al., 2018). Only one article focused on armed force training by combining AI decision-maker and Gaussian process Bayesian optimization with virtual environment developed by Orbit Logic simulator.

Gaming Applications

Gaming is another application category of the AI-XR combination. Recently, AI agents have been used to play games such as Starcraft II, Dota 2, the ancient game of Go, and the iconic Atari console games (Kurach et al., 2020). The main objective for developing this application is to provide challenging environments to allow newly developed AI algorithms to be quickly trained and tested. Furthermore, AI agents can be included as non-player characters in video gaming environments to interact with users. However, it is reported that the AI adversary agents used in gaming are highly non-adaptive and scripted (Israelsen et al., 2018). Of 36 included papers, five focused on gaming applications and developed AI-based objects in virtual environments. These articles used various platforms to combine AI-XR such as Reusable Elements for Animation Using Local Integrated Simulation Models (REALISM) software, and Unity.

Robots and Autonomous Cars

Designing a robot with the ability to perform complicated human tasks is very challenging. The main obstacles include the extraction of features from high-dimensional sensor data, modeling the robot’s interaction with the environment, and enabling the robot to adapt to new situations (Hilleli and El-Yaniv, 2018). In reality, resolving these obstacles can be very costly. For autonomous cars, training AI requires capturing vast amounts of data from all possible scenarios, such as near-collision situations, off-orientation positions, and uncommon environmental conditions (Shah et al., 2018). Capturing these data in the real world is not only potentially unsafe or dangerous, but also prohibitively expensive. For both robots and autonomous cars, training and testing AI in virtual environments has emerged as a unique solution. It is reported that both robots and autonomous cars, without any prior knowledge of the task, can be trained entirely in virtual environments and successfully deployed in the real world (Amini et al., 2020). Reinforcement learning (RL) in robots and autonomous cars is commonly trained by using XR. Of 36 papers, ten included articles (28%) focused on robots and autonomous cars. These articles combined DL and RL methods with different virtual environment platforms, including virtual intelligent vehicle urban simulator, CARLA (open source driving simulator with a Python API), AirSim (Microsoft open-source cars and drones simulators built on Unreal Engine), Gazebo (open-source 3D robotics simulator), iGibson (virtual environment providing physics simulation and fast visual rendering based on Bullet simulator), Madras (open-source multi-agent driving simulator), VISTA (a data-driven simulator and training engine), Bullet physics engine simulator, FlightGoggles (photorealistic sensor simulator for perception-driven autonomous cars), and Unity.

Advanced Visualization

XR can create novel displays and improve understanding of complex structures, shapes, and systems. However, automated AI-based imaging algorithms can increase the efficiency of XR by providing automatic visualization of target parts of structures. For example, in patient anatomy applications, the combination of XR and AI can add novelty to the thoracic surgeons’ armamentarium by enabling 3D visualization of the complex anatomy of vascular arborization, pulmonary segmental divisions, and bronchial anatomy (Sadeghi et al., 2021). Finally, XR can be used to visualize the deep learning (DL) structure (Meissler et al., 2019; VanHorn et al., 2019). For example, the XR-based DL development environment can make DL more intuitive and accessible. Three included articles focused on advanced visualization by combining neural networks with different models of XR based on Unity engine, and immersive 3-dimensional-VR platforms.

Discussion

In this section, we describe limitations and perspectives on the AI-XR combination in three categories: interpretation of XR generated data, conferring intelligence on XR, and training AI as it is represented in Figure 5.

FIGURE 5

FIGURE 5. Classification of included papers.

Interpretation of XR Generated Data

Recently, much attention has been paid to the use of XR for medical training, particularly in high-risk tasks such as surgery. Extensive data can be collected from users’ technical performance during simulated tasks (Bissonnette et al., 2019). The collected data can be used to extract specific metrics indicating user performance. Because these metrics, in most cases, are incapable of efficiently evaluating users’ level of expertise, they must be extensively validated before implementation and application to real world evaluation (Loukas and Georgiou, 2011). AI, through ML algorithms, can use XR-generated data to validate the metrics of skill evaluation (Ershad et al., 2018).

Data extracted from selected articles are including electroencephalography and electrocardiography results (Marín-Morales et al., 2018); patient eye and pupillary movements (Richstone et al., 2010); volumes of removed tissue, and the position, and angle of the simulated burr and suction instruments (Bissonnette et al., 2019); drilled bones (Kerwin et al., 2012); knot tying and needle driving (Loukas and Georgiou, 2011); and movements of surgical instruments (Megali et al., 2006). Common ML algorithms used to validate skill evaluation metrics include Support Vector Machine (Marín-Morales et al., 2018; Bissonnette et al., 2019), hidden Markov (Megali et al., 2006; Sewell et al., 2008; Loukas and Georgiou, 2011), nonlinear neural networks (Richstone et al., 2010), decision trees (Kerwin et al., 2012), multivariate autoregressive (Loukas and Georgiou, 2011), and Naive Bayes classifier (Sewell et al., 2008).

Conferring Intelligence on XR

Although the advancement of computer graphic techniques has substantially improved XR tools, including intelligence by adding AI can revolutionize XR. Conferring intelligence on XR is about adding AI to a specific part of XR to improve the XR experience. In this combination, AI can help XR to effectively communicate and interact with users. Conferring intelligence on XR can be useful for different applications, such as cancer detection (Chen et al., 2019), gaming (Turan and Çetin, 2019), advanced visualization (Sadeghi et al., 2021), driver training (Ropelato et al., 2018), and virtual patient and medical training (Gutiérrez-Maldonado et al., 2008). For example, in virtual patient applications, AI has been used to create Artificial Intelligence Dialogue Systems (Talbot et al., 2012). Consequently, virtual patients can engage in natural dialogue with users. In virtual patient, intelligent agents can be visualized in human size with the ability of facial expressions, gazing, and gesturing, and can engage in cooperative tasks and synthetic speech (Kopp et al., 2003). In more advanced virtual patients, intelligent agents can understand human emotional states (Gutiérrez-Maldonado et al., 2008).

The main ML algorithms for this type of combination include neural networks (NNs) and fuzzy logic (FL) algorithms. A combination of NNs and XR is commonly used for developing virtual patients and detecting cancer (Chen et al., 2019). For example, one study has developed an augmented reality microscope, which overlays convolutional neural network-based information onto the real-time view of a sample, and has been used to identify prostate cancer and detect metastatic breast cancer (Chen et al., 2019). NNs have been efficiently applied to XR because of its tolerance of noisy inputs, simplicity in encoding complex learning, and robustness in the presence of missing or damaged inputs (Turan and Çetin, 2019). The FL algorithm can handle imprecise and vague problems. This algorithm has been applied to interactive virtual environments and games to allow for more human-like decisive behavior. Because this algorithm needs only the basics of Boolean logic as prerequisites, it can be added to any XR with little effort (Turan and Çetin, 2019).

Training AI

Training AI by using real-world data can be very difficult. For example, developing urban self-driving cars in the physical world requires considerable infrastructure costs, funds and manpower, and overcoming a variety of logistical difficulties (Dosovitskiy et al., 2017). In this situation, XR, serving as a learning environment for AI, can be used as a substitute for training with experimental data. This technique has received attention because of its many advantages such as safety, cost efficiency, and repeatability of training (Shah et al., 2018; Guerra et al., 2019). The main elements of XR as a learning environment include 3D rendering engine, sensor, physics engine, control algorithms, robot embodiments, and public API layer (Lamotte et al., 2010; Shah et al., 2018). The outcomes of these elements are virtual environments consists of a collection of dynamic, static, intelligent, and non-inteligent objects such as the ground, buildings, robots, and agents.

In the learning environment, a robot can be exposed to a variety of physical phenomena, such as air density, gravity, magnetic fields, and air pressure (Lamotte et al., 2010). In this case, the physics engine can simulate and determine physical phenomena in the environment (Shen et al., 2020). The included papers have used different physics and graphics engines (simulators) including PHYSX, Bullet, Open Dynamics Engine, Unreal Engine, Unity, CARLA, AirSim, Gazebo, iGibson, Madras, VISTA, FlightGoggles for developing virtual worlds. Multiple advancements have also recently enabled the development of a more efficient learning environment for training AI. First, the rapid evolution of 3D graphics rendering engines has allowed for more sophisticated features including advanced illumination, volumetric lighting, real-time reflection, and improved material shaders (Guerra et al., 2019). Second, advanced motion capture facilities, such as laser tracking, infrared cameras, and ultra-wideband radio, have enabled precise tracking of human behavior and robots (Guerra et al., 2019).

In the learning environments, agents expose to variety of generated labeled and unlabeled data and they learn to control physical interactions and motion, navigate according to sensor signals, change the state of the virtual environment toward the desired configuration, or plan complex tasks (Shen et al., 2020). However, for training robust AI, learning environments must address several challenges, including transferring knowledge from XR to the real world; developing complex, stochastic, realistic scenes; and developing fully interactive, multi-AI agent, scalable 3D environments.

Transferring Knowledge

One of the main challenges is transferring the results from learning environments to the real world. Complicated aspects of human behavior, physical phenomena, and robot dynamics can be very challenging to precisely capture and respond to in the real world (Guerra et al., 2019; Amini et al., 2020). In addition, determining (by calculating percentages) whether the results obtained from a learning environment are sufficient for real-world application is difficult (Gaidon et al., 2016). One method to address this challenge is domain adaptation, a process enabling an AI agent trained on a source domain to generalize to a target domain (Bousmalis et al., 2018). For the AI-XR combination, the target domain is the real world, and the source domain is XR. Two types of domain adaptation methods are pixel-level and feature-level adaptation.

Complexity

Another challenge is the lack of complexity in learning environments. A lack of real-world semantic complexity in a learning environment can cause the AI to be trained insufficiently to run in the real world (Kurach et al., 2020). Many learning environments offer simplified modes of interaction, a narrow set of tasks, and small-scale scenes. A lack of complexity, specifically simplified modes of interaction, has been reported to lead to difficulties in applying AI in real-world situations (Dosovitskiy et al., 2017; Shen et al., 2020). For example, training autonomous cars in simple learning environments is not applicable to extensive driving scenarios, varying weather conditions, and exploration of roads. However, newly developed learning environments can add more complexity by using repositories of sparsely sampled trajectories (Amini et al., 2020). For each trajectory, learning environments synthesize many views that enable AI agents to train on acceptable complexity.

One aspect of complexity in learning environments is stochasticity. Many commonly used learning environments are deterministic. Because the real world is stochastic, robots and self-driving cars must be trained in uncertain dynamics. To increase robustness in the real world, newly developed learning environments expose AI (the learning agent) to a variety of random properties of the environment such as varying weather, sun positions, and different visual appearances and objects (e.g., lanes, roads, or buildings) (Gaidon et al., 2016; Amini et al., 2020; Shen et al., 2020). However, in many learning environments, this artificial randomness has been found to be too structured and consequently insufficient to train robust AI (Kurach et al., 2020).

Another aspect of complexity in learning environments is creating realistic scenes. Several learning environments including different annotated elements of the environment with photorealistic materials to create scenes close to real-world scenarios (Shen et al., 2020). These learning environments use scene layouts from different repositories and annotated objects inside the scene. They use various materials (such as marble, wood, or metal) and consider mass and density in the annotation of objects (Shen et al., 2020). However, some included articles have reported the creation of a completely interactive environment de novo without using other repositories.

Interactive or Non-interactive

Some learning environments are created only for navigation and have non-interactive assets. In these environments, the AI agent cannot interact with scenes, because each scene consists of a single fully rigid object. However, other learning environments support interactive navigation, in which agents can interact with the scenes. In newly developed learning environments, objects in scenes can be annotated with the possible actions that they can receive.

Single or Multiple Agents

In many learning environments, only one agent can be trained and tested. However, one way to train robust AI involves implementing several collaborative or compete AI agents (Kurach et al., 2020). In multi-agent learning environments, attention must be paid to communication and interactions between agents. In this complex interaction, optimizing the behavior of the learning AI agent is highly challenging. Furthermore, the appearance and behavior of adversary agents in learning environments must be considered. Implementing kinematic parameters and advanced controllers to govern these agents has been reported in several studies (Dosovitskiy et al., 2017).

Sensorimotor

Sensorimotor control in a 3D virtual world is another challenging aspect of learning environments. For example, challenging states for sensorimotor control in autonomous cars include exploring densely populated virtual environments, tracking multi-agent dynamics at urban intersections, recognizing prescriptive traffic rules, and rapidly reconciling conflicting objectives (Dosovitskiy et al., 2017).

Open-Source Licenses

To test new research ideas, learning environments must be open-source licenses so that researchers can modify the environments. However, many advanced environments and physics simulators offer restricted licenses (Kurach et al., 2020).

Scalable

Finally, developing scalable learning environments is another concern of researchers. Creating learning environments that do not require operating and storing virtual 3D data for entire cities or environments is important (Amini et al., 2020).

In conclusion, some advanced features in learning environments include:

• Fully interactive realistic 3D scenes

• Ability to import additional scenes and objects

• Ability to operate without storing enormous amounts of data

• Flexible specification of sensor suites

• Realistic virtual sensor signals such as virtual LiDAR signals

• High-quality images from a physics-based renderer

• Flexible specification of cameras and their type and position

• Provision of a variety of camera parameters such as 3D location and 3D orientation

• Pseudo-sensors enabling ground-truth depth and semantic segmentation

• Ability to have endless variation in scenes

• Human-environment interface for humans (fully physical interactions with the scenes)

• Provision of open digital assets (urban layouts, buildings, or vehicles)

• Provision of a simple API for adding different objects such as agents, actuators, sensors, or arbitrary objects

Guidelines

In general, the combination of AI-XR can be used for two main objectives, i.e.: 1) AI serving and assisting XR and 2) XR serving and assisting AI as it is represented in Figure 6.

FIGURE 6

FIGURE 6. Proposed guidelines for the AI-XR combination.

AI Serving XR

AI consists of different subdomains, including the following: 1) ML that teaches a machine to make a decision based on identifying patterns and analyzing past data; 2) DL that processes input data through different layers to identify and predict the outcome; 3) Neural Networks (NNs) that process the input data in a way that the human brain does; 4) NLP which is about understanding, reading, and interpreting a human language by a machine; and 5) Computer Vision (CV) that tries to understand, recognize and classify images and videos (Great Learning, 2020). Among included papers, AI served XR in order to 1) detect patterns of XR generated data by using ML methods and 2) improving XR experience by using NNs and NLP methods.

Using AI to detect patterns of XR-generated data is very common among included papers. This type of AI-XR combination can be divided into two main categories 1) non-interactive: simply feeding the results of XR into AI and detecting the patterns, and 2) interactive: feeding XR generated data into AI, identifying the pattern, and returning results to XR. While non-interactive method can help to extract specific metrics indicating XR’s user performance, the interactive category can be used for the optimization of process parameters. In this category, AI can analyze different modes of XR and return feedback to XR.

Several articles focused on improving the XR experience by adding NLP and DL. In this objective, AI can be added to a specific part of XR and provide various advantages to the XR experience. For example, in virtual patients, adding NLP can bring an ability to understand human speech and hold a dialogue. Furthermore, in gaming applications, implementing AI-based objects can increase the randomness of a game.

XR Serving AI

Data availability is one of the main concerns of AI developers (Davahli et al., 2021). Despite significant efforts in collecting and releasing datasets, most data might have different deficiencies, such as, missing data in datasets, lack of coverage of rare and novel cases, high-dimensionality with small sample sizes, lack of appropriately labeled data, and data contamination with artifacts (Davahli et al., 2021). Furthermore, most data are generally collected for operations but not specifically for AI research and training. In this situation, XR can be used as an additional source for generating high-quality AI-ready data. These data can be rich, cover rare and novel cases, and extend beyond available datasets.

In general, XR can be used for various applications such as medical education, improving patient experiences, helping individuals to understand their emotions, handling dangerous materials, remote training, visualizing complex products and compounds, building effective collaborations, training workers in safe environments, virtual exhibitions, and entertainment, and considering as a computational platform (Forbes Technology Council, 2020). However, in the “XR serving AI” objective, XR can have a new function and generate AI-ready data. By using XR, AI can learn and become well-trained before implementing in the physical world. The main advantages of using XR for training AI systems are including 1) the entire training can occur in XR without collecting data from the physical world, 2) learning in XR can be fast, and 3) XR allows AI developers to simulate novel cases for training AI systems.

Even though the reviewed articles describe applications of XR for training AI systems in different areas, including autonomous cars, robots, gaming applications, and armed force training; surprisingly, no study focused on the healthcare area. In healthcare, data privacy plays an important role in developing and implementing AI. Because of the complexity of protecting data in healthcare, data privacy has had significant impact on increasing data availability. In this situation, AI can be trained in virtual environments containing patient demographics, disease states, and health conditions.

Although majority of included papers used XR to train AI, XR can be used to improve performance of AI by 1) validating the results and verifying hypotheses generated by AI systems, and 2) offering advanced visualization of AI structure. For example, in drug and antibiotic discovery where AI systems propose new compounds, XR can be used to compute the different properties of discovered drugs.

Limitations

We realize that not all potentially relevant papers to this review may have been included. First, even though we used the most related set of keywords, some relevant articles might not have been identified due to the limited number of keywords. Second, for each search keyword, we only screened links or articles until the 350th record. Third, we used only one round of citation-chaining to screen references. More citation-chaining could have identified additional papers relevant to AI and XR synergy. Finally, we used a limited number of bibliographic databases for records discovery.

Conclusion

This article comprehensively reviewed the published literature relevant to the intersection of themes related to the combination of AI and XR domains. By following PRISMA guidelines and using targeted keywords, we identified 36 articles that met the specific inclusion criteria. The examined papers were categorized into three main groups: 1) the interrelations of AI and XR dynamics, 2) the influence of AI on making XR applications more useful and valuable in a variety of domains, and 3) common AI and XR training issues. We also identified the main applications of the AI-XR combination technologies, including advanced visualization methods, autonomous cars, robotics, military, and medical training, cancer diagnosis, entertainment and gaming applications, smart homes, affective computing, and driver education, and training. The study results point to the growing importance of the interrelationships between AI and XR technology developments and the future prospects for their extensive applications in business, industry, government, and education.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

DR and MD: Methodology and writing—original draft and revisions; WK, CC-N: Conceptualization, writing—review and revisions, editing, and supervision. All authors equally contributed to this research.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Amini, A., Gilitschenski, I., Phillips, J., Moseyko, J., Banerjee, R., Karaman, S., et al. (2020). Learning Robust Control Policies for End-To-End Autonomous Driving from Data-Driven Simulation. IEEE Robot. Autom. Lett. 5, 1143–1150. doi:10.1109/LRA.2020.2966414