Grand Challenges for Augmented Reality

In his 1965 article, The Ultimate Display, Ivan Sutherland imagined a future computer interface that blurred the separation between the digital and physical worlds (Sutherland, 1965). At the time, he was making this vision a reality, creating a see-through head mounted display (HMD) that allowed users to see virtual images superimposed over the real world (Sutherland, 1968). The user’s head position was tracked, so the virtual content appeared fixed in space, and a handheld wand could be used to interact with it. Although the term was not coined until decades later, Sutherland’s system was the first working Augmented Reality (AR) interface. AR is technology with three key characteristics (Azuma, 1997); 1) it combines real and virtual images, 2) is interactive in real time, and 3) the virtual imagery is registered in three dimensions. Sutherland’s work had these properties, but over 50 years later, his vision of the Ultimate Display still hasn’t been achieved and more research is needed. Azuma’s definition of AR provides guidance on the technology required to create an AR experience. In order to combine real and virtual images display technology is needed. To support interaction in real time user interface technologies are required. To register AR content in three dimensions tracking technology is needed. Once these technologies were only available in research labs, but today they are available in people’s hands. Current mobile phones with cameras, GPS and inertial sensors, high resolution screens, fast networking and powerful CPUs and graphics processors are the most common way that people experience AR. Compatible with hundreds of millions of devices, Apple’s ARKit (Apple, 2020), and Google’s ARCore (Google, 2020a) provide accurate AR tracking for mobiles. A user can look at the camera view on their phone screen and see virtual objects in their real world. Mobile AR applications such as Pokemon Go have been downloaded over a billion times (NintendoSoup, 2019), showing how readily accessible the technology is. However, the user experience provided by a phone is very different from the Sutherland’s vision of hands-free interaction, stereo graphics, and virtual imagery always in a person’s field of view. Mobile AR provides an easily accessible entry point, but the true potential of AR is achieved through using head mounted displays, with richer interaction and better tracking techniques. In each of these areas there are important Grand Challenges that need research, as discussed below.


INTRODUCTION
In his 1965 article, The Ultimate Display, Ivan Sutherland imagined a future computer interface that blurred the separation between the digital and physical worlds (Sutherland, 1965). At the time, he was making this vision a reality, creating a see-through head mounted display (HMD) that allowed users to see virtual images superimposed over the real world (Sutherland, 1968). The user's head position was tracked, so the virtual content appeared fixed in space, and a handheld wand could be used to interact with it.
Although the term was not coined until decades later, Sutherland's system was the first working Augmented Reality (AR) interface. AR is technology with three key characteristics (Azuma, 1997); 1) it combines real and virtual images, 2) is interactive in real time, and 3) the virtual imagery is registered in three dimensions. Sutherland's work had these properties, but over 50 years later, his vision of the Ultimate Display still hasn't been achieved and more research is needed.
Azuma's definition of AR provides guidance on the technology required to create an AR experience. In order to combine real and virtual images display technology is needed. To support interaction in real time user interface technologies are required. To register AR content in three dimensions tracking technology is needed.
Once these technologies were only available in research labs, but today they are available in people's hands. Current mobile phones with cameras, GPS and inertial sensors, high resolution screens, fast networking and powerful CPUs and graphics processors are the most common way that people experience AR. Compatible with hundreds of millions of devices, Apple's ARKit (Apple, 2020), and Google's ARCore (Google, 2020a) provide accurate AR tracking for mobiles. A user can look at the camera view on their phone screen and see virtual objects in their real world. Mobile AR applications such as Pokemon Go have been downloaded over a billion times (NintendoSoup, 2019), showing how readily accessible the technology is.
However, the user experience provided by a phone is very different from the Sutherland's vision of hands-free interaction, stereo graphics, and virtual imagery always in a person's field of view. Mobile AR provides an easily accessible entry point, but the true potential of AR is achieved through using head mounted displays, with richer interaction and better tracking techniques. In each of these areas there are important Grand Challenges that need research, as discussed below. ongoing in many of these areas. For example, a pinhole screen can be used to create a wide field of view see-through AR display (Maimone et al., 2014) and holographic projection can be used to achieve full color, high contrast AR images in an eye-glass form factor (Maimone et al., 2017).
Other areas are also important, such as the vergence accommodation problem caused by a display only having a single focal plane, preventing people from keeping the AR content in focus while also focusing on objects in the real world at a different distance. Variable focal planes can enable users to view virtual content at different focal lengths (Liu et al., 2008). Light Field Displays and light fields provide one way to show photorealistic content to the user and are a prerequisite for creating "True Augmented Reality" (Sandor et al., 2015). There are also interesting innovations happening in the commercial sector, such as from companies like Mojo Vision (Mojo Vision, 2020) who are developing AR enabled contact lenses, but these are many years away from commercialization.

Research in Interaction
Sutherland's system supported simple interaction with a handheld wand. Another Grand Challenge is to enable people to interact with AR content as easily as they do with real objects. Many researchers are exploring natural user interfaces such as using tangible objects to interact with AR content (Tangible AR interfaces (Billinghurst et al., 2008)) or free-hand gesture manipulation (Sharp et al., 2015). Modern AR displays such as the Hololens2 (Microsoft, 2020a) support natural two-handed gesture input, allowing people to reach out and grab virtual content. However, it is possible to go beyond this and combine speech and gesture together to create multimodal interfaces where the strengths of one modality compensates for the weakness of another (Nizam et al., 2018). Addition of eye-tracking, full-body input, and other non-verbal cues can provide even more intuitive multimodal interaction. Research also needs to be conducted into interaction methods using techniques not possible in the real world. Brain computer interaction methods enable brain activity to select AR content (Si-Mohammad et al., 2018), and other physiological sensors can enable AR to respond to user heart rate or emotional state. There are many opportunities to create even better AR interaction methods.

Research in Tracking
A key feature of AR systems is that the content appears to be fixed in space, which requires the user's viewpoint to be continuously tracked. Sutherland achieved this by using mechanical and ultrasonic trackers to measure where the user's HMD was and render the virtual imagery from that same position. Tracking technology has improved significantly, but another Grand Challenge is to precisely locate a user's position in any location. There has been a significant amount of research on computer vision methods for tracking user viewpoint without knowing any visual features (Kim et al., 2018). Hybrid approaches that combine vision-based SLAM tracking with GPS and inertial sensors can be used for a more robust result (Liu et al., 2016). However, one area that hasn't been well explored are hybrid approaches for very large-scale tracking. Wide area tracking can be achieved using sensor fusion from a dynamic combination of mobile and stationary tracking (Pustka and Klinker, 2008). Deep Learning could be used to coordinate multiple tracking systems and provide some scene understanding (Garon and Lalonde, 2017). Finally, there is a recent trend toward AR cloud-based tracking where features captured by a user's device are uploaded to the cloud and fused to provide a ubiquitous tracking service. HoloRoyale is one of the first examples of using city scale AR tracking from an AR cloud service to enable collaborative gaming (Rompapas et al., 2019). Commercial software from companies such as Ubiquity6 (Ubiquity6, 2020) enable large scale AR cloud tracking. However, none of these systems yet provide large-scale precise tracking, so more work is needed.

Research in Perception and Neuroscience
In addition to Grand Challenges in fundamental technology, there are other areas of AR that need to be addressed, such as exploring perceptual and neuroscience issues. AR systems create an illusion to convince the brain that virtual content actually exists in the real world. There are a number of perceptual problems that can occur in AR, classified into environmental, capturing, augmentation, display device, and user issues (Kruijff et al., 2010). Considerable research has been conducted on how to make AR content appear the same as real objects, including the use of virtual lighting (Agusanto et al., 2003), shadows (Sugano et al., 2003), real object occlusion (Breen et al., 1996) and similar methods. The goal is to create digital objects that have strong "Object Presence" and appear to be really there (Stevens and Jerrams-Smith, 2000). However, unlike Presence in Virtual Reality, Object Presence in AR has not been well studied. Most of these systems are evaluated using subjective measures, but EEG can be used as an objective measure to evaluate the quality of experience (Bauman and Seeling, 2018). EEG could also be used to explore the cognitive load of using AR interfaces, measure emotional response to AR stimuli, monitor shared brain activity in collaborative AR experiences, and more. So, there is significant opportunity to use neuroscience to understand the perceptual and psychological basis of AR.

Research in Collaboration
There are also many application areas that could be studied in more detail. One important area is using AR to enable remote people to work together as easily as if they were face to face. Early experiments showed that AR views of video avatars provided a significantly higher degree of Social Presence than traditional video conferencing (Billinghurst and Kato, 2002). More recently, Microsoft's Holoportation captured full 3D models of people in real time and showed them as life-sized AR avatars in a user's real environment, enabling the sharing of rich communication cues (Orts-Escolano et al., 2016). The company Spatial provides a commercial application that can superimpose AR avatars over the real world in a very natural way (Spatial.io, 2020).
There are also many examples of wearable AR systems can be used to enable a remote expert to see through a local user's eyes and provide AR cues to help them perform real-world tasks (Kim et al., 2019). Microsoft's Remote Assist product (Microsoft, 2020b), and others, have made this type of experience commercially available. The emerging field of Empathic Computing (Piumsomboon et al., 2017) goes beyond this to explore how physiological cues can be combined with AR in collaborative interfaces to enable remote people to share what they are seeing, hearing and feeling. There is also opportunity to study how to support viewing large scale social networks in AR interfaces, including using visual and spatial cues to separate out dozens of social contacts (Nassani et al., 2017). However, there is still very little research conducted on collaborative AR. A survey of 10 years of user studies until 2015, found that only 15 of the 369 AR studies reviewed were collaborative studies, and only seven of these used AR HMDs (Dey et al., 2018).

Research in Social and Ethical Issues
Finally, there are social and ethical issues that need to be addressed. The difficulty of Google Glass (Google, 2020b) and other AR displays to get consumer acceptance, shows that widespread use of HMD-based AR may depend more on social than technical issues. Rauschnabel explored the technology acceptance drivers of AR smart glasses (Rauschnabel, and Ro, 2016), while Pascoal studied acceptance in outdoor environments (Pascoal et al., 2018). When AR devices become more widely used a number of ethical issues may arise. Who should be allowed to place AR content in the view of a person and what are the ethics around AR advertising? What is the consequence of people having different views of the same real environment? Brinkman discusses the privacy implications of AR as an extension of the home and AR advertising (Brinkman, 2014). Pase lists a number of questionable ethical uses of pervasive AR, such as deception, surveillance, behavior modification, and punishment (Pase et al., 2012). AR technology could be used to create mediated reality experiences, removing from view certain parts of the real world, which could have public safety issues (Mann, 2002). Users capturing and sharing their surroundings for AR cloud tracking or remote collaboration could also raise significant concerns. Wasson has written about the legal, ethical and privacy issues of AR (Wassom, 2014), but there is still much more research needed.

CONCLUSION
Over 50 years ago Sutherland provided a compelling vision of how the physical and digital worlds could be seamlessly combined together. However, there is still significant research that needs to be done to make this vision a reality. Grand Challenges exist in fundamental display, interaction and tracking technologies, and also the perception/neuroscience of AR, using AR for collaboration, and exploring the social and ethical aspects. Addressing these topics will enable Augmented Reality to reach its full potential as a transformative technology.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and has approved it for publication.