Social Interaction With Agents and Avatars in Immersive Virtual Environments: A Survey

Immersive virtual reality technologies are used in a wide range of fields such as training, education, health, and research. Many of these applications include virtual humans that are classified into avatars and agents. An overview of the applications and the advantages of immersive virtual reality and virtual humans is presented in this survey, as well as the basic concepts and terminology. To be effective, many virtual reality applications require that the users perceive and react socially to the virtual humans in a realistic manner. Numerous studies show that people can react socially to virtual humans; however, this is not always the case. This survey provides an overview of the main findings regarding the factors affecting the social interaction with virtual humans within immersive virtual environments. Finally, this survey highlights the need for further research that can lead to a better understanding of human–virtual human interaction.


INTRODUCTION
Apart from the fact that virtual reality (VR) technologies can simulate environments and situations in a realistic and believable manner, they offer several advantages that make their use very beneficial in various fields. As a result, in the past decade, VR technologies are used in a wide range of applications. For example, social VR applications (McVeigh-Schultz et al., 2018) allow people to remotely meet, collaborate, and share (Li et al., 2019). Also, many of the most widely used and promising VR applications concern training simulations that are used as a training tool for pilots and drivers of various vehicles, dangerous jobs such as mine workers (Bellanca et al., 2019), and the military (Koźlak et al., 2013). A key advantage of using VR in these applications is that it provides realistic training conditions in a controlled and, therefore, much safer environment while significantly reducing the cost and increasing the efficiency of the training. Things that cannot be controlled in the physical world, such as the time of day, or are random, such as the weather conditions, in a virtual world are fully controllable. Moreover, VR offers the possibility of repeating scenarios and evaluating the learner's performance better. The introduction of VR in education can enhance learning outcomes (Merchant, Goetz, Cifuentes, Keeney-Kennicutt and Davis, 2014). VR increases the learner's motivation and involvement. VR allows students to experience, rather than just watch and listen, while promoting complex learning (Villena Taranilla et al., 2019). It gives students an opportunity to explore objects or events that are not accessible, such as the solar system, historical places, and events (Villena Taranilla et al., 2019; or the inside of the human body (Parong and Mayer, 2018;Michael-Grigoriou, Yiannakou and Christofi, 2017). Also, VR can be beneficial for teacher training (Stavroulia et al., 2019). Immersive virtual reality technologies are used in the fields of health on the part of education and training as well as in various kinds of therapies. The use of simulators in medical education protects patients while offering students a way to develop their skills, knowledge, and confidence, as well as evaluating their performance (Lateef, 2010;Pottle, 2019). Virtual reality therapies (Wiederhold and Riva, 2019) are used in patients with various phobias such as fear of heights (Rothbaum et al., 1995;Seinfeld et al., 2016), claustrophobia (Christofi, and Michael-Grigoriou, 2016;Rahani et al., 2018), fear of public speaking (Nazligul et al., 2017;Takac et al., 2019), social anxiety (Chesham et al., 2018), posttraumatic stress (Botella, Serrano, Baños, and Garcia-Palacios, 2015), and depression (Falconer et al., 2016).
The above are just a few examples of applications of VR technologies in various fields, through which we can distinguish the advantages of this technology. To summarize, VR technologies can provide affordable, realistic, controlled, safe, interactive, and accessible experiences to the user. Below, the basic concepts related to virtual humans (VHs) and VR are presented along with the relevant references. Then, the theory and the main factors that affect social interactions with VHs are presented. Finally, the authors summarize and discuss the topic, and suggest future research directions on social interactions with VHs. The references listed in this survey were selected by the authors to better illustrate the relevant literature. No systematic approach was followed for this survey.

VIRTUAL HUMANS
Many of these applications described above require the inclusion of virtual representations of humans. The representations of humans in virtual environments are called VHs. We define a VH as a "perceivable digital representation" of a human (Bailenson and Blascovich, 2004). VHs are classified into avatars and agents (Bailenson and Blascovich, 2004;von der Pütten et al., 2010), depending on who directs their behavior. An avatar is a VH whose behaviors reflect those executed by a specific human being. On the other hand, an agent is a VH whose behaviors are determined by the computer algorithm. However, since today's technology is unable to reflect all human actions on avatars, the distinction between an agent and an avatar is not always clear (Bailenson and Blascovich, 2004). Various forms of communications (e.g., facial expressions, gaze behavior, tone of voice, or body language) that may not be tracked by the system and, therefore, not attributed to the avatar are omitted or alternatively rendered onto the VH. As a result, a VH usually constitutes a hybrid of an agent and an avatar. However, recent technological advances such as real-time body and facial expression tracking can provide affordable solutions so the behavioral resemblance of the user and the avatar can be extremely accurate. In the future, we expect to have photorealistic avatars whose voices, movements, facial expressions, and gaze are determined completely by the user in real time. Despite that, hybrid agent-avatars can be used to combine the advantages of both agent and avatar technologies (Roth, Latoschik, Vogeley and Bente, 2015). Additionally, unlike the physical world where there are clear boundaries between humans and nonhumans, there are not necessarily any visible differences between human-controlled and computer-controlled VHs (Nowak and Fox, 2018). It is up to the developer of the VR application to conceal or inform (or even mislead) the user whether a VH is an avatar or an agent. Therefore, in a shared virtual environment, the user may not know which of the VHs are agents and which are avatars.

Avatars
In immersive virtual environments (IVEs), an avatar is the (usually visual) representation of the user in a virtual world. An avatar is perceivable by the user and/or by the other users, in the case of multiuser virtual environments (Nowak and Fox, 2018) such as social VR applications (Gunkel et al., 2018;McVeigh-Schultz et al., 2018). In the case of the selfrepresentation, the users can observe their avatar from either a first-person or a third-person perspective (Gorisse, Christmann, Amato and Richir, 2017), whereas in some cases the use of avatars is implied or omitted. In projection-based VR systems (e.g., Cruz-Neiraet al., 1993;Roth, Waldow, Latoschik, Fuhrmann and Bente, 2017), no avatars are required for selfrepresentation since the users can observe their physical body. In head-mounted display (HMD)-based VR settings, users are unable to see their physical body. In these cases, an avatar can be used to provide the users with a virtual body, usually with a first-person perspective. The degree to which the users can control their avatars varies, depending on the capabilities of the VR system. Under some situations (Kilteniet al., 2012), a sense of ownership over the virtual body can emerge to the user, which is called the sense of embodiment. Studies (Slater and Sanchez-Vives, 2014) showed that people tend to alter their attitudes and behaviors to match the expectations that are implied by the attributes of their virtual body. This phenomenon is known as the Proteus effect (Yee et al., 2009).

Agents
With the constant advancement of technology in the fields of computer graphics, machine learning, and artificial intelligence (Petrović, 2018), virtual agents are becoming more and more realistic in both appearance and behavior. At the same time, the opportunities and the efficiency of their use increase.
In VR entertainment applications, such as videogames, we refer to VHs that are used as actors in the game environment as non-player characters (NPCs). They act in the game as hostile, friendly, or neutral characters to the player. Their behavior is most of the time scripted and limited to the level needed to support their role in the game. However, there are examples of NPCs that are able to interact in more complex ways with the player (Takahashi et al., 2018), such as expressing emotions (Li and Campbell, 2010), taking decisions autonomously (Xi and Smith, 2016), and acting independently. The NPCs are a crucial part of a VR game and can drastically impact the user's gaming experience (Petrović, 2018).
Using VR, agents can play the role of the audience in applications for practicing presentation skills and overcoming public-speaking anxiety. Individuals can practice their presentations or speeches in an immersive virtual environment that includes real-life conditions. Studies (Nazligul et al., 2017;Takac et al., 2019) have shown that these applications are found to be beneficial in treating social anxiety disorders. Also, the number and the behavior of an audience consisting of agents are highly flexible and customizable, allowing the gradation of the challenge level using different scenarios (Botella, Garcia-Palacios, Baños and Quero, 2009). In the same way, agents are used in the treatment of various types of phobias using VR. The virtual agents who, through the use of artificial intelligence, have the capability of engaging in humanlike conversations are referred to as conversational agents (Yildirim, 2021). In some examples, agents are used to help, guide, encourage, and motivate the patient, replacing the human therapist (Bălan et al., 2020), while sometimes replacing patients in training scenarios for doctors and therapists (Lok et al., 2006;Rizzo and Talbot, 2016), or motivating other patients (Najm et al., 2020). Agents are used as healthcare assistants (Kim et al., 2019) to support registered healthcare professionals in conducting clinical tasks and providing care to the patients. Also, a study (Lucas, Gratch, King and Morency, 2014) showed that VH-interviewers can increase willingness to disclose and elicit more honest responses in a clinical interview context. In educational VR applications, agents have a crucial role, either as teachers or students. Studies showed that using pedagogical (Johnson and Lester, 2018;Makransky et al., 2019) agents can improve students' learning experience in an educational VR environment, enhance their engagement, and improve their knowledge construction and performance (Grivokostopoulou et al., 2020). Also, agents can play the role of students in teacher training scenarios (Stavroulia et al., 2019).
These were just a few examples of how the recruitment of virtual agents can be beneficial in an unlimited range of applications. They can be used in combination with other technologies to replace humans in social tasks efficiently. To summarize, some of the advantages of the use of virtual agents are that they are always available, even for multiple instances at the same moment; affordable; fully customizable and flexible, in both appearance and behavior; and fully controllable.

Hybrid Agents-Avatars
While the addition of computer-controlled behavior in avatars is usually performed to cover the inability of the technology to mirror the user's behavior (Bailenson and Blascovich, 2004), hybrid agent-avatars can be used to modify or enhance the avatar-mediated communication in shared VEs (Roth et al., 2015). For example, a study (Beall, Bailenson, Loomis, Blascovich and Rex, 2003) showed the example that an avatar can be shown to maintain eye contact with more than one interactant at a time. A study by Oh, Bailenson, Krämer, and Li (2016) showed enhancing the smile that was tracked from the participant led to more positive communication outcomes. In another study (Roth, Mal, Purps, Kullmann and Latoschik, 2018), mimicry behavior was injected in an avatar-meditated interaction to enhance the interpersonal understanding and rapport between the interactants. Roth, Kullmann, Bente, Gall, and Latoschik (2018) altered the avatar's tracked gaze direction in selected occasions to induce a listening focus to the other user.

THE USE OF IMMERSIVE VIRTUAL REALITY AND VIRTUAL HUMANS FOR RESEARCH
We have previously referred to the benefits and possibilities that immersive virtual reality (IVR) technologies offer as well as to the solutions that these technologies provide in a wide range of fields. Besides that, researchers have come to realize early that IVR can be very useful as a research tool Tarr and Warren, 2002;Foreman, 2009). In the last 2 decades, IVR technologies are used for the study of human behavior and cognition in the fields of psychology (Wilson and Soranzo, 2015;Pan and Hamilton, 2018) and neuroscience (Bohil et al., 2011;Parsons et al., 2017;Bell, Nicholas, Alvarez-Jimenez, Thompson and Valmaggia, 2020).
IVR technologies not only can offer researchers solutions to address several methodological problems, but they also create new research possibilities that were not possible in the past.
With IVR technologies, researchers can achieve realistic and complex environments that simulate accurately the experimental scenario and, therefore, high mundane realism (the degree to which the materials and procedures involved in an experiment are similar to events that occur in the real world; Kelly, 2007). At the same time, IVR provides the capability to induce to the participant the illusion of presence and elicit realistic (similar to real-life) reactions (Slater, 2009), achieving high experimental realism (the extent to which situations created in experiments are real and impactful to participants; Kosloff, 2007). This applies also to experiments that include social interactions, through social presence, as subjective feelings, and behavioral and physiological reactions during human-VH interactions can be very similar to those shown during human-human interactions (Bombari, Schmid Mast, Canadas and Bachmann, 2015).
Consequently, VR offers the possibility to conduct experiments with high ecological validity ("the extent to which research findings would generalize to settings typical of everyday life"; Baumeister and Vohs, 2007, p. 276), something that in the past was very difficult and required a high amount of resources to be achieved. For example, in experiments studying social influence, actors trained to maintain the same verbal and nonverbal behavior across sessions were used as confederates (Asch, 1956;Milgram, 1963). These solutions not only lead to more expensive experimental scenarios but are also difficult to implement and can often affect the level of experimental control. And this is one of the main methodological problems for researchers, the tradeoff between ecological validity and experimental control Kothgassner and Felnhofer, 2020). VR technologies can provide a high level of ecological validity as they can generate stimuli that approximate the complexity of a real-life situation while allowing the investigator for near-perfect experimental control (Bombari et al., 2015;Parsons, 2015). The high level of experimental control and the flexibility offered to the experimenter by VR technologies "enables the researcher to selectively manipulate variables that in naturalistic situations cannot be independently investigated" (Parsons, 2015, p. 7).
In addition, using VR makes replication of studies easier. According to Blascovich et al. (2002), in domains such as social neuroscience and psychology, one of the reasons for the lack of replications is the difficulty for a researcher to implement and use the exact methods and procedures of other investigators. VR technologies, however, enable researchers to conduct perfect (or at least near-perfect) replications (Bombari et al., 2015).
Finally, using VR, researchers can conduct experiments with scenarios that are impossible (e.g., Friedman et al., 2014) or unethical (e.g., Gonzalez-Franco et al., 2018;Neyret et al., 2020) to be tested in real life. This is possible because participants react to virtual characters and events as if they were real, and at the same time they remain aware that there are no real danger and consequences as a result of their actions (Pan and Hamilton, 2018). For example, perception and behavior in dangerous or threatening situations can be studied, without participants being exposed to real danger (Kinateder et al., 2015;McCall, Hildebrandt, Bornemann and Singer, 2015). Even though the main effort in research and development focuses on the best possible simulation of the real world, VR has the possibility of going beyond the limits of physical reality (Slater and Sanchez-Vives, 2016). Rules that exist in the "real" world do not necessarily exist in a virtual world. The physical laws, the time continuity (Friedman et al., 2014), human body characteristics, and limits (Slater and Sanchez-Vives, 2014) are manipulatable by the researcher, creating new research opportunities. For example, in a recent study (Friedman et al., 2014), the participants were given the illusion of traveling back in time, having the ability to prevent a tragic event in which they were present.
Using VR, researchers are able to dramatically alter the participants' self-representation by inducing in them a sense of embodiment toward a virtual body with different characteristics. This ability created a wide range of opportunities for investigating the impact of self-representation on the individual's attitudes and behaviors (Maister, Slater, Sanchez-Vives and Tsakiris, 2015). Even if in experiments with such manipulations the ecological validity is typically low, researchers can investigate the interaction with different variables and expand the theoretical understanding of human cognition and behavior (Bombari et al., 2015). A study by Kilteni et al. (2013) showed that participants embodied in a dark-skinned, casually dressed, virtual body expressed significantly greater body movement in a task that required playing drums than participants embodied in a light-skinned, formally dressed, body. This result was attributed to the stereotype that a dark-skinned, casually dressed, body is expected to be more bodily expressive. Other studies (Maister, Sebanz, Knoblich and Tsakiris, 2013;Peck, Seinfeld, Aglioti and Slater, 2013) showed that embodiment in a dark-skinned body resulted in a reduction of the implicit racial bias toward dark-skinned people. Also, a study found that the impact on implicit racial bias remained even a week after the participants' embodiment experience (Banakou et al., 2016).
To summarize, VR technologies became a powerful tool for researchers and studying human behavior. They can provide a series of advantages, such as realistic and complex experimental scenarios with almost perfect experimental control of the environment and the VHs, allowing researchers to overcome methodological problems. Additionally, they create new research opportunities for testing scenarios that are difficult or even impossible to be conducted in real-life settings.

IMMERSIVE VIRTUAL ENVIRONMENTS AND VIRTUAL HUMAN TECHNOLOGIES
With VR we refer to the creation of simulated environments (i.e., IVEs) with the use of computer technology, software, and hardware. In contrast to traditional interfaces, VR not only displays the created environments to the users but also gives them the feeling that they are "inside" the environment. This is achieved by "careful integration of hardware and software systems, including multimedia development software, databases, computers, rendering engines, and user interfaces" (Blascovich et al., 2002, p. 107). Today, typical VR systems provide stereoscopic vision that is updated as a function of the user's head-tracking and directional audio (Slater and Sanchez-Vives, 2016). It is also common for the VR systems to provide additional tracking technology (apart from the head) for the user's hands or even for the full body. An article by Slater and Sanchez-Vives (2016) presents an overview of the basic concepts and the technology of VR systems.
The applications described above are feasible due to the huge technological advances that have taken place in the last 2 decades. Nevertheless, the possibilities of the current technology are not unlimited, but on the contrary they include several limitations and disadvantages. Therefore, the ideal virtual reality, in which the experience offered can be compared to that of the real world, is far from the possibilities of today. Above that, the availability, the cost, and the physical and technical limitations and drawbacks of the current technology are creating additional limitations and tradeoffs on the quality of a virtual reality experience.
For example, the visual fidelity of and rendering quality of virtual environments (and VHs) are limited by the computational capability of the computer. Some techniques are used for the optimization of performance that usually sacrifice realism, such as precomputed illumination (i.e., lighting, shadows, and reflections) instead of real time illumination that is changing dynamically. Another limitation is that the display quality (resolution and refresh rate) of the current VR systems is quite limited even in the most sophisticated VR devices (i.e., HMDs), with the distinction of pixel still visible and distracting. Despite continuous advances in computing power, graphic representation, and display quality, the visual quality in VR is yet far from perfect.
Even more challenging than displaying visually plausible virtual environments and humans is the attempt to display environments and humans that are behaving and interacting with the user in a realistic way. The way that users in VR can interact with virtual objects is an ongoing challenge. Designing of VHs (i.e., agents) that behave and interact with the user in a realistic way is an even bigger challenge due to the complexity of human behavior. Using the current technology, as described in the previous section, agents can interact with the user, have a verbal conversation, or show nonverbal responses such as facial expressions and gestures. However, in these examples, each aspect of the agents' behavior and intelligence is limited to the functions implemented by the creators, usually to support the purpose of the application.
Regarding avatars, the accurate resample of human actions such as body movements, facial expressions, and eye movements on the virtual body is important for inducing the sense of body ownership, as well as for communication with other users in sheared immersive environments. There are many available methods that are used to transfer the users' body movements to the avatar. These methods use different technologies and vary in accuracy, cost, and convenience of use. Advanced motion tracking systems that are used for full-body motion tracking of the user provide very accurate resemblance of the users' body movement on the avatar with low latency; however, these systems are very expensive and require the users to wear a suit of trackers and time for calibration. For finger tracking, additional gloves are required. Head position and orientation is typically tracked by the HMD. Commercial VR systems typically include tracked controllers that in combination with an inverse kinematic technique can be used for approximating the pose of the arms. Similarly, using 2, 3, or more additional trackers (typically for the feet and the waist), a full-body motion approximation can be achieved. This method does not provide as accurate results as the advanced motion tracking systems; however, it is significantly more affordable and easier to set up and use. Alternatively, instead of using additional trackers for the legs, prerecorded walking animations are used for the leg movement. Another method of tracking the users' body is using depth camera devices. This method has the advantage of not requiring the user to wear or hold any equipment; however, the tracking quality is limited. Additionally, HMDs with a built-in eye tracker, as well as facial trackers, are commercially available, which can be used to track the user's eye movements and facial expressions (including lip motion while talking), respectively, and render them on the avatar.
Besides visual information, for creating realistic VR experiences, additional senses such as touch, smell, temperature, and even taste (Rubio-Tamayo, Gertrudix Barrio and García García, 2017), can include meaningful information in face-to-face interactions. Additionally, a crucial aspect for inducing a sense of embodiment over a virtual body (i.e., avatar) is the creation of the illusion that the virtual body is the source of the experience sensations (Kilteni et al., 2012), usually achieved using synchronous visuotactile or visuoproprioceptive stimulation. Successful embodiment can have an impact on social interactions with VHs (Ratan, Beyea, Li and Graciano, 2020). Researchers used several tricks to simulate the sense of touch to the participants, such as the experimenter touching the participant with a wand . Today a wide range of devices are commercially available (Perret and Vander Poorten, 2018), mainly haptic gloves, with different approaches and functions. However, providing realistic haptic feedback with easy-to-use equipment remains a challenge.
Other limitations and problems that are associated with VR technologies over time are still challenging and need to be addressed in the future. One of them is the physical discomfort or cybersickness  that may result from the use of HMDs and can have a negative impact on the user's experience in the VE (Weech et al., 2019) and, therefore, on the social interactions taking place in it. One way of dealing with the problem of cybersickness is improving the hardware, by increasing the refresh rate, improving headtracking quality, and reducing tracking and display delay (Chang et al., 2020). Cybersickness is also attributed to the content of the VR application. For that reason, it is very important to develop VR applications to avoid content that promotes cybersickness and include techniques that are proven to reduce cybersickness.
Another inhered problem of VR is the locomotion within the virtual environment (Cherni et al., 2020). Even with the current HMDs that include positional tracking and allow the user to walk physically, the walking area is restricted to the physical space. Another method of navigation in the virtual environment is using a joystick; however, this method is associated with cybersickness (Saredakis et al., 2020). For that reason, teleporting has become a popular way of navigation in VR. A new way of locomotion in VR is omnidirectional treadmill devices that allow the user to navigate with seminatural movements while staying in place. However, these devices are still expensive and not easy to use .

VIRTUAL REALITY CONCEPTS Immersion
The ability of the system to provide the user with an illusion of reality is called immersion and is defined as "the extent to which the computer displays are capable of delivering an inclusive, extensive, surrounding and vivid illusion of reality to the senses of a human participant" (Slater and Wilbur, 1997, p. 3). Consequently, immersion can be objectively assessed, based on technical parameters used to describe a system.
As mentioned above, VR systems are not designed only to display the virtual environment to users but also attempt to induce the feeling that they are "inside" the environment, and that is what makes VR special. However, the term VR is sometimes used to describe systems that do not have the technical capability to induce the user with the sense of being inside the virtual environment that is displayed by the system. The terms non-immersive VR and desktop VR are also used to describe these systems. In this article, the term VR is used to describe immersive VR systems.

Presence
The use of VR technologies in a wide range of fields and the use of VHs in many of these applications were discussed in the previous section. A crucial factor for the effectiveness of many of these applications is that the user perceives and responds to the events and situations taking place in the virtual environment as if they were real. Empirical studies have explored factors that contribute to realistic behavior in immersive virtual environments, while various theories have attempted to explain this phenomenon. Most of these theories are based on the concept of presence, the sense of "being" in the virtual environment, also referred to as telepresence or place illusion (Ijsselsteijn and Riva, 2003;Sanchez-Vives and Slater, 2005;Slater, 2009). Slater (2009) defines presence as "the strong illusion of being in a place in spite of the sure knowledge that you are not there" (p. 3551).
Although it is strongly related to immersion (Slater, 2003), presence is a subjective perception determined by how the person perceives and interprets stimuli, defined by characteristics of the VR system and the level of immersion (Ijsselsteijn and Riva, 2003).
Presence has been the main focus of both applied and academic work on VR as it is associated with the effectiveness of a VR experience. The greater the sense of the user's presence in the virtual environment, the more realistic (similar to the real world) their reactions and behaviors are and, in turn, the more successful the VR application is (Cummings and Bailenson, 2016).

Social Presence
As described above, VR is capable of inducing to the users a sense of presence, which is the feeling of being in the virtual environment. The greater the sense of the users' presence in the virtual world, the more realistic (similar to the real world) their reactions and behaviors are. However, the sense of "being there" is not enough for a realistic perception and reaction toward VHs (Lee, Jung, Kim and Kim, 2006). In virtual environments, where the user coexists with VHs, it is important that the user perceives the presence of the VH not only physically but also socially. Social presence (also referred to as co-presence) refers to the extent to which the user actively perceives a VH in a virtual environment and at the same time has the sense that the "other" perceives the presence of the user (Biocca, 1997;Oh et al., 2018). While presence describes the illusion of "being" in a virtual space that may include VHs, social presence refers to the experience of "being together" with a sentient social being, either an agent or an avatar .
Social presence is important due to the impact it has on social influence (Blascovich, 2002) and is associated with a variety of positive communication outcomes (Oh et al., 2018). For example, the results of a study (Thellman, Silvervarg, Gulz and Ziemke, 2016) demonstrated the effect of social presence on social influence by VHs. Specifically, participants who reported a stronger social presence were more inclined to accept the VH's offer in an ultimatum game. The impact of social presence on social influence is demonstrated by other studies (e.g., Hoyt et al., 2003;Strojny, Dużmańska-Misiarczyk, Lipp and Strojny, 2020). Consequently, the greater the sense of the users' social presence for a VH, the more realistic (similar to human-human and face-to-face) their social reactions are. This makes social presence a vital component for the realism and the effectiveness of social interactions between the user and VHs in VR environments. Also, studies (Schroeder et al., 2001;Heldal et al., 2005;Guimarães et al., 2020) showed that the participant's sense of social presence to VHs was higher for immersive VR than a non-immersive platform. This finding indicates the advantage of VR over non-immersive technologies in simulating social interactions with VHs.

SOCIAL INTERACTION WITH VIRTUAL HUMANS
Numerous studies show that people react socially to VHs. While an individual interacts with an avatar (or believing that it is an avatar), social responses are expected because such an interaction is perceived to be a human-human interaction mediated by the technology (Nowak and Fox, 2018). But why do individuals respond socially even if they know (or believe) that they are interacting with an agent, directed by a computer? Several theories attempt to explain social effects in interactions with computers. Earlier theories suggested that individuals socially react to computers temporarily due to the novelty of the situation (Kiesler and Sproull, 1997) or due to human deficits such as ignorance (Barley, 1988). Another approach suggests that social reactions are oriented toward the programmer rather than the computer itself (Dennett, 1987). However, the above theories have not been adopted and have become obsolete. The prevailing theory (Nasset al., 1994;Nass and Moon, 2000), known as the computers are social actors (CASA) paradigm, supports that social responses to computers result neither from the users' belief that they are interacting with the programmer nor from ignorance. Instead, the CASA paradigm argues that people unconsciously react to computers in the same way as they do toward humans. This can be attributed to the fact that the human brain is developed to automatically respond to social cues to deal successfully with daily life (Reeves and Nass, 1996, p. 97).

Evaluating Social Interactions With Virtual Humans
Several methods are used in the literature for the evaluation of the quality of the interactions with VHs. A common method of evaluation of social interactions with VHs is through subjective measures. Specifically, using self-reported questionnaires with which the participants are asked to evaluate their experience after their exposure to an IVE using scales such as social presence (Bioccaet al., 2003;Bailenson, Blascovich, Beall and Loomis, 2003), self-reported copresence, perceived other's copresence (Nowak, and Biocca, 2003) the Quality of Interaction, and Social Meaning (Li et al., 2019), and other positive communication outcomes such as likability and credibility (Guadagno, Blascovich, Bailenson and McCall, 2007).
Behavioral objective measures are also used in evaluating social interactions with VHs. Using VR technologies is very convenient for recording several aspects of the participants' behavior, for example, using the built-in motion trackers of the HMD and the built-in eye trackers, and by recording the participants' actions and navigation within the virtual environment. Measures such as participants' gaze behavior , interpersonal distance (Bailenson, Blascovich, Beall and Loomis, 2003;), verbal behavior (von der Pütten et al., 2010Oh et al., 2016), social influence Neyret et al., 2020;Dzardanova, Kasapakis, Gavalas and Sylaiou, 2021), mimicry (Hasleret al., 2017), persuasion (Guadagno et al., 2007), and others are used as indicators of the effectiveness and the quality of social interactions.
Also, to avoid possible biases from confounding variables such as personality traits and simulation sickness, they are measured and used as control variables (e.g., .

FACTORS AFFECTING SOCIAL INTERACTION WITH VIRTUAL HUMANS
The benefits of recruiting VHs in a wide range of applications are reviewed in a previous section. The effectiveness of these applications usually requires that the user perceive and interact with VHs as if they were real humans. For that reason, investigation of the factors that enhance social presence and increase social influence with VHs has attracted great interest by the researchers. An overview of the main findings regarding the factors that affect the social interaction with VHs is reviewed in this section.

Representation of the Virtual Humans
The way that VHs look and behave varies between different VR applications. These variations are not only due to the different capabilities of the VR systems regarding graphical quality and the interactivity, and the effort and the skill of the creators of the VR applications to provide convincing VHs but also due to the nature and purpose of the VR application. This results in VHs with different levels of realism. Several studies were conducted to investigate the impact of the VHs' visual and behavioral realism on social interactions.

Visual Realism
While studies showed that the presence of a VH's visual representation leads to a higher level of social presence compared to the absence of any visual representation (e.g., voice only), the effect of VHs' visual (photographic and anthropomorphic) realism is not consistent (Oh et al., 2018). For example, a recent study (Zibrek et al., 2019) investigated the level of a VH's visual realism using three render styles: realistic, simple, and sketch styles. The results showed that the level of a VH's visual realism did not have an impact on the participants' sense of the social presence of the VH. The impact of visual realism on the participants' emotional response was attributed to the fact that realistic rendering of the VH's facial expressions was more perceivable than the less realistic rendering, which is not directly associated with the level of realism.

Behavioral Realism
In contrast with visual realism, the VH's behavioral realism consists of an important factor for social interactions and a powerful predictor of social presence (Oh et al., 2018). Behavioral realism refers to the extent to which a VH behaves in the way an actual person would behave. Several studies showed that increasing the VH's behavioral realism leads to a stronger sense of social presence, especially when the VH's behavior indicates awareness of the user's presence (e.g., mutual gaze) and provides interactivity. The interactivity of a VH's behavior is an important factor for creating social presence (Oh et al., 2018) as it gives the impression that the VH is aware of the user's presence and actions. For example, a study (von der Pütten et al., 2010) showed that participants felt higher levels of social presence and mutual awareness, and talked more when the VH showed feedback behavior (head nodding) than when the VH did not show any feedback behavior. Another study (Guadagno et al., 2007) showed that VHs with more realistic gaze behavior led to a higher sense of social presence. Additionally, male participants reported more attitude change after interacting with male-like VHs with behavioral realism compared with male-like VHs with lower behavioral realism. Another study (Pan, Gillies Slater, 2008) focused on the effects of a VH's blushing during an embarrassing situation on participants' reaction. Especially, the effects of no blushing, cheek blushing, and whole-face blushing were compared. The results of the study showed that the VH's whole-face blushing improved participants' degree of social presence, while participants in the cheek blushing condition tended to withdraw earlier from the VH's presentation. A study by Roth et al. (2016) showed no difference in the effectiveness in a verbal negotiation task between participants embodied in abstract avatars without gaze behavior and facial expressions in VR, compared with physical word setting. This result suggests that the absence of behavioral cues can partly be compensated.
Enhancing the VH's behavioral realism implies increased social channels (e.g., the inclusion of facial expressions or gaze behavior) that are simulating better the face-to-face interactions. A study by  investigated the impact of nonrealistic (in the means of simulating face-to-face interactions) social cues (i.e., social augmentations), by visualizing eye contact with floating bubbles, joint attention with particles, and grouping by matching the color of the abstract box-shaped avatars. The results of the study showed that the augmentations had a positive impact on participants' sense of social presence as well as an influence on their behavior. This result suggests that increasing social cues is important for social interactions with VHs, despite if these cues are replicating face-to-face interactions or not. It is also revealing the potential of VR to enhance social interactions with additional social channels.

The Uncanny Valley
Additionally, the uncanny valley theory (Mori et al., 2012) that initially referred to humanoid robots but also applies to VHs suggests that the relation between a VH's realism and the perceiver's affinity for it is not linear. Instead, as VHs appear more human-like, they become more appealing up to a certain point. When a VH looks and moves to an almost life-like degree, but not yet as a human, it is perceived as creepy and unsettling. Only when the realism of a VH is fully convincing will it elicit positive responses. Consequently, this effect can have a negative impact on social interactions with VHs (Nowak and Fox, 2018). The results of a study (Groom et al., 2009) support the uncanny valley theory, as the VH received lower evaluations by the participants when exhibiting more realistic behavior (i.e., lip sync and body movement). The persuasiveness of the VH is not affected by the level of realism.

Self-Representation
Studies showed that the appearance of the user's avatar (i.e., selfrepresentation in the virtual environment) may have an impact on the social interactions with VHs (Ratan, Beyea, Li and Graciano, 2020). This effect is related to the sense of embodiment inside the virtual body (Kilteni et al., 2012), and the tendency of altering attitudes and behaviors to match the expectations that are implied by the attributes of their virtual body, named the Proteus effect (Yee et al., 2009;Slater and Sanchez-Vives, 2014). For example, a study by Yee and Bailenson (2007) showed that participants embodied in taller avatars were more confident in a negotiation task (the ultimatum game; Forsythe, Horowitz, Savin and Sefton, 1994) with an agent confederate.

Agency
Agency is the extent to which the user believes that a VH is controlled by another user (avatar) rather than a computer through an algorithm (agent). Blascovich (2002) defines agency as "the extent to which individuals perceive virtual others as representations of real persons" (p. 130). When the user has the impression that a VH is controlled by another user, the level of agency is high. Instead, when the user believes that a VH is controlled by the computer, the level of agency is considered to be low. It is important to state that the level of agency describes the user's perception of the VH as an agent or an avatar, rather than the VH's actual state (Fox et al., 2015). Additionally, agency is a continuum, as individuals perceive a VH to be partially controlled by a human and the computer (Blascovich, 2002). It is important to note that the term agency is also used to describe the feeling of controlling one's own (virtual) body (Tsakiris et al., 2006), and the two definitions should not be confused.
The impact of agency on social interactions with VHs is not clear in the literature. According to the CASA theory, the responses to computers that exhibit human characteristics are mindless and automatic (Reeves and Nass, 1996;Nass and Moon, 2000), and therefore, people will respond socially to VHs regardless of the level of agency. On the contrary, the Threshold Model of Social Influence (Blascovich, 2002;Blascovich et al., 2002) argues that agency, along with behavioral realism, is a major factor that affects social presence.
According to the Threshold Model of Social Influence, an increase in agency and/or behavioral realism leads to an increase in social presence. If/when social presence meets a threshold value, social influences begin to operate. Specifically, when the user believes that the VH is controlled by the computer (low agency), the VH must behave very realistically in order for the social influence threshold to be met and social influence to occur. If the individual believes that the VH represents a real person (high agency), then behavioral realism does not need to be high to cause a social reaction. According to the authors, the location of the social influence threshold varies as a function of two moderating factors, which are interpersonal self-relevance and the response system. Interpersonal self-relevance is the importance of the interaction to the individual's sense of self. In a social interaction that requires a discussion of one's beliefs and attitudes (e.g., participating in a job interview), the interpersonal self-relevance is expected to be high. In social interactions that do not involve central or core aspects of an individual (e.g., making a small withdrawal from a bank), the interpersonal self-relevance is expected to be low. According to the model of social influence, when self-relevance is low, the threshold's slope is shallow, which means that lower behavioral realism is required for social influence to occur. Instead, in high self-relevance interactions, the slope is steep, and therefore, higher behavioral realism is required for the threshold to be crossed and social influence to occur. The second factor that moderates the social influence threshold is the level of the behavioral response system of interest. For low-level response systems such as unconscious reflexes, the threshold is lower compared to high-level response systems such as verbal communication. Therefore, a lower level of agency and behavioral realism is required for low-level, implicit, or automatic social responses than for high-level response systems involving purposeful and conscious actions.
Several studies explored the impact of agency on social interactions with VHs. The perceived agency was manipulated generally by introducing the VH as an agent or an avatar prior to the interaction. For example, a study by Guadagno, Swinth and Blascovich (2011) examined the social evaluations (i.e., empathy and positivity) for a virtual peer counselor, who was introduced as either an agent or an avatar. The VH had two levels of behavior (i.e., smile and not smile). The results showed that the VH's smile affected the social evaluations; however, the level of agency moderated this effect. Specifically, the social evaluations were enhanced by the smile behavior for participants in the low-agency condition but were degraded in the high-agency condition. Using two experiments, de Melo, Gratch and Carnevale (2014) examined the effect of the VH's emotional expressions on participants' behavior. The results of the first experiment showed that the participants collaborated more with the VH who exhibited collaborative instead of competing expressions in a social dilemma, and this effect was more intense in the highagency condition. In the second experiment, the participants who were led to believe that they were interacting with an avatar conceded more in a negotiation task when the VH showed angry expressions. Instead, in the low-agency condition, the participants conceded the same regardless of whether the VH showed neutral or angry emotions. The results of a study (Felnhofer et al., 2018) that examined social avoidance tendencies and prosocial behaviors toward VHs were contradictory regarding the impact of agency. While presence, social presence, social interaction anxiety, and stress were not affected by agency, participants in the avatar condition showed more social avoidance and prosocial behavior. The results of a study by von der Pütten et al. (2010) showed no effect of agency on participants' social behavior and evaluations.
As shown above, there are several examples in the literature aiming to compare the usage of agents versus avatars, with many studies proving that avatars affect the social behavior of participants to a greater extent than agents, whereas others demonstrated no significant difference between the two. A meta-analysis by Fox et al. (2015) showed that perceived avatars produced stronger responses than perceived agents. A systematic review (Oh et al., 2018) reported that approximately half of the studies surveyed showed an impact of agency on social presence, whereas in the remaining half of the studies the participants perceived similar levels of social presence regardless of the level of agency.

Level of Immersion
Regarding social presence, the level of immersion does not seem to be as crucial as it is for presence (Oh et al., 2018), although some studies (Schroeder et al., 2001;Heldal et al., 2005) showed that participants reported a stronger sense of social presence when using an immersive compared to a non-immersive platform. Also, a recent study (Bailey et al., 2019) showed that children in an IVR condition demonstrated greater social influence (compliance) from a virtual character than children in a non-immersive condition, suggesting that IVR may elicit differential cognitive and social responses compared to less immersive technologies.

DISCUSSION AND FUTURE DIRECTIONS
In this article, we presented the applications and the potential of IVR and VHs in a wide range of fields such as training, education, and health. Additionally, we presented the benefits of using IVR as a research tool on experimental research in fields such as cognitive and social neuroscience and psychology. This potential stems from the many advantages of VR over traditional media. However, to be effective, many of these applications require that the user react to the virtual stimuli in a realistic way. The ability of the VR technologies to immerse the user in a virtual environment, and therefore to react in a realistic manner to it (as the user was physically there), is considered straightforward due to the ability of VR to induce the illusion of "being" inside a virtual environment. This sense of being in the virtual environment is called presence and is associated with realistic reactions to the virtual stimuli.
In contrast, eliciting realistic reactions to social stimuli within virtual environments seems to be more complex, and a deeper understanding of the users' cognitive process is required to achieve them. While studies demonstrated realistic social reactions toward VHs within virtual environments, other studies failed to replicate social effects using VHs. To react realistically to a social situation, the user not only has to perceive the VH as it is physically present but also mentally present as it was a sentient human being. The extent to which the user actively perceives a VH in a virtual environment and at the same time has the sense that the "other" perceives the presence of the user is called social presence.
Several factors of the design of the VR applications and the virtual representations seem to impact the effectiveness of human-VH social interactions in terms of realistic reactions by the user. In this article, we listed several of these factors. Concerning the VH's representation, the literature suggests that visual realism (image fidelity) seems to be not so important in creating social presence and eliciting realistic social responses to the user. On the other hand, the literature suggests that the behavioral realism of a VH (the extent to which a VH behaves like a real human) is an important factor for social influence. Behavioral realism consists of many parameters such as verbal and nonverbal behavior (body movements and gestures, facial expressions, and gaze behavior), responsiveness, and interactivity with the environment and the user. Therefore, more research is needed in the direction of designing VHs' behavior to enhance their social potential.
As described in this article, the use of virtual agents offers many advantages over the use of avatars. The creation of agents that are perceived and treated by the users in a similar way as avatars is very important. The role of agency, the extent that the user believes that a VH is controlled by other humans rather than by the computer through an algorithm, is not clear in the literature. While studies supported the theory that users will respond socially to a VH only (or to a greater extent) when it is perceived as an agent (controlled by other users), other studies showed no impact of agency on social presence or social influence. According to the theory (Blascovich, 2002), the importance of agency depends on the type of interaction. Specifically, unconscious and automatic social reactions seem not to be affected by the level of agency (Nass and Moon, 2000), while interactions require more conscious social responses that are more likely to occur when the VH is perceived to be an avatar, controlled by another human, or an agent who behaves very realistically (Blascovich, 2002). Therefore, more studies are needed to investigate the impact of agency on social interactions with VHs, taking into account the type of interaction. Additionally, according to Blascovich (2002), agents that are behaving realistic enough to exceed the threshold of social influence may overcome the limitation of agency and perceive the same way as agents, despite the fact that the user knows that they are interacting with an agent. This demonstrates the need of future research in the direction of creating agents with plausible, intelligent, and interactive behavior, which might be "the biggest challenge in social VR research" (Pan and Hamilton, 2018, pp. 410-411).
Another direction that is offered for future research is the impact of self-representation in social interactions in VR environments. The sense of embodiment is the perception of the virtual body by the participant as his biological body (Kilteni et al., 2012), which could be achieved by using real-time full-body motion tracking technology and by mapping the participants' movements to those of their virtual avatars. Studies (Slater and Sanchez-Vives, 2014) showed that people tend to alter their attitudes and behaviors to match the expectations that are implied by the attributes of their virtual body, including social behavior (Yee and Bailenson, 2007). We presume that there is a great scope for further research (Mal, 2020) on the impact of several aspects of self-representation (e.g., visual realism, body characteristics, gender, and age) in many forms of social interactions in VR.
Also, there is evidence that the level of immersion has an impact on social interaction with VHs; however, the literature is very limited. Further investigation is needed on whether more immersive systems can enhance the realism of social interactions with VHs.
Finally, the commercialization of social VR to the general audience in the form of entertainment and socialization may involve risks and unpleasant psychological and social consequences. An article by Slater et al. (2020) summarizes the potential negative implications of VR. Studies showed that the exposure to VR and especially virtual embodiment can lead to beneficial emotional, cognitive, and behavioral changes. However, the same techniques can be used to the opposite direction, leading to negative and undesired changes. Also, exposure to enjoyable environments and interactions, as well as the ability to create a desired self-representation, can create an individual preference of the virtual world over the real world, or even lead to prioritizing the virtual world. Studies also showed that VR and VHs influence the behavior and actions of an individual, with social effects such as persuasion (Guadagno et al., 2007), obedience (Neyret et al., 2020), and conformity . However, in contrast with the real world, a virtual environment and its virtual occupants, agents and even avatars, are highly controllable by the administrator of the VR application. This gives great power to the administrator of such applications over the users' behavior. These are only some examples of the ethical concerns raised by the introduction of VR as a mass consumer product and demonstrate that ethics is a major challenge for VR.
To sum up, realistic social interactions with VHs are crucial for the effectiveness for many VR applications; however, it is not yet clear how to achieve them, and further research is required.

AUTHOR CONTRIBUTIONS
CK and DM-G have made a substantial, direct, and intellectual contribution to this work.

FUNDING
This work has been partially funded by ED-DESPINA MICHAIL-300155-310200-3319 budget of the Cyprus University of Technology.