Social Presence as a Moderator of the Effect of Agent Behavior on Emotional Experience in Social Interactions in Virtual Reality

Background: Exposure therapy involves exposure to feared stimuli and is considered to be the gold-standard treatment for anxiety disorders. While its application in Virtual Reality (VR) has been very successful for phobic disorders, the effects of exposure to virtual social stimuli in Social Anxiety Disorder are heterogeneous. This difference has been linked to demands on realism and presence, particularly social presence, as a pre-requisite in evoking emotional experiences in virtual social interactions. So far, however, the influence of social presence on emotional experience in social interactions with virtual agents remains unknown. Objective: We investigated the relationship between realism and social presence and the moderating effect of social presence on the relationship between agent behavior and experienced emotions in virtual social interaction. Methods: Healthy participants (N = 51) faced virtual agents showing supportive and dismissive behaviors in two virtual environments (short interactions and oral presentations). At first, participants performed five blocks of short one-on-one interactions with virtual agents (two male and two female agents per block). Secondly, participants gave five presentations in front of an audience of 16 agents. In each scenario, agent behavior was a within subjects factor, resulting in one block of neutral, two blocks of negative, and two blocks of positive agent behavior. Ratings of agent behavior (valence and realism), experience (valence and arousal), and presence (physical and social) were collected after every block. Moderator effects were investigated using mixed linear models with random intercepts. Correlations were analyzed via repeated measures correlations. Results: Ratings of valence of agent behaviors showed reliable relationships with experienced valence and less reliable relationships with experienced arousal. These relationships were moderated by social presence in the presentation scenario. Results for the interaction scenario were weaker but potentially promising for experimental studies. Variations in social presence and realism over time were correlated but social presence proved a more reliable moderator. Conclusion: Our findings emphasize the role of social presence for emotional experience in response to specific agent behaviors in virtual social interactions. While these findings should be replicated with experimental designs and in clinical samples, variability in social presence might account for heterogeneity in efficacy of virtual exposure to treat social anxiety disorder.


Theoretical Background
Exposure Therapy is considered the gold-standard treatment for anxiety disorders, especially phobic disorders. It involves "the systematic confrontation of feared stimuli" (Steinman et al., 2016). Whereas the confrontation in vivo (in the real world) is highly effective for a variety of disorders, there are a number of problems which hinder its application in routine care, among them the considerable effort involved in preparing and conducting the exercises, often outside the common therapy setting (Neudeck and Einsle, 2012). These problems can be countered by the application of exposure therapy in virtuo, i.e., in Virtual Reality (Virtual Reality Exposure Therapy, VRET; Rothbaum et al., 1997). VRET has been proven to be as effective as in vivo Exposure Therapy for multiple anxiety disorders (Carl et al., 2019). With respect to Social Anxiety Disorder (SAD), however, the results have been heterogeneous. Some studies found an advantage of in vivo exposure (Kampmann et al., 2016), others a comparable effectiveness between in vivo and in virtuo exposure therapy (Chesham et al., 2018), and others still, a superiority of virtual exposure (Bouchard et al., 2017). On a meta-analytic level, Carl et al. (2019) concluded that VRET is as effective as in vivo exposure for SAD, whereas a recent meta-analysis, with more stringent criteria on what constitutes in vivo exposure, concluded that in vivo exposure is more effective compared to VRET for Social Phobia (Wechsler et al., 2019). Given these contrasting results, a better understanding of the underlying working mechanisms of virtual reality exposure in the case of social anxiety is clearly required.
An important mechanism underlying VRET is presence. Presence can be commonly defined along the lines of a "sense of being there in a mediated environment" (Lombard and Jones, 2015). The precise nature of the bi-directional relationship between presence in VR and experienced emotions is not fully understood and the focus of ongoing research (Bouchard et al., 2008;Diemer et al., 2015). It has been suggested that a certain amount of presence is only a precondition to experience emotion but does not influence its intensity (Felnhofer et al., 2014(Felnhofer et al., , 2015. Yet a common finding in VRET is a correlation between sense of presence and experienced anxiety, strongest for participants fulfilling criteria for anxiety disorders (Ling et al., 2014). Importantly, this correlation could not be established for Social Phobia (Ling et al., 2014). A specific form of presence, social presence, has commonly been defined in distinction to physical presence, among others as "the feeling of being together (and communicating) with someone" (Ijsselsteijn et al., 2000) or as "a psychological state in which virtual [. . .] social actors are experienced as natural social actors" (Lee, 2004). Social presence has been rarely reported in studies on VRET for Social Phobia until recently. But social presence appears to be more strongly related to the anxiety response in social phobia than physical presence (Felnhofer et al., 2019).
A different conceptualization of presence by Slater (2009) distinguishes between place illusion (PI) and plausibility illusion (Psi). PI, the feeling of being in the virtual environment, is related to most conceptualizations of physical presence and could be sufficient to trigger fear related to places, objects, and animals. But to trigger the core fear of social anxiety (i.e., evaluation by other humans), plausibility of the displayed behaviors may be essential. According to Slater (2009), Psi "includes the notion of the credibility of events in comparison with what would be expected in reality in similar circumstances." In the Multimodal Presence Scale, a questionnaire designed to measure the three dimensions of presence defined by Lee (2004), the subscale social presence contains but is not limited to "human realism," which includes experiencing virtual agents as "credible" (Makransky et al., 2017). Slater (2009) states that "when Psi (plausibility illusion) breaks, it is unlikely to recover." This implies that the effect of evaluative behaviors by these virtual characters could have diminished or ceased, i.e., this perceived "reality" of the characters (or the related experiential construct social presence) would moderate between the evaluative behaviors and the experience in response to them.
The goal of the study is to show in a correlative setting that an effect of the perception of behaviors with evaluative content (by agents in VR) on corresponding experience (in valence and arousal) exists and is moderated by the perceived realism of these behaviors and the experienced social presence in these situations.

Primary Hypotheses
It is hypothesized that the perceived valence of the behavior of agents in virtual social situations evokes an experience congruent to the behavior (i.e., the lower/more negatively valenced the behavior of agents is rated, the lower/more negative are ratings of experienced valence and the higher are ratings of experienced arousal and vice versa; hypothesis 1a valence and 1b arousal). The perception of realism of agent behavior over the different behavior variants correlates with the experience of social presence (hypothesis 2). The effect of valence of agent behavior on emotional experience (valence and arousal) is moderated by perceived realism of agent behavior (hypothesis 3a valence and 3b arousal) and social presence (hypothesis 4a valence and 4b arousal).

Secondary Hypotheses -Evaluation of Scenarios
We assume that the non-neutral behavior variants were successful in depicting behaviors that were distinguishable from neutral in evoking corresponding experiences. Indicators of a successful implementation are ratings for valence of agent behavior and experienced valence and arousal in all four nonneutral conditions that differ from neutral in the expected direction (larger/more positive for supportive and interested, smaller/more negative for uninterested and dismissive). Perceptions of differences of agent behavior have to be considered as the most basic indicator, differences in experienced valence as intermediate and differences in arousal as the strongest indicator of a successful paradigm. Ideally, no differences are apparent between behavior variants in physical presence, realism of agent behavior, and social presence.

Participants
In total, 52 healthy participants of both genders were recruited via bulletin boards and social media. Participants received 10€ per hour or course credit. Exclusion criteria were self-reported neurological or mental illnesses. One person was excluded from analysis post hoc due to self-reported mental disease. The remaining 51 participants were included in the analysis (33 female, aged between 18 and 47, M 23.16, SD 4.83). Seven participants had prior experience with giving presentations in VR. Experimental procedures were in line with the Declaration of Helsinki and the study was approved by the Ethics Committee of the University of Regensburg. All participants gave written informed consent.

Design
The study consisted of two experimental scenarios, i.e., an interaction scenario and an oral presentation scenario. Within each scenario, we manipulated the within-subject factor agent behavior by presenting five distinct blocks which differed only in the behavior that the virtual agents showed towards the participant. The implemented agent behavior variants aimed at representing supportive, interested, neutral, uninterested, and dismissive behavior, leading to five factor levels (the operationalization of the behavior variants are described in the procedure section of the respective scenario). All agent behavior variants were presented for every participant. To ensure comparability for evaluation of behavior variants in contrast to neutral behavior, neutral was the first variant for every participant. The order of the remaining variants was counterbalanced across participants.

Questionnaires
Physical and social presence were assessed separately for both scenarios using the Multimodal Presence Scale (MPS; Volkmann et al., 2018). Social anxiety was measured using the Social Phobia Inventory (SPIN; Stangier and Steffens, 2002), fear of negative evaluation using the Brief Fear of Negative Evaluation questionnaire (BFNE; Leary, 1983), fear of public speaking using the Personal Report of Confidence as a Speaker questionnaire (PRCS; Paul, 1966) and submissive behavior using the Submissive Behavior Scale (SBS; Allan and Gilbert, 1997). Symptoms of VR sickness were assessed with the Virtual reality sickness questionnaire (VRSQ; Kim et al., 2018) separately for both scenarios. In the sample, SPIN ranged from 5 to 35 (M 17.12, SD 7.98), BFNE from 11 to 43 (M 24.65, SD 7.62), PRCS from 4 to 23 (M 13.2, SD 5.1) and SBS from 11 to 44 (M 24.4, SD 6.17). Results of MPS and VRSQ are displayed separately for both scenarios in Table 1. Custom questionnaires were used to assess demographic information (age and gender), previous experience with presentations in VR (yes/no; if answered yes, follow up questions on number of studies and topics of presentation for these studies) and self reported presence of mental illness (yes/no; if answered yes, follow up questions on diagnosis, psychological and medical treatment as yes/no).

Apparatus
The virtual environments were created by VTplus (Würzburg, Germany) using Unreal Engine (Version 4.25, Epic Games, inhouse Interface VrSessionModUDK 1.0.16) and were presented via the HTC Vive Pro Eye head mounted display (HTC Corporation, Taoyuan, Taiwan). Participants wore the inbuilt headphones for presentation of auditory stimuli. The sequence of statements comprising the interactions and control of behavior of virtual agents constituting the different conditions was handled by software and a graphical user interface developed in the context of the project OPTAPEB by VTplus, the Chair of Information science at the University of Regensburg and the Zentrum für Telemedizin (Bad Kissingen, Germany). Agent responses during the interactions were presented automatically using output from speech recognition software (Povey et al., 2011).

Measures
After every block of interactions and every presentation, participants rated their experience on the dimensions of valence and arousal, their sense of physical and social presence, and the realism and valence of agent behavior (see Figure 1). Ratings were given on a scale of 0-100 (0 representing very low or completely absent, 100 representing very high or completely), ratings of valences were coded on a scale of −100 (extremely unpleasant/extremely dismissive) to 100 (extremely pleasant/extremely supportive). The direction of valence (positive or negative) was determined by a prior question of "rather pleasant or unpleasant" or "rather supportive or dismissive" after which the corresponding numeric value was given on a scale of 0-100 and appropriately coded by the experimenter. Physical presence was phrased as "sense of being in VR" (similar to MPS question 4), social presence as "sense of interacting with interaction partners/listeners instead of with a computer simulation" (based on MPS question 10). Realism was phrased as "how realistic did you find the behavior" of agents in VR. Valence of agent behavior was phrased as "how would an objective observer probably rate the behavior" on a scale from Frontiers in Virtual Reality | www.frontiersin.org December 2021 | Volume 2 | Article 741138 dismissive to supportive, experienced valence was phrased as "how did you experience the situation" on a scale from unpleasant to pleasant. After every presentation, additional ratings of difficulty and the emotionality of the presentation topic were given on a scale of 0-100. Measures of physiology (ECG and EDA) and behavior (Gaze, Distances in VR, Voice) were recorded but are not reported here.

Procedure
Participants signed informed consent and filled out questionnaires for demographic information and trait measures. Subsequently, electrodes for psychophysiological measurements (ECG and EDA) were attached and participants entered the VR environment. The interaction scenario was then presented (constituting five blocks of four interactions each), followed by the presentation scenario (five presentations of 3 min length plus 1 min preparation).

Interaction Scenario
In VR, participants found themselves in a room with eight tables and eight distinct agents (four male, four female), one agent standing at each table. The agents and their position were constant over the whole scenario. The agents at the table appeared to be preoccupied by looking down at their phones. The experimenter instructed the participant to select two agents of the same sex in succession and approach them, greet them, ask for the time, and to say goodbye. Agents reacted to the approach by looking up at a remaining distance of 1.5 m, a value determined as optimal for such a setting by a previous study (Kroczek et al., 2020). The subsequent behavior of the agent reflected one of the five behavior variants, expressed as facial expression, posture, verbal response and tone of voice. The behavior variants ranged from orienting fully towards the participant after approach, showing lots of eye contact and a clear smile FIGURE 1 | Procedure. In Block 1 of both scenarios, all agents acted neutrally. In blocks 2-5, agents acted according to the behavior variants (supportive, interested, uninterested, dismissive) in a counterbalanced fashion. All ratings were submitted on a scale of 0-100, except valences (−100 to 100). * For presentations, two additional ratings were submitted, difficulty and emotionality of presentation topic.
Frontiers in Virtual Reality | www.frontiersin.org December 2021 | Volume 2 | Article 741138 expression, statements and a tone of voice expressing enjoyment of the conversation (supportive) to merely turning the head initially, little eye contact, facial expressions of slight anger and contempt and short statements in a tone of voice expressing annoyance (dismissive). Between these two poles lay the variants interested (less pronounced smile), neutral (an absence of facial expressions or tone of voice of a particular valence, a medium level of orientation towards the participant and eye contact) and uninterested (slightly weaker expressions of the dismissive behaviors). After each approach block (with two men and two women each, order counterbalanced) for a specific behavior variant, the screen went dark and participants were asked to rate their experience, presence, and impressions of agent behavior. Subsequently, participants continued from the previous location in the room. In the first of the five approach blocks, behavior of all agents was neutral. The order of the remaining behavior variants was counterbalanced across all participants. See Figure 1 for an overview of blocks and ratings during the scenario. After the last block and removal of VR equipment, participants rated their presence (MPS) and VR sickness symptoms (VRSQ) during the interaction scenario.

Presentation Scenario
In VR, participants found themselves in a room in front of a laptop. The topic of the next presentation was displayed on the laptop. The topics were "personal introduction" and opinion on the following topics relating to Germany: "stronger promotion of alternative energy sources," "stronger promotion of local public transport," "prohibition of plastic," and "autobahn speed limit." Introduction was always the first topic, the remaining topics were counterbalanced to be presented in the given order for one half of the participants and in reversed order for the other half. Participants were given 1 minute to prepare a 3 minute presentation on the instructed topic. At the end of the preparation period, they were guided into the presentation room by an agent. In case of technical issues they were teleported into the presentation room for all remaining presentations in order to prevent negative effects on presence.
The presentation room consisted of four rows of tables with four agents each. The audience consisted of 16 distinct agents (nine male, seven female), including the eight agents from the interaction scenario. At this point, and in the first 10 s of the presentation, the audience acted neutrally, i.e., no discernible facial expressions, and some idling behavior, like twitching or hair stroking. The topic was presented on a computer screen on a table positioned slightly to the right of the center position in front of the audience and on the wall behind the position of the presenter (like a video projection). The participant was instructed to begin their 3 min presentation on the given topic. After approximately 10 s, listeners gradually started displaying the facial expressions, postures and behaviors defined by the behavior variant for this presentation. For neutral, agents showed no specific facial expressions or movements indicating interest or lack of interest. For supportive, listeners attended to the participant with a friendly expression and nodded occasionally. The variant interested was similar, but the expressions were weaker and behaviors less frequent. For uninterested, the listeners displayed no specific facial expressions but showed behavioral expressions of boredom such as slouching and looking around the room, at their neighbors or at their phone. For the variant dismissive, listeners showed facial expressions of anger, contempt and disgust while looking at the participant, and behavioral expressions of discontent, such as shaking their head. After the presentation, the screen went dark and participants were asked to rate their experience, presence, impressions of agent behavior and difficulty and emotionality of the presentation topic. Subsequently, they were teleported back into the into the preparation room in front of the laptop with the topic of the next presentation. The first presentation always constituted neutral behavior, the order of the remaining behavior variants was counterbalanced across participants. See Figure 1 for an overview of presentations and ratings. After the last block and removal of VR equipment, participants rated their presence (MPS) and VR sickness symptoms (VRSQ) during the presentation scenario. After removing electrodes, participants were debriefed. The total duration of the study was approximately 2 hours.

Data Processing
Questionnaire and rating data entry and documentation was performed in SPSS (version 25, IBM Corp, 2017). Statistical analysis and report generation were performed using R (R Core Team, 2020) and knitr (Xie, 2015), graphics were created with ggplot2 (Wickham, 2016). For moderator analyses, the investigated moderators were dichotomized at the median of the respective rating over all conditions (including neutral) for this scenario (medians for interaction scenario are: agent realism 50, social presence 50, physical presence 60; medians for presentation scenario are: agent realism 60, social presence 60, physical presence 66).

Statistical Analysis
All described analyses were performed separately for interactions and presentations. All significance tests were conducted with α 0.05. All ratings are assumed to represent continuous data at an interval scale. Relationships between ratings over the four non-neutral conditions were investigated using repeated measures correlations (Rmcorr; Bakdash and Marusich, 2017). Rmcorr accounts for dependence among observations by using analysis of covariance (ANCOVA) to remove variance between participants with varying intercepts. Moderator effects, i.e., how the relationship between two variables is influenced by a third variable (moderator), were tested by analyzing the interaction of the two variables in a linear regression (Baron and Kenny, 1986). The examined models are outlined in Figure 2. To improve interpretability of the results, the investigated moderators were dichotomized using a median split for all ratings of all participants in the given category across the four non-neutral conditions used for analysis. To account for non-independent measurement due to repeated measurements, the interactions were analyzed in a multilevel model with fixed

Primary Hypotheses
During blocks of interactions with non-neutral behaviors of interaction partners, ratings of valence of agent behavior correlate with ratings of experienced valence, r(149) 0.82, p < 0.001, and experienced arousal, r(149) −0.21, p 0.011. Ratings of realism of agent behavior correlate with Social Presence, r(149) 0.26, p 0.001 (full correlation matrix for interactions is presented in Table 2). Concerning the hypothesized moderators (realism and social presence) between valence of agent behavior and experience (valence and arousal), realism is not a significant moderator of experienced valence, b 0.05, t(147) 0.59, p 0.556, or arousal, b 0.03, t(147) 1.17, p 0.246. Social presence is not a significant moderator for experienced valence, b 0.05,    Figure 3A (see Table 3 for full results of moderator analysis). During presentations with non-neutral audience, ratings of valence of agent behavior correlate with ratings of experienced valence, r(152) 0.59, p < 0.001, but not experienced arousal, r(152) −0.13, p 0.101. Ratings of realism of agent behavior correlate with social presence, r(152) 0.38, p < 0.001; full correlation matrix for presentations is presented in Table 4). Concerning the moderators between valence of agent behavior and experienced valence, realism is a significant moderator for experienced valence, b 0.21, t(150) 2.29, p 0.024, but not for arousal, b −0.04, t(150) −1.29, p 0.200. Social presence is a significant moderator, both for experienced valence, b 0.27, t(150) 3.09, p 0.002, and arousal, b -0.11, t(150) -3.92, p < 0.001. The moderator effect of social presence on experienced valence and arousal during presentations, i.e., the different slopes for high and low values of social presence, are illustrated in Figure 3B (see Table 3 for full results of moderator analysis).

Secondary Hypotheses -Evaluation of Scenarios
Ratings for valence of all agent behaviors differ significantly from the neutral condition in the expected directions, i.e., larger/more positive valence for supportive and interested and smaller/more negative valence for uninterested and dismissive, for both scenarios (all ps. <0.05). No difference to neutral condition in physical presence is significant in both scenarios. The corresponding plots are displayed in Figure 4 (interactions) and Figure 5 (presentations). Ratings for experienced valence differ from the neutral condition for all conditions in the expected directions (see above) except supportive and interested in the interaction scenario and all except dismissive in the presentation scenario. Ratings of arousal do not differ from the neutral condition in any condition in both scenarios. Realism of agent behavior and social presence was rated higher than neutral in both positive behavior conditions in the interaction scenario whereas in the presentation scenario no differences were found for any condition.

Exploratory Analysis
In some instances, the direction of valence of the agent behavior does not correspond to the direction of valence of the experience (i.e., recognizing a certain valence in the displayed behavior but experiencing the opposite valence; upper left and lower right quadrant in left plots of Figure 3A,B). These data points may represent errors in assessment or measurement or other factors influencing emotional experience, e.g., anxiety in the presentation scenario (despite supportive agent behavior) or amusement over dismissive agent behaviors in the interaction scenario. In the context of the current correlational study, these factors could be considered noise, i.e., qualitatively different influences affecting the quantification of the relationships in question and their hypothesized moderators. Therefore, to secure that such factors have not biased our results, we removed ratings of opposite polarity for agent behavior valence and experienced valence in a supplemental analysis (see Supplementary Figure S1 for remaining data points). Under these conditions, social presence is still a moderator for experienced valence and arousal in the presentation scenario (b 0.22, t(93) 4, p < 0.001 and b −0.12, t(93) −4.07, p < 0.001), but also a significant moderator for experienced valence in the interaction scenario, b 0.1, t(115) 2.25, p 0.026 (see Supplementary Table S1 for full results).   The present study investigated the interplay of virtual agent behavior, social presence, and experienced emotions. In two virtual scenarios, an interaction scenario and a presentation scenario, participants were confronted with agents displaying different levels of agent behavior, i.e., negative, neutral, or positive behavior.

Primary Hypotheses
Valence of agent behavior correlated with experienced valence for both virtual scenarios (hypothesis 1a fully confirmed) whereas it only correlated with experienced arousal during interactions, not presentations (hypothesis 1b partially confirmed). Realism of agent behavior correlated with social presence in both scenarios (hypothesis 2 fully confirmed). Realism of agent behavior was a significant moderator of the effect of agent behavior for experienced valence only in the presentation scenario (hypothesis 3a partially confirmed) but not for arousal in any scenario (hypothesis 3b not confirmed). Social presence was a significant moderator of the effect of agent behavior on experienced valence only during presentations, not interactions (hypothesis 4a partially confirmed) and on arousal only during presentations, not interactions (hypothesis 4b partially confirmed). In summary, we were able to demonstrate the feasibility of investigating the moderating role of constructs like realism and social presence between agent behavior and evoked emotional experience but could not demonstrate moderator effects reliably in both scenarios. The analysis of moderating effects on arousal is complicated by the fact that data points show a u-shaped pattern that could potentially be modeled using a quadratic term. But if a difference based on a potential moderator is present mostly for negatively valenced agent behavior (leading to high arousal) whereas no systematic differences are present for positively valenced agent behavior (but a large variation in whether this constitutes low or high arousal is present), then a linear relationship seems more suitable to capture this effect.

Secondary Hypotheses -Evaluation of Scenarios
The representation of different kinds of behaviors was successful for both scenarios. For valence of agent behavior, all conditions for interactions and presentations differed from neutral and for experienced valence, only the interested condition in the interaction scenario and dismissive condition in the presentation scenario showed no difference to neutral. For the interaction scenario, the similarity between positive conditions and neutral is congruent with feedback from speakers who recorded the statements that reported trouble in creating two different kinds of positive sentiments and a difference to "neutral" in the short statements comprising the interactions. More importantly, given the context of social anxiety, the negative conditions in the interaction scenario were successful in evoking a negative experience, one in evoking increased arousal. For the presentations, the uninterested condition was successful in evoking the targeted experienced valence. The finding that for dismissive, valence of agent behavior clearly differed from neutral but experienced valence did not, could be an indication that displaying dismissive behaviors in an audience not necessarily translates to negative experiences.
Unexpectedly, for presentations, an effect of listening audience behaviors on arousal was practically absent in the given sample and overshadowed by the sequence effect (neutral condition first) and effects of the presentation topic (as indicated by the Rmcorrs between arousal and presentation topic in Table 4). It is possible that a sample with more pronounced social anxiety or SAD patients would show an effect on arousal. It is likewise conceivable that arousal was higher as a function of related traits than situational factors like audience behavior (a position taken by e.g., Ayres, 1990). Contrary to this position, Hsu (2009) showed that, in presentations with real audiences trained to give positive or negative feedback, audience behavior did have an substantial impact on state anxiety. But the necessary levels of realism of agent behavior and/or physical and social presence when depicting these behaviors in VR to achieve a similar effect are unknown.
For physical presence, no differences were apparent between any conditions in both scenarios. For social presence during interactions, the conditions supportive and interested differed from neutral. This potentially reflects that realism was rated higher for these conditions (than any other). It is unclear whether this reflects differences in the quality of the implementation between conditions or a genuine effectpotentially related to social expectations of polite responses. Such a finding, if replicated, would make it necessary to "package" negative behaviors of virtual agents as more realistic, potentially introducing it gradually in longer interactions.

Interpretation
The current study can be interpreted as preliminary evidence that social presence plays an important role in how evaluative behaviors of agents in VR are experienced. In research on the conditions that enable the processing of social stimuli in VR in a similar fashion to outside VR, Strojny et al. (2020) found that copresence and realism were moderators on social facilitation effects i.e., the effect of virtual agents being present (analogous to humans outside VR being present) was dependent on the agents' perceived social realism. Despite the differences in context and methodology, the similarity in findings and our finding of a more reliable effect for social presence further pronounces the importance of focusing on social presence when intending to increase effects of social settings in VR.

Limitations and Strengths
A clear limitation of the current study is its correlational nature without systematic variation of realism of agent behavior or factors influencing social presence. Therefore, no causal inferences can be made. On the other hand, the study allowed us to investigate the effects of naturally occurring differences in perception and experience of VR which has its own merit. The fact that removing data points with opposing valence leaves the moderating effects in the presentation scenario intact and introduces a moderating effect for experienced valence in the interaction scenario suggests that an experimental study, with less noise, could show more reliable moderator effects in multiple settings.

Implications
It is conceivable that social presence is one of the key mechanisms that underlies the effectiveness of VRET for social anxiety. Understanding its role in "translating" the social events in VR into an emotional experience (and potential peculiarities of SAD patients in this regard) could enable more precise measurements and early estimates of probable effectiveness of scenarios when used with patients. Future studies could systematically vary realism by counterbalancing two pairs of a positive and a negative condition (attempting high similarity in effects among the conditions of the same valence) and variations of realism (e.g., longer delay until agents utter statements during the interactions) or social presence (e.g., robotic vs. natural sounding voice). A similar analysis strategy could be used, i.e., different slopes between perception of agent behavior and its effect on experience for different degrees of realism, potentially investigating specific characteristics of high socially anxious participants or SAD patients. An important question is whether the assumption that social presence is non-recoverable (Slater, 2009) is correct-even in SAD patients-and if so, how generalized this phenomenon is, i.e., does a breakdown of social presence negatively influence the complete VR session, just the current scenario or the current agent. Another open question is the nature of the moderation of social presence. It is conceivable that non-linear effects could be necessary to adequately model the moderation, maybe reflecting the notion of presence as "precondition" for the experience of emotion as a "jump" around a certain level of social presence. From an applied standpoint, finding the threshold of social presence necessary to evoke adequate responses in SAD patients could be a main goal in the development and refinement of social scenarios for VRET.
In conclusion, our findings underscore the importance of the construct social presence in evoking emotional responses to specific agent behaviors in virtual social interactions. Replication of the results is called for, especially in experimental designs using systematic variations of realism and other variables relevant to social presence and in clinically anxious samples, as well as investigations of the nature of the moderating relationship of social presence.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of the University of Regensburg. The participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
MP, LK, and AM designed research. BL and MM created the virtual environment. RF programmed scripts for automatized presentation. MM supervised virtual environment creation and experiment paradigm programming. MP supervised data acquisition and analyzed data. MP, LK, and AM wrote the paper. BL, RF, and MM commented on the paper.

FUNDING
This study was supported by the project "OPTAPEB" (FKZ: 16SV7839K) funded by the German Federal Ministry of Education and (Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie).