The Influence of Embodiment as a Cartoon Character on Public Speaking Anxiety

Bellido Rivas, Anna I.; Navarro, Xavi; Banakou, Domna; Oliva, Ramon; Orvalho, Veronica; Slater, Mel

doi:10.3389/frvir.2021.695673

ORIGINAL RESEARCH article

Front. Virtual Real., 22 October 2021

Sec. Virtual Reality and Human Behaviour

Volume 2 - 2021 | https://doi.org/10.3389/frvir.2021.695673

The Influence of Embodiment as a Cartoon Character on Public Speaking Anxiety

Anna I. Bellido Rivas¹^†

Xavi Navarro¹^‡

Domna Banakou^1,2

Ramon Oliva¹

Veronica Orvalho³

Mel Slater^1,2*

¹Event Lab, Faculty of Psychology, University of Barcelona, Barcelona, Spain
²Institute of Neurosciences of the University of Barcelona, Barcelona, Spain
³Universidade do Porto, Instituto de Telecominicações, Porto, Portugal

Virtual Reality can be used to embody people in different types of body—so that when they look towards themselves or in a mirror they will see a life-sized virtual body instead of their own, and that moves with their own movements. This will typically give rise to the illusion of body ownership over the virtual body. Previous research has focused on embodiment in humanoid bodies, albeit with various distortions such as an extra limb or asymmetry, or with a body of a different race or gender. Here we show that body ownership also occurs over a virtual body that looks like a cartoon rabbit, at the same level as embodiment as a human. Furthermore, we explore the impact of embodiment on performance as a public speaker in front of a small audience. Forty five participants were recruited who had public speaking anxiety. They were randomly partitioned into three groups of 15, embodied as a Human, as the Cartoon rabbit, or from third person perspective (3PP) with respect to the rabbit. In each condition they gave two talks to a small audience of the same type as their virtual body. Several days later, as a test condition, they returned to give a talk to an audience of human characters embodied as a human. Overall, anxiety reduced the most in the Human condition, the least in the Cartoon condition, and there was no change in the 3PP condition, taking into account existing levels of trait anxiety. We show that embodiment in a cartoon character leads to high levels of body ownership from the first person perspective and synchronous real and virtual body movements. We also show that the embodiment influences outcomes on the public speaking task.

Introduction

When you put on a head-tracked stereo head-mounted display and you look down towards yourself, if it has been so programmed you will see a life-sized virtual body substituting your real unseen body. Your body movements can be tracked in real-time and mapped to the movements of the virtual body so that as you move and look down towards yourself you will see the virtual body move correspondingly and in synchrony with your movements. A mirror can be programmed so that looking into it you will see a reflection of your virtual body that would move synchronously and in correspondence with your real body movements. In our whole lives whenever we have looked down towards ourselves we have seen our own body, similarly in mirror reflections and similarly when we move our limbs it is our own limbs that we see moving correspondingly and synchronously. It is no surprise therefore that in such a setup in virtual reality (VR) people typically have the strong perceptual illusion that the virtual body that they see is their own body, even though they know for sure that this is not the case. This is referred to as a body ownership illusion, a concept inspired originally by the rubber hand illusion (RHI), where participants can feel a rubber hand as their own when it is seen to be touched, with touch that is felt synchronously on the corresponding real out-of-sight hand (Botvinick and Cohen, 1998). It is an example of an illusion resulting from multisensory stimulation (first person perspective over the hand, synchronous vision and touch) that provides evidence to the brain that the rubber hand is part of the body. In our opening example we refer to synchrony between proprioception and vision as well as first person perspective over the body. Full body ownership in VR is discussed extensively in (Kilteni et al., 2012a) and body consciousness more generally in (Ehrsson and Stein, 2012; Blanke et al., 2015).

Certain conditions must be satisfied for the RHI to occur. For example, the rubber hand should be in an anatomically plausible position in relation to the real body (Ehrsson et al., 2004) and using a VR version of the illusion it has been shown that there should be continuity between the virtual hand and the rest of the virtual body (Perez-Marcos et al., 2011; Tieri et al., 2015). However, with respect to the virtual hand illusion there is inconsistent evidence regarding ownership of non-hand objects—for example (Yuan and Steed, 2010) found stronger ownership over a hand than over an arrow, six different hand representations were compared in (Lin and Jörg, 2016) with wide variation in ownership though with strongest overall level corresponding to the most realistic hand. It was shown in (Guterstam et al., 2013) that with appropriate multisensory stimulation that there could even be an illusion of ownership over empty space. Moreover major distortions can occur with ownership preserved: having a third arm (Guterstam et al., 2011), an extra finger (Hoyet et al., 2016), one very long arm (Kilteni et al., 2012b), a body with a tail (Steptoe et al., 2013), and non-human bodies that can be moved by the self in unusual ways—e.g., moving a leg by arm movements (Won et al., 2015a; Won et al., 2015b). With respect to the full body ownership illusion in VR again there is remarkable plasticity—adult men embodied successfully as a young girl (Slater et al., 2010), adults in small or very large bodies (van der Hoort et al., 2011), or as children (Banakou et al., 2013; Tajadura-Jiménez et al., 2017), in bodies of a different race (Peck et al., 2013; Banakou et al., 2016), or age (Banakou et al., 2018; Slater et al., 2019), or alien bodies (Barberia et al., 2018).

The question that we address is whether body ownership is afforded through appropriate multisensory integration providing evidence to the brain that the virtual body is the person’s own body, thus leading to the illusion of body ownership, or whether appearance of the body has a fundamental role. Therefore, here our first goal was to test whether embodiment in a virtual body that is deliberately designed to look like a cartoon character can also result in the body ownership illusion. Our second goal was to exploit this representation to examine whether it would have an impact over public speaking anxiety. It is known that embodiment in different types of bodies has an impact on attitudes and behaviour, for example people in a body taller than their own will be more confident in negotiations (Yee and Bailenson, 2007), or being embodied as Einstein leads to better cognitive test performance compared to being embodied in another body (Banakou et al., 2018), and there are several studies that show that embodiment of Caucasian people in a dark skinned virtual body decreases their implicit racial bias—summarized in (Maister et al., 2015), with a mechanism presented in (Bedder et al., 2019). These are all examples of what was termed by Yee and Bailenson (Yee and Bailenson, 2007) as the ‘Proteus Effect’.

It has long been known that people with public speaking anxiety exhibit this also talking with entirely virtual audiences (Pertaub et al., 2002; Aymerich-Franch et al., 2014), and VR has been used for psychological therapy to overcome this aspect of social phobia, for example (Vanni et al., 2013). Our idea here, however, was that if the speaker with public speaking anxiety is embodied as a cartoon character, and the audience itself is a deliberate cartoon audience, then possibly the humour of the situation or the likelihood that the cartoon audience would not be seen as having expertise in any particular topic, would lead to a reduction of anxiety that would carry over to a later exposure of speaking to a virtual audience representing people rather than cartoons. Factors such as the size of the audience and their expertise level have been shown to influence anxiety in a public speaking task (Jackson and Latané, 1981; Ayres, 1990). The authors in these real-life studies found that the larger the audience and the more expert they were, the higher the anxiety level. Hence, we can infer that a positive audience consisting of a reduced number of non-experts would be an easier context for people with public speaking anxiety to deliver a speech. Immersion in such an environment may allow them to establish new positive associations with the feared speaking task, which may give rise to a progressive systematic desensitization, session after session.

Methods

Overview

In order to examine these ideas we carried out a between-groups experiment with 45 participants with three conditions. Each participant visited the virtual reality lab on two occasions separated by mean 5.3 ± 2.3 (S.D.) days. On the first visit they gave a speech embodied either as a cartoon character from first person perspective (1PP) speaking to a cartoon audience, or as a human from 1PP speaking to a human audience, or as a cartoon character speaking to a cartoon audience from a third person perspective (3PP). Then in the same session they gave another talk under the same condition. On the second visit, some days later, they gave a third speech, but this time embodied from 1PP as a human speaking to a human audience. This last exposure was considered as a test of the outcome of the first exposures. Our two questions were 1) whether the level of body ownership would differ between the three conditions and 2) whether embodiment as the cartoon character would lead to less anxiety for the public speaking in front of humans at the second visit.

Ethics

This experiment was approved by the Comisión de Bioética de la Universitat de Barcelona (IRB00003099). Participants gave written and informed consent.

Recruitment

Participants were recruited from the Mundet campus of the University of Barcelona and were independent from our own research group. A previous virtual reality study found a greater level of fear of public speaking for women compared to men (Pertaub et al., 2002) and a large sample study amongst college students found the same (Ferreira Marinho et al., 2017). Since our goal was to recruit participants with relatively high levels of public speaking anxiety the most convenient was to recruit women. The inclusion criteria was participants who scored at least 18 on the Personal Report of Confidence as a Speaker (PRCS) (Paul, 1966; Gallego et al., 2009). This is a set of 30 questions with yes/no answers and a maximum score of 30 indicating a high degree of anxiety. The mean ± SD score was 22.4 ± 2.86 with scores ranging from 18 to 28. Participants had to be at least 18 years, and the mean ± SD age of participants was 24.5 ± 9.31. A further exclusion criteria was obtained using the LSB-50 questionnaire that was used to screen out participants with potentially serious psychological disorders (Abuín and Rivera, 2014). Further details of the sample are given in Supplementary Table S1.

Experimental Design

This was a between groups experiment with three groups: Cartoon, Human and 3PP. In the Cartoon condition participants were embodied as a cartoon character and spoke to an audience of cartoon characters. Embodiment was from 1PP and visuomotor synchrony so that the virtual body moved in synchrony with real body movements. In the Human condition the participant was embodied in a female virtual body with visuomotor synchrony. In the 3PP condition the participant saw the cartoon virtual body from 3PP and it did not move with their body movements. However, they still had full control of the head and visual updates to the images in the head-mounted display were based on their own head movements. However, the displayed cartoon body did not show the participant’s head movements. The virtual audience also consisted of cartoon characters (Figure 1). We maintained the audience as the same type as the embodied character in order to avoid effects solely caused by difference between these two. Each condition was assigned 15 participants selected by a pseudo random number generator. The experiment is illustrated in Supplementary Video S1.

FIGURE 1

FIGURE 1. The scenario (A) The Cartoon condition. (B) The Human condition. The 3PP condition looked the same as the Cartoon except that the participant was not embodied in the bunny rabbit.

Implementation

Participants used a stereo NVIS nVisor SX111 head-mounted display. This has dual SXGA displays with 76H × 64V (degrees) field of view (FOV) per eye, with a wide field-of-view 111° horizontal with 50 (66%) overlap and 64 vertical, with a resolution of 1280 × 1024 pixels per eye displayed at 60 Hz. Head tracking was performed by a 6-DOF Intersense IS-900 device. Participants wore an OptiTrack full body motion capture suit that uses 37 markers and the corresponding software (https://optitrack.com/software/motive/) to track their movements. This used a 12-camera truss setup by OptiTrack. Participants were assisted to don and calibrate the head-mounted display (HMD) following the method described in (Jones et al., 2008).

The virtual room in which the speech took place was the same for all conditions. It was designed to be neutral, and it included a wooden platform on which participants virtually stood while delivering a speech. A virtual mirror was located on the left of the participant, which helped her inspect the body assigned. The mirror was carefully located so that it was in full view of the participant throughout the speech while she was looking at the audience. A virtual clock was added to the opposite wall of the room in order to help the participant keep track of the time.

The avatars generated for the Cartoon and 3PP conditions were cartoon-like, not culturally offensive, anthropomorphic figures of animals to make them look friendly and humorous, and were rigged so that they could be animated. The human avatars used in the Human condition were formed of male and female avatars from a RocketBox collection (Gonzalez-Franco et al., 2020). Both human and cartoon audiences were located in the same places in the virtual room. All the animations generated were for one audience and retargeted to the other so that the audience behaviors were identical.

Assessing Anxiety

Public speaking anxiety was measured using the State-Trait Anxiety Inventory (STAI) (Spielberger, 1983; Spielberger, 2010), a commonly used measure to diagnose anxiety and to distinguish it from depressive syndromes. The STAI measures two types of anxiety, state anxiety, or anxiety about an event, and trait anxiety, or anxiety level as a personal characteristic. Form Y is its most popular version and includes 20 items for assessing trait anxiety and 20 items for state anxiety, rated on a 4-point scale from “Almost Never” to “Almost Always”. Scores range from 20 to 80, where 20 indicates absence of anxiety and 80 its maximum value. The STAI is translated into Spanish and validated (Seisdedos, 1988). It has good test-retest reliability (Cronbach alpha of 0.90 for the state scale and 0.84 for the trait scale). Examples of State questions include: “I am tense; I am worried” and “I feel calm; I feel secure.” Trait questions include: “I worry too much over something that really doesn’t matter” and “I am content; I am a steady person.” The STAI Trait was used as a background variable for the participant’s general self-assessed anxiety, since how people might respond to a particular incident would be influenced by their general predisposition to anxiety, so that this is a critical covariate. The STAI State was used to assess the participant’s state before and after each talk.

Procedures

The experiment was carried out in three phases: a pre-experimental phase and two experimental sessions. The first phase was used to recruit only those participants who had sufficient level of fear of speaking in public using the PRCS as described above. A day was then arranged to hold the first session, and the participants were asked to choose two topics they could talk about for 5 minutes. They had two exposures in their assigned condition.

At the first session participants were given an information sheet to read, a consent form to sign, and if they agreed to participate in the study, they were asked to complete the LSB-50 and the STAI-Trait questionnaires. They were then assigned to one of the three conditions (Cartoon, Human or 3PP) following a pseudo-random method that guaranteed the same number of participants per condition. Prior to and after each VR exposure the participant was asked to complete the STAI-State.

The sequence of events started with 1 min 40 s of audio instructions the participants had to follow while looking at a virtual mirror in order to get them to move their head, arms and legs. This also allowed them to become acquainted with the virtual environment and their virtual body (or their relationship to it in the 3PP condition) in order to provide time for the body ownership illusion to be induced (or not). After the audio instructions, the participant was asked to move freely (although within the tracking area) for 1 min and 20 s and wait for a brief 3 s clap of the audience, which was the sign of the beginning of the talk. After 5 min, the audience applauded resoundingly indicating the end of the speech. The virtual environment slowly faded out and the experimenter helped the participant take off the HMD. Finally, she filled in the post-experiment questionnaires and a brief informal interview on their experience followed. Participants went through the virtual reality experiment twice (first and second session) with a 15 min break in between. After the end of the second session, a day for the third session was arranged. It had to be not sooner than 2 days nor later than a week, and they were asked to think of another topic to talk about. The participant was paid 5€ and left. In the third session, participants had to perform only one talk (third talk) always in the Human condition, so they went through the experimental procedure only once, which was identical to that of the first two sessions. After completion of the third talk, the participant was paid 15€ and debriefed.

Response Variables

Body Ownership

Body ownership was assessed using the questionnaire shown in Table 1 administered immediately after each VR exposure. The first three questions assess body ownership itself. The twobodies question is a control question—since if there is strong body ownership we would expect participants to report the feeling of having one body (the virtual) rather than two. The last question is a test of the extent to which the tracking system and mapping real movements to the movements of the virtual bodies was successful. If the variable $x$ refers to any of these questions then $x 1$ , $x 2$ and $x 3$ refer to the responses after exposure 1, 2 and 3 respectively.

TABLE 1

TABLE 1. Subjective evaluation of the body ownership illusion. The questionnaire was answered after completing each talk. Answers were rated on a 7-point Likert scale, where 1 was “Not at all” and 7 was “Completely”.

State Anxiety

We refer to the STAI State questionnaire prior to an exposure as $s t a i s t a t e p r e$ , and $s t a i s t a t e p o s t$ after the exposure. Then $s t a i s t a t e p r e 1$ , $s t a i s t a t e p r e 2$ and $s t a i s t a t e p r e 3$ refer to the states prior to the first, second and third exposures respectively. Similarly for $s t a i s t a t e p o s t$ . The response variable of interest is:

d s t a i = s t a i s t a t e p o s t 3 - (\frac{s t a i s t a t e p r e 1 + s t a i s t a t e p r e 2}{2}) (1)

This is the difference between STAI state after the final exposure to the Human condition, and the mean of the STAI states prior to the first two exposures. We consider the mean of the first two exposures since the first alone may induce anxiety simply due to a new and unknown forthcoming event. By the second time participants would know what to expect, and therefore be less anxious. So the first may overestimate anxiety and the second underestimate it, so taking the mean of the two is a balance. However, we have also carried out the analysis using instead $d s t a i 1 = s t a i s t a t e p o s t 3 - s t a i s t a t e p r e 1$ and also $d s t a i 2 = s t a i s t a t e p o s t 3 - s t a i s t a t e p r e 2$ , discussed in Results.

The STAI Trait, assessed in the pre-experimental meeting was used as a covariate since participants may respond differently depending on their underlying normal level of anxiety.

The anxiety variables are summarized in Table 2.

TABLE 2

TABLE 2. The anxiety scores. In general “trait anxiety” refers to a stable attribute of personality, whereas “state anxiety” refers to anxiety with respect to a particular situation or event. The STAI refers to the State-Trait Anxiety Inventory questionnaire (Spielberger, 1983).

Results

In this section we will first present descriptive results for body ownership and anxiety, and then a statistical analysis for all the results together.

Body Ownership

Figures 2A,B shows the box plots for the scores on the questions of Table 1 for the three exposures. For exposures 1 and 2 where participants were embodied as Cartoon, Human or the 3PP, it is clear that the scores on the three embodiment questions are very high, and much higher than the scores on the control question twobodies for the Cartoon and Human conditions, and the scores are always low for the 3PP condition. In the third exposure (Figure 2C) all were embodied as Human (the conditions refer to how they had been embodied in the first two exposures) and all body ownership scores are high, and again much greater than the control question. In all conditions and exposures except for 3PP the mymovements scores are very high, indicating that the tracking system and mapping of real movements to movements of the virtual body worked well.

FIGURE 2

FIGURE 2. Scores on the ownership questions from Table 1. (A) Box plot for exposure 1, session 1. (B) Box plot for exposure 2, session 1. (C) Box plot for the exposure embodied as Human for session 2, but where the conditions refer to those of session 1. (D) Bar charts showing means and standard errors of the factor scores from the principal component factor analysis of the ownership scores of the first two exposures only.

The critical embodiments were those of exposures 1 and 2, since the goal was to understand how experiencing the public speaking in the Human or Cartoon conditions would influence anxiety in the final test in the Human condition (exposure 3). We carried out a principal components factor analysis with varimax rotation on the scores mybody, medown, memirror and twobodies for exposures 1 and 2 (i.e., eight variables). This was with the Stata program 16.1 (https://www.stata.com/) using the “factor” command. Two factors were retained, the first accounting for 68% of the variance and the second for 23% of the variance, thus cumulatively 91%. Then regression scores were obtained for each of the two factors resulting in two uncorrelated variables with the scoring coefficients shown in Table 3. The first factor is proportional to the mean of all the scores apart from twobodies, and the second factor is proportional to the mean of the twobodies scores. Hence the factor structure is consistent with the meaning of the questionnaire. The interest is only on the first factor, which measures the overall level of ownership in the first two exposures, and we refer to this factor as own, which we will use in subsequent analysis. The means and standard errors are shown in Figure 2D, demonstrating no difference between the Cartoon and Human conditions, which are both much greater than the 3PP condition.

TABLE 3

TABLE 3. Scoring coefficients for the principal components factor analysis of the questionnaire scores of exposures 1 and 2. (Method = regression based on varimax rotated factors).

Anxiety

Figures 3A–C shows the scatter diagrams of dstai (Eq. 1) by the covariate staitrait, the trait anxiety measured some days prior to the first exposure. The results suggest that dstai is positively associated with staitrait in the Cartoon condition, negatively in the Human condition, and there seems to be no association in the 3PP condition. Figure 3D shows the means and standard errors of dstai by the conditions without taking into account background anxiety, suggesting the decrease in anxiety is greater for the Cartoon and 3PP conditions. The means and standard errors are also shown in Supplementary Table S2. However, these do not take into account the predisposition towards anxiety as measured by staitrait.

FIGURE 3

FIGURE 3. Plots of dstai (Eq. 1)—the difference between the anxiety score after exposure 3 compared to the mean anxiety score prior to exposures 1 and 2 by staitrait. (A–C)—scatter diagrams by condition. (D) Bar chart of dstai showing the means and standard errors by condition.

Statistical Analysis

Bayesian statistical methods have been increasingly employed over recent years including in psychology (Kruschke, 2011; Van De Schoot et al., 2017). In classical (frequentist) statistics, in order to consider whether a parameter value is in a certain range (for example, the mean of a population being positive compared with being zero) we compute the probability that the particular observed data would have been generated on the assumption that the parameter value were 0, referred to as the significance level. If this probability is small (typically <0.05) then we reject the hypothesis that the parameter value is 0. In classical statistics the probability of an event is based exclusively on its long run frequency of occurrence in a large number of independent trials. Hence this method essentially compares the observed data with what might have been observed in a large number of independent repetitions of the experiment. In Bayesian statistics in contrast we start with a probability distribution for the parameter based on prior knowledge (or a distribution with large variance in the absence of prior knowledge) and then we can compute a posterior distribution conditional on the observed data, so that the data updates our prior. From this we can compute probabilities of the parameter value being in any range of interest. Moreover, if there are multiple parameters the posterior distribution will be the joint distribution of all the parameters, and we can make as many probability statements as we like over several parameters. In classical statistics when we carry out more than one significance test then the significance levels are no longer valid and we have to resort to ad hoc corrections such as Bonferroni. In classical statistics confidence intervals are mathematically equivalent to significance tests, and a 95% confidence interval cannot be interpreted as a probability of 0.95 of a parameter being between the computed limits. In Bayesian statistics a 95% credible interval is a range of values where the actual probability of a parameter value being within that range is 0.95. What is particularly informative is to compare the credible interval based on the prior distribution of the parameter and the credible interval calculated from the posterior distribution. This is a very useful way to understand how the data has updated the credible interval.

A Bayesian analysis was carried out that includes both response variables (dstai and own) simultaneously. The method is equivalent to an analysis of variance model with a covariate in the case of dstai, and a simpler model without a covariate in the case of own. The mathematical formulation is identical to ANOVA except that the parameters have prior distributions.

Let $d s t a i_{i j}, i = 1, \dots, 15; j = 1,2,3$ be the dstai value for the ith participant (i = 1,2,…,15) in the jth condition (1 = Cartoon, 2 = Human, 3 = 3PP). Similarly for $o w n_{i j}$ . Let the corresponding means be $μ_{d s t a i, i j}$ for dstai, and $μ_{o w n, i j}$ for ownership. Then the model for dstai is as follows:

\begin{array}{l} μ_{d s t a i, i j} = μ_{d s t a i} + α_{d s t a i, j} + β_{d s t a i} \cdot s t a i t r a i t_{i j} + γ_{d s t a i, j} s t a i t r a i t_{i j} \sum_{j = 1}^{3} α_{d s t a i, j} = 0, \sum_{j = 1}^{3} γ_{d s t a i, j} = 0 d s t a i_{i j} \sim n o r m a l (μ_{d s t a i, i j}, σ_{d s t a i}) \end{array} (2)

The parameter $μ_{d s t a i}$ is the general mean. $α_{d s t a i, j}$ is the effect of the jth condition (j = Cartoon, Human, 3PP), $β_{d s t a i}$ is the coefficient of the covariate staitrait irrespective of condition, and $γ_{d s t a i, j}$ allows the slope of the relationship between dstai and the covariate to be different depending on condition. For ease of comparison between the conditions we adopt a centred parameterisation where the parameter values are constrained to sum to 0. $σ_{d s t a i}$ is the standard deviation.

The prior distributions of the parameters are chosen as weakly informative—e.g. (Lemoine, 2019), i.e., assuming very little prior information. Weakly informative priors are proper probability distributions, but with wide variance. Specifically $σ_{d s t a i}$ ∼ Gamma(shape = 2, rate = 0.1). This has a prior 95% credible interval of 2.4 to 55.7. All the other parameters have prior distribution normal(0,20) which leads to 95% credible intervals of -40 to 40, except that due to the sum to zero constraints $α_{d s t a i, 3}$ and $γ_{d s t a i, 3}$ will have normal(0, 28.3) distributions with prior credible intervals -55 to 55. However, the choice of condition 3 for this is arbitrary, and any of the other 2 conditions could have been chosen to have this wider prior distribution without affecting the results.

For own the model is similar but simpler since there is no covariate:

μ_{o w n, i j} = μ_{o w n} + α_{o w n, i, j}

\begin{array}{l} \sum_{j = 1}^{3} α_{o w n, j} = 0 o w n_{i j} \sim n o r m a l (μ_{o w n, i j}, σ_{o w n}) \end{array} (3)

with the same prior distributions for the parameters.

The model was implemented using the Stan probabilistic programming language (Stan Development Team, 2011-2019; Carpenter et al., 2017) (https://mc-stan.org/) through the RStudio interface (https://www.rstudio.com/). The execution used 2000 iterations on four chains. All Rhat = 1 indicating that the four chains converged and successfully mixed. Use of the ‘leave-one-out’ method (Vehtari et al., 2017), equivalent to repeated fits to the data with one observation left out each time, similarly indicated no problem with convergence or outliers.

Table 4 shows the summaries of the posterior distributions of the parameters. Notice that the posterior 95% credible intervals are narrow compared to the prior intervals. For example, for $α_{o w n, 3}$ the prior 95% credible interval was $\pm 55$ whereas the posterior is −1.38 to −0.85. The means of the distributions can be considered as effect sizes. For example, the mean of the posterior distribution of $α_{o w n, 3}$ is −1.11. The interpretation is that the 3PP condition is associated on the average with a decrease of 1.11 in the ownership response variable, other things being equal. Notice similarly that the prior 95% credible interval for the standard deviations of the model were 2.4 to 55.7, whereas the posteriors are 0.52 to 0.81 in the case of own, and 6.19 to 9.68 in the case of dstai.

TABLE 4

TABLE 4. Summaries of the posterior distributions of the parameters showing the distribution means, standard deviations, 95% credible intervals. Prob >0 is the posterior probability that the parameter is positive.

From the first block of Table 4 the posterior probabilities of the parameters of Cartoon $(α_{o w n, 1})$ and Human conditions $(α_{o w n, 2})$ being positive and the 3PP condition $(α_{o w n, 3})$ negative are 1. Hence the evidence is overwhelming that the Cartoon and Human condition had the highest levels of body ownership, and the 3PP condition the lowest.

In the case of dstai the interaction terms are important. Notice how the mean (CI: credible interval) for Cartoon × staitrait is 0.74 (CI: 0.20 to 1.27) whereas for Human it is -0.49 (CI: −0.94 to −0.01). Hence, for those in the Cartoon condition the greater the staitrait the greater the dstai (prob = 0.994) so that the state variable is proportional to the trait. However, for those in the Human condition the relationship is reversed—the greater the trait the lower the value of dstai. The distribution of the coefficient has mean −0.49 with credible interval −0.94 to −0.01, and the probability of it being positive is 0.024 (so it has prob = 1—0.024 = 0.976 of being negative). For those in the 3PP condition there is a moderate probability of there being a small negative association between state and trait (prob = 1—0.156 = 0.844). Hence, overall, and with high posterior probability, for those in the Cartoon condition dstai is positively correlated with trait, for those in the Human condition dstai is negatively correlated with trait. The correlation between dstai and trait is possibly negative for the 3PP condition. These results are in accord with Figures 3A–C.

The equivalent to Table 4 for the alternative response variables dstai1 and dstai2 where the staitrait in the third (human) exposure is compared to staitrait in the first or second exposure, is given in Supplementary Table S3.

The mean staitrait is 21.4 ± 7.0 (S.D.) and the median is 21. In addition to examining the relationship between the change in STAI state (dstai) and this covariate, we can consider what happens at its mean. Figure 4 shows the posterior distributions for the predicted dstai for each of the Cartoon, Human and 3PP conditions. It can be seen that the distributions reflect Figure 3D. From these distributions we can compute the posterior probabilities of, for example, dstai < −10, and dstai < −5, and the two corresponding vertical lines are shown in Figure 4, and the probabilities we require are the areas to the left of those lines under the curves.

FIGURE 4

FIGURE 4. Posterior distributions of dstai at the mean level of staitrait for the Cartoon, Human and 3PP conditions.

The probabilities are shown in Table 5. A decrease of five in dstai has probability almost double for the Cartoon condition compared to the Human, and more than double in the case of the 3PP condition. For a decrease of 10 the Cartoon condition has a probability of 10 times the Human condition, and the 3PP condition more than 30 times greater. Hence although considered overall the 3PP condition dstai does not change much with staitrait and the Cartoon condition is proportional to it, the model predicts that for a participant with the average trait anxiety the 3PP condition appears to be the one that reduces anxiety the most.

TABLE 5

TABLE 5. Posterior probabilities of the change in dstai being less than −5 or −10 at the mean level of staitrait.

Goodness of Fit of the Model

Using the Stan program 4000 pseudo random observations were generated from the model, leading to posterior predicted distributions of the two response variables for each individual. We take the mean of each of these distributions per individual as a point estimate for the predicted value so that for each individual we obtain predicted values of the two response variables. The correlation between the observed and predicted values of own is r = 0.79, with 95% confidence interval 0.66 to 0.88. For dstai the correlation is r = 0.50, with 95% confidence interval 0.24 to 0.69. We quote confidence intervals here not for formal significance, but only to show the strength of the relationships. Hence, overall we conclude that the model fit to the data is acceptable.

Discussion

There are two findings of this study. The first is that the level of body ownership did not differ between embodiment in a cartoon character or as a human, and that the level of body ownership was high and comparable with previous results. In contrast the level of body ownership was lower for the 3PP condition. The second is that contrary to our original idea embodiment as the cartoon character, in the more humorous situation, did not result in a reduction of anxiety in relation to the background trait anxiety, but the change in state anxiety was proportional to the level of trait anxiety. However, in the case of human embodiment and audience the change in state anxiety was inversely related to trait anxiety. There was little or no effect of the 3PP condition, which means that irrespective of trait the change in state anxiety was essentially constant and small, with some evidence of a small decline. Further, a prediction of the model is that for the average level of trait anxiety the 3PP condition is associated with the greatest reduction in state anxiety.

In the remainder of this section we first discuss the findings in relation to body ownership, paying particular attention to embodiment of non-human characters. We then review studies of public speaking anxiety in VR, and move on to provide a possible explanation of our findings in relation to a well-known theoretical model of social anxiety. We conclude by pointing out some limitations of our study and future work.

Body Ownership

Although given the state of technology, all studies of embodiment in VR inevitably use characters that are not photorealistic, and could be described as ‘cartoony’, our study was different in the sense that the character was deliberately designed as a cartoon character, a bunny rabbit. Our question was whether this deliberately non-human character would lead to levels of body ownership we have seen in previous studies with embodiment as humans (e.g., (Banakou and Slater, 2014)). Our expectation was that this would be the case, since as discussed in the introduction the form of the virtual body does not seem to influence the level of body ownership, which is derived from multisensory integration rather than top down identification with the appearance of the body. However, all our previous studies have been with human characters, even if distorted by having a long arm or a tail, or being of the colour purple, or being a different age or race.

There have been several studies with non-human characters. In (Ahn et al., 2016) participants were embodied with a virtual cow body using 1PP and visuomotor synchrony (the cow body moved with the movements of the participant on all fours) and there was visuotactile synchrony (the cow body was prodded which was felt synchronously by the participant). The results showed that the level of body ownership was significantly higher than a condition where participants watched a video of the same events. However, the mean reported level of ownership was 2.57 on a five point scale, which is proportionally equivalent to 3.6 on a 7 point scale. In absolute terms this is much lower than the typical values we obtain (median at least 5, with the whole interquartile range above the mid-point of 4) as can be seen in Figures 2A–C, although the questionnaire used in the two cases overlapped but were different. In (Krekhov et al., 2019) participants were embodied in several different types of animal body—a bat, spider, tiger as well as human. Their equivalent scores for body ownership (“acceptance”) were on a scale from 0 to 6. Embodiment as the human had the lowest mean score (2.79 equivalent to 3.3 on a 7-point scale), the score for the bat was considerably higher (4.33) and for the spider 3.63. Again, the questionnaires overlapped with ours but were not the same, but the low score for human embodiment is unusual. This may be related to the fact that a measure of the degree of control over the virtual bodies was highest for the bat. This was a within-groups study so that participants were comparing the different experiences, and it is possible that factors such as novelty or excitement played a role in the different evaluations. In (Charbonneau et al., 2017) participants were embodied in a giant Godzilla-like creature. Body ownership was not directly measured, but the point was to use this embodiment to improve gait while using a rehabilitation walking device. Since there was some evidence of gait improvement it is likely that there was an element of body ownership involved. In (Aymerich-Franch et al., 2017; Aymerich-Franch et al., 2019) people were embodied in physical humanoid robots that they saw through a HMD mounted as eyes on the robot, and ownership scores were high and comparable those typical of VR embodiment studies.

We suggest the following summary. It is possible to obtain some level of body ownership in completely non-human characters, and when there is multisensory integration that provides evidence that the virtual body is the person’s body, then there will be greater scores in that synchronous condition than in other control conditions. However, these are based on comparisons. What’s equally important is not just that a synchronous multisensory condition results in higher scores than a control condition but that the absolute scores also are themselves greater than would be expected by chance. In other words if we obtain random results on a questionnaire that is on a 7 point scale, then the median result will be around 4. A high score in absolute terms should be clearly greater than this, and there is little evidence of this at the moment. However, if the virtual body is humanoid, upright, with a face and limbs approximating humans, then the absolute body ownership scores will be high in themselves not just in comparison with a non-synchronous condition. In (Osimo et al., 2015; Slater et al., 2019) participants were able to compare embodiment in a virtual body that closely resembled their own body, and embodiment in a much older body. Even though one of the virtual bodies looked like themselves still the body ownership scores were not different between these two conditions. In the present study we have a direct comparison between embodiment as a bunny rabbit and a human body, in a between groups situation so that participants did not know of the other conditions. Still, we found that the body ownership was high and the same across these conditions, but dropped greatly for the non-synchronous (3PP) condition. This lends weight to the hypothesis suggested above.

Public Speaking Anxiety

Although in our experiment we did not find that the humorous situation (embodiment as a bunny rabbit with a cartoon audience) improved outcomes overall, our finding is in accord with large number of previous studies. In our case two embodiments as a human with a human audience led to a reduction of state anxiety in comparison with trait anxiety at the third session, supporting previous findings with respect to exposure therapy.

The first study of the efficacy of virtual reality for public speaking anxiety was reported in (North et al., 1998). It exposed participants to an audience of about 100 in a large auditorium, and although the characters forming the audience were static they could be heard to speak and could ask questions. There were five sessions in an exposure therapy, and the control group had equivalent VR exposure, but unrelated to public speaking. It was found that the VR exposure therapy was successful in reducing public speaking anxiety compared to the control group. This approach is standard for the use of VR to help people with anxiety disorders, where the VR is used as a substitute for a real life experience. Logistically it is far easier for the clinician to expose people to the anxiety provoking situation in the office, in real-time with the clinician there, than to arrange real situations such as getting an audience together for multiple sessions, or to give the client “homework” which is carried out in the absence of the clinician.

There has been significant additional research over the past 3 decades. In a meta-analysis of 30 randomised control trials that attempted to reduce fear of public speaking using a variety of methods (Ebrahimi et al., 2019) it was found that there were no differences between outcomes that used face-to-face counseling and virtual reality. In the general area of social anxiety disorders a further study found that VR based therapy was effective in reducing anxiety, and in comparison with in vivo or exposure based on imagination again there was no difference in effect size (Chesham et al., 2018). Overall a comprehensive meta-analysis of VR based psychology therapy found that it is effective, although studies are often small in size and not always RCTs (Freeman et al., 2017).

By the time of the third talk, participants in the Human condition would have already given two previous talks, to the same virtual human audience and under the same conditions. Therefore, in accord with exposure therapy it is not surprising that their level of stress declined relative to their trait level of stress. However, those in the Cartoon condition had previously given two talks to the cartoon audience so that the third “test” scenario was the first time that they had experienced this Human audience. Since the humour idea was ineffective then the simpler explanation for the results is based on number of exposures.

The Cognitive Model of Social Phobia

Why did the cartoon idea not work in the sense that the change in state anxiety simply reflected trait anxiety? Our original idea was that the humour of the situation would allow participants to speak without anxiety to an audience, and thereby learn that this is possible, with this learning carrying over to later talks in front of a human audience. In the cognitive model of social phobia by Clark and Wells (Clark et al., 1995) one of the factors is self-focussed attention and the accentuation of negative thoughts about the self especially with respect to the notion of supposed negative evaluation from others. In that case if a person with social phobia had to talk in front of an audience but as someone else we should expect that their anxiety would be reduced, which is what we expected for the Cartoon condition. In the study reported in (Aymerich-Franch et al., 2014) participants gave a speech in front of a human virtual audience embodied in a human virtual body with a face that was their own likeness or the face of another. In a pre-exposure test participants indicated preference for the face that was unlike their own. However, the exposure results showed that there was at best a marginal reduction of anxiety for those with the dissimilar face.

However, we did not take into account the possibility that even in the cartoon situation participants might still interpret the audience as responding negatively. In the Clark and Wells model social phobia sufferers, to the extent that they process external cues rather than be internally focussed, would be likely to interpret such cues as negative: “In particular, they may be more likely to notice and remember responses from others that they interpret as signs of disapproval” and that this would be particularly pointed in public speaking (Clark, 2001). In the Cartoon condition the cartoon audience, since it was so strange, would be particularly salient. However, for people with strong social phobia there would be no reason why they would not interpret the responses of the audience as negative, even seemingly positive events such as clapping being interpreted as negative (e.g., “They are only clapping because they feel sorry for me”).

Our results suggest that at the average level of trait anxiety the 3PP condition proved to be the one that had the greatest probability of reducing anxiety. This fits the Clark and Wells model since the 3PP condition was the one where they saw themselves from the outside, and thus had the maximum psychological distance from themselves as speaker. This accords well with self-distancing theory (Kross and Ayduk, 2017) where people recall an event that caused anxiety from a third person perspective as a “fly on the wall” rather than from an embodied first person perspective. Participants are instructed when recalling an affectively negative past event: “Now take a few steps back. Move away from the situation to a point where you can now watch the event unfold from a distance and see yourself in the event.” Research on self-distancing theory shows that this leads to a reduction of negative affect. Participants answered the questionnaire after the event itself, so it is possible that their disembodied third-person experience resulted in less stress. However, this finding about the average level of trait anxiety is an inference from the posterior statistical model and would need to be verified with a further experimental study.

Limitations

The first limitation of this study is that the sample consisted only of women, and it remains to be seen if these results would generalise to other genders. Second, the sample sizes were relatively small, however, the posterior distributions were clearly dominated by the data, as evidenced by the narrow and focussed posterior credible intervals compared to the prior intervals. Third, it would be possible to extend the experimental design to two factors: type of embodiment (Cartoon, Human, 3PP) and type of audience (Cartoon, Human). This would be interesting further work to elicit how much the results were due to the embodiment and the audience, the design being able to separate these two factors.

Although we did not find any advantage for the Cartoon condition in this application to fear of public speaking, it is possible that it may be beneficial in other psychological conditions. The role of humour in promoting mental and physical health is well-known—e.g., (Gelkopf and Kreitler, 1996)—and has in particular been studied in relation to overcoming depression (Tagalidou et al., 2019). This could be a useful line of further research.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Ethics Statement

The studies involving human participants were reviewed and approved by This experiment was approved by the Comisión de Bioética de la Universitat de Barcelona (IRB00003099). The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author Contributions

AB designed and implemented the virtual reality scenario, carried out the experiment and compiled the data. XN contributed to the implementation of the virtual reality scenario. DB and RO contributed to the design and implementation of the experiment. VO contributed to the design and implementation of the characters. MS formulated the original concept, designed the experiment, carried out the analysis, wrote the first draft of the paper and obtained the funding. All authors contributed to a review of the draft paper.

Funding

This research was originally funded under the European Seventh Framework Program, Future and Emerging Technologies (FET), Project Virtual Embodiment and Robotic Re-Embodiment (VERE) Grant Agreement Number 257695, and completed under the ERC Advanced Grant MoTIVE 742989.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors would like to thank Sofia Seinfeld for helping with the experiments, and Xenxo Álvarez for helping with the cartoon avatars.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frvir.2021.695673/full#supplementary-material

References

Abuín, M. R., and Rivera, L. d. (2014). La medición de síntomas psicológicos y psicosomáticos: el Listado de Síntomas Breve (LSB-50). Clínica y Salud 25 (2), 131–141. doi:10.1016/j.clysa.2014.06.001