A Public Database of Immersive VR Videos with Corresponding Ratings of Arousal, Valence, and Correlations between Head Movements and Self Report Measures

Virtual reality (VR) has been proposed as a methodological tool to study the basic science of psychology and other fields. One key advantage of VR is that sharing of virtual content can lead to more robust replication and representative sampling. A database of standardized content will help fulfill this vision. There are two objectives to this study. First, we seek to establish and allow public access to a database of immersive VR video clips that can act as a potential resource for studies on emotion induction using virtual reality. Second, given the large sample size of participants needed to get reliable valence and arousal ratings for our video, we were able to explore the possible links between the head movements of the observer and the emotions he or she feels while viewing immersive VR. To accomplish our goals, we sourced for and tested 73 immersive VR clips which participants rated on valence and arousal dimensions using self-assessment manikins. We also tracked participants' rotational head movements as they watched the clips, allowing us to correlate head movements and affect. Based on past research, we predicted relationships between the standard deviation of head yaw and valence and arousal ratings. Results showed that the stimuli varied reasonably well along the dimensions of valence and arousal, with a slight underrepresentation of clips that are of negative valence and highly arousing. The standard deviation of yaw positively correlated with valence, while a significant positive relationship was found between head pitch and arousal. The immersive VR clips tested are available online as supplemental material.


INTRODUCTION
proposed the use of virtual reality (VR) as a methodological tool to study the basic science of psychology and other fields. Since then, there has been a steady increase in the number of studies that seek to use VR as a tool (Schultheis and Rizzo, 2001;Fox et al., 2009). Some studies use VR to examine how humans respond to virtual social interactions (Dyck et al., 2008;Schroeder, 2012;Qu et al., 2014) or as a tool for exposure therapy (Difede and Hoffman, 2002;Klinger et al., 2005), while others employ VR to study phenomenon that might otherwise be impossible to recreate or manipulate in real life (Slater et al., 2006;Peck et al., 2013). In recent years, the cost of a typical hardware setup has decreased dramatically, allowing researchers to spend less than the typical price of a laptop to implement compelling VR. One of the key advantages of VR for the study of social science is that sharing of virtual content will allow "not only for cross-sectional replication but also for more representative sampling" (Blascovich et al., 2002). What is needed to fulfill this vision is a database of standardized content.
The immersive video (or immersive VR clip) is one powerful and realistic aspect of VR. It shows a photorealistic video of a scene that updates based on head-orientation but is not otherwise interactive (Slater and Sanchez-Vives, 2016). When a viewer watches an immersive VR clip, he sees a 360 • view from where the video was originally recorded, and while changes in head orientation are rendered accurately, typically these videos do not allow for head translation. A video is recorded using multiple cameras and stitched together through software to form a total surround scene. In this sense, creating content for immersive video is fairly straightforward, and consequently there is a wealth of content publicly available on social media sites (Multisilta, 2014).
To accomplish the goal of a VR content database, we sourced and created a library of immersive VR clips that can act as a resource for scholars, paralleling the design used in prior studies on affective picture viewing (e.g., International Affective Picture System, IAPS; Lang et al., 2008). The IAPS is a large set of photographs developed to provide emotional stimuli for psychological and behavioral studies on emotion and mood induction. Participants are shown photographs and asked to rate each on the dimensions of valence and arousal. While the IAPS and its acoustic stimuli counterpart the International Affective Digital Sounds (IADS; Bradley and Lang, 1999) are well-established and used extensively in emotional research, a database of immersive VR content that can potentially induce emotions does not exist to our knowledge. As such, we were interested to explore if we can establish a database of immersive VR clips for emotion induction based on the affective response of participants.
Most VR systems allow a user to have a full 360 • head rotation view, such that the content updates based on the particular orientation of the head. In this sense, the so-called field of regard is higher in VR than in traditional media such as the television, which doesn't change when one moves her head away from the screen. This often allows VR to trigger strong emotions in individuals (Riva et al., 2007;Parsons and Rizzo, 2008). However, few studies have examined the relationship between head movements in VR and emotions. Darwin (1965) discussed the idea of head postures representing emotional states. When one is happy, he holds his head up high. Conversely, when he is sad, his head tends to hang low. Indeed, more recent empirical research has provided empirical evidence for these relationships (Schouwstra and Hoogstraten, 1995;Wallbott, 1998;Tracy and Matsumoto, 2008).
An early study which investigated the influence of body movements on presence in virtual environments found a significant positive association between head yaw and reported presence (Slater et al., 1998). In a study on head movements in VR, participants saw themselves in a virtual classroom and participated in a learning experience (Won et al., 2016). Results showed a relationship between lateral head rotations and anxiety, where the standard deviation of head yaw significantly correlated to the awareness and concern individuals had regarding other virtual people in the room. Livingstone and Palmer (2016) tasked vocalists to speak and sing passages of varying emotions (e.g., happy, neutral, sad) and tracked their head movements using motion capture technology. Findings revealed a significant relationship between head pitch and emotions. Participants raised their heads when vocalizing passages that conveyed happiness and excitement and lowered their heads for those of a sad nature. Understanding the link between head movements in VR and emotions may be key in the development and implementation of VR in the study and treatment of psychological disorders (Wiederhold and Wiederhold, 2005;Parsons et al., 2007).
There are two objectives of the study: First, we seek to establish and allow public access to a database of immersive VR clips that can act as a potential resource for studies on emotion induction using virtual reality. Second, given we need a large sample size of participants to get reliable valence and arousal ratings for our video, we are in a unique position explore the possible links between head movements and the emotions one feels while viewing immersive VR. To accomplish our goals, we sourced for and tested 73 immersive VR clips which participants rated on valence and arousal dimensions using self-assessment manikins. These clips are available online as supplemental material. We also tracked participants' rotational head movements as they watched the clips, allowing us to correlate the observers' head movements and affect. Based on past research (Won et al., 2016), we predicted significant relationships between the standard deviation of head yaw with valence and arousal ratings.

Participants
Participants comprised of undergraduates from a medium-sized West Coast university who received course credit for their participation. In total, 95 participants (56 female) between the ages of 18 and 24 took part in the study.

Stimulus and Measures
The authors spent 6 months searching for clips of immersive VR which they thought will effectively induce emotions. Sources include personal contacts and internet searches on website FIGURE 2 | Depiction of the three angles of rotational movement-pitch, yaw, and roll. such as YouTube, Vrideo, and Facebook. In total, more than 200 immersive VR clips were viewed and assessed. From this collection, 113 were shortlisted and subjected to further analysis. The experimenters evaluated the video clips and a subsequent round of selection was conducted based on the criteria employed by Gross and Levenson (1995). First, the clips had to be of relatively short length. This is especially important as longer clips may induce fatigue and nausea among participants. Second, the VR clips had to be understandable on their own without the need for further explanation. As such, clips which were sequels or part of an episodic series were excluded. Third, the VR clips should be likely to induce valence and arousal. The aim is to get a good spread of videos that will vary across the dimensions. A final 73 immersive VR clips were selected for the study. They ranged from 29 to 668 s in length with an average of 188 s per clip.
Participants viewed the immersive VR clips through an Oculus Rift CV1 (Oculus VR, Menlo Park, CA) head-mounted display (HMD). The Oculus Rift has a resolution of 2,160 × 1,200 pixels, a 110 • field of view and a refresh rate of 90 Hz. The lowlatency tracking technology determines the relative position of the viewer's head and adjusts his view of the immersive video accordingly. Participants interacted with on-screen prompts and rated the videos using an Oculus Rift remote. Vizard 5 software (Worldviz, San Francisco, CA) was used to program the rating system. The software ran on a 3.6 GHz Intel i7 computer with an Nvidia GTX 1080 graphics card. The experimental setup is shown in Figure 1.
The Oculus Rift HMD features a magnetometer, gyroscope, and accelerometer which combine to allow for tracking of rotational head movement. The data was digitally captured and comprised of the pitch, yaw, and roll of the head. These are standard terms for rotations around the respective axes, and are measured in degrees. Pitch refers to the movement of the head around the X-axis, similar to a nodding movement. Yaw represents the movement of the head around the Y-axis, similar to turning the head side-to-side to indicate "no." Roll refers to moving the head around the Z-axis, similar to tilting the head  from one shoulder to the other. These movements are presented in Figure 2. As discussed earlier, Won et al. (2016) found a relationship between lateral head rotations and anxiety. They showed that scanning behavior, defined as the standard deviation of head yaw, significantly correlated with the awareness and concern people had of virtual others. In this study, we similarly assessed how much participants moved their heads by calculating the standard deviations of the pitch, yaw, and roll of their head movements while they watched each clip and included them as our variables.
Participants made their ratings using the self-assessment manikin (SAM; Lang, 1980). SAM shows a series of graphical figures that range along the dimensions of valence and arousal. The expressions of these figures vary across a continuous scale.
The SAM scale for valence shows a sad and unhappy figure on one end, and a smiling and happy figure at the other. For arousal, the SAM scale depicts a calm and relaxed figure on one end, and an excited and interested figure on the other. A 9-point rating scale is presented at the bottom of each SAM. Participants select one of the options while wearing the HMD using the Oculus Rift remote control device that could scroll among options. Studies have shown that SAM ratings of valence and arousal are similar to those obtained from the verbal semantic differential scale (Lang, 1980;Ito et al., 1998). The SAM figures are presented in Figure 3.

Procedure
Pretests were conducted to find out the duration that participants were comfortable with watching immersive videos before they experience fatigue or simulation sickness. Results revealed that some participants encountered fatigue and/or nausea if they watched for more than 15 min without a break. Most participants were at ease with a duration of around 12 min. The 73 immersive VR clips were then divided into clusters with an approximate duration of 12 min per cluster. This resulted in a total of 19 groups of videos. Based on the judgment of the experimenters, no more than two clips of a particular valence (negative/positive) or arousal (low/high) were shown consecutively (Gross and Levenson, 1995). This was to discourage participants from being   too involved in any particular affect and influence his judgement in the subsequent ratings. Each video clip was viewed by a minimum of 15 participants. When participants first arrived, they were briefed by the experimenter that the purpose of the study was to examine how people respond to immersive videos. Participants were told that they would be wearing an HMD to view the immersive videos, and that they can request to stop participating at any time if they feel discomfort, nauseous, or some form of simulator sickness. Participants were then presented with a printout of the SAM measures for valence and arousal, and told that they would be rating the immersive videos based on these dimensions. Participants were then introduced to the Oculus Rift remote and its operation in order to rate the immersive VR clips.
The specific procedure is presented here: Participants sat on swivel chair which allowed them to turn around 360 • if they wished to. They first watched a test immersive VR clip and did a mock rating to get accustomed to the viewing and rating process. They then watched and rated a total of three groups of video clips with each group comprising of between two and four video clips.
A 5 s preparation screen was presented before each clip. After the clip was shown, participants were presented with the SAM scale for valence. After participants selected the corresponding rating using the Oculus Rift remote, the SAM scale for arousal was presented and participants made their ratings. Following this, the aforementioned 5 s preparation screen was presented to get participants ready to view the next clip. After watching one group of immersive VR clips, participants were given a short break of about 5 min before continuing with the next group of clips. This was done to minimize the chances of participants feeling fatigue or nauseous by allowing them to rest in between group of videos. With each group of videos having a duration of about 12 min, the entire rating process lasted around 40 min. Figure 4 shows the plots of the immersive video clips (labeled by their ID numbers) based on mean ratings of valence and arousal. There is a varied distribution of video clips above the midpoint (5) of valence that vary across arousal ratings. However, despite our efforts to locate and shortlist immersive VR clips for the study, there appears to be an underrepresentation for clips that both induce negative valence and are highly arousing. Table 1 shows a list of all the clips in the database, together with a short description, length and their corresponding valence and arousal ratings.

Affective Ratings
The immersive VR clips varied on arousal ratings (M = 4.20, SD = 1.39), ranging from a low of 1.57 to a high of 7.4. This compares favorably with arousal ratings on the IAPS, which range from 1.72 to 7.35 (Lang et al., 2008). Comparatively, arousal ratings on the IAPS ranged from 1.72 to 7.35. The video clips also varied on valence ratings (M = 5.59, SD = 1.40), with a low of 2.2 and a high of 7.7. This compares reasonably well with valence ratings on the IAPS, which range from 1.31 to 8.34.

Head Movement Data
Pearson's product-moment correlations between observers' head movement data and their affective ratings are presented in Table 2. Most scores appear to be normally distributed as assessed by a visual inspection of Normal Q-Q plots (see Figure 5). Analyses showed that average standard deviation of head yaw significantly predicted valence [F (1, 71) = 5.06, p = 0.03, r =0.26, adjusted R 2 = 0.05], although the direction was in contrast to our hypothesis. There was no significant relationship between standard deviation of head yaw with arousal [F (1, 71) = 2.02, p = 0.16, r = 0.17, adjusted R 2 =0.01)]. However, there was a significant relationship between average head pitch movement and arousal [F (1, 71)

DISCUSSION
The first objective of the study was to establish and introduce a database of immersive video clips that can serve as a resource for emotion induction research through VR. We sourced and tested a total of 73 video clips. Results showed that the stimuli varied reasonably well along the dimensions of valence and arousal. However, there appears to be a lack of representation for videos that are of negative valence yet highly arousing. In the IAPS and IADS, stimuli that belong to this quadrant tend to represent themes that are gory or violent, such as a victim of an attack that has his face mutilated, or a woman being held hostage with a knife to her throat. The majority of our videos are in the public domain and readily viewable on popular websites such as Youtube which have a strict policy on the types of content that can be uploaded. Hence, it is not surprising that stimuli of negative valence and arousal were not captured in our selection of immersive videos. Regardless, the collection of video clips (which can be found here) should serve as a good launching pad for researchers interested to examine the links between VR and emotion.
Although not a key factor of interest for this paper, we observed variance in the length of the video clips which was confounded with video content. Long video clips in our database tend to be of serious journalism content (e.g., nuclear fallout, homeless veterans, dictatorship regime) and naturally evoke negative valence. Length is a distinct factor of videos in contrast to photographs which are the standard emotional stimuli of photographs. Hence, while we experienced difficulty sourcing for long video clips that are of positive valence, future studies should examine the influence of video clip length on affective ratings.  The second objective sought to explore the relationship between observers' head movements and their emotions. We demonstrated a significant relationship between the amount of head yaw and valence ratings, which suggests that individuals who displayed greater movement of side-to-side head movement gave higher ratings of pleasure. However, the positive relationship shown here is in contrast to that presented by Won et al. (2016) who showed a significant relationship between the amount of head yaw and reported anxiety. It appears that content and context is an important differentiating factor when it comes to the effects of head movements. Participants in the former study explored their virtual environment and may have felt anxious in the presence of other virtual people. In our study, participants simply viewed the content presented to them without the need for navigation. Although no significant relationship was present between standard deviation of head yaw and arousal ratings, we found a correlation between head pitch and arousal, suggesting that people who tend to tilt their head upwards while watching immersive videos reported being more excited. This parallels research conducted by Lhommet and Marsella (2015) who compiled data from various studies on head positions and emotion states and showed that tilting the head up corresponds to feelings of excitement such as surprise and fear. The links between head movement and emotion are important findings and deserves further investigation.
One thing of note is the small effect sizes shown in our study (adjusted R 2 = 0.05). While we tried our best to balance efficient data collection and managing participant fatigue, some participants may not be used to watching VR clips at length and may have felt uncomfortable or distressed without overtly expressing it. This may have influenced their ratings for VR clips toward the end of their study session, and may explain the small effect size. Future studies can explore when participant fatigue is likely to take place and adjust the viewing duration accordingly to minimize the impact on participant ratings.
Self-perception theory posits that people determine their attitudes based on their behavior (Bem, 1972). Future research can explore whether tasking participants to direct their head in certain directions or movements can lead to changes in their affect or attitudes. For example, imagine placing a participant in a virtual garden filled with colorful flowers and lush greenery. Since our study shows a positive link between amount of head yaw and valence ratings, will participants tasked to keep their gaze on a butterfly fluttering around them (therefore increasing the amount of head movement) lead to stronger valence compared to those who see a stationary butterfly resting on a flower? Results from this and similar studies can possibly aid in the development of virtual environments that assist patients undergoing technology-assisted therapy.
Our study examined the rotational head movements enacted by participants as they watched the video clips. Participants in our study sat on a swivel chair, which allowed them to swing around to have a full surround view of the immersive video. Future studies can incorporate translational head movements, which refers to movements that operate horizontally, laterally and vertically (x-, y-, and z-axes). This can exist through allowing participants to sit, stand or walk freely, or even program depth field elements into the immersive videos and seeing how participants' rotational and translational head movements correlate with their affect. Exploring the effects of the added degrees of freedom will contribute to a deeper understanding on the connection between head movements and emotions.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of the Human Research Protection