How Soundtracks Shape What We See: Analyzing the Influence of Music on Visual Scenes Through Self-Assessment, Eye Tracking, and Pupillometry

Ansani, Alessandro; Marini, Marco; D’Errico, Francesca; Poggi, Isabella

doi:10.3389/fpsyg.2020.02242

ORIGINAL RESEARCH article

Front. Psychol., 07 October 2020

Sec. Auditory Cognitive Neuroscience

Volume 11 - 2020 | https://doi.org/10.3389/fpsyg.2020.02242

This article is part of the Research TopicThe Effects of Music on Cognition and ActionView all 36 articles

How Soundtracks Shape What We See: Analyzing the Influence of Music on Visual Scenes Through Self-Assessment, Eye Tracking, and Pupillometry

Alessandro Ansani^1,2*

Marco Marini^2,3

Francesca D’Errico⁴

Isabella Poggi¹

¹Cosmic Lab, Department of Philosophy, Communication, and Performing Arts, Roma Tre University, Rome, Italy
²Department of Psychology, Sapienza University of Rome, Rome, Italy
³Institute of Cognitive Sciences and Technologies, Rome, Italy
⁴Department of Education Science, Psychology, Communication Science, University of Bari Aldo Moro, Bari, Italy

This article presents two studies that deepen the theme of how soundtracks shape our interpretation of audiovisuals. Embracing a multivariate perspective, Study 1 (N = 118) demonstrated, through an online between-subjects experiment, that two different music scores (melancholic vs. anxious) deeply affected the interpretations of an unknown movie scene in terms of empathy felt toward the main character, impressions of his personality, plot anticipations, and perception of the environment of the scene. With the melancholic music, participants felt empathy toward the character, viewing him as more agreeable and introverted, more oriented to memories than to decisions, while perceiving the environment as cozier. An almost opposite pattern emerged with the anxious music. In Study 2 (N = 92), we replicated the experiment in our lab but with the addition of eye-tracking and pupillometric measurements. Results of Study 1 were largely replicated; moreover, we proved that the anxious score, by increasing the participants’ vigilance and state of alert (wider pupil dilation), favored greater attention to minor details, as in the case of another character who was very hard to be noticed (more time spent on his figure). Results highlight the pervasive nature of the influence of music within the process of interpretation of visual scenes.

Introduction

The influence of music on human behavior has been studied since the dawn of time. Although a vast amount of studies analyzed the influence on several kinds of performances, among which physical tasks (Edworthy and Waring, 2006), work performance (Lesiuk, 2005), text and verbal memory (Taylor and Dewhurst, 2017), and learning (Lehmann and Seufert, 2017), the vast majority of the studies, starting from the 1980s, focused on marketing, shopping, and advertising (Bruner, 1990). Nowadays, this tradition continues, although several modifications have been made within the experimental paradigms to involve new contemporary scenarios such as online shopping, website atmospherics, and driving game performance (Brodsky, 2001).

Another flourishing tradition has started to bloom in the last two decades on the psychology of music in gambling environments: numerous researchers have deepened the influence of music on gambling, virtual roulette, ultimatum game, casino environment, and lottery (Dixon et al., 2007).

Concerning aspects in the domain of affect, a lot have been written about the processes through which music is useful for personal enhancement (Brown and Theorell, 2006), being able to express and induce moods and emotions (Västfjäll, 2001). An increasing number of studies measure plausible behavioral changes in dependence on the listening of music pieces evoking or inducing different emotions. Several social- and moral-related domains have been explored: facial emotion recognition (Woloszyn and Ewert, 2012); awareness, acceptance, and recall of unethical messages (Ziv et al., 2012); moral judgment and prosocial behavioral intentions (Ansani et al., 2019; Steffens, 2020); and compliance with requests to harm a third person (Ziv, 2015).

Induction vs. Background Music

Among studies on music influence, we can distinguish two categories, depending on their exploiting the musical stimulus either before the task or during the task. We call the former method induction and the latter background music. This premise is paramount since the underlying mental processes that preside over these two experimental situations might be substantially different: according to a previous study (Pisciottano, 2019), in the case of induction music, the participant feels the emotion him/herself, while in the background case, s/he attributes the emotion to the scene character.

Our study focuses on soundtracks, that is musical stimuli administered as background. Background music implies a parallel and multimodal processing, which can vary among music/unrelated, music/related, and music/visual tasks. In general, depending on the task, music may have either positive, integrative, or detrimental effects.

Detrimental Effects: Music as a Source of Distraction

Hearing music while performing an experimental task may be distracting, being a secondary source of information. Indeed, according to Kahneman’s (1973) Cognitive-Capacity Model, “there is a general limit on man’s capacity to perform mental work. […] this limited capacity can be allocated with considerable freedom among concurrent activities. […] the ability to perform several mental activities concurrently depends, at least in part, on the effort which each of these activities demands when performed in isolation. The driver who interrupts a conversation to make a turn is an example” (Kahneman, 1973, p. 9).

Not necessarily does an effortful workload exclude a mood effect—several researchers who use music during the task (e.g., Au et al., 2003) provide accounts in terms of mood, but the parallel elaboration often implies the emergence of other phenomena. In their meta-analysis, Kämpfe et al. (2011) conclude that background music has detrimental effects on several memory-related tasks and produces decreases in reading performance, being a source of distraction during cognitive tasks per se (Furnham and Strbac, 2002; Kallinen, 2002; Salvucci et al., 2007) or depending on its tempo (Mentzoni et al., 2014; Nguyen and Grahn, 2017; Israel et al., 2019) or its volume (Noseworthy and Finlay, 2009).

Integrative Effects

When it comes to music in audiovisuals, things become more complicated. In this case too music can be a distractor, leading, for instance, to a reduced recall of the ads’ messages (Fraser and Bradford, 2013). Nevertheless, the effects of music are overall integrative: on the one hand, the human mind expects music to exhibit some sort of synchrony (Rogers and Gibson, 2012) and, most of all, congruity to what is stated and depicted (i.e., visual information) by the main message, whether movies or advertising (Bolivar et al., 1994; Boltz, 2004; North, 2004; Oakes, 2007; Herget et al., 2018), as stated by the Congruence-Association Model by Cohen (2013). On the other hand, through the mood communicated, music can convey semantic and content-related information by activating specific schemas: cognitive structures developed through experience that represent “knowledge about concepts or types of stimuli, their attributes, and the relations among those attributes” (Shevy, 2007, p. 59). Such schemas, in turn, influence the building of a mood-coherent audiovisual narrative.

Soundtrack and Interpretation

As already implied by Hoeckner et al. (2011), background music provides an interpretive framework for the audiovisuals (for a more detailed analysis of several cognitive frameworks of soundtracks, see Branigan, 2010); moreover, it can be seen as a second source of emotion besides the film itself (Cohen, 2001): it shapes audience’s understanding not only of a character’s actions, emotions, and intentions (Marshall and Cohen, 1988; Tan et al., 2007), by framing “visual meanings” (Nelson and Boynton, 1997), but also of characters’ moral judgments (Steffens, 2020), general evaluations (Shevy, 2007), and plot anticipations (Bullerjahn and Güldenring, 1994; Vitouch, 2001; Shevy, 2007) or by generating expectations (Killmeier and Christiansen, 2004). This is well-known to any director and soundtrack composer: “the music in a film may be original or not, but what matters most, from a textual and communicative point of view, is the relationship established between the music, and the script, and the photography, and how they all add up and combine with each other, so that viewers can interpret them in a certain way” (Zabalbeascoa, 2008, p. 24). Several scholars claim in fact the existence of proper semiotics of music for film and TV (Tagg, 2013).

Tagg (2006) let their subjects listen to 10 title themes for film or television, asking them “to write down what they thought might be happening on the screen along with each tune. The results were collected and reduced to single concepts;” the authors called these visual–verbal associations (VVAs). Surprisingly enough, they found some of the themes to be strongly connected with male figures and some other with female figures; moreover, masculine themes were associated with concepts like Western, fast, detective, robbery, concrete, business, traffic, shooting, and planning, whereas feminine themes were associated with love, sad, parting, destiny, tragic, death, sentimental, sitting, France, and harmonious. Along the same line, Huron (1989) claims music to be a “very effective non-verbal identifier” and thus useful for targeting certain demographic and social groups as well as determining a character’s ethos. Despite such encouraging preliminary results, only a few studies focused on the different interpretations of audiovisuals that music may foster by experimentally manipulating it.

Iwamiya (1997) studied the effect of listening to different music on the impression obtained from landscapes viewed from a car, showing that they were more pleasant when music was played as opposed to silence, and the ratings of pleasantness were highest when relaxing music was on.

Boltz (2001) analyzed the interpretations of three ambiguous clips in positive music, negative music, and no music conditions. A negative rating was connected with extreme violence and death, while the highest rating was given when the interpretation was about very happy outcomes. Coherently with her hypothesis, compared to the no music condition, the interpretations of all three clips were positive in the presence of positive music and negative in the other case. Furthermore, assumptions about the main character’s personality were measured: in the positive condition, he was considered as caring, loving, and playful, while in the negative condition, the most significant adjectives were deranged, manipulative, and mysterious.

Ziv and Goshen (2006) obtained the same results with 5- to 6-year-old children. Using the first 21 bars of the melody of Chopin’s Mazurka op. 68 n. 2 in A minor as sad music and a modified version of the same piece (transposed in C major and played faster) as happy music, the authors showed that children’s interpretations were significantly affected by the background music: sadder in the first case and happier in the second.

Using a more ecological covert design (i.e., participants were presented with an original vs. fake score of the same film sequence), Vitouch (2001) found that “viewers’ anticipations about the further development of a sequence are systematically influenced by the underlying film music” (p. 70).

In his fascinating work, Bravo (2013) studied the effect of tonal dissonance on interpretations of the emotional content of an animated short film. He hypothesized that in the same film sequence, different levels of tonal dissonance would elicit different interpretations and expectations about the emotional content of the movie scene. The short film he used as a suitable stimulus to be interpreted was very ambiguous since it did not involve clear facts in its scenes. Bravo created two soundtracks only differing as to their degree of dissonance. Comparing the subjects’ interpretations in the consonant vs. dissonant condition, it emerged that in the latter, the main character was judged as more scared, alienated, sadder, less confident, and was thought to be trying to destroy something; the character was also believed to be more sinister than nostalgic and its story more tragic than hopeful.

Finally, Hoeckner et al. (2011) deepened the interesting question about how viewers relate to movie characters in correspondence of different music and how their sense of empathy is shaped by two soundtracks: thriller music and melodrama music. They found that “compared to melodramatic music, thriller music significantly lowered likability and certainty about characters’ thoughts,” while “melodramatic music increased love attributions and lowered fear attributions.” Moreover, for the first time, they introduced the theme of empathy into the debate and, although not directly assessing its level through a specific scale, demonstrated that “musical schemas used in underscoring modulate viewers’ theory of mind and emotional contagion in response to screen characters, thus providing antecedents for empathic accuracy and empathic concern.”

All of these studies are overviewed in Herget’s (2019) comprehensive review on music’s potential to convey meaning in film; she concludes her work by underlining some weak points that should be overcome to improve such domain of research:

(1) Research on this issue is sparse. This results in experiments each analyzing a single psychological construct.

(2) Complex psychological constructs such as sympathy and empathy toward media protagonists could and should be investigated through all the available established measuring instruments (Herget, 2019).

(3) There are hardly any ecologically valid investigations carried out in natural contexts such as during a visit to the cinema, a television evening with the whole family, or alone in a young person’s room (Bullerjahn, 2005);

(4) Most research designs are too complicated and extensive.

(5) Within-subjects designs risk drawing the participants’ attention to the musical manipulation (Tan et al., 2007).

Since we strongly agree with the bulk of Herget’s criticism, our aim in this work is to investigate the effects of music in the interpretation of visual scenes by specifically addressing these demands. In our Study 1 below, we intend to:

(1) provide a global view of the influence of background music on scene interpretation by examining various psychological constructs, such as empathy, affective states, and perceived personality traits;

(2) accurately measure such constructs by relying on available established measures and tools;

(3) improve ecological validity by running an online study to be directly done from the participants’ homes on laptops and other devices;

(4) reduce the number of factors to have better control and lower the number of experimental subjects required;

(5) plan a between-subjects design to prevent the subjects’ awareness of the manipulation.

Moreover, in Study 2, we employ the eye-tracking methodology to investigate the influence of music on a scene interpretation also from a physiological perspective.

Study 1: Online Survey

As stated above, the literature on the interpretation of audiovisuals has proved the ability of music to convey meanings through associations (Cohen, 2013) and activation of cognitive schemas (Boltz, 2001). In our study, we consider interpretation in a multidimensional fashion as a global process involving several interconnected cognitive operations: attribution of emotions, personality traits, thoughts or behavioral intentions to the characters on the scene, empathy toward them, and perception of the surrounding environment.

Research Questions

Our aim is to investigate how in a visual scene the following dependent variables are affected by background music:

(1) empathy toward the character.

(2) affective states attributed to the character.

(3) impressions of the character’s personality.

(4) plot anticipation.

(5) environment perception.

Method: Rationale and Recruitment

We designed a between-subjects experiment (N = 118–44, female; age = 37 ± 11, see Table 1 for gender and age distribution) in which participants watched a scene (01′ 55′′) from an almost unknown short movie (Duras, 1981) (Figure 1): an emotionally neutral male character slowly walks toward some large windows in a lonesome building, with the seaside in the distance. He walks, looks outside, stops, and moves out of the frame.

TABLE 1

Table 1. Gender and age distribution (mean age ± SD).

FIGURE 1

Figure 1. Illustration of three representative frames of the scene.

Through Adobe Premiere Pro, we created three versions of the scene—the three experimental conditions—with the video accompanied respectively by a dogged and anxious orchestral piece (The Isle of the Dead by S. Rachmaninov), a soft, melancholic jazz solo piano (Like Someone in Love by B. Evans), or by ambient sound only. We chose these two pieces based on the findings of Juslin and Laukka (2004), and subsequent studies listed by Cespedes-Guevara and Eerola (2018), concerning several psychoacoustic parameters associated with emotional expression in music. The two pieces both evoke negative feelings but differ in the arousal dimension: Evans track’s mellow tone and soft intensity can be associated with delicacy, gracefulness, relaxation, and quietness (Fabian and Schubert, 2003) or with sadness and tenderness (Quinto et al., 2014). On the contrary, Rachmaninov track’s large sound level variability, rapid changes in sound level, and ascending pitch could be linked to fear (Juslin and Laukka, 2004), while its increasingly louder intensity could communicate restlessness, agitation, tension (Fabian and Schubert, 2003) or anger, fear (Scherer et al., 2015), and scariness (Eerola et al., 2013).

After viewing the scene, participants were asked how they felt toward the character and what they thought he was feeling, what kind of personality he could have, whether they thought he was remembering or planning instead, and how they perceived the environment in which the scene was set. To avoid sequence effects, the order of questions was randomized for each participant.

Aiming at a better ecological validity, to let people participate in a less detached situation than a lab, we build the procedure on Qualtrics.com. By accessing a single unreusable link,¹ they could run the experiment directly from home on their laptops, smartphones, or tablets; recruited through Amazon Mechanical Turk, they were paid according to the standard American minimum wage: 1$ for a ∼15 min task.

Measures and Hypotheses

We hypothesize that the narratives (hence, the interpretations) that the participants will build on the scene, influenced by the soundtrack, will be very different among them. We plan to shed light on such differences by taking a fine-grained look into some of the psychological constructs involved. Below, we describe each construct and its related measurement separately, stating our hypotheses at the bottom of each subparagraph.

In a nutshell, here is an example of two different plausible narratives:

• Evans: we see a sad man who walks alone in an empty building, he must be an introverted guy, we see that he’s watching outside the window, maybe he’s thinking about the past, maybe a loved one, the scene is sweet and quite gloomy.

• Rachmaninov: we see an ambiguous character walking in an unsettling hall, he shows a solemn gait, something bad is happening; probably he’s planning something harmful. I wouldn’t trust this man.

Empathy Toward the Character

To assess the participants’ empathy toward the main character, after comparing various indexes (Neumann et al., 2015), we opted for a 14-item two-factor scale by Batson et al. (1983). The scale involves 14 adjectives that describe affective states of distress (alarmed, grieved, upset, worried, disturbed, perturbed, distressed, troubled) and empathic interest (sympathetic, moved, compassionate, tender, warm, softhearted). The score obtained from the difference between empathic interest and distress should therefore be the most significant assessment of the empathic response (Batson et al., 1983; Leone et al., 2008): higher ratings correspond to higher empathic interest, while lower ratings stand for enhanced distress-like feelings.

H₁: Evoking feelings of delicacy and tenderness, the melancholic track (Evans) will encourage empathy toward the character. On the contrary, the negative feelings evoked by Rachmaninov will dampen empathy.

Affective State Attributed to the Character

We administered a classic 10-item Positive and Negative Affect Schedule (I-PANAS-SF) for the emotions attributed to the character. The used version was previously validated by Karim et al. (2011). Moreover, we added the item wistful, as we were convinced that it could have been significantly different among the conditions.

H₂_a: The dogged and menacing track (Rachmaninov) will lead to attribute a more positive affective state; the character will appear as adamant; on the contrary, Evans track will let the participant attribute the character more negative affective states.

The reason for such a hypothesis is intuitive: in the first case, music can make one imagine an evil character, possibly determined to do something harmful; while in the second case, music mood will let one picture a depressed/nostalgic character, therefore with a more negative affective state.

H_2b: Evans track will show higher scores in wistfulness as opposed to Rachmaninov’s.

Impressions of Personality

To measure the participants’ personality impressions (Asch, 1946) about the character, we employed a 15-items assessment of the Big Five (Lang et al., 2011) previously validated with satisfying results.

H₃: In the light of the melancholic track, the character will be seen as more agreeable and open (i.e., very emotional) and less extroverted; on the contrary, in dependence of Rachmaninov’s track, the character will be regarded as more neurotic and conscientious (e.g., a lucid criminal) (Table 2).

TABLE 2

Table 2. Impressions of personality (hypotheses).

Plot Anticipation: Past Perspective vs. Future Perspective

As for the plot anticipation, we simply asked the participants whether they thought that the main character was remembering the past (past perspective) or taking a decision (future perspective). It was also possible to choose both options. In the first case, several five-point Likert scales were presented about the emotions that the characters could have been feeling in relation to his memory. In the second one, other Likert scales were presented on the nature of such a decision; in particular, we asked whether it could have been a morally good, neutral, or bad action. In the event that both the options (i.e., remembering and taking a decision) were chosen, both the questions on the memory and the decision appeared.

H₄: When viewed with Evans music in the background, the participants will think about someone who is remembering something nostalgic; with Rachmaninov, he will be seen as a planner of possibly evil deeds.

Environment Perception

We were interested in understanding whether a place could be seen as cozier and warmer rather than inhospitable and unpleasant in dependence of different music; therefore, we took inspiration from a study by Yamasaki et al. (2015): they analyzed the impact of music on the impressions of the environment on the three standard dimensions of emotions: activation, valence, and potency. We decided to administer a short list of five bipolar five-step Likert scales by picking only the couples of adjectives that were somehow related to the idea of coziness, so we chose four out of five from those of the valence dimension (we excluded one for reasons of redundancy), and we also added a new couple that we considered crucial: dangerous–safe.

H₅: The melancholic track will let the environment be perceived as cozier. On the contrary, Rachmaninov will let our participants perceive an unpleasant environment.

Preliminary Sample Data Analysis

Every online procedure has the merit of guaranteeing a significant number of participants in a few days; nevertheless, lacking in experimental control, a careful preliminary analysis is necessary. To improve the reliability of our sample, first, we added an attention check question in which a multiple-item Likert scale was presented with an explicit instruction to avoid filling it out; thus, we excluded all those participants who compiled such a scale. Second, we added a time count on the screen containing the video so to exclude all of those participants who had not watched the whole scene.

After such exclusions, our sample decreased from 309 to 118 participants. No further outliers were excluded.