The Influence of Background Music on Learning in the Light of Different Theoretical Perspectives and the Role of Working Memory Capacity

This study investigates how background music influences learning with respect to three different theoretical approaches. Both the Mozart effect as well as the arousal-mood-hypothesis indicate that background music can potentially benefit learning outcomes. While the Mozart effect assumes a direct influence of background music on cognitive abilities, the arousal-mood-hypothesis assumes a mediation effect over arousal and mood. However, the seductive detail effect indicates that seductive details such as background music worsen learning. Moreover, as working memory capacity has a crucial influence on learning with seductive details, we also included the learner’s working memory capacity as a factor in our study. We tested 81 college students using a between-subject design with half of the sample listening to two pop songs while learning a visual text and the other half learning in silence. We included working memory capacity in the design as a continuous organism variable. Arousal and mood scores before and after learning were collected as potential mediating variables. To measure learning outcomes we tested recall and comprehension. We did not find a mediation effect between background music and arousal or mood on learning outcomes. In addition, for recall performance there were no main effects of background music or working memory capacity, nor an interaction effect of these factors. However, when considering comprehension we did find an interaction between background music and working memory capacity: the higher the learners’ working memory capacity, the better they learned with background music. This is in line with the seductive detail assumption.


INTRODUCTION AND THEORETICAL BACKGROUND
Music has become much more readily available to the public in the past decades. One influencing factor was the increasing availability of music: whilst in the past one was in need of CDs or tapes and an according player, nowadays music can be played digitally on many different devices such as computers, mobile phones or iPods. Furthermore, the choice of available songs is almost endless due to music portals. This makes it possible to select suitable songs for different situations, such as relaxing songs for a cozy evening or activating songs before going out. Due to these advances in music technology, learning with background music has received more and more attention over the last decade (e.g., Schwartz et al., 2017).
For some situations it seems intuitive to think that music would help to enhance our experience -but how do music and learning fit together? At present the effects of background music while learning and the mechanisms behind this are unclear. On the one side, music seems to have a positive (Mozart effect; Rauscher et al., 1993) and stimulating effect (arousal-moodhypothesis; Husain et al., 2002), which could improve learning. On the other side, background music could lead to an additional burden on working memory (seductive detail effect; e.g., Rey, 2012), thus hindering learning. To be able to simultaneously deal with the learning material and the background music, the learner's working memory capacity is a crucial factor to consider.

Background Music
In this study we define background music as music that plays in the background while studying, i.e., when reading a text. Learners are intended to listen to this music but there is no relation between the music itself and the main task, namely learning the text.
Results of studies investigating the relationship between background music and learning outcomes are varied. While some studies found no effect of background music (e.g., Moreno and Mayer, 2000;Jäncke and Sandmann, 2010) others found that it negatively impacted learning outcomes [e.g., Furnham and Bradley, 1997;Randsell and Gilroy, 2001;Hallam et al., 2002 (study 2)]. Further studies report that it has a positive impact [e.g., Hallam et al., 2002 (study 1);de Groot, 2006], especially on students with learning disabilities (Savan, 1999) or poor spelling skills (Scheree et al., 2000). Thompson et al. (2011) gave a first hint as to why previous results were so mixed. They revealed that music characteristics like tempo and intensity have an influence on learning outcomes: only soft fast music had a positive influence, whilst loud fast as well as soft slow or loud slow music hindered learning. In addition, instrumental music disturbs learners less than music with lyrics (Perham and Currie, 2014). As each study used their own music and did not control for the characteristics of their music choice, this is one possible explanation for the heterogeneous study results mentioned above. Moreover, it seems plausible that learner's characteristics such as their musical expertise (Wallace, 1994) or their familiarity with the presented music could also impact their learning.
Importantly, it is not the characteristics of a song per se, but their effects on the learner which influence learning outcomes. These effects on the learner have been explained by different theoretical approaches. These can be grouped into approaches positing either a potentially positive or negative influence on learning outcomes.
The first theoretical perspective explains why background music could positively influence learning and cognitive abilities. Probably the most well-known approach in this field is the so-called Mozart effect (Rauscher et al., 1993). In this study, before completing a task that measured spatial abilities, some participants listened to a Mozart sonata, while others did not listen to any music. Participants in the Mozart condition outperformed the other group. The authors found a direct, positive influence of listening to Mozart sonatas on spatial abilities. They explain these better test results though priming effects. Even though in the experiment the exposition to music took place in advance of the task, the results are transferrable to listening to music while learning. Priming effects should be even stronger during the exposition to the stimulus and decay over time (e.g., Foss, 1982).
This priming explanation, however, was criticized by Husain et al. (2002). They formulated the arousal-mood-hypothesis. It states, that listening to background music does not have a direct influence on cognitive abilities, but affects it through the mediators of arousal and mood. The prerequisite for this assumed mediation is that background music has an impact on arousal and mood, which in turn impact learning outcomes. Moreover, the authors go one step further and postulate that this mediation effect should not only influence spatial abilities, but also cognitive performance.
When considering arousal, Husain et al. (2002) follow Sloboda and Juslin's (2001) definition, that arousal describes physical activation. The influence of listening to background music on arousal (for an overview, see Pelletier, 2004) is well-established: Music can increase or decrease arousal, mostly influenced by the tempo of a song (Husain et al., 2002). In addition, there is broad evidence of the impact of arousal on learning (e.g., Kleinsmith and Kaplan, 1963;Eysenck, 1976;Heuer and Reisberg, 2014). The Yerkes-Dodson law (Yerkes and Dodson, 1908) describes optimal arousal in a learning situation following an inverted U-shaped pattern. While learners with little arousal are not engaged enough to really invest in the learning process, too much arousal can cause distractive feelings like anxiety. Thus, a medium level of arousal is optimal for learning. In conclusion, a mediation effect of background music over arousal on learning seems probable, as there seems to be an influence of background music on arousal as well as an impact of arousal on learning.
When considering mood, the arousal-mood-hypothesis defines mood as referring to emotions (Sloboda and Juslin, 2001). Several studies have found background music to influence mood (e.g., Juslin and O'Neill, 2001;Sloboda and Juslin, 2001;Schmidt and Trainor, 2010). Background music leads to different emotions dependent on whether they are composed in a major or minor mode (Husain et al., 2002). Moreover, several theoretical approaches and studies state that mood influences learning (Ilsen, 1984;Pekrun, 2006;Goetz and Hall, 2013;Heuer and Reisberg, 2014;Pekrun et al., 2017). In general, positive mood is associated with better learning outcomes (Isen, 2002) while negative mood or boredom hinders learning (O'Hanlon, 1981;Pekrun, 2006). Based on this, a mediation effect of mood also seems plausible.
To conclude, Husain et al. (2002) state that besides these two mediation effects (mood and arousal mediating the influence of background music on learning) and in contrast to the Mozart effect, music does not directly influence learning. The authors underpinned this statement by referring to a study by Nantais and Schellenberg (1999). In this study participants listened to a Mozart sonata and to a short story and completed a spatial task after each. Participants were also asked if they liked the sonata or the story better. In general, participants performed better after listening to the stimulus (sonata or story) they preferred. Thus, Husain et al. (2002) reasoned that better cognitive performance when listening to background music is due to the exposure to a pleasant stimulus.
In sum, both the Mozart effect and the arousal-moodhypothesis state that listening to background music can foster learning, while the arousal-mood-hypothesis also takes characteristics of the melody into account. A piece of music needs to be in the right tempo and mode to be able to evoke the appropriate arousal and mood in the learner. When investigating arousal and mood evocation, it is not enough to simply measure arousal and mood after learning, but measurements need to be taken before and after learning. Only in this way is it possible to calculate the change in arousal and mood during the learning phase.
Another completely contradictory theoretical perspective describes why background music can also have a negative impact on learning. When learning with background music, the learners have to divide their attention between the learning task and the music. Thus, they have to invest cognitive resources to process the background music in addition to the learning task, as auditive information always gets processed first (Salamé and Baddeley, 1989) and cannot be ignored (Mayer, 2001). Background music is not related to the task, but can attract the learner's attention and therefore can be defined as a seductive detail (Rey, 2012). Such information distracts the learner from the main task, i.e., the learning task, and therefore hinders learning. Hence, it is not surprising that a meta-analysis of the influence of background music that involved many types of music (including different tempi and modes) (Kämpfe et al., 2010) revealed an overall negative impact on learning. Music becomes an unnecessary burden on working memory, which is a crucial point when regarding the limitations of working memory capacity (Miller, 1994;Cowan, 2001).

Working Memory Capacity
The importance of working memory and its capacity in a learning situation is due to the fact that all information within a learning situation (including learning material, learning task, and context factors) needs to be processed within working memory. There is an ongoing debate about the structure of working memory. Baddeley (1986) and Cowan (1999) published probably the two most prominent working memory models. As the experimental group in this study has to deal with visual (reading a text) as well as auditive information (listening to background music) we will especially focus on how this information gets processed according to Baddeley's (1986) and Cowan's (1999) models. Baddeley (1986) assumes working memory to be a system with a hierarchical structure: the central executive controls the two subsystems which are phonological loop and visuospatial sketchpad. He postulates that working memory is separated to long-term memory, even though long-term memory can have an influence on processes within working memory. For example, prior knowledge activated in long-term memory can facilitate the processing and integration of new information in working memory. Due to different independent subsystems, which work in parallel and all involve their own independent capacity, it is easier to process information of different modalities. A visual text is processed with the phonological loop after being recoded through subvocal processes. Background music is phonological information as well as it is presented auditory, and thus might overload the phonological loop. However, there is evidence that musical information gets processed in a slightly different way to verbal auditive information (Salamé and Baddeley, 1989).
Different authors assume an additional, subsystem to be responsible for processing background music, which is partly independent from the phonological loop (Deutsch, 1970;Rowe et al., 1974;Paivio et al., 1975;Rowe, 2013). Referring to this, there is more capacity available while processing music in addition to a visual text as two different subsystems are utilized, compared to the processing of auditive text in addition to a visual text processed in the same subsystem. As such, background music would still interfere with reading, but not as severely as, for example, when verbal auditive information is processed by the same subsystem.
Another approach to working memory was put forward by Cowan (1999) who proposed the embedded-processes model. Working memory in this model is the activated part of longterm memory, without differentiating between the processing of different modalities. Cowan argues, that the similarity of information has an influence on how much information can be processed simultaneously: the less similar the content and modality of the information is, the easier it is to process them simultaneously. Concerning instrumental background music and reading a text at the same time, this would mean that instrumental music would be less disruptive compared to music with lyrics or a classical auditive text because of the added verbal aspect. However, processing background music still relies on the same cognitive capacity, thus, hindering learning. Independent of which model describes working memory better, they both assert that listening to background music while learning requires additional cognitive capacity that could otherwise be invested into the learning process. This is especially important, as working memory capacity is limited.
Working memory capacity can be defined as the number of separate concepts that can be dealt with at the same time in working memory (Cowan, 2012). Cowan (2001) states that 3-4 chunks of information can be stored and manipulated at the same time. A wide variety of studies show an advantage in learning situations for learners with a higher working memory capacity [e.g., Daneman and Carpenter, 1983;King and Just, 1991;Whitney et al., 1991;Rosen and Engle, 1998 (Experiment 1); Alloy and Alloy, 2010]: the more information an individual can deal with simultaneously, the more efficient the learning process. However, listening to background music reduces the available memory capacity for the learning process. How then do background music and working memory capacity interact?
Interaction between Background Music and Working Memory Capacity on Learning Salamé and Baddeley (1989) postulate firstly, that it is impossible not to process auditive information and secondly, that auditive information is always processed first. Thus, only if working memory capacity is high enough do learners have sufficient capacity to invest in the learning task after processing the auditive information. In this case, appropriate background music could be of benefit to learners by influencing their mood and arousal level to an optimal state, thereby fostering the learning process. However, even for those learners melodies should be chosen that only pose a small burden on working memory. Comparing instrumental music with songs with lyrics, it seems plausible that when lyrics are present they would need to be additionally processed. According to Baddeley's (1986) model, these lyrics are auditive texts that burden the phonological loop, leading to a larger decrease in learning performance compared to an instrumental song. The same is true for Cowan's (1999) model, where the lyrics are too similar to the visual text and therefore lead to interferences during learning.
Therefore, when attempting to foster learning for highcapacity learners by improving mood and arousal, one should use a music without lyrics. In this case learners may be able to process the learning material as well as the song. Therefore, sufficient working memory capacity may compensate for the additional cognitive burden, so that the potential positive effect of the music may benefit the learner. This is comparable to the ability-ascompensator effect (Mayer and Sims, 1994), where a learner's ability (in this study: sufficient working memory capacity), is required to deal with a specific element of the instructional design (in this study: Background music).
When learners with low working memory capacity have to process background music there is not enough capacity left to invest in the learning task. Even if the learners were in a perfect learning condition concerning arousal and mood, they would not be able to learn as they simply would not be able to process the information in the learning material in addition to the music.
To our knowledge, there is no empirical evidence of the interaction between background music and working memory capacity on learning outcomes which could support these theoretical assumptions. As we defined background music as a seductive detail, we argue that research on other seductive details in interaction with working memory capacity might be transferrable. Sanchez and Wiley (2006) found, that learners with low working memory capacity were hindered in their learning if learning materials included seductive pictures in addition to the text. Interestingly, learners with higher working memory capacity were not affected by these pictures, however, their performance did not increased either. As the pictures used in Sanchez and Wiley's (2006) experiment were normed to not influence arousal or mood as our experiment does, this result is not contradictory to our assumptions. A study by Fenesi et al. (2016) found similar results: Learners with low working memory capacity perform worse when presented with irrelevant pictures in addition to learning material.
The cut-off between a working memory capacity that is "too small" and "high enough" depends on the characteristics of the learning material. Highly complex or poorly designed learning tasks burden working memory capacity more than content which is less complex or better designed (Sweller, 2010;Sweller et al., 2011). This indicated that background music should only be considered when the learning material itself is not too demanding. A similar effect is was found in a study by Park et al. (2011) where pictures were used as a seductive detail. The researchers varied the complexity of the main task and found that pictures hindered learning less when the main task was not very demanding, whereas the seductive details effect was revealed with highly demanding tasks.

Learning Outcomes
Besides the complexity of the learning material, the level of learning outcomes could also play an important role. So far, we have discussed learning outcomes in general. However, one can differentiate between different levels of learning outcomes, like recall or comprehension (e.g., Bloom, 1956). For exams it is typically necessary to remember and understand the learning content. Thus, the post-test of this study differentiates between both of these learning outcomes. To our knowledge no studies as yet differentiate between the influence of background music on recall and comprehension, so we can only establish assumptions on a theoretical basis and turn to results of comparable studies for comparisons. As cited above, in a study by Park et al. (2011) the seductive detail effect depended on task difficulty with easy tasks not affected by seductive details. Transferring these results to learning with background music and to different levels of learning outcomes, i.e., recall and comprehension, one would expect background music to influence comprehension outcomes but not recall. Easier recall tasks are a smaller burden in working memory so that a learner may be able to process background music simultaneously. In addition, working memory capacity does not play an important role, as the learner does not need a high capacity. This is also why also the interaction between both factors should not influence recall performance.
However, comprehension tasks are more demanding and are bigger cognitive burdens. In this case, background music should affect comprehension outcomes, as well as working memory capacity. Moreover, we should witness an interaction between both factors in the way described above.

Research Questions and Hypothesis
To sum up, the influence of background music on learning is not clear: while the Mozart effect (Rauscher et al., 1993) implies a direct, positive effect, the arousal-mood-hypothesis (Husain et al., 2002) postulates a mediation effect over arousal and mood. Furthermore, the seductive detail effect indicates that background music has a direct negative effect on learning. In addition, the level of learning outcomes could also play an important role. On this basis, we pose the following research questions: Does listening to background music influence learning directly or is this association mediated by arousal or mood? And which role does the learner's working memory capacity have and how does it interact with background music?
All three theoretical assumptions (Mozart effect, arousalmood-hypothesis and seductive detail effect) have theoretical and empirical justifications. As we are the first to compare all three of these, we formulate the following in parts competing hypotheses: Background music does not influence recall (H1.1), but comprehension (H1.2): H1.2a: Due to the Mozart effect, comprehension will be influenced positively and directly by background music. H1.2b: Due to the arousal-mood-hypothesis, we hypothesis that arousal and mood will be related to music and learning outcomes. As we chose music that was intended to induce positive mood and learning enhancing arousal, we expect background music to influence mood positively, thus fostering comprehension. Secondly, we expect that background music to have a positive impact on arousal, with arousal improving comprehension. H1.2c: On the basis of the seductive detail effect, we hypothesize that there will be a direct negative influence of background music on comprehension.
Several studies cited above found better learning outcomes for learners with higher working memory capacity. As we think that a higher working memory capacity is only necessary for more demanding tasks, we hypothesize that there will be no main effect of working memory capacity on (H2.1) recall but on (H2.2) comprehension, with better comprehension scores recorded for learners with higher working memory capacity.
There is a lack of research investigating the interaction between listening to background music and working memory capacity. Theoretically, we assume that learners with low working memory capacity will be overburdened by processing both the learning material and the background music. Nevertheless, learners with sufficiently high working memory capacity could benefit from the potential positive effect of background music which compensates for the additional cognitive burden (see Mayer, 2001). However, this should only be relevant for comprehension tasks which are highly demanding. Based on these theoretical assumptions and the results of transferrable studies, we hypothesize that there will be (H3.1) no interaction effect between background music and working memory capacity on recall. However, we hypothesis that (H3.2) this interaction effect will be present in the case of comprehension. More specifically, we hypothesise that there will be (H3.2a) better comprehension outcomes for learners with low working memory scores while not listening to background music. Learners with high working memory capacity, (H3.2b) will have better comprehension outcomes when listening to background music while learning.

Subjects and Design
Data was collected from 86 university students aged between 16 and 50 years (M age = 21.37, SD age = 4.19), including 71 (82.6%) females. Due to their very poor test performance, five participants were defined as outliers (e.g., Barnett and Lewis, 1994). We compared all post-test scores to the predefined criteria of 20% of the possible post-test score. As these five participants reached less than 15% of the post-test score, we assume that they were not engaged enough in the learning process and we excluded their data. Hence, data from 81 participants (M age = 21.46, SD age = 4.30, 81.5% females) were included in further analysis.
Participants were randomly assigned to one experimental group (between-subject factor: Background music -present or absent). Working memory capacity was included in the design as an organism variable, also considered as an independent variable. As dependent variables, we measured recall and comprehension as indicators for learning performance. In addition, we measured mood and arousal as potential mediating variables. Moreover, we considered prior knowledge, musical experience, age and gender as potential covariates.

Materials and Measures
All materials besides the background music and the instruction to learn were in paper-pencil form. Due to our materials, there was no ethics approval needed for this study.
The learning material consisted of a visual text about time and date differences on earth that was 1070 words long. It was adapted from a study of Schnotz and Bannert (1999). The adapted version of the learning materials has successfully been used in another study by Lehmann et al. (2016). The text includes information about the concept of time and time zones as well as a table that shows exemplary time differences between different cities around the world. Learning time was limited to 7 min and 30 s. To accompany the text a test to measure prior knowledge was created. It consisted of six open-ended questions (e.g., "What are time zones?"). Answers were compared to predefined solutions. Learning outcomes were measured using five openended recall questions (e.g., "According to which principle were the time zones classified?") and five open-ended comprehension questions (e.g., "What time is it in Frankfurt, when it is 2 pm in Mexico City?"). Answers were again compared to a predefined solution.
As background music, we used two different common German songs: "Auf uns" by Andreas Bourani and "Nur ein Wort" by Wir sind Helden, both in the instrumental version. Both songs were chosen to induce positive mood. According to Thompson et al.'s (2011) results, we chose two songs with a fast tempo and presented them at a medium volume (30%) to not disturb the participants too much. The songs were presented through over-ear headphones. The two songs were played between the recorded instructions to start and stop reading. To not induce any motivational effects, participants in the control group also wore headphones but only heard the instructions to start and stop reading.
Working memory capacity was measured with the computerbased Numerical Memory Updating Test (Oberauer et al., 2000). Digits that are shown in a spatial matrix for seconds have to be stored and processed by simple additions and subtractions. The resultant capacity scores indicate how many of the nine matrix fields learners can process simultaneously.
Arousal was measured before and after learning with the subscale of the Self-Assessment Manikin (Bradley and Lang, 1994). This questionnaire measures arousal with a 9-point Likert-Scale ranging from 1 = "highly aroused" to 9 = "not at all aroused, " which is illustrated by a pictorial representation of a stick figure with more or less arousal indicated by a bigger or smaller explosion in its belly.
To measure mood before and after learning, we used a short version of the Multidimensional Mood State Questionnaire (Steyer et al., 2004). The questionnaire consisted of 14 emotions grouped into 3 subscales: good-bad-mood (angry, happy, joyful, satisfied, unhappy, and well), awake-tired (awake, lively, rested, and tired), and calm-nervous (balanced, nervous, relaxed, and restless). Participants scored each emotion according to the question "Please score how you feel at the moment." The answer format was a 7-point Likert-Scale ranging from 1 = "completely true" to 7 = "not true." A positive score in a subscale denotes positive emotions (being in a good mood, awake, and calm), a negative score indicates negative emotions (being in a bad mood, tired, and nervous). To calculate the influence of the learning phase on emotions, we subtracted mood values before learning from values after learning. Thus, a positive value in our study symbolizes an increase in positive emotions (good mood, awake, and calm) whilst a negative value indicates an increase in negative emotions (bad mood, tired, and nervous).
In addition, we used a demographic questionnaire to assess each learner's age, gender and study subject. The questionnaire also included questions concerning the musical expertise of our participants: Did they have experience of singing in a choir and if so, for how many years? Did they have experience playing an instrument and if so, for how many years? Moreover, we asked participants to score how musical they would assess themselves to be on a 7-point Likert-scale. Furthermore, after the learning phase, we asked the participants in the condition with background music if they were familiar with the song they had listened to.

Procedure
Data collection took place in group sessions. First, participants were asked to formally agree to participate in the experiment and the involved data collection by signing the informed consent form. This informed the participants about the duration and tasks involved in the experiment, that data will be used anonymously, the possibility to ask questions during the data collection and to withdraw their participation at any time. All participants who agreed to the data collection then completed the demographic questionnaire, two pre-tests for arousal and mood as well as a test of prior knowledge. Following this, the learning phase took place: Participants were asked to put on the headphones and to start their track, consisting of either the instructions to start and stop learning or the same instructions but with the two songs played in between. After the learning phase, participants completed the arousal and mood questionnaires again. The post-test then took place. The whole data collection took approximately 45 min.

Covariates
To identify potential covariates, we checked whether prior knowledge, age and gender were equally distributed between the conditions. As we did not find any significant differences (all ps > 0.35), we did not include any covariates in further analyses.
Moreover, we analyzed whether musical experience (experience singing or playing an instrument) or familiarity with the songs influenced recall or comprehension. We did not find any significant differences between the groups (all ps > 0.35). Thus, musical experience and familiarity with the songs were not considered further.

Descriptive Data
Descriptive data concerning all dependent variables in all conditions can be found in Table 1.

Potential Mediators
To analyze whether background music influences learning outcomes indirectly mediated through mood or arousal, a first step is to analyze whether background music influences mood or arousal directly. If so, we will then analyze whether these variables influence learning outcomes significantly (for a theoretical approach concerning mediator analyses, see Baron and Kenny, 1986).

Arousal
Listening to background music did not influence the difference in arousal before and after learning, F < 1, ns. The prerequisites for a mediation were not reached in this case.

Mood
Background music did not influence the differences in moods before and after learning in the good-bad mood subscale or in the awake-tired subscale, Fs < 1, ns, nor the calm-nervous subscale, F(1,77) = 1.04, ns, η 2 = 0.01. Again, the prerequisites for a mediation were not reached.
The interaction between background music and working memory capacity was significant, F(3,73) = 3.22, p < 0.028, η 2 = 0.12 (see Figure 1). Planned post hoc contrast compared comprehension scores within the same working memory score and between the experimental groups. We found higher comprehension scores for participants with the lowest working memory score of 2 in the group with no music compared to

DISCUSSION
The aim of this study was firstly, to examine whether background music has a direct effect on learning outcomes or whether this influence is mediated by arousal and mood. Secondly, we wanted to investigate whether the influence background music has on learning outcomes could be positive, for instance when listening to a song with specific facilitative characteristics, or whether, following the seductive detail assumption, a cognitive burden would always be present. Finally, we wanted to examine which role the learner's working memory capacity or its interaction with background music has in, speaking about learning outcomes. Results will be discussed referring to these research questions.

Mediation Effect or Direct Influence of Background Music?
To investigate whether there is a mediation effect of background music through arousal and mood on learning, we first calculated differences in arousal and mood before and after learning. As a second step, we tested whether these scores were different between the groups with or without background music during the learning phase. As there were no significant differences between the conditions, we inferred that in this study background music did not affect arousal or mood. This is contradictory to the results of previous studies (e.g., Nantais and Schellenberg, 1999;Juslin and O'Neill, 2001;Sloboda and Juslin, 2001;Husain et al., 2002;Pelletier, 2004;Schmidt and Trainor, 2010). We provide three possible explanations for these contradictory results: Firstly, the time span during which the participants were exposed to the music might have been too short to have had an impact. Learning phases in everyday life are usually much longer than in our experiment and learners may normally be exposed to music for longer periods. It might be the case, that it is necessary to listen to music for a longer time period to affect arousal or mood.
Secondly, the measurement tool might not have been sensitive enough to measure small changes in mood or arousal. The Likert scales used in this experiment consisted of seven and nine gradations of mood and arousal, respectively. Thus, in between two adjacent scale responses (e.g., between a 4 or 5) there is a 14% differences in variance in the mood scale and 11% in the arousal scale. If the influence of listening to background music was smaller than this, the measurement tool would simply not be able to account for the differences. A possible alternative approach would be to use a continuous scale. In addition, arousal could also be measured objectively with physiological data, such as heart rate, blood pressure or skin conductance.
Thirdly, contradictory to both recent explanations, it might be the case that the specific background music we used simply does not influence arousal or mood in a learning scenario such as ours. The two songs were picked based on the results of earlier studies concerning song characteristics. We chose fast paced songs to induce arousal and played them at a medium volume in line with Thompson et al.'s (2011) findings. Moreover, we used songs with a positive sounding melody which have positive lyrics in their original version. Nevertheless, it could be the case that these characteristics did not fit our sample in terms of music taste. For example, if a section of our sample did enjoy the music genre whilst the others did not the positive and negative effects may cancel each other out. This idea is supported by the rather high standard deviations in the scales, as well as the different high scores between the different levels of working memory capacity, see Table 1. Moreover, contradictory to Thompson et al.'s (2011) findings Hallam et al. (2002 found that fast music negatively influenced learning outcomes. This contradiction emphasizes how important it is to control for learners' characteristics in studies and, in addition, to be precise with the description of the musical stimuli, so that "fast music" is understood in replicable terms in all studies.
In summary, we were not able to confirm the arousal-moodhypothesis, as background music did not affect arousal or mood in our study. However, besides arousal and mood, there are other learners' characteristics which could potentially be mediators not tested in this study, such as learner motivation. Anyway, did background music have a direct, positive or negative influence on learning outcomes in this study?
Concerning recall, background music did not influence performance, confirming our hypothesis. Therefore, the potential positive effect on cognitive abilities postulated by Rauscher et al. (1993) and the seductive detail effect (Rey, 2012) either do not benefit the learner or indeed cancel each other out. As recall tasks only place as small burden on working memory, there is still enough capacity left after processing background music. A study by Brünken et al. (2004) supports this idea as they did not find an influence of listening to background music on cognitive load while completing a simple recall task. Thus, background music did not influence recall negatively. We believe that there is neither a positive, nor a negative impact on recall and no compensation effect. However, if one would like to affect recall through music, some success has been found by using jingles to improve recall for short verbal sequences (e.g., Yalch, 1991;VanVoorhis, 2002).
When considering comprehension, learners reached higher levels of learning with no background music. This result lends support to our seductive detail hypothesis (1.2c): As background music is always processed first (Salamé and Baddeley, 1989) there is not enough capacity left to work on cognitively demanding comprehension tasks. In conclusion, this was the only association which we found between background music and learning outcomes, direct or indirect. This indicates that besides the arousal-mood-hypothesis, the Mozart effect hypothesis also needs to be rejected. In this study, background music functioned as a seductive detail for more demanding learning processes such as comprehension.
A further point which needs to be considered is that the songs we used were instrumental versions of popular songs with lyrics. Even though we did not present the lyrics they may have been activated by the melody as an anchor (see for example, Bartlett and Snelus, 1980;Wallace, 1994). On the one hand, the activated lyrics interfere with the text the participants have to learn in working memory, as participants would have to deal with both simultaneously. On the other hand, participants would need less effort to process the melody, as familiar information is easier to process than unfamiliar information (Hulme et al., 1991). Taken together, the negative and positive effects may cancel each other out and may explain why in our study, we did not find any influence of learners' familiarity with the songs on learning outcomes.

Working Memory Capacity
Answering our second research question, working memory capacity did not influence recall performance As in the explanation above, recall tasks do not demand much cognitive capacity and because of this, all learners should be able to process the relevant content, independent of their working memory capacity. However, comprehension tasks require more cognitive capacity. Hence, in support of our hypothesis, learners with higher working memory capacity reached higher comprehension scores as they are able to process more units of information simultaneously allowing them to better understand the test.

Interaction between Background Music and Working Memory Capacity
The last research question concerned the interaction between background music and learners' working memory capacities. In the case of the recall tasks, neither background music nor working memory capacity played a crucial role. Even learners with little capacity should be able to process background music in addition. Indeed, we found conformation of our hypothesis that the interaction between both factors did not influence recall performance.
In the case of comprehension, however, we found a significant interaction between listening to background music and working memory capacity. The only significant and relevant contrast occurred in the learners with the lowest working memory capacity who reached higher comprehension scores without background music. As their working memory capacity is highly limited, they are simply not able to process a comprehension tasks and background music simultaneously. For all of the other capacity levels we did not find such a difference or indeed, any advantages when learning with music. This finding is also in keeping with the seductive detail assumption and comparable to the ability-as-compensator effect (Mayer and Sims, 1994).
In line with this result, we found a linear trend in the group which learned with background music. The higher a learner's working memory capacity, the better they learn with background music. Whilst processing the music, they still have enough capacity left for the main learning task. We found a quadratic trend when analyzing the group without background music. As expected, learners with medium working memory capacity performed worse than those with high working memory capacity scores. Unexpectedly, learners with low working memory capacity scores outperformed the medium capacity groups and their results matched that of the high-capacity group. We expected a better performance with increasing capacity. However, Zander (2010) found that some learners may not constantly invest all of their capacities in the learning process, so that learners with beneficial learning characteristics do not necessarily outperform those learners with poor skills. In this context we also need to point out that our sample for the extreme group analysis was rather small. Therefore, effects might also have been attributed to other variables such as motivation or situational interest, which might be unequally distributed and were not controlled for.

Limitations and Further Research
As in all studies involving music, these results are not simply transferable to learning with other songs. If at all, one would expect similar results when using songs with the same characteristics, such as tempo or mode. The background music in this study did not influence arousal or mood as expected. It is therefore important that a learner's attitude concerning the presented music need to be taken into account. Further research need to investigate whether one would reach the same results while testing participants with different characteristics. Furthermore, the direct negative influence of background music needs further investigation. Even though we found evidence of a seductive detail effect, this result needs to be validated by measuring cognitive load after learning with and without background music, and differentiated for all three types of load during solving recall and comprehension tasks. For this, one could use the cognitive load questionnaire developed by Leppink et al. (2013). Furthermore, it would be interesting to assess how exactly background music impacts learning on a cognitive basis: For example, the question of how exactly background music is processed is still an open one.
Moreover, as mentioned above, we recommend using a more sensitive measuring tool than we did. Our tools were not able to detect small variations in either arousal or mood. We would suggest using continual instruments to pick up on subtle chances in variance.
In addition, working memory capacity is also discussed as being relevant in the context of creativity (e.g., Jalil, 2007;Vandervert et al., 2007;Sharma and Babu, 2017). Therefore, it might be interesting for further research to consider creativity as another aptitude variable in the context of learning with background music. For example, we could imagine that highly creative learners may especially benefit from listening to background music while learning. Moreover, it could also be relevant to measure the impact of the interaction between background music and working memory capacity on creative learning tasks.

Practical Implications
Based on the results of this study, we cannot recommend learning with background music. Learners with the lowest capacity levels were especially impaired by background music. With increasing working memory capacity background music neither hindered nor fostered learning. For these learners it is merely a matter of personal preference as to whether they wish to learn with background music or not, for example in an attempt to raise their motivation levels. However, learners should be careful with their decision as to which music they chose to listen to: Song with lyrics are potentially more distracting than instrumental melodies and music with other modes or tempos could possibly evoke obstructive emotions for learning. Luckily, there is enough music readily available, so that each of us has the chance to listen to our preferred music, which may even be conducive to learning.

ETHICS STATEMENT
Our study is about learning with or without background music. There was no potential to harm or endanger any participants. Moreover, we did not collect any sensitive data. Hence, there was no offical ethics approval needed.

AUTHOR CONTRIBUTIONS
JL designed and conducted this study and wrote this manuscriptall under supervision of TS.

ACKNOWLEDGMENT
We would like to thank Wolfgang Schnotz and Maria Bannert for letting us use an adopted version of their learning material.