A Functional MRI Study of Happy and Sad Emotions in Music with and without Lyrics

Musical emotions, such as happiness and sadness, have been investigated using instrumental music devoid of linguistic content. However, pop and rock, the most common musical genres, utilize lyrics for conveying emotions. Using participants’ self-selected musical excerpts, we studied their behavior and brain responses to elucidate how lyrics interact with musical emotion processing, as reflected by emotion recognition and activation of limbic areas involved in affective experience. We extracted samples from subjects’ selections of sad and happy pieces and sorted them according to the presence of lyrics. Acoustic feature analysis showed that music with lyrics differed from music without lyrics in spectral centroid, a feature related to perceptual brightness, whereas sad music with lyrics did not diverge from happy music without lyrics, indicating the role of other factors in emotion classification. Behavioral ratings revealed that happy music without lyrics induced stronger positive emotions than happy music with lyrics. We also acquired functional magnetic resonance imaging data while subjects performed affective tasks regarding the music. First, using ecological and acoustically variable stimuli, we broadened previous findings about the brain processing of musical emotions and of songs versus instrumental music. Additionally, contrasts between sad music with versus without lyrics recruited the parahippocampal gyrus, the amygdala, the claustrum, the putamen, the precentral gyrus, the medial and inferior frontal gyri (including Broca’s area), and the auditory cortex, while the reverse contrast produced no activations. Happy music without lyrics activated structures of the limbic system and the right pars opercularis of the inferior frontal gyrus, whereas auditory regions alone responded to happy music with lyrics. These findings point to the role of acoustic cues for the experience of happiness in music and to the importance of lyrics for sad musical emotions.


INTRODUCTION
Music has sometimes been characterized as a language of emotions (e.g., Åhlberg, 1994). Listeners are able to recognize a few basic emotions expressed by music, particularly happiness and sadness (Krumhansl, 1997;Peretz et al., 1998;Altenmüller et al., 2002;Khalfa et al., 2002;Juslin and Laukka, 2004; for reviews, see Juslin andVästfjäll, 2008 andNieminen et al., 2011). Some have claimed that emotions in music do not correspond to those induced by life events because basic emotions in music are subtler, do not exactly coincide with those triggered by prototypical life events (loss, threat, etc.;Ekman, 1999), and lack stereotypical action tendencies (like running or freezing for fear; Krumhansl, 1997;Scherer, 2004;Zentner et al., 2008). However, emotional recognition in music is a common and almost automatic process that occurs after the presentation of a 500-ms musical excerpt (Peretz et al., 1998). It is also observable in children as young as 3 years of age (Dalla Bella et al., 2001; for a review, see Nieminen et al., 2011) as well as in listeners completely unfamiliar with the musical system in which those emotions are expressed (Fritz et al., 2009). The basis for such a powerful universal reaction, especially to sad and happy emotions in music, is likely rooted in the acoustic features of music. It has repeatedly been found that happy music is characterized by fast tempo and major mode, whereas sad music is typically played in slow tempo and minor mode (Peretz et al., 1998;Dalla Bella et al., 2001;Pallesen et al., 2005). In a recent study by Laurier (2009), 116 listeners rated 110 excerpts from film soundtracks from which 200 audio feature values were extracted. Pieces rated by listeners as happy were characterized by major mode and faster onsets, whereas sad and tender pieces were in minor mode and had longer onsets (for similar results with productive methods, see Friberg et al., 2006). Additionally, performers use specific features to convey emotions while playing: sad emotions are typically expressed by soft dynamics, legato articulation, and soft tempo, but happy, positive www.frontiersin.org connotations of music are conveyed by staccato articulation and louder intensities (Juslin, 2000;Patel, 2008).
Only a few studies have explored the brain correlates of basic emotions in music. The first pioneer study using functional magnetic resonance imaging (fMRI) by Khalfa et al. (2005) chose a controlled manipulation of two musical features (tempo and mode) to vary the happy or sad emotional connotations of 34 instrumental pieces of classical music, lasting 10 s each. Sad pieces in minor-mode contrasted with happy pieces in major mode produced activations in the left medial frontal gyrus (BA 10) and the adjacent superior frontal gyrus (BA 9). These regions have been associated with emotional experiences, introspection, and self-referential evaluation (Jacobsen et al., 2006;Kornysheva et al., 2010). Nevertheless, with a conservative statistical threshold, major pieces did not generate any significant brain activity when contrasted with minor pieces (Khalfa et al., 2005). Mitterschiffthaler et al. (2007) acquired fMRI images while subjects listened to a selection of 20 sad, happy, and emotionally neutral pieces lasting 30 s each. Contrary to Khalfa et al. (2005), only the direct contrast between happy pieces minus sad pieces obtained significant brain activation in the left superior temporal gyrus (BA 22). In addition, Mitterschiffthaler et al. (2007) compared responses to happy music with responses to neutral music and found activation of the parahippocampal gyrus, precuneus, the ventral striatum, and the caudate nucleus; the two latter structures are associated with the subjective experience of pleasure and reward, physiological arousal, and the drive to move (Blood and Zatorre, 2001;Haber and Brucker, 2009;Haber and Calzavara, 2009;Salimpoor et al., 2011). Conversely, sad music contrasted with neutral music activated the hippocampus and amygdala, consistent with the role of these structures in negative emotion perception (cf. also Gosselin et al., 2007), and the cerebellum. Other structures recruited by both sad and happy music contrasted with neutral music were the posterior cingulate cortex and the medial frontal gyrus (BA 6), related to the introspective experience of emotions, self-control, and attentive behavior (Koelsch, 2010). The bilateral primary auditory cortex was also activated during listening to emotional music (contrasted to neutral music), reflecting the importance of acoustic features for the attribution of affective connotations in music (cf. Patterson et al., 2002;Schneider et al., 2002).
Until now, neuroimaging studies investigating emotional responses to music have focused solely on classical instrumental music (for reviews, see Koelsch, 2010;Brattico and Pearce, forthcoming). The majority of behavioral studies of music-induced emotions also utilized instrumental music, though derived from a larger variety of genres (see, however, Lundqvist et al., 2009, where pop music with lyrics was used to evoke emotional responses in listeners although the presence of lyrics did not constitute an experimental variable). However, people worldwide often listen to pop and rock music, containing lyrics or vocal parts (Nettle, 1983;Music and Copyright, 2010). The message in songs is carried both by the melodic and the linguistic channels. Only very recently, neuroscientists have begun to determine the underlying neural networks governing song perception and how they are distinct from the neural networks processing speech alone or music alone. Convergent findings indicate that song perception does not require a dedicated neural network but rather a blend of brain structures associated with musical sound and phonological processing; these include left temporo-frontal regions, more involved in language processing, and right temporo-frontal regions, associated more with music processing Sammler et al., 2010;Schön et al., 2010). Nonetheless, it is not known how lyrics affect the brain processing of emotions in music.
Few behavioral and computational studies have shown that basic emotion recognition in music is affected by the presence of lyrics, and these studies have had contradictory findings. In Laurier et al. (2008) and Cho and Lee (2006), emotion recognition accuracy was improved by including lyrics information in algorithms for automatic classification of happy and sad musical emotions, whereas for angry or violent emotions, the algorithm did not improve classification substantially. A very recent computational study further showed that the emotion itself determines whether or not lyrics have a role in automatic musical mood classification (Hu et al., 2010): compared to the audio set alone, an audio feature set in combination with lyrics produced higher hits in automatic mood classification for selective negative emotions. The opposite effect was found in the classification of positive emotions: the audio set by itself elicited more accurate classification of positive emotions in comparison to the acoustic combined with the semantic set. In another behavioral study (Ali and Peynircioglu, 2006), unfamiliar classical and jazz instrumental melodies, representing four discrete emotions, were either paired with lyrics of pop songs adapted to match with the melodies or played alone. Listeners rated happy and calm music without lyrics as more intensely representing positive emotions than music containing lyrics with semantic content congruent to the musical emotion (Ali and Peynircioglu, 2006). Conversely, the opposite effect was obtained for sad music: emotionally congruent lyrics contributed to the intensity of negatively perceived emotions compared to instrumental music alone.
With the present study, we wished to contribute to the growing literature investigating the brain structures responsible for the processing of music with or without linguistic content. We hypothesized that songs with lyrics, in contrast to instrumental music, would activate the left fronto-temporal language network, whereas music without lyrics would recruit right-hemispheric brain structures. Second, we wanted to generalize the identification by Khalfa et al. (2005) and Mitterschiffthaler et al. (2007) of neural correlates of sadness and happiness in classical instrumental music with a larger more ecological musical selection, including pieces from a variety of genres and timbres. In line with evidence from neuroimaging studies of hemispheric specialization for spectro-temporal processing (Zatorre et al., 2002), we also expected to observe the activation of left-hemispheric auditory areas by happy music (richer in fast spectral transitions) and of the right-hemispheric areas by sad music (most likely containing slower attacks and tempos). Third, and most importantly, we investigated the role of lyrics in modulating the neural processing of basic emotions expressed by music. Our rationale derives from a set of observations: 1. The majority of music listened to in the world consists of pop/rock songs, containing lyrics. 2. One of the basic motivations for listening to (rock/pop) music lies in its power to induce emotions (Laukka, 2007;McDonald and Frontiers in Psychology | Auditory Cognitive Neuroscience Stewart, 2008). 3. The neural correlates of musical emotions have been investigated so far mainly by using instrumental music of the classical genre, completely disregarding the putative importance of lyrics in a musical emotional experience. To enhance the understanding of neural mechanisms operating during the processing of basic musical emotions, we wished to determine whether music containing lyrics and instrumental music alone evoke similar emotions and activate comparable brain structures. Based on previous behavioral literature, we hypothesized that the activation of the limbic system associated with emotion processing in music would be affected by the presence of lyrics but in a non-linear way dependent upon the actual emotional content. More specifically, we predicted that sad music with lyrics would recruit emotionrelated brain areas when compared with sad instrumental music; in contrast, happy instrumental music would be more efficient in inducing and expressing emotions, as reflected by the activation of the limbic system, than happy music with lyrics.
In order to enhance the subjective experience of musical emotions, subjects were allowed to bring their own music selection to the lab. The stimuli needed for our aims were hence obtained by selecting a subset of the subjects' musical excerpts that did or did not include intelligible lyrics. We chose this approach as a tradeoff between careful control of stimulus manipulation and statistical power on one hand, and ecological validity of the results and optimization of emotional induction on the other hand.

SUBJECTS
Fifteen healthy subjects (with no neurological, hearing, or psychological problems) chosen without regard to musical training participated in the study (mean age: 23.9 ± 2.9 SD; six females; seven subjects had played an instrument, on average, for 14 ± 5.1 years).

Prior to the experiment
The present study is part of a large-scale project aimed at revealing the neural structures involved in the processing of musical emotions. The project includes fMRI measurements, behavioral ratings, acoustic analyses, and questionnaire surveys. Due to the magnitude of data obtained, the findings will be presented in several publications (Saarikallio et al., submitted for the listening test; Brattico et al., in preparation for the full set of fMRI data). To comply with the aims of the current study, we included here only those subjects from whom we could obtain an acceptable number of lyrical and instrumental musical excerpts (>8 per each stimulus category), and we focused on the basic emotions of sadness and happiness in music, as they are the most studied in the literature. The study procedures were approved by the ethical committee of the Helsinki University Central Hospital and complied with the Helsinki Declaration.
Prior to the listening test, subjects were asked to provide us with 16 comparably familiar music pieces: four sad and four happy pieces from favorite music, and four sad and four happy pieces from disliked or even hated music. We instructed subjects to bring to the lab pieces from different musical genres, e.g., popular, classical, folk, electronic, and atonal music, with the goal of increasing acoustic variability and avoiding the possible confound of emotional responses tied to specific acoustic features (for instance, we wanted to avoid a subject bringing only piano pieces representing sad emotions and only percussive Latin American music representing happy emotions). Indeed, subjects were able to select pieces from at least four different genres each, although a majority were pop/rock songs (>60%), as expected from the diffusion of this musical genre among young subjects (cf., for instance, Music and Copyright, 2010).
Four excerpts (18 s each) with 500-ms fade-ins and fade-outs were created from each music piece with Adobe Audition. Since over 60% of the pieces brought by subjects were pop/rock songs, excerpts were selected such that they represented the main themes included in the music. For instance, for songs with lyrics, we chose the introductory instrumental part, the refrain, one of the strophes, and, when suitable, the modulated refrain. The aim was to identify the excerpts to which subjects would best respond emotionally and to which they would be most familiar. Thus, altogether 64 excerpts were cut from the music selection of each participant. The excerpts were normalized to a matched loudness level as measured by the root mean square (RMS). The music excerpts were presented binaurally via headphones with the Presentation software (Neurobehavioral Systems, Ltd.).

Listening test
The listening test took place at the University of Helsinki and was approved by the local ethical committee. Each subject performed the test individually. Beforehand, the participants filled in a consent form and a questionnaire concerning their musical background and behavior. Subsequently, the 18-s music excerpts were delivered with the Presentation software (Neurobehavioral Systems, Ltd.) to the subjects binaurally via headphones at 50 dB above their individually determined hearing thresholds. After listening to each musical excerpt, subjects pressed a number from 1 to 5 on a keyboard to rate it according to six 5-step bipolar scales: unfamiliar-familiar, sad-happy, feeling sad-feeling happy, disliked-liked, unpleasant-pleasant, and ugly-beautiful. Thus, behavioral ratings were acquired for each musical excerpt. The listening test lasted around one and a half hours in total.

fMRI measurements
The fMRI measurements were conducted with the 3-T scanner (3.0 T Signa VH/I General Electric) in the advanced magnetic imaging (AMI) Centre at Helsinki University of Technology and were approved by the Koordinoiva ethical committee of the Helsinki University Central Hospital. Participants were placed on the scanner bed in a supine position. To prevent postural adjustments and to attenuate the noise and vibration of the scanner, foam cushions were placed around the arms of the participants. Music was presented through audio headphones with approximately 30 dB of gradient noise attenuation. Thirty-three oblique slices covering the whole brain (field of view 20 mm; 64 × 64 matrix; thickness 4 mm; spacing 0 mm) were acquired using an interleaved gradient echo-planar imaging (EPI) sequence (TR 3 s; echo time 32 ms; flip angle 90˚) sensitive to blood oxygenation level-dependent (BOLD) contrast(s). Before the fMRI measurement, volunteers were informed about the study protocol, signed a written consent form, filled in a safety questionnaire, and were required to www.frontiersin.org remove any ferromagnetic material before entering the magnet bore. Participants were encouraged to relax in the magnet bore while concentrating on the musical stimuli. After the experiment, the subjects received two movie theater tickets to compensate for their inconvenience.
During the fMRI measurement, participants listened to 18-s excerpts of music selected on the basis of the previously conducted listening test. In detail, from the four excerpts for each of the 16 pieces of music brought to the lab by the subjects, the two excerpts obtaining the highest scores in emotional and familiarity ratings were fed to the stimulation computer and delivered to the subjects by the Presentation software via high-fidelity MRcompatible headphones. In total, each subject was presented with 32 musical excerpts. The sound level was adjusted to be comfortable at an energy level of around 80 dB. In the scanner, the subjects performed one of two tasks, preceded by a visual cue ("Like? Dislike?", in Finnish: "Pidän? En pidä?"; or "Sad? Happy?", in Finnish: "Surullinen? Iloinen?"). The purpose of the tasks was to keep subjects' attention on the music and to force them to concentrate on the emotional aspects of the stimuli. Three test trials were presented to the subjects prior to the main session. The text with the visual cue was displayed for the duration of the stimulus and served as a fixation point. At the end of the 18-s stimulus, another cue asked the subjects to answer (in Finnish:"Vastaa nyt"). To answer, subjects pressed MR-compatible button pads with the second and third fingers of the left or right hand (counterbalanced between subjects). After a 3-s interval without any stimulus, a sinusoidal tone indicated the start of the next trial. The fMRI session lasted about 23 min. Subsequent to a short break, anatomical T1 weighted MR images (field of view 26 mm; 256 × 256 matrix; thickness 1 mm; spacing 0 mm) were also acquired in about 9 min.

Stimulus features
For all the musical stimuli, the two low-level acoustic features of attack slope and spectral centroid were computationally extracted in the MATLAB environment by means of the MIRToolbox (Lartillot and Toiviainen, 2007). Attack slope indicates the sharpness of the attack phase of musical events. For instance, percussive, struck, and plucked instruments tend to have higher attack slope. It was calculated over the entire 18-s musical excerpt according to the specifications suggested in Peeters (2004), and the mean value was taken as the representative value for the entire excerpt. Spectral centroid gives an estimate of perceptual brightness and the balance between the high-and low-frequency content of the signal (Alluri and Toiviainen, 2010). Each 18-s stimulus was subjected to a frame-by-frame analysis with a frame length of 3 s and hop factor of 0.1. The feature space consisted of the means of each feature across all frames during the 18-s stimulus.
In addition, for comparison with previous neuroimaging studies on music and emotions, we analyzed the tempo and mode of the stimuli perceptually. The tempo of each stimulus was rated by a volunteer expert judge on a 5-point Likert scale varying from very slow, indicated by 1, to very fast, indicated by 5. Another volunteer music expert judged whether the 18-s music excerpt was mainly in the major or minor mode. The values of the acoustic analysis and the perceptual ratings were subjected to statistical analysis with Friedman's rank test. Pair-wise comparisons were carried out with the Wilcoxon statistics for paired ordinal variables.

Listening ratings
Only musical excerpts that were utilized in fMRI scanning and that matched the criteria for stimulus inclusion (i.e., completely instrumental excerpts or containing intelligible lyrics) were selected for the statistical comparison of behavioral responses to music with and without lyrics. The effects of the type of music on the six different scales of judgments collected in the listening test (familiarity, emotion recognition, emotion induction, preference, valence, and beauty) were investigated using Friedman's rank test. Pairwise comparisons aiming to test differences between ratings to music with and without lyrics were carried out with the Wilcoxon statistics.

fMRI data processing and stimulus selection
Whole-brain imaging data was studied using SPM5 for the preprocessing and SPM8 for the statistical analyses 1 . Images for each subject were realigned, spatially normalized onto the Montreal Neurological Institute (MNI) template (12 parameter affine model, gray matter segmentation), and spatially smoothed (Gaussian filter with an FWHM of 6 mm). After realignment, datasets were also screened for scan stability as demonstrated by small motion correction, always <2 mm translation and <2˚rotation. fMRI responses were modeled using a canonical hemodynamic response function (HRF) with time dispersion and temporally filtered using a high-pass filter of 128 Hz to minimize scanner drift. The six parameters for residual movement were modeled as regressors of no interest.
Following preprocessing, linear contrasts employing canonical HRFs were used to estimate category-specific BOLD activation for each individual and each scan. The stimulus conditions were obtained from the subject's selection of happy and sad music. For the two lyrics stimulus categories (sad music with lyrics, happy music with lyrics), we extracted only those fMRI scans that were associated with musical excerpts that contained lyrics (either a male or female voice or a chorus). For the two non-lyrics stimulus categories (sad music without lyrics, happy music without lyrics), we selected only those fMRI scans obtained during the presentation of instrumental music (in fewer than 10 scans across subjects, humming was allowed to be present; for a list of all the musical excerpts containing lyrics included in the fMRI analysis, see Table A1 in Appendix). The selection was performed in a MATLAB environment by two expert judges with knowledge of Finnish and English, listening to each excerpt and assigning a code for the presence or absence of lyrics. The average number of trials per subject included in further processing were as follows: 13.2 ± 4.3 SD for sad music with lyrics (total number of excerpts across the experiment: 104), 11.3 ± 4.8 for sad music without lyrics (total number of excerpts: 88), 13.3 ± 4.6 for happy music with lyrics (total number of excerpts: 109), and 10.3 ± 5.3 for happy music without lyrics (total number of excerpts: 78). No significant difference was found between the four stimulus categories in a one-way ANOVA. To note, the sad/happy connotations of the music excerpts as given by each subject were overall consistent with the semantic content of the excerpts. This was confirmed by a separate listening test, in which we asked three independent judges with knowledge of English and Finnish to classify the sad or happy emotional connotations based on the semantic content of the lyrics of each musical excerpt (disregarding the music). We then analyzed the internal consistency of the experts' classifications obtaining Cronbach's alpha equals to 0.7, which confirmed the high correspondence between subjects' and experts' classifications.
In a first-level analysis, we used paired-samples t -tests to compare the brain responses to sad music with those to happy music and the brain responses to music containing lyrics with those to instrumental music. Using paired-samples t -tests, we also contrasted the brain responses across lyric conditions within each emotional category, i.e., sad music with lyrics versus sad music without lyrics. These individual contrast images (i.e., weighted sums of the beta images) were then used in second-level random effects models that account for both scan-to-scan and participantto-participant variability to determine mean, condition-specific regional responses. In the second-level analysis, we included five of the six behavioral ratings (familiarity, emotion induction, and an aesthetics rating consisting of an average of preference, valence, and beauty) and the acoustic parameters of spectral centroid and attack for each stimulus condition and each subject as covariates.
In this way, we could exclude possible confounding effects of these subjective ratings and acoustic features that differentiated sad and happy musical excerpts with lyrics from those without lyrics, as will be reported in detail in the following sections. We did not include as covariates the perceptual attributes of tempo and mode or the behavioral ratings for emotion recognition as they did not differentiate the stimulus conditions of interest, namely sad and happy music with or without lyrics, but rather differentiated sad versus happy music, irrespective of the lyrics variable (see following sections). Furthermore, we used an average of the three aesthetic ratings (pleasantness, liking, and beauty) instead of the individual ratings since they correlated strongly with each other, and represented a single conceptual construct, as testified by the very high Cronbach alpha obtained (0.8 on average). In this way, we were able to contain the number of covariates included in the analysis, thus avoiding artificially inflating the degrees of freedom.
To protect against false positive activations, only regions with a Z -score equal to or greater than 3.5 (p < 0.001), and with a minimum cluster size (k) equal to or greater than 10 contiguous voxels were considered (cf. Kotz et al., 2002). For anatomical localization, the coordinates of the local maxima of activation plotted in the MNI space were converted to conform to the Talairach space (Talairach and Tournoux, 1988). Subsequently, anatomical labels and Brodmann's areas were assigned to activated clusters using the Talairach Daemon Client 2 .

STIMULUS FEATURES
The attack slopes differed according to the music categories, as revealed by Friedman's rank test (χ 2 = 17.2, p < 0.001; see Figure 1). Happy music with lyrics had significantly faster attack slopes than sad music with and without lyrics (T > 3, p < 0.002). However, happy or sad music with lyrics did not differ in attack slopes from happy or sad music without lyrics (p > 0.07), and happy music without lyrics did not differentiate in the attack slopes from sad music with or without lyrics (p > 0.1).
As visible from Figure 1, the spectral centroids were also affected by the music categories (χ 2 = 34.9, p < 0.001), revealing the brightest timbres for happy music with lyrics compared with all the others (T > 2.2, p < 0.03). Interestingly, happy and sad music with lyrics was associated with higher spectral centroids, i.e., brighter timbres, than happy and sad music without lyrics (T > 2.2, p < 0.03). Moreover, happy music with or without lyrics was characterized by higher spectral centroids when compared with sad music with or without lyrics (T = 3, p < 0.002). Notably, sad music with lyrics had comparable spectral centroid values to happy music without lyrics (p = 0.3).

As shown in
For the familiarity scale, direct comparisons between the ratings to the music with and without lyrics showed that while all music samples were highly familiar to the subjects (scoring on average more than 4 on a 1 to 5 scale), sad music with lyrics was slightly less familiar than sad music without lyrics (T = 2.6, p < 0.01). No differences were found in emotion recognition of sad and happy content between lyrics and non-lyrics music samples. In contrast, happy music without lyrics induced stronger happy emotions than happy music with lyrics (T = 2.7, p < 0.01). No difference was observed in the intensity of emotions felt between the sad music with or without lyrics.
Finally, for this group of subjects, both sad and happy music without lyrics was rated overall more positively, i.e., more liked, more pleasant, and more beautiful than sad and happy music with lyrics (liked-disliked: T = 4.1, p < 0.001 for happy music and T = −4, p < 0.001 for sad music; unpleasant-pleasant: T = 4.1, p < 0.001 for happy music and T = 3.6 for sad music; uglybeautiful: T = −4.1, p < 0.0001 for happy music and T = −3.9 for sad music). www.frontiersin.org

Main effects of basic emotions
As visible in Table 1 and Figure 3, the contrast sad > happy music showed significant activation in the left thalamus and the right caudate. The opposite contrast happy > sad music revealed significant differences only in the left-hemispheric secondary and associative auditory cortices (BA 42 and 22), including the insula (BA 13).

Main effects of lyrics
As evidenced by Table 1 and Figure 3, the contrast music with lyrics > music without lyrics produced brain activity in several bilateral auditory and associative areas, including the left inferior and superior temporal gyri (BA 21 and 22) and the right transverse and superior temporal gyri (BA 22 and 41). In addition, this contrast revealed activity in four left-hemispheric structures, namely the putamen, the cuneus (BA 18), the postcentral gyrus (BA 43), and the declive of the cerebellum. The opposite contrast music without lyrics > music with lyrics resulted in activations in the medial anterior cingulate cortex (BA 24, 32, and 33) and left anterior cingulate cortex (BA 24), as well as in the right dorsolateral prefrontal cortex at the middle frontal gyrus (BA 9), the right pars opercularis of the inferior frontal gyrus (BA 44), and the medial part of the cerebellar tonsil.

Effects of lyrics on sad music
As illustrated in Table 1 and Figure 4, the presence or absence of lyrics had an effect on the brain responses to music but, interestingly, was differentially weighted by the emotional content of the music. In particular, the contrast sad music with lyrics > sad music without lyrics revealed significant differences in the right claustrum, the right parahippocampal gyrus, the bilateral amygdala, the bilateral auditory cortex at the transverse and middle temporal gyri (BA 21, 22, and 41), the right medial frontal gyrus (BA 10), the left putamen, and the bilateral inferior frontal and right precentral gyri (BA 47 and 43, respectively). Conversely, the opposite contrast of sad music without lyrics > sad music with lyrics did not yield any significant difference.

Effects of lyrics on happy music
As shown in Table 1 and Figure 4, happy music with lyrics > happy music without lyrics elicited significant differences in the BOLD responses only in the bilateral auditory cortices, and in particular in the right middle and bilateral superior temporal gyri (BA 21 and 22). The opposite contrast between happy music without lyrics > happy music with lyrics showed significant differences in limbic and emotion-related frontal areas, such as the left anterior cingulate (BA 24), the right insula (BA 13), the left middle frontal gyrus (BA 9), the precentral gyrus (BA 44), and the superior frontal gyrus (BA 6).

Interactions: effects of lyrics on sad versus happy music
As shown in Table 1 and Figure 4, we also studied the interactions between the main effects. Compared to happy music, the presence of lyrics in sad music produced larger activations in the bilateral inferior frontal gyrus (BA 47), the left transverse, middle and superior temporal gyri (BA 22 and 42), the right superior temporal gyrus (BA 38), and the right inferior parietal lobule (BA 40).
The insula was activated bilaterally (BA 13). In the left hemisphere, the insular cluster extended to the precentral and inferior frontal gyri, encompassing the pars opercularis (BA 44). The presence of lyrics in happy music > absence of lyrics in sad music, conversely, did not yield any significant difference.

DISCUSSION
The present study investigated the brain activations in response to sad and happy music with or without lyrics in order to elucidate the effects of linguistic information on the most common everyday affective experiences of music, such as pop and rock. We have hence contributed to the knowledge of brain areas involved in processing sadness and happiness in ecological musical stimuli, characterized by a wide variety of timbres and sonorities. First, we demonstrated that the behavioral ratings of sad and happy music, disregarding the presence of lyrics, validate the use of self-selected musical excerpts in inducing an emotional experience during fMRI scanning. Second, we found that a number of brain regions were active in response to sad music with lyrics versus sad music without lyrics and in the interaction between the effects of lyrics on sad versus happy music, whereas no brain activation was obtained in the opposite contrasts. Moreover, significant brain activity in the limbic system and in the auditory cortex was observed in response to instrumental happy music whereas only temporal regions were active during happy music with lyrics. These results, together with the parallel findings in behavioral ratings and acoustical/perceptual analysis of the musical excerpts, will be further elaborated below.
Sad music induced activity within the right caudate head and the left thalamus. Interestingly, the left thalamus is one of the few structures (along with a region in the visual cortex and the left insula) found to be consistently active when processing sad faces in a large meta-analysis of 105 neuroimaging studies of face expressions (Fusar-Poli et al., 2009). This consistency hints at the cross-modal nature of basic emotions evoked by visual or auditory stimuli (cf. Koelsch, 2010). We also found activation in the right head of the caudate nucleus. In the left hemisphere, this subcortical striatal region has been associated with demanding speech-related tasks, such as switching between languages in proficient bilinguals (Garbin et al., 2011), but also with judgments of musical and visual beauty (Ishizu and Zeki, 2011). In the right hemisphere, the caudate is activated during reward-based learning, good decisionmaking (Haber and Brucker, 2009;Haber and Calzavara, 2009), and during listening to highly pleasurable chill-inducing musical pieces (Salimpoor et al., 2011). To note, it has been proposed that sad music is more effective in inducing chills than happy music (Panksepp, 1995), suggesting a link between these two affective experiences and the recruitment of the caudate that deserves to be further investigated with future experiments. An earlier study by Mitterschiffthaler et al. (2007) that investigated the brain correlates of sadness in music (contrasted with the baseline neutral music condition) obtained discrepant results, including activation of the anterior cingulate cortex and other structures of the limbic system, like the amygdala and hippocampus. Activations of the ventral striatum and caudate nucleus, related to reward, subjective experience of pleasure and drive to move, were instead found in response to happy music contrasted with neutral music. When www.frontiersin.org Table 1 | Anatomical labels based on center of mass, MNI-coordinates, and Z -score of global maxima within clusters of significant activations (p < 0.001; Z > 3.5; k > 10 with k standing for number of voxels: distance between clusters >8 mm).  Mitterschiffthaler et al. (2007) contrasted brain responses to happy music directly with those to sad music, as done in the current study, they obtained findings comparable to ours, with activations in the left superior temporal gyrus (BA 22). In our study, the left-hemispheric cluster activated by happy music is larger than in Mitterschiffthaler et al.'s study, encompassing also BA 42 and extending to the insula (BA 13). The lack of full correspondence, particularly between the activation to sad music in these two studies, can be accounted for by the different paradigm used. While Mitterschiffthaler et al. (2007) focused on only five classical music pieces per category, pre-selected by the investigators, here the music was very diverse: over 300 different musical excerpts, individually selected by subjects, were used, and hence they were balanced in familiarity and in the subjective pleasure they induced (since aesthetic and familiarity ratings were included as covariates in the fMRI analysis), and varied widely in acoustic features. The other earlier study on sad and happy emotions in music by Khalfa et al. (2005) obtained activations of the medial and superior frontal gyri (BA 9 and 10) in response to sad minor music contrasted with happy major music. Their finding was not replicated here (although we did obtain a significant cluster in BA 10 in response to sad music containing lyrics when contrasted to sad music without lyrics). Also, their study diverges from ours in the selection of musical stimuli and in the nature of the manipulation, restricted to mode and tempo in Khalfa et al. (2005).
As mentioned above, the direct comparisons between happy versus sad music and also between sad or happy music with versus without lyrics produced activations of auditory cortices. As a general pattern, bilateral auditory cortices were recruited for music containing lyrics, with large clusters of activations in the left hemisphere; this is in line with the extensive literature showing the importance of the left superior temporal lobe for phonetic, syntactic, and semantic processing of language (Hickok and Poeppel, 2000;Tervaniemi et al., 2000;Näätänen, 2001;Wong et al., 2008;Vartiainen et al., 2009;Sammler et al., 2010;Schön et al., 2010). Interestingly, the brain activations in response to happy music were focally restricted to the left-hemispheric superior temporal gyrus (and adjacent insula). In line with our initial hypothesis, this selectivity may be explained by the acoustic features contained in the happy music selected by subjects for this study. As evidenced by acoustic and perceptual analyses, happy music (particularly that with lyrics) had sharper attack slopes and faster tempos as well as higher spectral centroids and major mode. A growing body of data supports the related notion that fast spectro-temporal transitions, such as fast-paced sounds with steep attacks, would be processed in the left Heschl's gyrus and in part of the left superior temporal gyrus, whereas slow spectro-temporal transitions would favor the right-hemispheric counterparts (Zatorre et al., 2002;Poeppel, 2003;Tervaniemi and Hugdahl, 2003;Schönwiesner and Zatorre, 2009). Both instrumental pieces and songs activated two clusters in the posterior lobe of the cerebellum; the former recruited the declive of the vermis (or lobule VI according to the new nomenclature; Schmahmann et al., 1999), whereas the tonsil (or lobule IX according to the new nomenclature) was more active during listening to songs. Traditionally associated with sensory-motor processing, the cerebellum, particularly in its posterior lobe, has more recently been implicated in the monitoring of cognitive tasks related to musical sounds and in imaging musical production (Langheim et al., 2002;Salmi et al., 2010), which is consistent with our findings. One could also venture a possible association between the activation of the declive in response to instrumental music and subjective emotional experiences since this cerebellar structure is believed to be recruited by the processing of emotional faces and empathy for another's pain (Fusar-Poli et al., 2009;Stoodley and Shmahmann, 2009). In addition, music containing lyrics specifically generated brain activity in a network of left-hemispheric areas previously associated with language processing, and with semantic memory for object concepts, such as the superior and inferior temporal gyri, the cuneus (also formerly recruited during pitch discrimination tasks; Platel et al., 1997), and the putamen (Martin and Chao, 2001;Kotz et al., 2002;Wheathley et al., 2005;Vartiainen et al., 2009;Burianova et al., 2010; for a review, see Price, 2010). The activity in the left putamen, previously associated with initiation and execution of speech (Price, 2010), along with the cerebellar activity, indicates that subjects might have been covertly singing their familiar songs while listening to them in the scanner.

www.frontiersin.org FIGURE 4 | Effects of the presence or absence of lyrics on emotions and the interaction between lyrics and emotions. Amy
In contrast, emotion-related areas, such as the cingulate cortex (including the anterior cingulate and the cingulate gyrus) and the middle frontal gyrus (BA 9), were more active during instrumental (especially happy) music. Particularly, in a meta-analysis of the neuroimaging literature, the medial prefrontal cortex (BA 9) has been indicated as necessary for all emotional functions in any sensory modality (Phan et al., 2002). In our study, activity in the cingulate cortex is likely linked to emotional attentive processing of self-selected music, and its involvement in processing of instrumental music is well explained by the results of the behavioral ratings. Our participants consistently preferred sad or happy instrumental music to music containing lyrics, and judged instrumental music as more pleasant and even more beautiful. Although we included these subjective aesthetic ratings as covariates in the data processing, the generally higher emotional impact of instrumental music over sung music might have affected our findings. Nevertheless, notably, in one study strictly focusing on neural processing of songs compared to instrumental music, listening to sung words activated limbic areas, including the insula, the parahippocampal gyrus, and the cingulate cortex, when contrasted to listening to spoken words or to singing without words . These findings hint at the association between songs and affective responses, which in our study was evident particularly with sad songs. The study by Schön et al. (2010), however, did not vary nor control for the emotional content of the musical stimuli.
Instrumental music in general, and happy instrumental music in particular, further activated areas encompassing the right pars Frontiers in Psychology | Auditory Cognitive Neuroscience opercularis of the inferior frontal gyrus, namely the homolog of the Broca's area (BA 44) in the right cerebral hemisphere. This region is consistently recruited for the processing of chord successions in Western tonal harmony (Maess et al., 2001;Koelsch et al., 2006;Tillmann et al., 2006). Acoustic and perceptual analyses demonstrate that majority of the happy excerpts are in the major mode, likely containing more obvious tonal categories than the other minor mode excerpts. The right homologue of the Broca's area here was hence likely responsible for the processing of the clear harmonic passages especially present in happy instrumental music.
Importantly, we examined whether the presence of lyrics modulated the perceptual and neural responses associated with the experience of sadness or happiness in music. Perceptual analyses of mode and tempo revealed no differences between music with lyrics and music without lyrics, and only small differences in the acoustic analyses; in particular, higher spectral centroids for music with lyrics versus without lyrics. These findings are reminiscent of a study by Rentfrow and Gosling (2003), in which they collected the perceptual attributes of 15 songs from 14 different genres and of their lyrics. Although the authors did not conduct a direct comparison of music and lyrics attributes, the description of the results indicated an almost full correspondence between the lyrics and the music attributes of different musical genres. For instance, rock, alternative, and heavy metal music were characterized by moderate complexity, high energy, and negative affect in both lyrics and music, whereas genres like pop and country were defined by simplicity, high positive affect in both music and lyrics, and by high energy in lyrics and low loudness in sounds (Rentfrow and Gosling, 2003). In our study, the most notable differences in the acoustic and perceptual analyses were obtained between happy and sad music (either with or without lyrics), with steeper attack slopes, higher spectral centroids, faster tempos, and predominant major mode for happy music as opposed to smoother attack slopes, lower spectral centroids, slower tempos, and predominant minor mode for sad music (see Figure 1). In line with this, a large review of 104 studies of vocal expression and 41 studies of musical performance demonstrated the use of acoustic cues, such as overall F0/pitch level, rate/tempo, and intensity, to convey basic emotions to listeners via both the instrumental and vocal channels (Juslin and Laukka, 2003). Nevertheless, in our data, sad music with lyrics and happy music without lyrics were characterized by mid-range values for acoustic features, without significant differences between those two categories. In sum, the semantic message conveyed by lyrics would play a larger role than the acoustic cues present in the music itself.
Our fMRI data converge with the behavioral data to suggest that emotions induced by happy music without lyrics and sad music with lyrics are experienced more deeply. Here we found that happy music without lyrics versus happy music with lyrics more strongly activated structures that have been previously associated with the perception and recognition of basic emotions in both language and visual modalities, in particular the left anterior cingulate cortex (BA 24), the right insula (BA 13), and the middle frontal gyrus (BA 9; Phan et al., 2002). Similarly, behavioral results show that positive emotions are felt more clearly with instrumental happy music. On the other hand, wider brain activity was obtained in response to sad music with lyrics (versus without lyrics) specifically in brain regions related to language processing, such as the left inferior frontal gyrus (BA 44 and 47;Grodzinski, 2000) and the left superior temporal gyrus (Zatorre et al., 2002); in addition, this contrast revealed activity in several emotion-related areas, including the right claustrum, the left medial frontal gyrus (BA 10), the bilateral amygdala and the right parahippocampal gyrus. The two latter emotion-related structures were also recruited by sad instrumental music contrasted with neutral music in the study by Mitterschiffthaler et al. (2007), whereas the medial frontal gyrus (BA 10) was associated with processing minor (sad) musical excerpts versus major ones (Khalfa et al., 2005) and with judging a rhythm pattern as beautiful (Kornysheva et al., 2010). The interplay between acoustical and semantic information in musical emotions, evident from the direct contrasts, was confirmed by the interaction between the presence of lyrics in sad music versus happy music without lyrics. In detail, lyrics in sad music specifically activated left-hemispheric areas related to syntactic, semantic language processing, and speech execution, including the inferior frontal gyrus (BA 47 and 44), the putamen, the auditory areas (BA 42 and 22), and the inferior parietal lobule (BA 40), along with the emotion-related insula (BA 13). In contrast, happy music with lyrics did not recruit any additional neural activity in comparison to sad music with lyrics. Behavioral studies similarly showed the efficacy of instrumental music in conveying positive emotions, whereas sad emotions are instead reinforced and better represented when lyrics are present (Ali and Peynircioglu, 2006).
The claustrum, lateral to the putamen and below the insula, found in our study to be active during sad music with lyrics, is a lesser known, small region of the brain, receiving afferent projections from almost all the other areas of the brain and projecting to the majority of the brain. Given the spatial resolution of fMRI and the dimensions of the claustrum, rounding to about 5 mm, it is hard to distinguish the activation of this region from the nearby insula (indeed active in the interaction between sad music with lyrics and happy music without lyrics). The insula, which is the better known brain structure, has been suggested to be affiliated with the "feeling" of emotion, i.e., its subjective experience (Damasio et al., 2000). The insula positively correlated with increasing intensity of chills induced by favorite classical music in musicians (Blood and Zatorre, 2001), and in non-musicians, it had higher activity in response to pleasant classical music pieces contrasted with the same pieces distorted to sound dissonant . In their review, Molnar-Szakacs and Overy (2006) proposed that the anterior insula, due to its anatomical connections with both the limbic system and the motor system, could represent a key structure for emotional awareness by linking emotion processing with motor representations. In music, this brain structure would contribute to producing an integrated musical representation.
Sad music with lyrics was further associated with neural activity in the posterior part (pars triangularis) of the bilateral inferior frontal gyrus (BA 47), the rostral part of the inferior parietal lobule (BA 40), and the lower part of the left precentral gyrus extending to the pars opercularis of the inferior frontal gyrus (BA 44). All these regions are supposed to belong to the "mirror neuron" system in humans, activated both by motor production by an individual and www.frontiersin.org by perception of motor acts by others (Rizzolatti and Craighero, 2004;Morin and Grezes, 2008;Fadiga et al., 2009). Specifically, the inferior parietal lobule has been related to motor representations of the mouth. A neighboring structure, the rolandic operculum, being placed at the ventral part of the somatosensory homunculus ; for a review, see Fadiga et al., 2009), has been suggested to include the neural somatosensory representation of the larynx (hence the somatosensory counterpart of BA 40). Furthermore, the left-hemispheric pars triangularis (BA 47) and the pars opercularis (BA 44), belonging to Broca's area, have been related to motor language representations, speech articulation, and language syntax (Grodzinski, 2000). More recent hypotheses about the function of Broca's area point to its role in the cross-modal processing of language, chord sequences, or even movement programs organized according to hierarchies of importance (Koechlin and Jubault, 2006;Koelsch, 2006;Tettamanti and Weniger, 2006). Overall, these clusters of activation in response to sad music with lyrics might be explained by the involuntary imagination of singing along with familiar, self-selected sad songs. Actual singing during the fMRI scanning can be excluded based on answers to the post-experimental questionnaire; however, involuntary imagined singing cannot be excluded.
These findings, in addition to the behavioral data, lend support to the hypothesis that lyrics are more important for inducing sad emotions by music, but that instrumental cues have a greater significance in inducing happy emotions through music. In addition, the contrast between sad music with lyrics versus sad music without lyrics (Table 1E) reveals activity within the limbic system, suggestive of greater emotional induction by sad music with lyrics. Conversely, the contrast between happy music without lyrics versus with lyrics (Table 1H, and compare to the opposite contrast Table 1G) reveals activity in the limbic system only in response to happy instrumental music. Further support for this idea is garnered from examination of the effects of lyrics on sad versus happy music (Table 1I). It could be ventured that vocal (rather than semantic) information, particularly in sad music, contains subtle cues for activating deep emotion-related structures by association with ancient vocalizations (cf. Juslin and Västfjäll, 2008). Also, the content of the lyrics would produce mental associations with negative emotional experiences and hence limbic and paralimbic neural activity. These explanations require new studies.
In sum, these findings generalize and broaden our understanding of the neural processing of musical emotions, songs, and instrumental music. Unlike previous studies that have used experimenter-designed stimulus sequences or musical excerpts from the classical music repertoire, we have expanded the set of stimuli to include over 300 self-selected musical excerpts from various genres. Most importantly, we were able to discern the contribution of lyrics to this process. Blood and Zatorre (2001) first introduced a paradigm that utilized subjects' own music selections to induce a reproducible and intense emotional experience, possibly enhanced by familiarity and personal associations (see also Salimpoor et al., 2011). Such a paradigm, also involving a further experimenter-based selection of the stimuli corresponding to predetermined criteria, raises concerns about control of all the factors involved in the affective responses. In this study, we obtained behavioral ratings for each music excerpt, thus assessing some crucial dimensions of the affective and aesthetic musical experience. We also computationally extracted the acoustic features of attack slope and spectral centroid and tested their influence on the emotional connotations of happiness and sadness in music. These behavioral and acoustic measures were included in the fMRI analysis as covariates in order to minimize the possible confounds that could emerge when employing such an ecological paradigm. Consequently, the conclusions that can be drawn from our findings could be considered as strengthened by our statistical design including covariates. One should, however, keep in mind that follow-up studies directly comparing ecological stimuli with highly controlled ones should be conducted in order to isolate the effects of acoustic variability and monotony for a musical affective experience.
Taken together, the current results converge to show that the presence of lyrics has differential effect in happy or sad music. Lyrics appear to be crucial for defining the sadness of a musical piece, as reflected in the activation of limbic system areas of the brain, whereas acoustic cues have a stronger role in determining the experience of happiness in music, as shown by activity in auditory cortical regions.