# THE JANUS-FACE OF LANGUAGE: WHERE ARE THE EMOTIONS IN WORDS AND THE WORDS IN EMOTIONS?

EDITED BY : Cornelia Herbert, Thomas Ethofer, Andreas J. Fallgatter, Peter Walla and Georg Northoff PUBLISHED IN : Frontiers in Psychology

#### Frontiers Copyright Statement

© Copyright 2007-2018 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-550-8 DOI 10.3389/978-2-88945-550-8

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# THE JANUS-FACE OF LANGUAGE: WHERE ARE THE EMOTIONS IN WORDS AND THE WORDS IN EMOTIONS?

Topic Editors:

Cornelia Herbert, Ulm University, Germany Thomas Ethofer, University of Tuebingen, Germany Andreas J. Fallgatter, University of Tuebingen, Germany, LEAD Graduate School, University of Tuebingen, Germany Peter Walla, Webster Vienna Private University, Austria, University of Newcastle, Australia and Vienna University, Austria Georg Northoff, University of Ottawa, Canada

Cover image design: kiz, Ulm University, Germany Text and content: Cornelia Herbert, Ulm University, Germany

Language has long been considered independent from emotions. In the last few years however research has accumulated empirical evidence against this theoretical belief of a purely cognitive-based foundation of language. In particular, through research on emotional word processing it has been shown, that processing of emotional words activates emotional brain structures, elicits emotional facial expressions and modulates action tendencies of approach and avoidance, probably in a similar manner as processing of non-verbal emotional stimuli does. In addition, it has been shown that emotional content is already processed in the visual cortex in a facilitated manner.

Yet, this is only one side of the coin. Very recent research putting words into context suggests that language may also change and regulate emotions and that by studying word processing one can provide a window to one's own feelings. All in all, the empirical observations support the thesis of a close relationship between language and emotions at the level of word meaning as a specific evolutionary achievement of the human species. But what does this relationship between written words and emotions theoretically imply for the processing of emotional information?

The present Research Topic and its related articles aim to provide answers to this question. This book comprises several experimental studies investigating the brain structures and the time course of emotional word processing. Included are studies that examine the affective core dimensions underlying affective word processing and that show how these basic affective dimensions influence word processing in general and studies that investigate the interaction between words, feelings and (expressive) behavior. In addition, new impetus comes from studies that on the one hand investigate how task-, sublexical and intrapersonal factors influence emotional word processing and on the other hand extend emotional word processing to the domains of social context and self-related processing. Finally, future perspectives are outlined including research on emotion and language acquisition, culture and multilingualism.

In summary, this textbook offers scientists from different disciplines insight into the neurophysiological, behavioral and subjective mechanisms that may underlie emotion and language interactions. It gives new impulses to existing theories on the embodiment of language and emotion and provides new ways of looking at emotion-cognition interactions.

Citation: Herbert, C., Ethofer, T., Fallgatter, A. J., Walla, P., Northoff, G., eds. (2018). The Janus-Face of Language: Where Are the Emotions in Words and the Words in Emotions? Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-550-8

# Table of Contents

*07 Editorial: The Janus Face of Language: Where are the Emotions in Words and Where are the Words in Emotions?*

Cornelia Herbert, Thomas Ethofer, Andreas J. Fallgatter, Peter Walla and Georg Northoff

# CHAPTER 1

EMOTIONAL WORD PROCESSING – WHERE ARE THE EMOTIONS IN WORDS AND THE WORDS IN EMOTIONS: CURRENT THEORIES, PERSPECTIVES AND AFFECTIVE CORE DIMENSIONS


Sylvie Moritz-Gasser, Guillaume Herbet and Hugues Duffau

*30 The Role of Language in Emotion: Predictions From Psychological Constructionism*

Kristen A. Lindquist, Jennifer K. MacCormack and Holly Shablack

*47 10 years of BAWLing into Affective and Aesthetic Processes in Reading: What are the Echoes?*

Arthur M. Jacobs, Melissa L.-H. Võ, Benny B. Briesemeister, Markus Conrad, Markus J. Hofmann, Lars Kuchinke, Jana Lüdtke and Mario Braun

*62 Affective Norms for 4900 Polish Words Reload (ANPW\_R): Assessments for Valence, Arousal, Dominance, Origin, Significance, Concreteness, Imageability and, Age of Acquisition* Kamil K. Imbir

# CHAPTER 2

# EMOTIONAL WORD PROCESSING – TIME COURSE, BRAIN STRUCTURES, EFFECTS OF TASKS, SUBLEXICAL AND INTRAPERSONAL FACTORS


Marina Palazova

*93 Neural Correlates of an Early Attentional Capture by Positive Distractor Words*

José A. Hinojosa, Francisco Mercado, Jacobo Albert, Paloma Barjola, Irene Peláez, Cristina Villalba-García and Luis Carretié

*106 Implicit and Explicit Attention to Pictures and Words: An fMRI-Study of Concurrent Emotional Stimulus Processing*

Tobias Flaisch, Martin Imhof, Ralf Schmälzle, Klaus-Ulrich Wentz, Bernd Ibach and Harald T. Schupp

*122 The Emotion Potential of Simple Sentences: Additive or Interactive Effects of Nouns and Adjectives?*

Jana Lüdtke and Arthur M. Jacobs


# CHAPTER 3

# GOING BEYOND SINGLE WORDS – THE IMPACT OF SELF-REFERENCE, SOCIAL RELEVANCE AND COMMUNICATIVE CONTEXT ON EMOTIONAL WORD AND SENTENCE PROCESSING


Patrick P. Weis and Cornelia Herbert

*251 It's all in Your Head – How Anticipating Evaluation Affects the Processing of Emotional Trait Adjectives*

Sebastian Schindler, Martin Wegrzyn, Inga Steppacher and Johanna Kissler


Marc P. Bennett, Ann Meulders, Frank Baeyens and Johan W. S. Vlaeyen

# CHAPTER 4

WHERE ARE THE WORDS IN EMOTIONS? FUTURE PERSPECTIVES, EMOTIONAL LANGUAGE ACQUISITION, BI- AND MULTILINGUALISM AND POETIC AESTHETICS


# Editorial: The Janus Face of Language: Where Are the Emotions in Words and Where Are the Words in Emotions?

Cornelia Herbert <sup>1</sup> \*, Thomas Ethofer 2,3, Andreas J. Fallgatter 2,4, Peter Walla5,6,7 and Georg Northoff <sup>8</sup>

<sup>1</sup> Department of Applied Emotion and Motivation Research, Institute of Psychology and Education, University of Ulm, Ulm, Germany, <sup>2</sup> Department of Psychiatry, University of Tübingen, Tübingen, Germany, <sup>3</sup> Department for Biomedical Resonance, University of Tübingen, Tübingen, Germany, <sup>4</sup> LEAD Graduate School, University of Tuebingen, Tuebingen, Germany, <sup>5</sup> Cognitive Neuroscience & Behaviour Lab (CanBeLab), Department of Psychology, Webster Vienna Private University, Vienna, Austria, <sup>6</sup> School of Psychology, University of Newcastle, Callaghan, NSW, Australia, <sup>7</sup> Faculty of Psychology, Vienna University, Vienna, Austria, <sup>8</sup> Mind, Brain Imaging and Neuroethics, University of Ottawa Institute of Mental Health Research, University of Ottawa, Ottawa, ON, Canada

Keywords: emotion and language, embodiment, neuronal and behavioral mechanisms of emotional word processing, language as context, mood, multilingualism, poetic aesthetics, self-reference

**Editorial on the Research Topic**

**The Janus Face of Language: Where Are the Emotions in Words and Where Are the Words in Emotions?**

INTRODUCTION: WHERE ARE THE EMOTIONS IN WORDS?

#### Edited and reviewed by:

Manuel Carreiras, Basque Center on Cognition, Brain and Language, Spain

> \*Correspondence: Cornelia Herbert cornelia.herbert@uni-ulm.de

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 25 March 2018 Accepted: 16 April 2018 Published: 17 May 2018

#### Citation:

Herbert C, Ethofer T, Fallgatter AJ, Walla P and Northoff G (2018) Editorial: The Janus Face of Language: Where Are the Emotions in Words and Where Are the Words in Emotions? Front. Psychol. 9:650. doi: 10.3389/fpsyg.2018.00650 We text, blog, twitter and tweet, we write each other emails, poems and love letters. Ever since in human history, people have been using language to communicate emotions and feelings, well knowing that words can hurt or heal. Thus, considering everyday experiences, there is no doubt that written language constitutes a most powerful tool for inducing emotions in self and others and for eliciting emotional responses in the sender and perceiver of a message even when no direct face to face communication is possible.

However, what happens so naturally and effortlessly in everyday life has become a subject of intensive scientific debate. Can language, specifically written language in terms of single words elicit emotions? And if so, where are the emotions in words and where are the words in emotions?

Theoretically, the answer to these questions is anything than trivial. Traditionally, language has been considered a purely cognitive function of the human mind; a property of the mind that evolved for the purpose of representing individual experiences in an abstract way, independent from sensory and motor experience and independent from bodily sensations including emotions (for a discussion see Chapter 1 in this book). In this view, reading emotion-related or emotional words such as "snake" or "fear" may activate the semantic meaning of the word including its emotional meaning; readers may even infer from reading that snakes are harmful and threatening creatures; nonetheless, this knowledge would be stored in a purely amodal fashion. As a result, readers would be unable to bodily and affectively feel what they are reading because the crucial link between mental states and sensory, motor and peripheral (bodily) changes characterizing emotions would be missing. In other words, viewed from a pure cognitive approach of language, emotions and their perceptual, sensory and motor consequences can be expressed linguistically. However, the linguistic description and semantic representation of an emotion will not be accompanied by physiological bodily changes or by affective experiences of arousal, or by bodily feelings of pleasure and displeasure nor by changes in motivational behavior of approach or avoidance.

In recent years there have been changes with respect to the understanding of mind-body interactions and the role language may play in emotion processing and emotion regulation. In the past 15 years, a number of studies have been conducted at the interface of language and emotion, most if not all of them accumulated empirical evidence against the theoretical belief of a purely cognitive-based foundation of language (e.g., see Chapter 1–4 in this book).

# EMOTIONAL WORD PROCESSING—CORE DIMENSIONS, TIME COURSE AND BRAIN STRUCTURES

Several studies investigated the neurophysiological and psychophysiological correlates of emotional word processing to determine whether the processing of emotions from words and the processing of emotions from pictures or faces share the same neurophysiological mechanisms (e.g., Kissler et al., 2006, 2007; Herbert et al., 2008; Citron, 2012; Mavratzakis et al., 2016; see Bayer and Schacht; Palazova in this book for an overview). Methodologically, most studies used high-density electroencephalography (EEG) or functional magnetic resonance imaging (FMRI) techniques either alone or in combination with behavioral and subjective self-report measures.

Investigating emotional word processing in the brain showed that reading emotional words vs. neutral words increases neural activity in the ventral visual processing stream (involved in object recognition) within the first 200 ms after word presentation; i.e., in the same early time-windows reported in EEG studies investigating emotional picture processing (e.g., Junghöfer et al., 2001; Kissler et al., 2007; Herbert et al., 2008; see Bayer and Schacht, in this book for an overview). For words, occasionally, even earlier emotional facilitation effects have been reported indicating that emotional content is able to circumvent indepth semantic analysis (e.g., Kissler and Herbert, 2013; see Palazova in this book for discussion). Interestingly, results from imaging studies suggested that these stimulus-driven neural activity changes are likely to be caused by reentrant processing between the amygdala and the ventral visual processing system (e.g., Herbert et al., 2009; see Flaisch et al.; Eden et al. in this book for a discussion). Furthermore, under some instances, processing of emotional words may also lead to changes in approach and avoidance behavior (e.g., Herbert and Kissler, 2010) and to specific approach- and avoidance related behavioral response patterns (see Citron et al. for an overview in this book).

Taken together, the results do not support the idea that language representations in the brain are cut off from perception, actions and emotions. Instead the results argue in favor of a common coding principle of how the brain represents and processes emotional information—be this information abstract or concrete, verbal or non-verbal. Emotional stimuli may therefore—regardless of the stimulus type (pictures, faces or words)—elicit changes in central autonomic arousal and in specific appraisals related to pleasure and displeasure. Nevertheless, as suggested by functional imaging and EEG source analysis studies, the activation of brain structures involved in the top-down regulation of visual attention can significantly differ during processing of emotional pictures vs. emotional words; also when words are used as task-related distracters vs. targets (e.g., see Flaisch et al.; Hinojosa et al. in this book).

This raises questions about whether the emotional content of a word is embodied during reading: i.e., can readers affectively experience and feel what they are reading? And if so, when during reading does this kind of embodied processing occur? Undeniably, many if not all languages are rich of emotional words, suggesting a tight connection between written words and felt emotions. Previous research exploring the structure of affective ratings in large emotional word corpora and different languages suggested two dimensional emotional factors of valence (positive vs. negative) and arousal (being physiologically calm vs. aroused). These two factors seem to explain most of the variance of the affective ratings of words (e.g., see Jacobs et al. in this book for an overview). More recent studies found that other stimulus appraisal factors related to sensing, acting and feeling may also play a role (e.g., see Imbir et al.; Jacobs et al. in this book for an overview). All in all, this suggests on the one hand, fast and non-reflective appraisal of words according to the bodily arousal of the words and on the other hand, temporally slower and reflective evaluation of the word's personal self-, motivational or emotional relevance.

The distinction between pre-reflective, arousal-driven vs. reflective and valence-driven appraisal checks is well in line with what EEG studies investigating the time course of emotional word processing during passive reading, lexical decision or rapid serial visual presentation suggested (Herbert et al., 2006, 2008; Kissler et al., 2007, 2009; Carretié et al., 2008; Schacht and Sommer, 2009; Hinojosa et al., 2010): a rapid and selective processing of highly arousing emotional words of positive and negative valence in the time window of, for instance, the early posterior negativity (EPN) and a temporally later in-depth semantic processing of emotional words according to their emotional valence (positive vs. negative) in, for instance, the time windows of the N400 and LPP (e.g., Herbert et al., 2006, 2008; Kissler et al., 2009; see Palazova; Bayer and Schacht, in this book for a discussion). Therefore, the emotional significance of a word may be quickly appraised according to its physiological arousal and its emotional intensity. However, at these early bottom-up driven stages of emotional word processing the subjective feelings that arise from this processing may at this stage of word processing not be consciously, conceptually and semantically available for the reader although they arise from verbal input (Herbert, 2015; see e.g., Lindquist; Ensie Abbassi et al. in this book for a theoretical discussion). Subjective feelings may be consciously, conceptually and semantically available for the reader only during later stages of word processing.

# EFFECTS OF MOOD, INTRAPERSONAL AND SUBLEXICAL FACTORS INCLUDING COMPARISONS ACROSS STIMULUS TYPES

Moreover, sublexical factors such as phonological iconicity (sound-to-meaning correspondences) and intrapersonal factors (e.g., subjective mood, anxiety) can influence emotional word processing (e.g., Eden et al.; Sereno et al.; Ullrich et al., in this book). Regarding sublexical factors, these factors may modulate already stimulus-driven early stages of emotional word processing (Ullrich et al. in this book). Furthermore, anxiety may modulate activity in emotion structures such as the amygdala (involved in emotion detection and emotional response selection) in associative word-learning paradigms (Eden et al. in this book), whereas positive mood may change lexical decisions for positive and negative words via a broadening of attention (Sereno et al. in this book). Also, the induced mood state (via positive or negative film clips) may significantly affect syntactic processing of words. Thus, the interaction between emotion and language can go beyond semantic processing levels (see Verhees et al. in this book).

Nevertheless, an early stimulus tagging stage seems obligatory for all types of emotional stimuli (faces, words, pictures). This is also suggested by studies that compared the time course of emotional picture, emotional face and emotional word processing. These studies suggest that pictures, faces and words do evoke the same electrophysiological signals (e.g., an early posterior negativity component, EPN, as well as a late positive potential, LPP), but the emotion effects elicited at later processing stages may be stimulus-type specific due to a positivity offset elicited by the overall lower arousal levels of words vs. faces and pictures (see Bayer and Schacht; Lüdtke and Jacobs in this book for a discussion of EEG and behavioral results).

# EMOTIONAL WORD PROCESSING—CURRENT THEORIES AND PERSPECTIVES

What many emotional word processing studies though still leave open is whether the results summarized above are more compatible with traditional associative network models, interactive dual processing models or with an embodied account of word processing. Associative network models of emotions assume that emotional content conveyed by an abstract symbol such as a word or a concrete emotional stimulus such as a picture is rapidly mapped onto conceptual knowledge stored in associative memory networks. The information stored in these networks as nodes includes links to the operations, use, and purpose of the stimulus, as well as its emotional and physiological consequences (e.g., Lang, 1979; Bower, 1981). Importantly, activation of these networks is assumed to partially reactivate the perceptual processing, feeling- and action patterns that occur when directly confronted with an emotion inducing event in real time; an assumption that is also shared by theories of embodied cognition, that view knowledge as grounded in perception and action. Dual processing models (e.g., Paivio, 2010) as well as embodied theories of language processing (e.g., Barsalou et al., 2008) distinguish between two processing systems. Controversy between the two theories exists in the way concrete and abstract stimuli are processed by the two propagated systems (Vigliocco et al., 2009; Kousta et al., 2011; Paivio, 2013, for a discussion). Embodied theories propose a fast linguistic system and a temporally slower imagery-based simulation system (see Ensi-Abassi et al. in this book). Additionally, they assume that experiencing emotions through abstract words is possible only through simulation or reenactment. Theoretically, it has been proposed that on a cortical level, embodied processing of emotional words is laterized to the right hemisphere, whereas a pure linguistic and probably "cold" appraisal of words is more strongly associated with left-hemisphere activation (see Ensi-Abassi et al. Moritz-Gasser et al. in this book).

# GOING BEYOND SINGLE WORDS—THE IMPACT OF SELF-REFERENCE, SOCIAL RELEVANCE AND COMMUNICATIVE CONTEXT ON EMOTIONAL WORD AND SENTENCE PROCESSING

Compelling evidence that emotional content conveyed by abstract symbols such as words can elicit consciously retrievable affective feeling states comes from recent studies that extended emotion word processing to the domains of social cognition. Going beyond single words, a number of these studies use sentences that differ in self-reference (see Fields and Kuperberg, in this book). Other studies use compound stimuli consisting of pronoun- and article-noun pairs making a reference to the reader's own emotions (e.g., "my fear," "my pleasure") or to the emotion of another person ("my fear," "my pleasure") or that contain no particular personal reference (see Weis and Herbert, in this book). Some studies are using more complex designs in which participants read emotional trait adjectives in anticipation of an evaluation by a significant communicative sender (see Schindler et al. in this book). Generally speaking, these studies allow a detailed analysis of where and when in the processing stream emotional meaning is discriminated from neutral meaning as a function of the communicative context and the stimuli's personal or social reference (self, other, no reference). Crucially, one particular observation of these studies is that self-reference impacts emotional word processing during later stages of cortical processing, i.e., after an in-depth semantic analysis (N400, LPP) (see Fields and Kuperberg in this book; see also Herbert et al., 2011a,b). Moreover, the self-reference of an emotional word seems to selectively enhance activity in cortical midline structures, possibly generating an awareness, feeling or evaluation that this stimulus and its content refer to one's own emotion (see Herbert et al., 2011c). Nevertheless, the evaluation of self-related emotional words in reference to one's own feelings may not be accompanied by stronger emotional expressive behavior or by stronger physiological changes in heart rate or skin conductance: instead, it appears that appraising other-related emotional words (e.g., "his happiness") in reference to one's own feelings elicits significant changes in facial muscle activity (see Weis and Herbert, in this book).

Taken together, the results of the studies presented in Chapter 3 argue in favor of a differentiated view of embodied emotional word processing. The studies suggest that the social relevance of the emotional words needs to be taken into consideration. Interestingly, anticipating the evaluation by a communicative partner seems to be sufficient to increase the relevance of an emotional word. This seems to facilitate already early cortical processing in the EPN time window (see Schindler et al. in this book). Moreover, recent studies have extended emotional word processing to the domain of verbal fear learning and to symbolic generalization (see Bennet et al. in this book) and to grammatical aspects in political speech (see Havas and Chapp, in this book) and to the general affective meaning of a word in poetic texts (see Ullrich et al. in this book).

# WHERE ARE THE WORDS IN EMOTIONS? AFFECT LABELING, EMOTIONAL LANGUAGE ACQUISITION, MULTILINGUALISMS AND POETIC AESTHETICS

Although the results reviewed above clearly support the notion that words can elicit emotions, yet, there is another line of research showing that language processing can also regulate and change emotion perception of non-verbal emotional signals (e.g., Lieberman et al., 2011; Herbert et al., 2013; see Lindquist et al. in this book for discussion). Viewed from a developmental perspective of the human brain, emotion processing may be significantly influenced by language as soon as children learn to use words and verbal labels for emotion expression and emotion categorization (see Lindquist et al. in this book). This implies that in the adult brain, language and emotions are inextricably intertwined, influencing each other on different levels of cerebral, peripheral, subjective and behavioral responding. Due to this bidirectional link between emotion and language, experimental approaches probing learning of new emotion concepts in adults in different languages as well as approaches investigating emotion processing in mono- vs. bilinguals or multilingual speakers seem to be especially fruitful to better understand this interaction (e.g., see Caldwell-Harris; Ferré et al. in this book).

# CONCLUSION

As outlined above, the articles included in this book The Janus Face of Language: Where are the Emotions in Words and Where are the Words in Emotions? can provide a conclusive theoretical

# REFERENCES


and empirical answer to the questions raised by the Topic Editors Herbert, Ethofer, Fallgatter, Walla, and Northoff. The authors of the in total 24 articles theoretically and empirically illuminate the key aspects of the relationship between language and emotion. They provide answers to how information about an emotion is decoded from abstract stimuli such as words, and how the emotional content of a word is processed in the brain. They furthermore highlight the role bodily physiological changes and self- and socially relevant contexts play in the processing and generation of emotional word meaning.

# SUMMARY AND STRUCTURE OF THE CHAPTERS

The articles are grouped into four chapters: **Chapter 1** comprises articles with a strong theoretical focus. These articles discuss recent theoretical views that exist in explaining the emotion-language link with regard to written language. In addition, empirical research focusing on word corpora analyses is included in Chapter 1 investigating the major core affective dimensions underlying the appraisal of emotional words in different languages. **Chapter 2** comprises several experimental studies investigating the brain structures and the time course of emotional word processing. These studies also lay special focus on the effects of task-, sublexical, and intrapersonal factors. Moreover, they shed light on the questions of how affective core dimensions (e.g., emotional valence, emotional arousal or affective origin) influence emotion word processing, the interaction between words and the direction of behavior (approach vs. withdrawal). The studies summarized in **Chapter 3** extend emotional word processing to the domains of social cognition. They provide evidence that the interaction between words and emotions must also be seen in a broader context that takes intrapersonal (self-reference), social factors (senderreceiver characteristics) and the sender's communicative intentions into consideration. Finally, the studies summarized in **Chapter 4** extend the research on emotional word processing to the domains of aesthetics and poetic text, bi- and multilingualism, i.e., areas of psycholinguistic and psychological language research that have developed only recently.

# AUTHOR CONTRIBUTIONS

All authors listed have made a substantial intellectual contribution to the work, and approved it for publication.

Carretié, L., Hinojosa, J. A., Albert, J., López-Martín, S., de la Gándara, B. S., Igoa, J. M., et al. (2008). Modulation of ongoing cognitive processes by emotionally intense words. Psychophysiology 45, 188–196. doi: 10.1111/j.1469-8986. 2007.00617.x

Citron, F. M. (2012). Neural correlates of written emotion word processing: a review of recent electrophysiological and hemodynamic neuroimaging studies. 122, 211–226. doi: 10.1016/j.bandl.2011. 12.007


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Herbert, Ethofer, Fallgatter, Walla and Northoff. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Emotional words can be embodied or disembodied: the role of superficial vs. deep types of processing

*Ensie Abbassi1\*, Isabelle Blanchette2, Ana I. Ansaldo1, Habib Ghassemzadeh3,4 and Yves Joanette1*

*<sup>1</sup> Centre de Recherche, Institut Universitaire de Gériatrie de Montréal and Faculté de Médecine, Université de Montréal, Montréal, QC, Canada, <sup>2</sup> Département de Psychologie, Université du Québec à Trois-Rivières, Trois-Rivières, QC, Canada, <sup>3</sup> Department of Psychiatry, Tehran University of Medical Sciences, Tehran, Iran, <sup>4</sup> Visiting Scholar, University of Oregon, Eugene, OR, USA*

#### *Edited by:*

*Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy-Tübingen, Germany*

#### *Reviewed by:*

*Debbie L. Mills, Bangor University, UK Suzanne Oosterwijk, University of Amsterdam, Netherlands*

#### *\*Correspondence:*

*Ensie Abbassi, Centre de Recherche, Institut Universitaire de Gériatrie de Montréal and Faculté de Médecine, Université de Montréal, 4565 Queen-Mary Road, Montreal, QC H3W 1W5, Canada ensie.abbassi@umontreal.ca*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 31 January 2015 Accepted: 29 June 2015 Published: 09 July 2015*

#### *Citation:*

*Abbassi E, Blanchette I, Ansaldo AI, Ghassemzadeh H and Joanette Y (2015) Emotional words can be embodied or disembodied: the role of superficial vs. deep types of processing. Front. Psychol. 6:975. doi: 10.3389/fpsyg.2015.00975* Emotional words are processed rapidly and automatically in the left hemisphere (LH) and slowly, with the involvement of attention, in the right hemisphere (RH). This review aims to find the reason for this difference and suggests that emotional words can be processed superficially or deeply due to the involvement of the linguistic and imagery systems, respectively. During superficial processing, emotional words likely make connections only with semantically associated words in the LH. This part of the process is automatic and may be sufficient for the purpose of language processing. Deep processing, in contrast, seems to involve conceptual information and imagery of a word's perceptual and emotional properties using autobiographical memory contents. Imagery and the involvement of autobiographical memory likely differentiate between emotional and neutral word processing and explain the salient role of the RH in emotional word processing. It is concluded that the level of emotional word processing in the RH should be deeper than in the LH and, thus, it is conceivable that the slow mode of processing adds certain qualities to the output.

Keywords: emotional words, meaning access, conceptual processing, disembodied/embodied, superficial/deep, cerebral hemispheres

# Introduction

This paper concerns emotional words and how these words are processed in the cerebral hemispheres. Our previous review (Abbassi et al., 2011), based on behavioral, electrophysiological, and neuroimaging research results, indicates that both hemispheres are involved in the processing of emotional words, albeit in different and probably complementary ways. Emotional words are processed rapidly early in processing, and slowly with the involvement of attention later on; the left hemisphere (LH) and the right hemisphere (RH) are likely responsible for this early vs. later stage of processing, respectively1 . Automatic processing does not place much demand on processing resources whereas attentional processing is slow, effortful, and under one's active control. This paper aims to pinpoint the nature of emotional word processing and find the reason for the rapid vs. slow modes of processing. It shows that emotional word processing does not necessarily produce

<sup>1</sup>This statement about the role of the LH and RH in early and later stage of emotional word processing, respectively, does not mean that processing occurs only in the LH or the RH. Instead, it implies that at these stages, the center of processing is in the LH or RH: there is relatively stronger activation in these respective hemispheres.

a subjective experience of emotions (though it may); thus, a kind of superficial processing is also possible and is probably the reason for fast and automatic processing of these words in the LH. Deep kind of processing, in contrast, is likely the reason for slow processing of emotional words in the RH.

By emotional word, we refer to any word characterized by emotional connotations (e.g., "lonely," "poverty," "neglect," "bless," "reward," "elegant") or denoting a specific emotional reaction (e.g., "anger," "happy," "sadness"). Although emotional words can convey the emotions we feel, they can be used without subjective experiencing of an emotion, as well (see Niedenthal et al., 2003, for a review).We have non-verbal channels – facial expression, prosody, and body language – to communicate emotions. So is there another reason for using emotional words? How do emotional words convey emotional meanings? What underlies the rapid2 vs. slow modes of processing of emotional words? To answer these questions, it is necessary to first explore the purpose of using the language system and its relationship with semantic memory, where meanings and concepts, or knowledge about the world, are represented in the mind. This understanding is fundamental to an understanding of emotional words and concepts and how these words are processed in the cerebral hemispheres. Then we can describe the role that emotional words play in human communication and the reason for the automatic vs. attentional modes of processing of emotional words in the LH and RH.

Accordingly, in the first section of the paper, we present two approaches – disembodied and embodied – to concept representation and meaning access and then an integrative approach that combines the capabilities of both. Next, the linguistic and simulation (i.e., perceptual- or image-based) systems, which are involved in conceptual processing and meaning access, are presented. By the linguistic system here we refer to the system for which linguistic forms3 are also important; thus, meaning is mainly represented in the simulation system (Barsalou et al., 2008). After that, we describe research results that show how conceptual processing and meaning access occur using the linguistic and simulation systems. Since emotional words are generally more abstract, we will then discuss the discriminating features of abstract and concrete words. Then, we present evidence demonstrating that meaning access during the early, automatic processing of emotional words is superficial and is accomplished by the linguistic system, whereas meaning access during the later attentional processing of emotional words is deep, and involves imagery and the content of autobiographical memory. The paper will end with a proposed framework that attributes the superficial mode of emotional word processing to the LH and the deep mode to the RH.

# Meaning Access and Conceptual Processing

# Semantic Memory and Approaches to Concept (Meaning) Representation

We store knowledge we have acquired about the world, including concepts, facts, skills, ideas, and beliefs, in a division of long-term memory known as *semantic memory* (Glaser, 1992; Martin, 2001; Thompson-Schill et al., 2006). Because concepts play important roles in different cognitive operations, semantic memory is sometimes known as conceptual memory or the conceptual system (e.g., Barsalou, 2003b). Unlike *episodic memory,* which is a person's unique memory of events and experiences (e.g., times, places), semantic memory consists of memories shared by members of a culture (Tulving, 1972, 1984). For example, whereas remembering the name and breed of our first dog is dependent on episodic memory, knowing the meaning of the word "dog"4 and what a dog is relies on semantic memory. Thus, studying semantic memory and conceptual processing is a window that guides us toward the way in which word meanings are accessed. When we have a concept for something, it means that we know its meaning. How is a concept stored in semantic memory? What is the nature of the concepts or meanings stored in the mind? Cognitive literature introduces two main approaches in this respect: *disembodied* or *symbolic* (amodal) and *embodied* (modal).

# Disembodied Approach

The symbolic approach, which corresponds to the more traditional view, assumes that there is no similarity between components of experience – objects, settings, people, actions, events, mental states, and relations – and concepts stored in the mind. This approach proposes that perceptual –sensory, motor, introspective (e.g., mental states, affective, emotional5 ) – information about components of experience is transduced (redescribed) into arbitrary (language-like) symbols such that the final concept contains no reference to the actual experience *per se* (e.g., Fodor, 1975; Pylyshyn, 1984; see Barsalou and Hale, 1993, for review). Thus, abstract symbolic codes constitute concepts and perceptual experiences do not play a role in knowledge representation. Semantic networks which represent semantic

<sup>2</sup>In this paper, we use *automatic* and *rapid* and also *attentional* and *slow* interchangeably.

<sup>3</sup>A category of things distinguished by some common characteristic or quality (www*.*oxforddictionaries*.*com). Indeed, language is divided into content (meanings), form (i.e., rules, categories, structures), and use (pragmatics or the social use of language).

<sup>4</sup>Throughout this paper, when we talk about words vs. concepts, quotation marks will be used to indicate words (e.g., "dog") and uppercase will represent concepts (e.g., DOG). Italics are used to introduce new key technical terms or labels.

<sup>5</sup>Although the terms *emotion* and *affect* are mostly used interchangeably, it is important not to confuse them. Affects are composed of responses involving respiratory system, blood flow changes, facial expressions, vocalizations, and viscera over which one has little control (Tomkins, 1995, p. 54). Affects, indeed, represent the way the body prepares itself for action in a given circumstances. Feelings and emotions, in contrast, are personal, biographical, and social, in the sense that every individual has his own set of sensations which are compared with previous experiences. While affects endow intensity to what we feel, feelings and emotions help us interpret and recognize the quality (i.e., pleasant, unpleasant) of our experiences and make our decision making activities more rational. Namely, the experience of emotions is highly subjective (Citron, 2012; Citron et al., 2014). For example, while we can speak broadly of certain emotions like anger, our own experience of this emotion is unique and probably multi-dimensional ranging from mild annoyance to blinding rage. Therefore, the term *emotion* seems to better explain subjective experiences that we refer to in this paper.

relations between concepts constitute one example of this mode of knowledge representation (Collins and Loftus, 1975; Posner and Snyder, 1975).

### Embodied Approach

The last two decades have witnessed a surge of interest in alternative models of concept representation clustered under the label *embodied* or *modal* approach (e.g., Barsalou, 1999; Wilson, 2002; Gallese and Lakoff, 2005; Glenberg, 2010). This approach assumes that various sensory (visual, auditory, tactile, etc.), motor, and introspective information about external world experiences is depicted in the brain's modality-specific systems. Indeed, the claim that concepts are embodied means that they are formed as a result of interactions with objects, individuals, and the real world as a whole in modality-specific brain areas that are responsible for processing the corresponding perceptual information (Zwaan, 2004).

Thus, when an entity (e.g., object, event) is experienced, it activates neurons in the sensory, motor, and affective neural systems. For example, when one sees a car, a group of neurons fires for color, others for shape, a third group for size, and so forth, to represent CAR in one's vision. Regarding the auditory and tactile sensory modalities, analogous patterns of activation can occur to represent how a car might sound or feel. Moreover, activation of neurons in the motor system represents actions on the car, and activations that occur in emotion-related areas like the amygdala and orbitofrontal regions represent emotional reactions toward the car (Barsalou, 1999, 2003a, 2008, 2009). So concepts have the same structure as perceptual experiences.

# Integrative Approach

The cognitive literature has recently appeared to provide support for a middle approach to concept representation that combines the two approaches described above (e.g., Vigliocco et al., 2004; Barsalou et al., 2008; Louwerse, 2008, 2011; Simmons et al., 2008; Louwerse and Hutchinson, 2012). Indeed, considering the organization of the nervous system, it is hard to accept either the pure disembodied or the pure embodied approach. The nervous system has not only modality-specific (unimodal) areas but also supramodal areas. This middle position proposes that concept representation involves some form of symbolic information, along with the activation of sensory, motor, and emotional areas (e.g., Machery, 2007; Mahon and Caramazza, 2008; Meteyard et al., 2012). To be specific, meaning is both grounded in relation between words and in perceptual experiences (e.g., Vigliocco et al., 2004, 2009). There might, however, be important differences; when meaning access is the product of relation between words, it could be thought of as superficial while meaning access resulting from activating perceptual experiences may be deeper, an idea we come back to later.

Binder and Desai (2011) suggest the term *embodied abstraction,* which means that conceptual representation consists of several levels of abstraction from sensory, motor, and emotional input. The top level is highly abstract and activation in this level is sufficient for familiar (already categorized) processes such as lexical decision tasks (Glaser, 1992). Consequently, processing does not involve activation of modality-specific areas. In contrast, when a deep type of processing is necessary or possible, such as when the exposure duration of words is long (e.g., Simmons et al., 2008), perceptual areas play a greater role in performing a task.

Based on this integrative approach, we can expect words to first create activation in supramodal areas that are not modalityspecific, in areas such as the anterior temporal lobe, which has been described as the neural substrate behind semantic memory (Simmons and Barsalou, 2003; Kiefer et al., 2007a,b; Patterson et al., 2007; Mahon and Caramazza, 2008; Pulvermüller et al., 2010). Activation in these areas is typically left-lateralized, whereas bilateral activation can be expected when perceptual areas come into play (e.g., Van Dam et al., 2010). This approach likely implies two levels of processing: one level that is rather superficial and another level that is deep and during which activation spreads to modality-specific areas (e.g., visual cortex, auditory cortex, motor cortex). The main point here is that word processing relies on both amodal and modality-specific areas.

# Two Systems Involved in Conceptual Processing and Meaning Access

Most researchers working in the fields related to conceptual processing accept that two systems – a language like system (i.e., linguistic system) and a perceptual- or image-based system (i.e., simulation) – are involved in conceptual processing (e.g., Paivio, 1971, 1986, 1991; Glaser, 1992; Barsalou, 2008). A detailed investigation of the role of these two systems is found in Barsalou's (2008) LASS theory – linguistic and situated simulation. According to this theory, the linguistic system helps us communicate the concepts that we have stored in our mind and create a network containing semantically associated words. This network encompasses categories of words and relations among concepts.

Simulation or reenactment is the process by which concepts re-evoke or produce perceptual states present when perceiving and acting in the real world. In other words, our perceptual system can become active in the absence of external world entities. Researchers consider simulation to be the factor that supports the spectrum of cognitive functions from perception to thought and reasoning (Barsalou et al., 1999, 2003; Martin, 2001; Barsalou, 2003b). For example, being able to name the different colors of an APPLE (red, yellow, green) is possible due to simulation. Simulation is also considered to be *situated* (Barsalou, 2003b; Yeh and Barsalou, 2006). That is, during simulation not only the target object (e.g., APPLE) is simulated, but also settings, actions, and introspections. This triggers an experience of *being there*. Thus, when an APPLE is simulated, it occurs in a setting like a garden, with apples hanging from the branches of a tree, with someone eating it, probably experiencing a pleasant taste.

As the **Figure 1** illustrates, the LASS theory holds that when a word is presented (heard or seen), both the linguistic and simulation systems become active immediately to access its meaning; however, the activation of the linguistic system peaks before that of the simulation system. The reason is probably that the linguistic forms of representation are more analogous to

the perceived words than the simulation of related experiences. Barsalou et al. (2008) claimed that, although the simulation system existed long before human beings evolved, the use of the linguistic system was what caused humans to enhance their cognitive performance. It is as though the linguistic system appeared later in human development in order to control the simulation system and increase this system's ability to represent non-present situations.

Yet Barsalou et al. (2008) proposed that the activation of the linguistic system is rather superficial because meaning is principally represented in the simulation system. For example, "car" first activates "vehicle" and "automobile" and then these associated linguistic forms act like pointers to related conceptual information. This process causes simulation to occur and processing to become deep and deeper. Because the simulation of related conceptual information proceeds more slowly than the activation of associated words, the linguistic stage peaks earlier than the simulation stage.

Research shows different combinations of the activity of the linguistic and simulation systems underlie a wide variety of tasks. When a superficial mode of processing is sufficient for adequate task performance, processing is supported mostly by the linguistic system and little by the simulation system. In contrast, when the linguistic system cannot complete a task on its own or there is opportunity (more time) for additional processing, attention shifts to the simulation system which takes extra time.

#### Evidence for Mixtures of Linguistic and Simulation Systems in Conceptual Processing and Meaning Access

The cognitive science literature provides evidence for a superficial mode of processing managed by the early acting linguistic system and a deep mode of processing managed by the later-acting simulation system. Two tasks have provided evidence for this difference in depth of processing: the property verification task and the property generation task. We next describe the results of research employing these two tasks, including one recent study using event-related potentials (ERPs), to further understand the nature of conceptual processing.

# *Evidence from the property verification task*

A property verification task (e.g., Solomon and Barsalou, 2001, 2004; Kan et al., 2003; Pecher et al., 2003; Van Dantzig et al., 2008) is a passive, recognition-oriented task, in which the participant reads a concept word (e.g., an object name such as "chair") presented on a computer screen and verifies whether the next presented word is a true or false property of that concept (e.g., "facet" vs. "seat"). Response time and accuracy are measured. Typically, the simulation system is expected to be involved in responding to this task. That is because conceptual information must be retrieved that identifies whether the property is a part of the concept. An interesting finding is that when the property of a target trial (e.g., LEMON-"sour") relates to a different modality than the previous trial (e.g., BLENDER- "loud"), switching occurs between sensory modalities. This incurs a processing cost: slower and less accurate responses (Pecher et al., 2003) because attention must switch from one modality to another.

Nevertheless, task condition may cause participants to mostly rely on the linguistic system. That is, when information in the linguistic system is sufficient, participants do not utilize the simulation system (Solomon and Barsalou, 2004). On true trials, the given property is always part of the concept (e.g., ELEPHANT-"tusk," SAILBOAT-"mast"). Consequently, the type of false properties presented is the factor that determines whether the processing is superficial or deep. On false trials, if the given property is unrelated to the concept (e.g., AIRPLANE- "cake," BUS-"fruit"), the involvement of the linguistic system is sufficient for adequate performance. That is because correct responses in this condition are highly correlated with linguistic associativeness; i.e., object and property being associated is equal to a *true* response and being not associated is equal to a *false* response. Thus, participants consult only the linguistic system and processing is superficial.

In contrast, when true trials (e.g., ELEPHANT-"tusk," SAILBOAT-"mast") are mixed with false trials in which the property is associated to the concept but is not a part of its concept (e.g., TABLE-"furniture," BANANA-"monkey"), consulting the linguistic system is not sufficient. Consequently, participants must simulate perceptual information for adequate performance. Therefore, research shows participants are quite faster (more than 100 ms) to verify the same true trials when the false trials are unrelated than when they are related (Solomon and Barsalou, 2004). This is evidence that the linguistic system can act faster and produce responses earlier than the simulation system.

#### *Evidence from the property generation task*

A property generation task (e.g., Wu and Barsalou, 2009; Santos et al., 2011) is an active, production-oriented task, in which a word for a concept (e.g., "table") is presented to the participant who is asked to verbally generate its characteristic properties (e.g., "legs," "surface," "eating on it"). This task is an important tool in the psychology of concepts; it provides a window into the underlying representation of a concept. The properties that participants produce can reveal which system is involved in meaning access. Since deep retrieval of a concept involves simulation, experts in concepts believe that property generation involves perceptual representation.

In a series of experiments, Santos et al. (2011) gave participants words like "car," "bee," "throw," and "good," and asked for the following word, that is, what other words came to mind immediately. The words produced in a 5-s period (usually 1–3 words) were analyzed. Analysis of the responses showed the linguistic origins of the various words produced such as *compounds* (e.g., the response "hive" to "bee" comes from "beehive"), *synonyms* (e.g., "automobile" in response to "car"), *antonyms* (e.g., "bad" in response to "good"), *root similarity* (e.g., "selfish" in response to "self "), and *sound similarity* (e.g., "dumpy" in response to "lumpy").

In contrast, when participants were asked what characteristics are typically true of (for instance) "dogs" and responses given during a 15-s period were analyzed, most of the words produced originated in the simulation system. In fact, the first responses were still linguistic-based, but they were followed by responses originating in the simulation system. Thus, later responses described aspects of situations such as *physical properties* (e.g., "wings" in response to "bee"), *setting information* (e.g., "flowers" in response to "bee"), and *mental states* (e.g., "boring" in response to "golf "). Overall, the results suggest that both a faster-acting linguistic system and a slower-acting simulation system are involved in conceptual processing.

Similar findings were obtained when Simmons et al. (2008) administered a property generation task in an MRI scanner. In the first session, participants generated the typical properties of each concept to themselves for 15 s (property generation task), and in the second session, they were given some other words and generated *word associates* for each one for 5 s (word associate task). In this session, for six presented words, participants were given 15 s to imagine a situation that contained the related concept [situated simulation (SS) task]. For example, for BEE they might imagine a garden with a bee buzzing around flowers and then flying toward a hive and so forth.

To analyze the data, each 15-s period of response time for a single word was divided into two 7.5-s periods, an early one and a later one. The results showed three regions of overlap between the early stage of the property generation task and the word association task: Broca's area (the left inferior frontal gyrus), left inferior temporal gyrus, and right cerebellum. These regions are responsible for linguistic processing and generating twoword associations (right cerebellum). A different set of regions overlapped between the later stage of the property generation task and the SS task: bilateral posterior areas, precuneus, right middle temporal gyrus, and right middle frontal gyrus. These regions are involved in imagery, episodic memory, and situation representation. Thus, fMRI research corroborates findings from behavioral studies: properties bearing a linguistic relation to presented words were produced earlier than properties bearing a simulation relation.

#### *Evidence from time course analysis*

A recent ERP investigation conducted by Louwerse and Hutchinson (2012) also suggests different time course of activation for linguistic and simulation processing, with evidence that the linguistic system reaches its peak of activation earlier that the simulation system. Half of the participants were assigned to a semantic judgment task and the other half to an iconicity judgment task. Each condition employed the same word pairs, half with an iconic relationship in which the two words were presented vertically in the same order that they appear in the world (e.g., "sky" above "ground") and the other half with a reverse-iconic relationship (e.g., "ground" above "sky"). In the semantic task, participants judged whether the words were related in meaning; in the iconicity task, participants judged whether the words appeared in the same configuration as in the real world (i.e., a yes/no response was required for both tasks).

The results showed the involvement of both linguistic and simulation systems in responding to both tasks, with the linguistic system being more active during the semantic task and the simulation system during the iconicity task. Participants took a mean 1809 ms to respond to task stimuli. Source analysis using Low Resolution Brain Electromagnetic Tomography (LORETA) showed that, for each trial, activation started (within 300 ms after stimulus onset) around the left inferior frontal gyrus (FC5, F7, T7) and then continued (within 1500–1800 ms after stimulus onset) bilaterally in posterior areas of the brain (O1, O2, P7, P8).

Overall, research findings suggest that, when words are presented, two systems of linguistic and simulation come into play immediately to access their meanings. However, the activation in the linguistic system peaks early on; this type of processing is rather superficial and involves activation of a word's associates in the LH. In contrast, the activation in the simulation system peaks later; this type of processing is deep and involves SS of the referent of a word, for which bilateral activation may be required.

#### Abstract vs. Concrete Word Processing

A key topic of discussion concerning word and concept processing relates to abstract words. This is important because abstract words, on average, tend to have more emotional properties than concrete words (Kousta et al., 2011; Moseley et al., 2012; Sakreida et al., 2013; see Meteyard et al., 2012, for a review). Hence, we need to discuss the differentiating characteristics of abstract words. Some concept experts (e.g., Paivio, 1971, 1986) claim that abstract concepts are represented only through associations with other words, that is, through the linguistic system. This notion may have been reinforced by the results of neuroimaging research, which demonstrate more activation of the LH during the processing of abstract words compared to concrete words (see Sabsevitz et al., 2005, for a review).

However, Barsalou (1999) casts doubt on this notion and attributes this notion to the kind of task generally used to study abstract words. He believes that abstract words cannot be learned without the contribution of SSs. In effect, abstract concepts are represented in a wide variety of situations featuring predominantly introspective (emotional) and social information, whereas concrete concepts are represented in a restricted range of situations featuring chiefly sensory and motor information. This difference causes the situation to play a critical role in deep processing of abstract words. That is to say, both linguistic and simulation systems are both involved in the processing of abstract words; nevertheless, the type of task determines whether the linguistic or simulation system is active.

In one study conducted by Barsalou and Wiemer-Hastings (2005) a property generation task was used to compare representations of abstract concepts (e.g., TRUTH, FREEDOM, INVENTION) and concrete concepts (e.g., BIRD, CAR, SOFA). Simulation was shown to be also important in the representation of abstract concepts. When participants were asked to generate properties of concrete and abstract concepts, in both cases, they produced relevant information about agents, objects, settings, events, and mental states. However, the emphasis placed on these different types of information was different. For concrete concepts, the major focus was on information about *objects* and *settings*. In contrast, for abstract concepts, more information about *mental states* and *events* was produced.

As well, neuroimaging research demonstrates that abstract concepts are represented by distributed neural patterns more than concrete concepts (Barsalou and Wiemer-Hastings, 2005; Wilson-Mendenhall et al., 2011; Moseley et al., 2012). Thus, for an abstract word like "convince," a variety of situations (e.g., a political situation, a sports situation, a school situation, etc.) may come to mind to represent events in which one person (agent) is speaking to another or others in order to change their mind. It seems to be difficult for people to process an abstract word without bringing relevant situations into their mind (Schwanenflugel, 1991). For a concrete word like "rolling," in contrast, the processing is simpler and more focused. Thus, the role that a task plays is critical here: if the researcher was using a lexical decision task, which typically encourages a superficial level of processing, there is more possibility that an abstract word will access only the information provided by the linguistic system (Glaser, 1992; Kan et al., 2003; Solomon and Barsalou, 2004). The involvement of the simulation system, on the other hand, requires a task that encourages deep processing (Wilson-Mendenhall et al., 2013).

This paper concerns emotional words, many of which are abstract. So we can predict that the above descriptions of word and concept processing, which show words can be processed superficially or deeply, will apply to emotional words and concepts, as well. A review of the findings related to emotional word processing should help verify this prediction.

# Emotional Word Processing and Meaning Access

In the second part of this paper we focus on emotional word processing. We intend to show that, similarly to what we have established thus far for neutral words, a superficial type of processing also occurs for emotional words. Although emotional words possess emotional component, their processing does not necessarily results in emotional states (e.g., Innes-Ker and Niedenthal, 2002; Havas et al., 2007). That is, creating emotional states and communicating emotional feelings may not be the only or primary outcome of using emotional words. We have other channels – prosody, facial expression, and body language – to communicate our feelings. That is why some researchers believe that language is a tool by which we can control our emotions (see Niedenthal et al., 2003, for a review). The linguistic system, which underlies superficial word processing, likely leads to conveying information about emotions without necessarily experiencing emotional states. We suggest that the involvement of an imagebased system is necessary to experience emotional states as a result of emotional word processing.

Thus we suggest that, similarly to what occurs for neutral words, the involvement of a perceptual- or image-based system is likely necessary for deep processing of emotional words. There is, however, one important difference between deep processing of neutral and emotional words. In deep processing of emotional words, emotional properties6 likely have a crucial influence on the outcome of the processing. This notion does not deny the important role that perceptual (i.e., visual, auditory, etc.) properties play in this process. As a result, we can imagine deep processing of emotional words which may involve reactivation7 of emotional properties, to be able to create emotional states in the individual (e.g., Havas et al., 2007). This does not mean that deep processing is always along with reactivation of emotional properties, but it keeps open different possibilities for the reactivation of emotional and perceptual properties: for example, reactivation of only perceptual properties without emotional properties, reactivation of one perceptual property (for instance, visual) along with emotional properties, etc.

Therefore, the principal question in this part is: what does happen when emotional words are presented? In the following sections, after introducing emotional concepts, related approaches, and an overview of the lateralization of emotional word processing, we attempt to answer this question and find the reason for rapid vs. slow modes of processing of emotional words in the cerebral hemispheres.

# Emotional Concepts

Emotional concepts (e.g., FIGHT, SPIDER, JOY, ANGER) represent knowledge about emotions, that is, the meaning of emotional information. They hold information about behaviors associated with emotions (e.g., actions), how emotions are elicited

<sup>6</sup>Properties or attributes perceived by emotional system. In general, emotional properties are considered as part of perceptual properties (e.g., Barsalou, 1999). Here, because the topic is directly related to a type of stimuli with emotional component (emotional words), we bring emotional properties into our attention, in order to be able to beter discuss their outcomes.

<sup>7</sup>Since simulation is mostly considered to be an automatic process and, as we will see later on, deep processing of emotional words probably requires more attentional components due to the role that imagery plays in this processing (Pecher et al., 2009), we use the term *reactivation* as a substitute for *simulation* for deep processing of emotional words. We wish to be sufficiently cautious about the automatic vs. attentional nature of simulation and imagery, respectively.

(e.g., situations), and subjective experiences and bodily states that occur when we are in an emotional state (Niedenthal, 2008). Based on the information presented above, there should be a link between emotional words and emotional concepts, i.e., emotional words should serve as a window to access emotional concepts. Reactivating the emotional properties of emotional concepts can be expected, in turn, to lead to subjective emotional states.

Regarding the nature of emotional concepts, the literature on emotion, perhaps under the influence of cognitive studies, has adopted two approaches: disembodied or amodal and embodied or modal. In the disembodied approach (Teasdale, 1999; Philippot and Schaefer, 2001), emotions are represented in an amodal fashion, devoid of their perceptual and emotional properties. Namely, emotional information that is initially encoded in different modalities (i.e., visual, auditory, etc.) is represented and stored in the conceptual system separate from its perceptual and emotional properties. Thus, just as people know that CHAIR possesses the properties of seat, back, and legs, they know that ANGER comprises the experience of frustration, a desire to fight, maybe a clenched fist, and a rise in blood pressure.

On the other hand, the embodied or modal approach proposes that sensory, motor, and emotional states triggered during an encounter with an emotion-evoking stimulus (e.g., a SNAKE) are captured and stored in modality-specific brain areas (Damasio, 1989; Damasio and Damasio, 1994; Gallese, 2003; Niedenthal et al., 2005a,b; Barrett, 2006). Later, during reactivation of the experience (e.g., thinking about a snake), the original pattern of sensory, motor, and emotional states can be relived. More specifically, emotional states (e.g., feeling happy, sad, angry) that are experienced during interaction with stimuli having pleasant or unpleasant properties are stored and later reactivated. Thus, like other concepts, processing of emotional concepts is accompanied by reactivation of subjective experiences in modality-specific areas in the brain (Chao and Martin, 2000; Pecher et al., 2003; Vermeulen et al., 2007).

Similar to word and concept processing in general, the literature provides evidence for rapid simultaneous activation of the linguistic and emotional areas in the LH when emotional words are presented (e.g., Herbert et al., 2011; Moseley et al., 2012; Ponz et al., 2013; see Abbassi et al., 2011, for review), in addition to a slower activation that occurs later on in the RH. Our previous review (Abbassi et al., 2011) suggests that during the automatic processing of emotional words, in tasks such as lexical decision where deep processing is not required, early ERP components like early posterior negativity (EPN)8 which occur within 300 ms of stimulus onset, appear. For this type of processing, in addition to language areas including the inferior frontal (Broca's area), inferior parietal, and superior temporal (Wernicke's area), limbic areas including the orbitofrontal, prefrontal, amygdala, posterior cingulate, and insular cortex are also activated.

In contrast, when an explicit task like emotional Stroop task9 is used, later ERP components like late positive component (LPC) that occur more than 300 ms from stimulus onset, appear and processing is stronger in the RH. This type of processing is slow, requires attention, and also the involvement of some other areas including anterior cingulate and dorsolateral prefrontal cortex (Abbassi et al., 2011, for review). Accordingly, the same as neutral word processing, it appears that emotional word processing requires two systems: the linguistic system which peaks early on and a second system, we will introduce it shortly, which (like simulation) involves image reproduction, but with some specificities relative to neutral words. The results of the relevant research introducing the capabilities of these two systems follow.

# Two Levels of Emotional Word Processing Superficial Level

Research suggests that emotional word processing is not always accompanied by feeling an emotion. Indeed, an emotional word can be processed superficially, such that no subjective experience of emotion (emotional feeling) becomes involved. That is to say, although emotional words possess emotional properties, their processing does not necessarily lead to feeling an emotion.

One task that demonstrates this finding is the *sentence unscrambling task* (e.g., Srull and Wyer, 1979; Bargh et al., 1996; Innes-Ker and Niedenthal, 2002; Oosterwijk et al., 2010). In this task, participants are presented with a series of words in random order and asked to construct grammatically correct sentences out of a subset of the words. Critical sentences are intended to prime a specific pleasant or unpleasant concept (e.g., HAPPINESS, SADNESS). For example, in the study conducted by Innes-Ker and Niedenthal (2002), 30 four-word sentences that described behaviors, situations, and reactions associated with happy or sad feelings were used. A fifth word was added to each sentence to create groups of five scrambled words. The connotation of this word was the same as the sentence (e.g., "the guest felt satisfied" as the sentence and "ease" as the filler). Fifteen sentences with neutral content were also added to each list to control for the bias in favor of the intended emotional concept. Participants were asked to construct a four-word sentence out of each subset.

After performing the task, participants completed a self-report measure of emotional state and also a lexical decision experiment. The results indicated that unscrambling emotional sentences did not affect participants' emotional state. Yet, performing the task was effective in priming semantically related words having emotional component, because participants made faster lexical decisions about words that were congruent with the activated concept than about incongruent words (e.g., "joke" primed "sunbeam," not "speech"; "tears" primed "disease," not

<sup>8</sup>This negative potential occurs in posterior scalp regions, 200–300 ms after word onset, for both negative and positive high-arousal words. It has been attributed to the arousal feature of emotional words and appears predominantly in the left occipitotemporal region (Kissler et al., 2007).

<sup>9</sup>The emotional Stroop task is a version of the standard Stroop task (Stroop, 1935) in which participants are required to respond to the ink color of a color word while ignoring its meaning (e.g., the word *green* written in red ink). Since reading is an automatic process, naming the color in which a word is written requires the allocation of attention and, hence, causes longer naming times (the Stroop effect). Similarly, in the emotional Stroop task, naming the color of an emotional word takes longer than naming the color of a neutral word (the emotional Stroop effect). This effect reflects the fact that attention is captured by the emotional content of words (Williams et al., 1996).

"breath"). Thus, participants could encounter emotional words and construct sentences with an emotional meaning, but without reactivating that meaning sufficiently to trigger an emotional subjective experience.

Here, we need to make a distinction between high-level subjective emotional experience that we refer to in this paper and low-level affective or arousal changes that seem to occur automatically early in processing (e.g., Kissler et al., 2007). Although the sentence unscrambling task does not evoke subjective emotional states, research shows it can evoke lowlevel affective changes in the body and face (Oosterwijk et al., 2010). This type of (low-level) changes likely gives intensity to our subjective experiences, i.e., what we feel later on5. Thus, following Barsalou et al. (2008), we believe two systems (linguistic and image-based) are activated when emotional words are presented. The linguistic system, however, peaks before the image-based system. That is, during superficial processing of emotional words, in addition to word forms (linguistic system), arousal features are also accessed; these features likely potentiate subsequent emotional feelings (Oosterwijk et al., 2010; Citron, 2012; Citron et al., 2014).

So emotions are not necessarily experienced when we encounter emotional words. That is probably because emotional words are processed only superficially at first. In order to feel an emotion, it appears that the brain areas responsible for emotion must reactivate an emotional experience. Accordingly, Niedenthal et al. (1994) suggested that emotional knowledge is represented at three levels. The first is the *emotion lexicon level,* which includes words; this level is necessary for *encoding* perceptual and emotional experiences. The second level is the *conceptual level,* which contains memories of emotional experiences. The third level is the *somatic level*; at this level, feedback from the body is recognized and bodily changes affecting the autonomic nervous, endocrine, and muscular systems are experienced10.

Therefore, reactivation of emotional states is likely a prerequisite for subjective experiencing of an emotion; activation of associated words does not lead to emotional feelings. If we know that concepts are grounded not only in sensory and motor experiences but also in emotional experiences, we can expect reactivation of emotional experiences, similar to the reactivation of sensory and motor experiences, to occur (Barsalou, 1999).

#### Deep Level

#### *Evidence from the property verification task*

Researchers working in areas related to emotion have used the property verification task to show that the emotional properties of concepts can be reactivated using the same system that supports emotional responses to an object or event. For example, in Vermeulen et al.'s (2007) study, not only perceptual (e.g., visual, auditory) but also emotional (pleasant and unpleasant) properties of concepts were taken into consideration. Half of the trials were constructed of concepts paired with properties coming from the same modality as in the previous trial (e.g., TRIUMPH- "exhilarating"/COUPLE-"happy") and the other half of concepts paired with properties coming from different modalities than in the previous trial (e.g., FRIEND-"tender"/TREASURE-"bright"). In such a task, verifying, for instance, that TRIUMPH can be "exhilarating" or a COUPLE can be "happy" involves reactivating emotional properties in the emotional system whereas verifying that a FRIEND can be "tender" after verifying that a TREASURE can be "bright" reactivate two different systems (the emotional system for the former and the visual system for the later).

Thus, similar to the switching cost for neutral pairs that we reviewed earlier (e.g., Pecher et al., 2003), the results showed slower reaction times and higher error rates when judgments required participants to switch modalities, that is, when trials with emotional properties were preceded by trials with perceptual properties. This finding suggests that, in order to verify an emotional property, this property needs to be reactivated by the emotional system.

### *Evidence from the property generation task*

Property generation tasks have yielded the same conclusion as property verification tasks. Since emotional words are more abstract, we would expect participants to generate and focus on situations, events and introspective (including emotional) properties when a property generation task is used (Martin and Chao, 2001). Accordingly, when Oosterwijk et al. (2009) asked participants to generate properties of the emotional words "pride" and "disappointment," most of the generated words described situations, personal attributions, events, and associated reactions rather than agents and objects.

For "pride," words or phrases like "school," "sport," "good marks," "winning a game," "did well," "applause," "throwing a party," "feeling happy," and "significant others (parents, friends, family)" were produced. Likewise, for "disappointment," words or phrases like "doing badly," "losing," "failing psychophysiology," "getting an F," "exams," "driving test," "shame," "fear," "feeling angry," and "depression" were generated. As indicated, almost all the words generated referred to situations, actions, and introspective states, and many referred directly to emotional states. Producing words or phrases like "feeling happy" even suggests the possibility that the participant may experience an emotion (Barsalou, 1999).

One point that merits special attention and seems to be indicated by the results is the likely activation of autobiographical memory when emotional words are processed. Here, we need to consider the differentiating feature of autobiographical memory and episodic memory. In fact, some researchers treat autobiographical and episodic memory as synonyms, but others believe they should be treated separately (e.g., Tulving, 1983; Conway, 1990; Cabeza et al., 2004; McDermott et al., 2009; see Marsh and Roediger, 2012, for a review). They say that memory for events with specific times and places should be referred to as episodic memory, whereas autobiographical memory is related to our personal history in which priorities are given to emotional properties, not to specific times and places: memories, for instance, of our first-grade experiences, of learning to drive a car, of friends we had in university, or of grandparents. Conway (1990) argues that autobiographical memory plays a pivotal role in the representation of emotional information.

<sup>10</sup>Only the first two levels seem to be relevant to the topic of this paper.

In the next section, we discuss autobiographical memory which seems to provide the content for imagery11 system and whose role in emotional word processing is likely comparable with semantic memory which provides content for the simulation system. While literature suggests that semantic memory is likely centered in the LH, the concentration of autobiographical memory seems to be in the RH (see Cabeza and Nyberg, 2000, for a review). We suggest this system is crucially involved in deeper processing of emotional words.

## Imagery, Autobiographical Memory, and the RH

As mentioned above, reactivation (simulation) of stored information appears to be necessary for deep processing of word stimuli. We also know that a slow type of processing for which attention is necessary and which is concentrated in the RH probably occurs for emotional words (Abbassi et al., 2011, for review). So, the question here is: what is the factor that is comparable to simulation, i.e., involves image reproduction, and for which the RH plays a critical role? According to literature, imagery has all these features (Holmes and Mathews, 2005, 2010; Holmes et al., 2008). Imagery is, in fact, a process that creates a mental image for the individual using different senses. Thus, it allows one to see, hear, smell, and feel different components of a situation (i.e., people, settings, actions, *...*) (Kosslyn et al., 2006). A distinguishing feature of emotional word processing is that image-based processing, in addition to including perceptual information, likely involves reactivated emotional properties based on the involvement of autobiographical memory. (see Cabeza and Nyberg, 2000, for a review). The suggestion is that autobiographical memory plays a pivotal role in the representation of emotional information (Conway, 1990).

In fact, the relationship between imagery and emotion is mediated by autobiographical memory. That is to say, a link between imagery and autobiographical memory is responsible for the emotional outcomes of image use. Holmes et al. (2008) attribute the more powerful impact of imagery on emotion, as compared to words, to three possible reasons: (1) the emotion system existed long before the language system12; (2) images share perceptual properties and details with actual experiences (Kosslyn et al., 2001); and (3) autobiographical memories, including emotional states experienced during interactions with the real world, are first stored in the form of images, not language. Images are therefore likely to be effective cues for

reactivating emotional experiences. To be precise, imagery has a stronger emotional impact than purely linguistic forms because it has privileged access to the emotional experiences stored in autobiographical memory.

Research even suggests a causal relationship between imagery and emotion (Holmes et al., 2008); this implies that a more direct link exists between imagery and emotions, than between words and emotions. In this research, participants are given a combination of pictures and words conveying an emotional meaning (e.g., a picture of a flight of stairs and the word *fall* written beneath) and asked to rate their contents, without receiving any instructions concerning which modality to use. Results show that participants base their responses primarily on pictures, not words. Moreover, the more participants use images to respond, the more likely they are to report experiencing an emotional state.

Taking lateralization into consideration, the distinguishing feature of emotional word processing, i.e., imagery and the contribution of autobiographical memory should be responsible for the role that the RH play in this processing. Since the center of autobiographical memory retrieval is likely located in the RH (Tulving et al., 1994; Nyberg et al., 1996; Perecman, 2012; see Cabeza and Nyberg, 2000, for a review) and based on the aforementioned points, we can propose that the salient role of the RH in emotional word processing may relate to the retrieval of the contents of autobiographical memory surrounding past events and personal histories, which feeds imagery of emotional words. Pinpointing the role of the corpus callosum might also highlight the RH's role in this process. Indeed, research shows that individuals with congenital absence of the corpus callosum produce language that contains almost no words denoting emotions (Turk et al., 2010). This deficit presumably causes the LH, which also has the role of generating language units such as words, to have reduced access to autobiographical memory contents in the RH. This deficit, which is called as *alexithymia,* has also been reported in patients with surgical disconnection of the cerebral hemispheres (Tenhouten et al., 1985a,b,c). Thus, there seems to be robust evidence supporting the salient role of the RH in deep processing of emotional words in which perceptual and emotional properties are involved, which coincides with an important role of the RH in autobiographical memory.

# The Suggested Framework: How Does Superficial vs. Deep Processing Occur?

One outcome of the above-mentioned superficial vs. deep processing types is that when we encounter an emotional word, we might access its semantically associated words, but not its perceptual and emotional properties. Following Barsalou et al. (2008), we believe that emotional words first access only semantically associated words, and that this process is focused mainly in the LH. When this process involves mental imagery and access to autobiographical memory contents, deeper processing occurs. This type of processing presumably occurs mainly in the RH and may be followed by experiencing a subjective emotional state.

<sup>11</sup>Concept experts always compare simulation with imagery. While simulation is essentially an automatic process, attention is involved in imagery (Pecher et al., 2009). This conclusion is based on the results of research comparing these two processes (e.g., Wu and Barsalou, 2009). Participants in the imagery group were explicitly asked to form an image of a concept when responding to task items, while participants in the control group were required only to think about that concept and were not instructed to form an image. The claim that simulation occurs automatically arises from the results shown by the control group in this research, which presented similar results to the imagery group, implying that the control group also used imagery to respond.

<sup>12</sup>Recall that Barsalou et al. (2008) also claimed that the simulation system evolved before the language system in human beings.

For example, a word like "flower" might activate words like "rose," "beautiful," "fragrance," and "branch." When processing becomes deep due to, for instance, employing an explicit task or longer exposure duration of stimuli, the activated words lead to the recall of contents in autobiographical memory in which FLOWER can be found: this involves imagery. Therefore, depending on the individual's memory content, a situation containing concepts like GARDEN, SPRING, PARK, WALKING, and NICE WEATHER may become active and the individual may see himself walking in a garden full of flowers in the spring, with nice weather, and finally perhaps feeling a pleasant emotional state.

On the unpleasant side, a word like "cancer" may activate words like "fear," "bad," "pain," "disease," and "death." When processing becomes deeper, depending on the individual's memory contents, a situation containing concepts like CHEMOTHERAPY, HOSPITAL, SURGERY, and FIGHT may be activated and the individual may see himself in a hospital room with a patient battling cancer and be left feeling an unpleasant emotional state. Obviously, not such an elaborated scenario is necessary or occurs all the time. So what may occur can be reactivation of only perceptual properties without emotional properties, or reactivation of emotional properties along with part of perceptual properties, etc.

Creating an emotional state, then, requires that autobiographical memory contents be activated and, consequently, that the emotional properties of an emotional concept are experienced; this is not possible unless imagery is involved. That is, activating semantically associated words and a superficial mode of processing does not lead to the experience of an emotional state *per se*. As well, activating perceptual properties does not create emotional feelings.

In sum, when emotional words are presented, the two systems of linguistic and imagery can become active. However, the linguistic system for which the LH is dominant seemingly operates automatically and, thus, peaks before the imagery system. As a result of this process emotional words likely establish a link with semantically associated words. When this superficial mode of processing is sufficient for adequate task performance, processing is supported mostly by the linguistic system and does not involve emotional feelings.

In contrast, when the linguistic system cannot complete a task on its own, or when there is more time to process emotional

# References


words, the imagery system may become involved. In this type of processing, the retrieval of past memories and situations containing these concepts occurs. This processing is deep and may be followed with subjective experiencing of an emotional state. We do not claim that processing in the RH always triggers an emotional state, but the level of processing in the RH should be deeper than in the LH and involves imagery. This latter stage, for which the RH is likely more responsible than the LH, is slow because more components (i.e., reactivation of perceptual and emotional properties) are involved.

# Conclusion

This paper investigates emotional words and the reason for fast vs. slow processing of these words, which occurs mainly in the LH and RH, respectively. Although we can use emotional words to convey emotional feelings, experiencing emotions may not be the primary outcome of using emotional words. This review suggests two systems of the linguistic and imagebased (imagery) are involved in the processing of emotional words. As long as the processing involves mainly the linguistic system, emotional word processing does not necessarily result in emotional states.

Further research should be carried out using emotional words and tasks pinpointing a superficial vs. deep level of processing. Taking into consideration the few studies that have examined these two levels, using tasks such as word verification and word generation tasks should be helpful in revealing further aspects of this process. In addition, by using short vs. long exposure durations or tasks targeting superficial vs. deep processing, emotional word processing can be compared in the two hemispheres.

Thus, the level of emotional word processing in the RH should be deeper than in the LH and, thus, it is conceivable that the slow mode of processing in the RH adds certain qualities (reactivating perceptual and emotional properties) to the output.

# Acknowledgment

This review was supported by the Canadian Institutes of Health Research (grants MOP-93542 & IOP-118608 to YJ).


*Views and Inductive Data Analysis*, eds I. Van Mechelen, J. Hampton, R. Michalski, and P. Theuns (New York: Academic Press), 97–144.


*Neuropsychologia* 49, 2947–2956. doi: 10.1016/j.neuropsychologia.2011. 06.026


Pylyshyn, Z. W. (1984). *Computation and Cognition*. Cambridge, MA: MIT Press.


space hypothesis. *Cogn. Psychol.* 48, 422–488. doi: 10.1016/j.cogpsych.2003. 09.001


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Abbassi, Blanchette, Ansaldo, Ghassemzadeh and Joanette. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Integrating emotional valence and semantics in the human ventral stream: a hodological account

*Sylvie Moritz-Gasser 1,2, Guillaume Herbet 1,2 and Hugues Duffau1,2\**

*<sup>1</sup> Department of Neurosurgery, Gui de Chauliac Hospital, Montpellier University Medical Center, Montpellier, France*

*<sup>2</sup> Team "Plasticity of Central Nervous System, Stem Cells and Glial Tumors," INSERM U1051, Institute for Neuroscience of Montpellier, Saint Eloi Hospital, Montpellier, France*

*\*Correspondence: h-duffau@chu-montpellier.fr*

#### *Edited by:*

*Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Germany*

#### *Reviewed by:*

*José Antonio Hinojosa, Universidad Complutense of Madrid, Spain*

**Keywords: ventral stream, emotion, semantic processing, anatomo-functional connectivity, brain electrostimulation mapping, subcortical pathways**

#### **INTRODUCTION**

Accessing the meaning of words to produce and understand language requires the activation of semantic representations. These latter are stored in semantic memory, organizing concepts according to semantic attributes (e.g., *the cat mews*), and semantic categories (e.g., *the cat is an animal*). Concerning semantic attributes, non-living concepts (e.g., *tools*) are processed preferentially according to functional features (e.g., *a saw cuts*) rather than visual features, whereas living concepts (e.g., *animals*) are processed preferentially according to visual features (e.g., *the cat is little, with sharp ears*) rather than functional ones (Warrington and Shallice, 1984). Within these semantic attributes, some are widely shared, independently from our personal history (e.g., *the cat mews*), and some are linked with our autobiographic memory (e.g., *I had a cat when during my childhood*). Moreover, most of the concepts have an emotional connotation which, whether it is widely shared (e.g., *a black cat brings misfortune*) or linked with our personal history (e.g., *I love cats because mine was so soft*) in a subjective-centered manner, constitutes a semantic attribute, i.e. a defining characteristic (Cato Jackson and Crosson, 2006).

Therefore, not only semantic category and "cold" widely-shared semantic attributes, but also "warm" emotionrelated attributes should be activated to produce or understand a word. Even if the meaning of words may be accessed by a single "cold" semantic processing, words with emotional connotation (widely-shared and/or personal) are more quickly and efficiently processed (Bock and Klinger, 1986), allowing a faster and more accurate lexical access than neutral words (Scott et al., 2009; Méndez-Bértolo et al., 2011; Kissler and Herbert, 2013). It is worth noting that processing wordemotional connotation "differs from the actual experience of emotion: emotional connotation refers to knowledge about the emotional property of an object" (Cato Jackson and Crosson, 2006) and that "emotion modulates word production at several processing stages" (Hinojosa et al., 2010).

Semantic representations forming concepts are more than the simple summation of defining features (Lambon-Ralph et al., 2010). However, how these semantic representations are organized at the neural level is still poorly understood. While some models suggest a distributed organization between a number of interacting cortical associative regions (Turken and Dronkers, 2011), an alternative model proposes an unified organization of semantic representations in an amodal shape in the anterior temporal lobes (ATLs), receiving integrated information from different modality-specific cortical areas. In this latter framework, the ATLs are named "amodal hubs" (Patterson et al., 2007; Lambon-Ralph et al., 2009).

Here, in the light of our clinical observations during picture naming in glioma patients who underwent awake surgery, we bring a new insight on how semantic and personal-emotional information are integrated at the brain systems level, enabling to perform a well-rounded and efficient semantic processing, in order to achieve a complete noetic experience.

#### **A DIRECT AND AN INDIRECT ROUTE FOR SEMANTIC PROCESSING**

We highlighted previously the crucial role of the inferior fronto-occipital fasciculus (IFOF) in semantic processing (Duffau et al., 2013; Moritz-Gasser et al., 2013; Almairac et al., 2014). We proposed this long-association pathway, which comes from the occipital lobe, posterior-lateral temporal areas and parietal cortex, and runs to the orbitofrontal and dorsolateral prefrontal cortices (Catani et al., 2002; Kier et al., 2004; Wakana et al., 2004), as a ventral plurimodal direct route for semantic processing, parallel to an indirect route subserved by the complex inferior longitudinal/uncinate fasciculi (ILF/UF). Indeed, intraoperative mapping during awake surgery for brain glioma (Duffau et al., 2002, 2005) shows that direct electrostimulation of the left IFOF during a naming task always induces semantic disorders (semantic paraphasias or anomias). This semantic disorganization may be either plurimodal (verbal and nonverbal) when stimulating the deep layer of the IFOF, evidenced by the inability for the patient to perform a non-verbal semantic association task, or "only" verbal, when stimulating the superficial layer of the IFOF. Recent studies, based on the Klingler fiber dissection technique, identified two different components of the IFOF: a superficial and dorsal subcomponent, which connects the dorsolateral prefrontal lobe with the superior parietal lobe and the posterior portion of the superior and middle occipital gyri; and a deep and ventral subcomponent, which connects the orbitofrontal cortex with the posterior portion of the inferior occipital gyrus and the posterior temporalbasal area (Martino et al., 2010; Sarubbo et al., 2013). This multilayer organization of the IFOF has recently been confirmed by q-ball tractography (Caverzasi et al., 2014). Interestingly, these anatomical descriptions correspond with the cortical network involved in semantic control, namely prefrontal, temporal-basal and parietal areas (Whitney et al., 2011).

Thus, we assumed that the IFOF plays a crucial role in the monitoring of multimodal semantic processing, and we proposed a dynamic dual-stream model of the ventral amodal semantic route, including both the deep and the superficial layers of the IFOF and the indirect (ILF/UF) ventral pathway (Duffau et al., 2013). Based on data issued from intraoperative electrostimulation, we suggested that the IFOF might play a crucial role not only in multimodal semantic processing but beyond, in the awareness of conceptual knowledge, namely noetic consciousness (Moritz-Gasser et al., 2013).

Tractographic studies suggested that semantic processing is underlain by the sole complex ILF/UF (Agosta et al., 2010). The ILF has a vertical component in the parietal lobe, and a horizontal component that lies within the white matter of the occipital and inferior temporal regions (Schmahmann et al., 2007). From the dorso-lateral surface of the occipital lobe, the ILF runs ventro-medially from the posterior lingual and fusiform gyri and dorso-medially from the cuneus. Then the branches run forward to the superior, middle and inferior anterior temporal gyri on the lateral surface, and medially to the amygdala and the parahippocampal gyrus (Catani et al., 2003; Martino and de Lucas, 2014). The ILF seems to be implicated in visual perception, face and object recognition (Catani and Mesulam, 2008a; Fox et al., 2008), reading (Epelbaum et al., 2008) and spoken language (Mummery et al., 1999; Catani and Mesulam, 2008b).

Concerning face/object recognition and reading, it seems that only the posterior part of the ILF ("visual part" corresponding to occipito-inferotemporal fibers) is involved, whereas concerning spoken language (naming), both posterior and anterior parts of the ILF are involved (the former in visual processing of the object or picture, and the latter in "linking object representations to their lexical labels" (Catani and Mesulam, 2008b), by "allowing the semantic system access to stored lexical information" (Foundas et al., 1998; Mummery et al., 1999).

The UF is a ventral associative bundle that connects the ATL and amygdala with the orbitofrontal cortex (Catani et al., 2002; Catani and Thiebaut de Schotten, 2008). It runs inferiorly to the IFOF within the temporal stem, then it splits into a large ventro-lateral branch which terminates in the lateral orbitofrontal cortex and a smaller medial branch which terminates in the frontal pole (Catani et al., 2002; Thiebaut de Schotten et al., 2012). The UF is traditionally considered to be part of the limbic system (Catani et al., 2013; Von Der Heide et al., 2013). Given its connections, functions linked to the UF may concern episodic memory (valuebased updating of stored representations), language (retrieval of proper names for people, some aspects of semantic memory retrieval), and social-emotional processing (valuation of stimuli, emotional meaning of concepts) (Von Der Heide et al., 2013).

We postulate that this indirect pathway (ILF/UF) is involved but not sufficient to perform an efficient semantic processing. We propose that, given their respective cortical terminations, one of the roles of the complex ILF/UF might be to convey critical emotional and mnemonic information associated with words and needed to generate well-rounded supramodal representations of concepts, under the amodal control of the IFOF.

#### **CORTICAL NETWORK AND SUBCORTICAL CONNECTIVITY OF PERSONAL EMOTIONAL-VALUED SEMANTIC PROCESSING DURING LEXICAL ACCESS: PROPOSAL OF A HODOTOPICAL MODEL**

Picture naming requires an early visual processing and recognition by accessing a stored structural description, and then the selection of the corresponding semantic representation or "concept." In parallel with this preverbal processing, appropriate lexical representations or "words" are activated (Ferrand, 1997; Levelt, 2001), thanks to the selection of the most accurate defining features of the semantic representation (Papagno, 2011). Within these defining features or "semantic attributes," some are "cold," widely-shared, and some are "warm," i.e., with an emotional value, itself widely-shared or personal. As mentioned, words with emotional connotation are processed faster and more efficiently than neutral words.

We hypothesize that, if we can access words accurately with only "cold" attributes processing, a well-rounded lexical access will be achieved more efficiently thanks to an integrated processing of words-related emotion. We argue that the indirect ventral semantic stream, subserved by the complex ILF/UF, is the anatomical substrate of this high-level processing, while the direct ventral semantic stream, subserved by the IFOF, is crucial in the monitoring of amodal semantic processing. Thus, we propose an original anatomo-functional model of lexical access, in which all processes (except the early visual processing) are performed in parallel and synchronically.

Visual processing in occipital structures leads to visual recognition thanks to the activation of structural descriptions stored in temporo-basal areas, linked with corresponding semantic representations. During this preverbal stage, information is transmitted via the posterior part of the ILF. Then, to select the appropriate word, corresponding lexical representations are activated following "cold" and "warm" defining features of the semantic representation thanks to a synchronous processing involving the middle temporal gyrus, anterior ventral temporal cortex and temporal pole via the anterior part of the ILF interacting with orbitofrontal structures via the UF. These parallel processes are supervised and controlled via the IFOF, in an amodal way (**Figure 1**).

Interestingly, the left ATL seems to be involved in the retrieval of people proper names (Damasio et al., 1996; Papagno and Capitani, 1998; Grabowski et al., 2001). Our model may explain some clinical presentations to the extent that people proper names can only be accessed with their emotional connotation.

Furthermore, it is worth noting that some parts of the distributed cortical network our model highlights have previously

been proposed as being involved in the processing of word emotional valence in an fMRI study (Kuchinke et al., 2005).

Finally, one of our previous studies based on intraoperative electrostimulation (Mandonnet et al., 2007) suggested that the ILF was not essential in language processing. We proposed that "due to plasticity phenomena induced by slow growing lesion, the function could have been redistributed over the ipsi- or contralateral hemisphere." In other words, the complex ILF/UF is possibly not crucial in semantic processing (because, as mentioned above, an acceptable semantic processing may be performed following only "cold" semantic attributes), and compensable in brain lesions, but this complex is necessary in normal conditions to perform well-rounded, fast and efficient emotional-valued lexicosemantic processing. Nonetheless, the IFOF remains in our model the critical substrate subserving the monitoring and the control of amodal semantic processing. This repeated assumption is in line with the hypothesis of a semantic working memory pathway via the IFOF (Turken and Dronkers, 2011).

In summary, only the integration of synchronous processes from both the indirect and direct ventral streams allows an accurate, efficient and emotionally connoted semantic processing.

## **CONCLUSION**

We propose an original anatomofunctional model of lexical access, integrating the processing of personal emotional values of words. This model, based on clinical observations of glioma patients undergoing awake surgery and on an extensive review of the literature concerning the anatomo-functional descriptions of white matter associative tracts, puts forward the implication of a large-scale distributed network in this processing. This network might consist of the indirect semantic ventral stream, namely the complex ILF/UF, interconnecting infero-temporo-occipital areas and antero-ventral and medial temporal areas with orbitofrontal structures, which would act synchronically under the amodal monitoring of the direct ventral stream underlain by the IFOF. Integration of processes from both the indirect and direct ventral streams would be required to achieve an emotion-tinged semantic processing, fully and solely human. We may assume that a sole "cold" semantic processing, devoid of any emotional connotation, would entail a disembodied communication, not allowing making sense to situations and to the whole world around us. In other words, a sole "cold" semantic processing wouldn't be a human semantic processing, rich, complex and linked with personal history. We then propose that integration of processes from both the indirect and the direct ventral streams allows a fully achieved, human semantic processing leading to a complete noetic experience.

#### **REFERENCES**


nization of semantic control: TMS evidence for a distributed network in left inferior frontal and posterior middle temporal gyrus. *Cereb. Cortex* 21, 1066–1075. doi: 10.1093/cercor/bhq180

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 October 2014; accepted: 08 January 2015; published online: 28 January 2015.*

*Citation: Moritz-Gasser S, Herbet G and Duffau H (2015) Integrating emotional valence and semantics in* *the human ventral stream: a hodological account. Front. Psychol. 6:32. doi: 10.3389/fpsyg.2015.00032*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Moritz-Gasser, Herbet and Duffau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The role of language in emotion: predictions from psychological constructionism

#### *Kristen A. Lindquist\*, Jennifer K. MacCormack and Holly Shablack*

*Carolina Affective Science Laboratory, Department of Psychology, University of North Carolina, Chapel Hill, NC, USA*

Common sense suggests that emotions are physical types that have little to do with the words we use to label them. Yet recent psychological constructionist accounts reveal that language is a fundamental element in emotion that is constitutive of both emotion experiences and perceptions. According to the psychological constructionist Conceptual Act Theory (CAT), an instance of emotion occurs when information from one's body or other people's bodies is made meaningful in light of the present situation using concept knowledge about emotion. The CAT suggests that language plays a role in emotion because language supports the conceptual knowledge used to make meaning of sensations from the body and world in a given context. In the present paper, we review evidence from developmental and cognitive science to reveal that language scaffolds concept knowledge in humans, helping humans to acquire abstract concepts such as emotion categories across the lifespan. Critically, language later helps individuals use concepts to make meaning of on-going sensory perceptions. Building on this evidence, we outline predictions from a psychological constructionist model of emotion in which language serves as the "glue" for emotion concept knowledge, binding concepts to embodied experiences and in turn shaping the ongoing processing of sensory information from the body and world to create emotional experiences and perceptions.

Keywords: language, emotion, psychological constructionism, concept acquisition, emotional development, concept knowledge, abstract concepts

# Language and Emotion

Common sense suggests that language has naught to do with emotion. Surely, the things that people say affect our emotions, and we can describe our emotions (or the emotions we see in others) with words after the fact. However, it is typically assumed that this is the extent of the relationship between language and emotion. Many contemporary psychological models of emotion agree with this common sense perspective. In these views, emotions are physical types that are essentially distinct from linguistic or conceptual processing (Ekman and Cordaro, 2011; Panksepp, 2011; Shariff and Tracy, 2011; Fontaine et al., 2013). Yet growing psychological research suggests that the role of language may run deeper in emotions than either laypeople or researchers previously thought.

In this paper, we introduce a psychological constructionist model of emotion that explains the mechanisms by which language plays a fundamental role in emotion. We begin our article by first providing a brief primer on the psychological constructionist approach we take in our own work called the Conceptual Act Theory (CAT; cf., Barrett, 2006b). We outline the CAT's predictions for

#### *Edited by:*

*Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Tuebingen, Germany*

#### *Reviewed by:*

*Frederic Isel, Sorbonne Paris Cité – Paris Descartes University, France Anna Hatzidaki, University of Athens, Greece Marta Ghio, Heinrich-Heine-Universität, Düsseldorf, Germany*

#### *\*Correspondence:*

*Kristen A. Lindquist, Carolina Affective Science Laboratory, Department of Psychology, University of North Carolina, Chapel Hill, NC 27599, USA kristen.lindquist@unc.edu*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

> *Received: 01 October 2014 Accepted: 29 March 2015 Published: 14 April 2015*

#### *Citation:*

*Lindquist KA, MacCormack JK and Shablack H (2015) The role of language in emotion: predictions from psychological constructionism. Front. Psychol. 6:444. doi: 10.3389/fpsyg.2015.00444* the role of language in emotion and discuss early evidence that language does indeed play a role in emotion. To understand the ultimate and proximate mechanisms by which language plays a role in emotion, we next explore evidence from developmental and cognitive science, demonstrating that language helps humans acquire and then use concept knowledge to make meaning of their experiences and perceptions. We close by exploring the implications of language's role in emotion concept acquisition and use for emotional experiences and perceptions.

# Psychological Construction and the Conceptual Act Theory

The idea that language goes beyond describing emotion after the fact is consistent with psychological constructionist theories of emotion. Psychological construction is a family of theories that conceives of emotions as psychological "compounds" resulting from the combination of more basic psychological "elements" that are not themselves specific to emotions (Russell, 2003; Barrett, 2006b, 2013; Clore and Ortony, 2008, 2013; Cunningham et al., 2013; Lindquist, 2013; see Gendron and Barrett, 2009 for a historical account of psychological constructionist views). All constructionist theories of emotion predict that psychological compounds such as anger, disgust, fear, etc. emerge when more basic psychological elements such as representations of the body, exteroceptive sensations (e.g., visual sensations; auditory sensations) and concept knowledge about emotion categories combine. Just as chemical compounds (e.g., NaCl) emerge from more basic elements and possess attributes that their constitutive elements do not—NaCl (sodium chloride, or commonly, table salt) has properties that are not reducible to either sodium, which is a member of the alkali metal family, or chlorine, which is a type of halogenic gas—psychological compounds such as emotions are more than the sum of representations of the body, exteroceptive sensations, and concept knowledge. Most psychological constructionist views agree that a person experiences an emotion when concept knowledge (e.g., knowledge about "fear") and exteroceptive sensations (e.g., the sights and sounds of being in a dark alley) are used to make meaning of body states (e.g., a beating heart, sweaty palms, and feelings of startle) in a given instance. A person sees someone else as emotional when concept knowledge (e.g., knowledge about "fear") and exteroceptive sensations (e.g., the sights and sounds of riding a roller coaster) are used to make meaning of someone else's affective bodily and facial muscle movements (e.g., a person's wide eyes, gaping mouth, and white knuckles). Our own psychological constructionist approach, the CAT (Barrett, 2006a, 2009, 2012; Wilson-Mendenhall et al., 2011; Lindquist and Barrett, 2012; Lindquist, 2013) specifically predicts a role for language in this process, insofar as language supports the acquisition and use of concept knowledge (e.g., the concept of *"*fear*"*) that is used to make sensations meaningful as emotions.

# Basic Elements of the Mind

According to the CAT (CAT; cf., Barrett, 2006a; Lindquist, 2013), the basic elements that contribute to emotions (and other mental states) are representations of sensations from inside the body (known as *affect*), representations of sensations from outside the body (known as *exteroceptive sensations*), and *concept knowledge* used to make those sensations meaningful in context (cf., Barrett, 2009; Lindquist and Barrett, 2012; Lindquist et al., 2012; Lindquist, 2013). Affect is a representation of the body's everchanging internal state, which can be experienced as having some degree of valence and arousal or "activation" (Cacioppo et al., 2000; Russell, 2003; Barrett, 2006b; Kober et al., 2008; Mauss and Robinson, 2009; Lindquist et al., 2012; Clore and Ortony, 2013; Cunningham et al., 2013). Affect is often described as a homeostatic barometer that allows an organism to understand whether objects in the world are good for it, bad for it, approachable or avoidable (Barrett and Bliss-Moreau, 2009). Affect is a combination of interoceptive information from the internal milieu that represents activity in the smooth muscles, skeletal muscles, peripheral nervous system, and neurochemical/hormonal system (Barrett and Bliss-Moreau, 2009; Lindquist, 2013). Affect thus provides an internal representation of the meaning of objects in the world and can serve as a "common currency" for comparing the meaning of otherwise diverse stimuli and events (Cabanac, 2002).

By contrast, exteroceptive sensations provide an organism with a representation of information from the external world outside of the body (e.g., vision, audition, taste, olfaction, and proprioception; Barrett, 2009; Lindquist and Barrett, 2012; cf., Lindquist et al., 2012). Exteroceptive sensations contribute to perceptions of emotions in other people (via vision, audition, and perhaps even tactile or olfactory sensations) but are also often the sources of shifts in core affect (e.g., visual sensations of a dark, long, squiggly shape in the middle of the path ahead of you in the woods) that contribute to one's experiences of emotions in his or her own body and provide information about the physical context that is used to help disambiguate the meaning of interoceptive sensations.

Importantly, the CAT predicts that both affect and exteroceptive sensations are made meaningful as instances of specific emotional experiences or perceptions using concept knowledge about emotion categories (Barrett, 2006a, 2009, 2014; Lindquist, 2013; also see Russell, 2003; Clore and Ortony, 2013; Cunningham et al., 2013 for other psychological construcitonist views). Concept knowledge refers to the rich cache of instances that populate what someone "knows" about different categories. For instance, people may know that the category of *fear* involves a beating heart, sweaty palms, a knot in the stomach, an urge to flee, and threatening contexts related to various objects (e.g., snakes, bears, cliffs, intruders, etc.). Notably, people also know lots of other information about fear, even if it's not stereotypical of fear, and this information may vary ideographically—for instance, one person might know that fear can involve attacking someone else; another person might know that fear can involve smiling. Still other people might know that fear can variably involve clowns, global warming, public humiliation, and existential concerns. Rather than consisting of a number of prototypes for certain emotions, concept knowledge about emotion is thus thought to consist of populations of instances (cf., Barrett, 2012) that have been acquired via a combination of instrumental learning via other individuals (i.e., "semantic knowledge") and personal experience (i.e., "episodic knowledge"; see discussion in Vigliocco et al., 2009).

Once acquired, concept knowledge serves as a form of *a priori* information to shape predictions about new interoceptive and exteroceptive sensations, helping the brain understand the meaning of sensations and act on them (Bar, 2007; Barrett, 2009, 2014; Clark, 2013). In the case of emotion, this means that concept knowledge is used to help make otherwise vague and potentially ambiguous sensations from inside the body (affect) and outside the body (exteroceptive sensations) meaningful as instances of specific emotions (e.g., anger, disgust, fear, pride, joy, schadenfreude, what have you). The resulting emotion is thus an emergent state that is at once affective and conceptual (cf., Lindquist and Barrett, 2008a). We refer to the process of using knowledge to make meaning of sensations as *situated conceptualization,* because the concept knowledge accessed to make meaning of sensations is highly situated and dependent on the present context. Situated conceptualization is a relatively automatic1 process (Wilson-Mendenhall et al., 2011; Barrett, 2014) and operates in a probabilistic manner (Barrett et al., 2007b; Clark, 2013), making predictions about the meaning of sensations (e.g., a beating heart, sweaty palms) given the features of the present context (e.g., giving a speech), prior experiences of other contexts in which similar sensations have occurred (e.g., past experiences of giving speeches vs. past experiences of watching scary movies vs. experiences of standing atop a tall balcony), and culturally relative knowledge about the types of experiences that involve certain sensations (e.g., knowledge about fear vs. excitement).

Importantly, the CAT predicts that the aforementioned elements are domain-general elements of the mind and are not specific to the category of mental states called "emotions" (Barrett, 2009; Lindquist and Barrett, 2012; Barrett and Satpute, 2013; Lindquist, 2013). In essence, the CAT does not see "emotions" as states that are fundamentally distinct from "cognitions" or "perceptions" (cf., Barrett, 2009; Lindquist, 2013; e.g., Oosterwijk et al., 2012); all are constructed from the same basic elements and are *nominal kind* categories that exist because members of a culture agree that they share certain features (e.g., in English, "emotions" are typically thought to involve relatively greater involvement of the body than "thoughts," even if body states are in fact constitutive of both kinds of mental states; e.g., Oosterwijk et al., 2012). The agreement between members of a culture imbues emotions with social reality—they are real even if the specific categories (e.g., anger, disgust, fear, sadness, schadenfreude, pride, excitement, awe, etc.) are not inborn categories given by the structure of the nervous system (cf., Barrett, 2012). In this sense, the CAT and other constructionist views are quite distinct from

other psychological and neuroscience models of emotion, which view emotions as domain-specific, inborn, inherited types that are fundamentally distinct from other types of mental states (e.g., "cognitions," "perceptions,"), and are produced by specific anatomically-given neural structures (i.e., emotions are *natural kind* categories*;* e.g., Cannon, 1921; Allport, 1924; Tomkins, 1962; Izard, 1971; Sprengelmeyer et al., 1996; Ekman and Cordaro, 2011; see Barrett, 2006b for a review)2 . In such natural kind views, there is no role for language in the constitution of emotion (Ekman and Cordaro, 2011; Panksepp, 2011; Shariff and Tracy, 2011; Fontaine et al., 2013) and the role of language in the acquisition of emotion concepts should have no bearing on the actual experience or perception of emotion. The predictions of the CAT are thus quite novel in regard to emotions, even if they are more broadly consistent with other evidence that language generally supports the construction of "cognitive" mental states (e.g., Boroditsky, 2011; Lupyan, 2012a,b,c).

# Growing Evidence: A Role for Language in Emotion

In contrast to the natural kind view of emotion, there is growing evidence for the CAT's prediction that concept knowledge supported by language plays a constitutive role in emotions. In recent years, we have extensively reviewed the literature on language and emotion (Barrett et al., 2007a; Lindquist and Gendron, 2013; Lindquist et al., in press a,b) documenting the various ways in which language shapes on-going perceptions and experiences of affect into perceptions and experiences of emotion (anger, disgust, fear, sadness, etc.). For instance, we have documented that impairing people's access to the meaning of emotion words impairs their ability to subsequently perceive emotions on faces (Lindquist et al., 2006, 2014; Gendron et al., 2012). Without access to the meaning of emotion words such as "disgust," vs. "anger," vs. "fear," vs. "sadness," individuals perceive posed emotional facial expressions (wrinkled noses, scowls, wide eyes, and frowns) as merely unpleasant (Lindquist et al., 2014). These findings suggest that access to the meaning of emotion words (and the concepts that they represent) is an essential component of understanding the discrete meaning of emotional facial expressions.

Other research demonstrates that labeling one's own unpleasant feelings with emotion words causes an experience of a particular discrete emotion to occur. Individuals who are exposed to labels for the category "fear" prior to listening to unpleasant music are subsequently more likely to engage in behaviors typical of fear (i.e., risk aversion) than individuals who were exposed to labels for the category "anger" or those not exposed to emotion category labels at all prior to listening to unpleasant music (Lindquist and Barrett, 2008a). Labeling one's affective state as

<sup>1</sup>Importantly, situated conceptualization does not happen because people *consciously* categorize ambiguous feelings or situations. It is an effortless and not necessarily conscious mechanism of how the human brain works, as it masters and makes meaning of the information and statistical regularities of experience. The analogy is that the brain uses knowledge from prior experience to transform wavelengths of visible light into the perception of a specific color (Barrett, 2006b). This process differs based on the lighting present in a room and even the other colors present in the context (see Bruner et al., 1951).

<sup>2</sup>The natural kind view was prevalent for the latter half of the 20th century, but recent evidence from behavior, peripheral physiology, and neuroscience has amassed to suggest that emotions such as anger, disgust, fear, happiness, sadness, etc. are not physical types with consistent and specific behavioral (Barrett, 2006b; Mauss and Robinson, 2009) and physiological outputs (Cacioppo et al., 2000; Barrett, 2006b; Mauss and Robinson, 2009; Quigley and Barrett, 2014) that derive from specific circuits or regions in the brain (Kober et al., 2008; Lindquist et al., in press a; Kassam et al., 2013; Touroutoglou et al., in press).

an emotion also alters cardiac responses during affective events. Individuals who labeled their emotions while completing a stressful mental arithmetic task showed physiological responses consistent with an experience of threat (i.e., increased total peripheral resistance or TPR; relatively reduced cardiac output), whereas participants who did not label their emotions experienced a physiological profile more consistent with active coping (i.e., decreased TPR, increased cardiac output; Kassam and Mendes, 2013). These findings suggest that labeling an unpleasant state as one type of emotional experience vs. another can shape how it is subsequently experienced.

Neuroscience evidence also documents a critical link between language and emotion. Growing evidence suggests that using emotion words to label posed emotional facial expressions reduces activity in brain regions associated with uncertainty such as the amygdala (Lieberman et al., 2007; see Lindquist et al., in press b for a discussion). These findings are consistent with the idea that emotion words help to make meaning of otherwise ambiguous unpleasant vs. pleasant facial expressions (cf., Lindquist et al., in press b). Consistent with the interpretation that language plays a routine role in creating instances of discrete emotion perceptions and experiences, meta-analytic summaries of the neuroimaging literature on emotion reveal that a subset of the brain regions involved during studies of emotion perceptions and experiences are also involved during studies of semantic judgments (Lindquist et al., in press b). Together, these accumulating sources of evidence suggest that language may not merely impact emotions after the fact. They instead suggest that language plays an integral role in emotion perceptions and experiences, shaping the nature of the emotion that is perceived or felt in the first place.

Finally, evidence from cross-cultural research is consistent with the idea that language plays a constitutive role in emotion. For instance, speakers of Herero, a dialect spoken by the remote Himba tribe in Namibia, Africa, and American English speakers perceive emotions differently on faces. When participants were asked to freely sort images of identities making six facial expressions (anger, disgust, fear, happiness, sadness, and neutral) into piles, English-speakers created relatively distinct piles for anger, disgust, fear, sad, happy and neutral faces, but Hererospeakers did not sort in this pattern. Instead Herero-speakers produced piles that reflected multiple categories of facial expressions (e.g., smiling, neutral, wrinkled nose, scowling, and frowning faces). Importantly, the Herero-speakers sorted similarly to one another, suggesting that they understood the instructions but were using different perceptual cues (and perhaps different categories) than the English-speakers to guide their sorts (Gendron et al., 2014).

The existing evidence thus suggests that language plays some role in emotion, but what remains in question is the precise mechanisms by which language does so. The CAT hypothesizes that language helps support the acquisition and use of concept knowledge about emotion, but very little work has directly addressed this hypothesis in relation to emotion, to date. We thus turn now to evidence from developmental and cognitive science demonstrating that language helps individuals represent and use concept knowledge in general, as well as concept knowledge about emotions in particular. We use this evidence to hypothesize about the mechanisms by which language shapes the acquisition and subsequent use of emotion concept knowledge.

# Language Supports Conceptual Knowledge of Emotion

The CAT makes the unique prediction that language plays a role in emotion because language helps a person to initially acquire and then later support the representations that comprise emotion concept knowledge (Lindquist, 2013; cf., Lindquist et al., in press b). Of course, language likely plays a role in the acquisition and use of all category knowledge (see Lupyan, 2012a,b; Borghi and Binkofski, 2014). However, we hypothesize that language is especially likely to be implicated in emotion because emotion concepts (e.g., *anger, disgust, fear,* etc.) are embodied and abstract representations that form populations of conceptual information rather than concrete concepts grounded by physical types that form prototypes for emotion category knowledge. Words for emotion categories (e.g., "anger," "disgust," "fear") thus serve as the "glue" or "essence place-holder" (cf., Xu, 2002) that helps bind together otherwise disparate instances of a given emotion category3 .

The idea that emotion concepts are embodied derives from growing evidence in cognitive science that conceptual knowledge is represented via sensorimotor "simulations" of prior sensory experiences and actions (Glenberg and Gallese, 2012; for review, see Kiefer and Barsalou, 2013). Traditionally, researchers assumed that emotion concepts are structured as prototypes (Shaver et al., 1987; Russell, 1991) or as theories about why category members share certain features (Clore and Ortony, 1991; Zinck and Newen, 2007). In these models, category knowledge is represented outside the sensory modalities as amodal, symbolic representations (see Barsalou, 1999). Instead, consistent with recent theories of embodied cognition (e.g., Barsalou, 2009;

<sup>3</sup> Of note, the CAT exclusively makes predictions for words that name specific emotion categories in a given language (e.g., "anger," "disgust," "fear," in English and other culturally relevant terms in other languages). We are not referring to words that name other categories (e.g., "mother," "murder") that might themselves have emotional connotations. These words, if they help construct emotions, likely do so via less proximal mechanisms. For instance, the word "mother" might prime the word "love," which in turn might cause an individual to access the relevant body states, exteroceptive sensations, and conceptual knowledge associated with that emotion concept (for a discussion of the embodiment of emotion words see Barrett and Lindquist, 2008). Accessing or "simulating" the relevant affective and sensorimotor concomitants of the category "love" could in turn cause a person to start to feel an instance of love toward her mother. However, this process is clearly different from the process of emotion construction we are proposing in which the conceptual knowledge associated with the word "love" is being used in the moment make a situated conceptualization of the pleasant affect that is experienced when talking to one's mother on the phone, when hugging one's mother, etc. For research on the emotional connotations of words and implications for psycholinguistics, we point interested readers to Altarriba et al. (1999), Kousta et al. (2011) and Borghi and Binkofski (2014). We also point interested readers to fascinating research demonstrating that the emotional connotations of words in second languages are not as intense as the emotional connotations of words in first languages (for review see Harris et al., 2006; Opitz and Degner, 2012).

Vigliocco et al., 2009; Borghi and Binkofski, 2014), the CAT proposes that prior perceptual experiences associated with an emotion category help constitute conceptual knowledge of that emotion. Emotion categories are thus represented as re-enactments of prior interoceptive sensations such as feelings (see Barrett and Lindquist, 2008; Wilson-Mendenhall et al., 2013a), modalityspecific exteroceptive perceptions such as visual, auditory, olfactory, and proprioceptive sensations of objects and contexts (e.g., Wilson-Mendenhall et al., 2011, 2013b), and actions (e.g., punching vs. yelling vs. scowling vs. smiling in anger; Oosterwijk et al., 2014; see Barrett and Lindquist, 2008 for a discussion). Emotion concepts also contain more abstract, propositional information that describes a person's relationship to the environment (e.g., sadness is about loss); this information may be acquired from one's culture and augmented by prior experiences (i.e., representations of specific instances of loss). For example, the concept of what it feels like to be "sad" may include previous bodily sensations (e.g., feeling heavy, drained, tired; unpleasant), previous exteroceptive sensations (i.e., sights, smells, tastes, sounds, associated with different physical contexts in which one was sad), and simulations of representative instances in which loss occurred (e.g., simulations of the context in which loss occurred at the death of a loved one, during an insult to one's self-esteem, loss of a job, etc.).

Based on evidence that the bodily and exteroceptive concomitants of instances of a single emotion category are highly variable (Cacioppo et al., 2000; Barrett, 2006b; Mauss and Robinson, 2009; Kreibig, 2010), the CAT also proposes that embodied emotion categories are abstract—without a single category prototype to define them (for a similar view, see Vigliocco et al., 2009; Borghi and Binkofski, 2014). In this view, emotions are not natural kind categories with strong perceptual regularities (Barrett, 2006a), nor are they single prototypes that stand in as typical examples of the rest of the category members (Barrett, 2014). Unlike concrete categories (e.g., "apple") that may have strong perceptual regularities (e.g., apples are round, tart, crisp, red/green/yellow fruits that grow on trees) and clear, best example prototypes (e.g., a Red Delicious), emotion categories are thought to exist as *populations* of conceptual information that might not be covered by a single "best example" category prototype (Barrett, 2014). In this view, there are many sensorimotor representations of "anger" that help form conceptual knowledge about this category and there is little perceptual regularity that makes instances obviously similar to one another (e.g., not all instances of anger involve a scowl, increased heart rate, punching, etc.).

Research demonstrates that situations may be key for an individual to acquire abstract concepts, such as "anger," "love," "fear," or "pride," that do not correspond to strong statistical regularities in exteroceptive or interoceptive sensations. For example, when individuals are asked to think about the abstract concepts "convince" and "arithmetic," brain regions associated with contexts in which those concepts might be involved are activated (Wilson-Mendenhall et al., 2013a). When thinking about the word "convince," brain regions associated with mentalizing and social cognition are activated. By contrast, when thinking about arithmetic, brain regions associated with engaging in numerical cognition are activated. Similarly, representations of emotion concepts draw on situations, with representations of fear involving brain networks underlying different types of contexts (Wilson-Mendenhall et al., 2013b). In some instances, brain regions involved in representing fear include those involved in social inference and mentalizing (i.e., as might occur during social threats). By contrast, other representations of fear involve networks underlying visuospatial attention and action planning (i.e., as might be observed during physical threats; Wilson-Mendenhall et al., 2013b). Abstract concepts can thus be thought of as reconstituted amalgamations of situated experience, and these amalgamations evolve with new experiences and new information from early life across the lifespan (Meteyard et al., 2012). Replaying or "simulating" the situation in which an abstract concept occurs may in part be what enables individuals to use abstract concepts to make future situated conceptualizations. Consistent with this hypothesis, research finds that lexical access, word comprehension, and memory are generally faster for concrete concepts than abstract ones; however, when situational cues are provided, abstract concepts become just as quickly available as concrete concepts (Barsalou and Wiemer-Hastings, 2005).

Critically to this paper, another key to acquiring and using abstract concepts such as emotion concepts may be language (cf., Barrett and Lindquist, 2008; Borghi and Binkofski, 2014). An embodied theory of language and semantics assumes that the brain's linguistic system is separate from, but integrally tied to the modality-based system that represents embodied concepts (Barsalou and Wiemer-Hastings, 2005; Barsalou et al., 2008; Vigliocco et al., 2009). In the absence of strong statistical regularities based on previous perceptions of concrete objects in the environment, abstract concepts may particularly benefit from language—that is from being associated with the phonological form of a word (Barrett and Lindquist, 2008; Vigliocco et al., 2009; Borghi and Binkofski, 2014). People may integrate in long-term memory two representations from the same emotion category (even if they involve different bodily and exteroceptive sensations, contexts, and actions) because the label for the emotion links them in memory (see Gelman and Markman, 1987; Borghi and Binkofski, 2014). As foreshadowed by early researcher Hunt (1941, p. 266), the CAT predicts that, "the*...*universal element in any emotional situation is the use by all the subjects of a common term of report (i.e., 'fear')."

The CAT of emotion thus predicts that language plays a role in emotion because it helps individuals to initially acquire and then use emotion concept knowledge to form situated conceptualizations of affect (for a similar view of abstract concepts, see Borghi and Binkofski, 2014). To date, research assessing the CAT has focused exclusively on documenting evidence that language plays a role in emotion experiences and perceptions at all. However, very little research to date has addressed whether language specifically helps individuals acquire and use words to make situated conceptualizations of emotion across development, which might form the ultimate mechanisms by which language shapes emotion. Growing evidence from developmental and cognitive science demonstrates that words help infants and adults acquire and then use concepts throughout the lifespan; this evidence suggests that language is key to the acquisition of emotion concepts. We thus turn to this literature to motivate predictions for how words help individuals acquire the emotion concepts that they then use to make situated conceptualizations about emotion.

# Words Help Humans Acquire Emotion Concepts: Lessons from Early Development and Adult Cognition

# Language and the Acquisition of Emotion Concept Knowledge in Infants

Understanding how infants and young children use words to learn novel concepts sheds light on how language more generally contributes to the acquisition of concept knowledge, and by extension, concept knowledge about emotion. Developmental accounts of concept knowledge traditionally assumed that infants are either born with pre-existing knowledge of specific categories (a nativist account) or learn every category *de novo* (an empiricist account; for discussion, see Xu and Griffiths, 2011). Similarly, before the recent emergence of constructionist accounts of emotion, many models of emotion assumed that infants were born with the ability to experience and perceive basic emotions such as fear, sadness, and disgust (Izard, 1978, 2007, 2011; Ekman and Oster, 1979; Barrera and Maurer, 1981; Campos et al., 1992; Lewis, 2000). However, growing research suggests that infants are "rational constructivists," born without pre-existing knowledge of many categories, but possessing an intrinsic sensitivity to statistical regularities and the ability to extrapolate to new category instances on the basis of inductive learning (Sirois et al., 2008; Xu and Kushnir, 2013) 4 . The ability to use statistical regularities extracted from prior experience to categorize phenomena and predict those phenomena's causes, behaviors, and effects is known as probabilistic or statistical learning. Probabilistic learning continues across the lifespan and is a fundamental aspect of human cognition (Oaksford and Chater, 2009; Carey, 2011; Clark, 2013). We propose that this basic ability also undergirds infants' abilities to learn about emotion categories through words.

From the early days of brain development *in utero*, infants' brains are able to observe stimuli from inside and outside their bodies and begin forming probabilistic *a priori* predictions about the meaning of those stimuli (Aslin and Newport, 2012). For instance, probabilistic learning may first exert its influence when infants learn to categorize sounds as linguistic vs. non-linguistic *in utero*. Organization of the auditory cortex in humans is thought to occur by the 27th week of gestation (Hepper and Shahidullah, 1994) and plasticity in the auditory cortex is thought to occur due to sounds penetrating the mother's intrauterine walls (Gerhardt and Abrams, 2000). Fetuses exposed to particular phonemes (linguistic speech sounds) *in utero* show more neural responsiveness to those phonemes after birth than newborns that were not exposed to such phonemes as fetuses (Partanen et al., 2013). This early sensitivity to language suggests that even as neonates, infants bring with them the ability to differentiate and make predictions about different linguistically relevant sounds. Indeed, neonates who are less than a day old already prefer phonemes from their native language to phonemes from a non-native language (Moon et al., 2013). After birth, infants use the statistical properties of language to help them differentiate between phonemes and extrapolate rules of grammar in their native language (Saffran et al., 1996; Aslin et al., 1998; Maye et al., 2002; Thiessen and Saffran, 2003; Kuhl, 2004; Rivera-Gaxiola et al., 2005; Gebhart et al., 2009; Teinonen et al., 2009; Shukla et al., 2011; Krogh et al., 2013).

Just as infants use statistical learning to differentiate words from non-words, they also use probabilistic learning to understand visual sensations in the world around them; these two processes likely co-occur and interact (for review, see Bergelson and Swingley, 2012). By 3–4 months of age, infants begin to form categories for natural kinds (e.g., species categorization for horses, zebras, tigers, cats, etc.; see Eimas and Miller, 1992; Quinn and Eimas, 1996) and artifacts (e.g., different furniture types such as chairs, tables, and beds; Behl-Chadha, 1996). Some concept knowledge for these categories may be developed on the basis of visual statistical regularities alone (e.g., all zebras have stripes, horses do not). Yet not all categories can be learned on the basis of statistical regularities alone, especially abstract categories, and so it is predicted that infants use the phonological sound of a word as a salient cue for differentiating between sensations in the environment. This is particularly relevant for emotion categories, where the word "anger" for example can tie together multiple modalities of sensorimotor experience (such as bodily sensations, situations, or behaviors) and also can serve as "glue" for different instances of "anger" that are not necessarily perceptually regular or consistent with one another (e.g., being angry at one's computer may not look or feel the same as being angry about an insult).

Thus, by 9 months of age, infants regularly use words as cues for understanding which objects in the world are similar vs. distinct. For example, the presence of two distinct labels helps infants establish a representation that two objects are in fact distinct in an object individuation task (Xu, 2002). Words seem to be special in this regard; the presence of two different labels facilitates object individuation, but the presence of two distinct tones, two distinct sounds, or two distinct emotional expressions does not5 . By contrast, when 9-month-old infants hear one label repeated twice, they expect to see two objects that are perceptually similar (Dewar and Xu, 2009). By around a year of age, infants can use the presence of words to make predictions about the types of stimuli to expect. Twelve-month-old infants will look for two objects when an adult uses two words as opposed to one word to describe objects that are unseen by the infant (Xu et al., 2005). Similarly, emotion labels may be an important cue for helping infants and young children understand emotion categories and apply those categories to their own experiences and observations.

<sup>4</sup>Rational constructivism is of the broader class of psychological constructionist views of the mind (Barrett, 2009; Lindquist and Barrett, 2012; Barrett and Satpute, 2013).

<sup>5</sup>Although for evidence that 15-month old infants can use music as a cue for acquiring categories, see Roberts and Jacob (1991).

Importantly, for abstract concepts that do not have strong perceptual similarities, labels also help infants learn that perceptually distinct objects should be treated as members of the same category. For instance, in 10-month-olds, linguistic labels can override the perceptual qualities of objects, directing infants to group together objects that do not possess strong perceptual similarities (Plunkett et al., 2008). When infants are taught to group cartoon creatures possessing various features (e.g., differences in tail size, head size, etc.) into categories, the use of a single label leads infants to sum across perceptual differences within the cartoons and learn a single category that included all the cartoon creatures. Thus, words not only inform infants about the nature of phenomena they encounter and help them classify what phenomena go together, but words also tell children where to look for boundaries between categories—including categories that might not be perceptually obvious but that are encoded in language (Bowerman, 1988; Roberts and Jacob, 1991). No research to date has directly examined this hypothesis with the acquisition of emotion category knowledge in infancy, but we predict that infants may be using words to help derive emotion categories to describe affective sensations in their own bodies and expressions of affect seen in others' bodies.

# Language and the Acquisition of Emotion Concept Knowledge in Young Children

Once infants become verbal toddlers, their concepts become honed through bi-directional communication with caregivers. As infants begin producing words themselves, they have the opportunity to receive more directed feedback from adults as to whether their word-sensation associations map on to the wordsensation associations of adults in their culture. Research from computer simulations suggest that the communicative function of language may be essential for helping humans to develop concept knowledge that is shared with other societal members. For instance, Steels and Belpaeme (2005) programmed artificial intelligence agents in a simulation to each possess the same capabilities for perception, categorization, and naming of colors in the artificial environment. The color space in this artificial environment was a set of continuous wavelengths of light with no statistical regularities in terms of the contexts in which certain color categories appear. Each agent was furthermore programmed to experientially develop its own unique knowledge of which sensory information corresponded to which categories and words. In one simulation, the agents merely learned to discriminate a given color from the present sensory array (all of color space) and named the color based on their personal set of category representations. Yet in a separate simulation, the agents not only discriminated for themselves but also "communicated" with one another, allowing each agent to learn category knowledge from the present social interaction in which they were involved. In this social interaction, the first agent (the speaker) discriminated a given color from the present sensory array (all of color space) and named the color based on the agent's own set of personal color category knowledge. The second agent (the hearer) then had to guess which color the speaker was referring to. If the hearer was successful, it strengthened the association between the word used and its own personal color category

knowledge. Yet if the hearer was unsuccessful, it lessened the association between the word used and its own color category knowledge and also created a new association between the word the speaker used and color category knowledge. The authors found that although agents in the first scenario learned to discriminate between different colors and each developed their own set of color category knowledge from the environment, each agent possessed completely different color knowledge when the simulation was over. By contrast, in the simulation involving communication, all agents eventually possessed the same color knowledge. Importantly, similar results persisted even when statistical regularities were introduced in terms of which colors occurred in which contexts in the artificial environment (a situation that likely better approximates the real world). By extension, these findings suggest that children might never learn the emotion concepts of their culture without communication with caregivers.

In light of the importance of communication in concept acquisition, it is interesting that children do not learn how to reliably categorize facial expressions of different emotions (e.g., "anger," "disgust," "fear," "sadness") as distinct until they acquire and begin to use words to describe those categories in conversation. Although there is debate on this point, pre-linguistic infants and toddlers younger than 2 years of age seem only able to reliably differentiate facial expressions in terms of valence (i.e., positivity vs. negativity; for reviews see Widen and Russell, 2008; Widen, 2013) 6 . Two-year-olds use the very simple emotion labels "angry" and "happy" in daily discourse and, like infants, can reliably differentiate faces in terms of valence. Yet 2-year-olds cannot differentiate between more specific unpleasant emotion categories until they start reliably using additional negative emotion terms in daily discourse (Widen and Russell, 2008). For example, when 2-year-olds are given a set of pictures depicting five emotion categories and are asked to perceptually match only those faces that match an additional picture (e.g., an angry face) by placing them in a box, they place all unpleasant faces (angry, sad, disgusted, fearful faces) in the box but leave out happy faces. Yet as 3- and 4-year-olds begin to acquire the concepts "sad" and "fear," they begin to leave those faces out of the "angry" box, demonstrating an ability to perceptually categorize unpleasant faces into more specific emotions. By the age of 7, children show adult-like perceptual categorization of most faces save disgust (Widen and Russell, 2008; see Widen, 2013 for a review). These findings suggest that as children acquire emotion words and start using them in daily life with caregivers, they become increasingly competent at perceiving and labeling facial expressions in terms of their culture's emotion categories. Consistent with the idea that words help infants generalize between otherwise perceptually distinct objects during learning, toddlers appear to show a "language superiority effect" when categorizing facial expressions (Russell and Widen, 2002). Specifically, 2- and 3 year-olds are better able to accurately place pictures of facial

<sup>6</sup>Evidence claiming that infants can reliably differentiate between different facial expression on the basis of something other than valence may be driven by differences in the perceptual regularities present in stimuli that are not related to the emotion category itself (e.g., the presence of teeth; Caron et al., 1985).

expressions in a box labeled with a word (e.g., "anger") as compared to a box labeled with a face (e.g., an angry face), an effect that increases over early childhood. These findings suggest that newly acquired emotion knowledge associated with a word anchor may help children gloss over perceptual similarities between faces that are not useful for the categorization of facial expressions (e.g., furrowed brows in both anger and disgust) and focus on perceptual differences that are diagnostic (e.g., a scrunched nose in disgust vs. a growl in anger). Such a link between children's emotion understanding and linguistic development is also suggested in correlational studies demonstrating that children's advances in emotion understanding develop in tandem with advances in language comprehension (Harris et al., 2005).

We thus predict that emotion concept knowledge acquisition expands as young children acquire words for specific emotions and receive feedback on their situated conceptualizations of their own and others' affective states through communication with caregivers. Indeed, much evidence is consistent with the idea that communication with parents about emotions during early childhood is essential for children to develop complex knowledge about the emotion categories relevant to their culture (for discussion, see Halberstadt and Lozada, 2011). The implication of these findings is that parents' own abilities at situated conceptualization, concept knowledge about emotions, and communication skills, can transfer to their children. For instance, 2–4 year old children's total emotion utterances correlate with the emotion labels that their mothers know and use (Cervantes and Callanan, 1998). Similarly, children whose mothers used more emotion terms when children were 18 months old in turn produced more emotion terms themselves at 24 months (Dunn et al., 1987). Children whose parents discussed emotions more when children were 36 months old also had better emotion understanding at 6 years of age (Dunn et al., 1991). Parents' explanations of internal states and attributes (such as "hungry," "sad," or "nice") are thus thought to scaffold children's own abilities to identify and describe the same experiences in themselves and others (Saarni, 1999; Yehuda, 2005), perhaps because word use is helping children acquire complex embodied information about a given emotion category.

By contrast, parents who possess a paucity of conceptual knowledge about emotion or who struggle to communicate this knowledge likely dampen their children's opportunities to develop conceptual knowledge about emotion. Alexithymia is a non-clinical characteristic commonly defined as "difficulty identifying, understanding, and expressing feelings" (Bagby et al., 1986) and is hypothesized to stem in part from a paucity in conceptual knowledge about emotions (cf., Lindquist and Barrett, 2008b). In this view, adults with alexithymia either possess relatively sparse knowledge about specific emotion concepts (e.g., knowledge about *fear* might consist of a relatively narrow population of instances) or do not have differentiated knowledge about emotion concepts in the first place (e.g., these individuals do not possess differentiated concepts for *anger*, *disgust,* and *fear* and instead just possess a concept for *negativity*). The result is that they themselves have difficulty making situated conceptualizations of affective states in the moment, which in turn limits their ability to translate this knowledge to their children. Consistent with this interpretation, there is some evidence that the tendency for alexithymia is transmitted across generations; caregivers who struggle to communicate and express their feelings create an impoverished environment for children to learn conceptual knowledge about emotions (Berenbaum and James, 1994; Lumley et al., 1996). For instance, college students' level of alexithymia is positively correlated with their mothers' retrospective difficulty expressing feelings when their children were young (Fukunishi and Paris, 2001).

Similarly, evidence suggests that parents' beliefs about emotions, which can be considered a meta-cognitive aspect of emotion concept knowledge, shape children's emotional abilities. Parents' beliefs about the value of emotions guides both how parents talk about emotions to children, but also how parents react to their children's emotions (Dunsmore and Halberstadt, 1997; Hakim-Larson et al., 2006). For example, parents who believe that emotions are valuable are more likely to discuss and teach children about emotions (Gottman et al., 1996); this in turn gives children an opportunity to discuss their growing conceptual knowledge about different emotions and get feedback on the situated conceptualizations they are making about their and others' internal states and behaviors. Such exchanges consequently shape children's socioemotional abilities. For example, one recent study found that parents' beliefs that emotions are valuable as opposed to dangerous predicted children's ability to recognize their parents' emotional facial expressions (Castro et al., 2014). This may be in part because parents who believe that emotions are dangerous are more likely to avoid expressing emotions, creating a more impoverished affective environment for their children to practice their developing emotion-relevant skills (Dunsmore et al., 2009). Another possibility, however, is that parents who avoid talking about emotions to their children due to a belief that emotions are dangerous do not help children acquire the conceptual knowledge necessary for learning how to differentiate between different emotional facial expressions.

Together, the developmental evidence suggests that parents help children acquire emotion concepts, in part through the communicative powers of language, and that parents also may scaffold children through the process of making situated conceptualizations of emotion. Parents constantly infer what they believe their young child may be feeling—based on conceptualizations of their own previous and current interoceptive and exteroceptive experiences, in context of the current situation, their knowledge of how their child usually acts, and how the child is currently behaving. For instance, a father may categorize his preverbal daughter's internal state as "mad" when he observes her refusing to eat and throwing her food—based on the present context and also based on his knowledge of the contexts in which he experiences frustration himself. He may label her inferred state for her, asking why she is "mad"; this parental labeling may in turn help the child associate her current feelings of unpleasantness, her behaviors, and her father's reactions in that moment to the word "mad." Over the course of early childhood, parents (with varying degrees of skill) discuss with their children why the child behaved and felt the way she/he did ("Why were you angry with Grandma?") and why other people behaved and felt the way they did ("Your friend hit you because he was angry you took his toy"). As children acquire emotion concept knowledge and become able to label their own states, they can receive feedback from adults on the "accuracy" of their situated conceptualizations (e.g., the child reports that she is "sad" because her brother took her toy and a parent corrects that she is more likely to be "mad" that the toy was taken). Over time, with the development of conceptual knowledge and the ability to draw complex inferences about their own and others' mental states, children's tendency to make situated conceptualizations of emotion are likely to become more automatic. Ultimately, children whose emotion knowledge is more defined (due to the content communicated by parents) and more automatically accessible (due to motivation to categorize states as emotional instilled by parents) would be more emotionally aware and able to understand the complexities and nuance of emotions in different situations. The benefit of this ability is clear: children who are more skilled at recognizing and expressing their own emotions exhibit less worry and depression than children who struggle to convey their emotional experiences (Rieffe et al., 2007). Likewise, children's emotion understanding is predictive of their social and emotion regulation skills, as well as their academic outcomes (for reviews, see Halberstadt et al., 2001, 2013).

Thus far, we have discussed concept acquisition as if it halts after early childhood and remains stable thereafter. To the contrary, findings in adults suggest that language may still play a role in adult emotion because language continues to help adults acquire and use concept knowledge to make meaning of core affect and exteroceptive sensations. In fact, a small body of evidence suggests that words help adults learn that novel perceptual instances are either similar or distinct and assists adults in continuing to assimilate new perceptual instances into existing category knowledge. We now turn to this evidence.

# Language and the Acquisition of Concept Knowledge in Adults

In an embodied account of concept knowledge, adults continue to update and refine categories based on on-going experiences of the perceptual world throughout their life (Schyns et al., 1998; Vigliocco et al., 2009; Barsalou, 2012). Growing evidence suggests that words play as much, if not more, of a role in adults' acquisition of novel visual categories, even when words are redundant with other cues for learning. For instance, in one study documenting the role of language in adult category learning (Lupyan et al., 2007), participants learned to categorize novel "alien" stimuli as things to be approached or things to be avoided and received feedback on the accuracy of each response. As participants received feedback about the accuracy of their judgment, participants in the label condition also saw a nonsense word; participants in the control condition received no word. Even though words were not necessary for the task, those participants who saw nonsense words while learning to categorize the stimuli were later better able to differentiate between members of different categories than were individuals who did not. Redundant words facilitated learning regardless of whether they were presented visually or played aurally during learning.

Despite research on the role of words in general adult concept acquisition, very little work has specifically assessed how words help adults learn novel emotion concepts. Indeed, it is hard to conduct this research because most healthy adults (who are not alexithymic) already possess substantial knowledge about the feelings, situations, behaviors, and bodily changes that accompany the emotion categories encoded by their acquired language. However, one study addressed the role of language in the perception of emotion in a category-learning task involving novel Chimpanzee affective facial actions that were unfamiliar to most participants (Fugate et al., 2010). In the first phase of the experiment adults simply viewed pictures of unfamiliar Chimpanzee facial actions (e.g., a "bared teeth" or "scream" face) or viewed the faces while learning to associate them with nonsense words. Participants were later shown two images taken from a continuous morphed array of two facial expressions (e.g., an image of a face containing a percentage of both the bared teeth expression and scream expression) and were asked to indicate whether two faces from random points throughout the array were similar to one another or different. This was a classic measure of "categorical perception" (Goldstone, 1994), the ability to perceive categories within a continuous dimension of sensory information. On some trials, participants compared faces that did not cross one of the learned category boundaries (e.g., they compared an 86% bared teeth, 14% scream expression with a 71% bared teeth, 29% scream expression), whereas on others, they compared faces that *did* cross a learned category boundary (e.g., compared a 43% bared teeth, 57% scream expression with a 29% bared teeth, 71% scream expression). If participants demonstrated categorical perception, they would see the first set of faces as similar but the second set of faces as different. Yet only participants who learned to associate the faces with words in the first phase of the experiment demonstrated such categorical perception. Participants who did not learn to associate faces with a label did not perceive a categorical distinction between the faces.

Building on these findings, a recent study from our laboratory suggests that language can even help adults acquire and assimilate new perceptual experiences into existing category knowledge about emotional facial expressions (Doyle and Lindquist, in preparation). During a learning phase, participants saw a series of non-stereotypical posed facial expressions of anger (e.g., a scowl and squinted eyes with raised eyebrows) and fear (e.g., an open mouth and wide eyes with furrowed eyebrows). In one betweensubjects condition, participants learned to associate these facial expressions with emotion words ("anger" vs. "fear"). In another, participants studied the faces and performed perceptual judgments (whether the eyes were close together vs. far apart). In a target phase, participants next studied target individuals who were depicting stereotypical facial actions for either anger or fear and were asked to categorize the facial expression as "anger" or "fear." During a final test phase, participants were asked to identify which face the target individual had been making during the target phase (i.e., either the learned face, the target face, or a morphed combination of the two). Consistent with the idea that language helps adults acquire and assimilate new perceptual instances into existing category knowledge, participants who had paired faces with words in the learning phase were more likely to remember seeing a target face that was similar to the learned category information. These findings suggest that language helps acquire novel category knowledge that biases memory of later novel faces.

Together, these early findings point to the idea that language continues to help adults acquire novel category knowledge across the lifespan and to update existing category knowledge. This may be how adults continue to augment their existing category knowledge about emotion and suggests that at any point in time, adults' category knowledge about emotion may reflect the regularities present in the local environment (e.g., one's cultural, social, or familial context). For example, if concept knowledge is always being updated and changed, then an adult's knowledge about say, *anger*, may be impacted by the last time the person experienced an instance of *anger* (e.g., at a spouse). This concept knowledge may thus feed-forward to impact situated conceptualizations of future instances of body states when with a spouse, potentiating the situated conceptualization of *anger* over, say, *anxiety* or even other body states such as *hunger* (e.g., a person might conceptualize her unpleasant feelings around dinner time as *anger* toward her spouse as opposed to *hunger* for the impending meal). Thus, the CAT predicts that language does more than just help acquire concept knowledge. It further predicts that language supports the accessibility and use of existing concept knowledge as humans make meaning of sensations in the body or world during the construction of emotions. This prediction is consistent with growing evidence from cognitive science that language, once connected to certain perceptual representations that become stored as conceptual knowledge, alters on-going adult perception by selecting certain sensations for conscious awareness while suppressing other sensations from conscious awareness.

# The "Label-Feedback Hypothesis": Language Supports the Use of Concept Knowledge

As we stated earlier, in the field of emotion, the CAT's prediction about the role of language in emotion are quite novel. However, current evidence in cognitive science converges on the idea that language shapes on-going conceptual processes in adults more generally; these conceptual processes furthermore shape the online processing of external sensations across modalities. According to Lupyan's (2012a) "label-feedback hypothesis," labels connected to concepts shape the conceptual information that is brought to bear when making meaning of sensations in the environment (see Figure 1 in Lupyan, 2012b). The labelfeedback hypothesis thus explains why language shapes on-going perception (e.g., visual perception) and cognition (e.g., thought) in adults (see Lupyan, 2012a,b,c) and why language can alter ongoing emotional experiences and perceptions too (for a review of these findings see Lindquist and Gendron, 2013; Lindquist et al., in press b).

The label-feedback hypothesis suggests that the linguistic and conceptual systems become functionally entwined over the course of development such that activation of concepts in adults tends to activate labels and vice versa (see Lupyan, 2012a). For instance, after learning throughout childhood that scowls occurring in specific contexts (e.g., after an insult) are called "anger," this knowledge would be brought online to make meaning of future facial movements in situations involving insults. The activation of the label across new situations might further warp visual sensations in a top–down manner, causing scowls made following insults to appear more similar to memories of other scowls made following insults than scowls made when contemplating a colleague's question. As Lupyan (2012a) points out, modulation of perception by language can be up-regulated when words are explicitly referenced during perception. By contrast, the modulation of perception by language can be down-regulated when the linguistic system is temporarily impaired via verbal interference or other means.

As an example of the up-regulation of language shaping ongoing visual perception, a set of studies (Lupyan, 2008; Lupyan and Spivey, 2010a,b) examined the role of verbal labels on participants' reaction times to identify visual objects. For instance, in one visual identification task, participants were asked to locate a target object (a chair) in an array of non-target objects (tables). At the start of each block, participants were given an example of the stimulus that was the target. On half the trials, before the array of images appeared, participants also received the verbal instructions to "find the category" or "find the chair." Despite the fact that the word "chair" was redundant with existing instructions, participants were quicker to find the target on these trials, suggesting that labels can help direct attention to certain visual sensations in the environment (Lupyan and Spivey, 2010b).

Importantly, labels appear to modulate sensations in a deep manner by altering which sensations are selected for conscious awareness in the first place. For instance, in one study (Lupyan and Spivey, 2010a) participants completed a task in which they made an object presence vs. absence judgment to briefly presented letters. When participants heard the letter name prior to the judgment, they identified the presence of the letter with greater sensitivity (i.e., judged that it was present when it was in fact present and absent when it was in fact absent). By contrast, a visual cue of the to-be-presented letter did not increase participants' sensitivity to judge it was present during the trial. In an extension of these findings, participants in a separate study (Lupyan and Ward, 2013) were asked to indicate whether they saw a stimulus presented to one eye during continuous flash suppression (CFS) or not. CFS takes advantage of the binocular nature of vision by directing flashing visual images to one eye and a still image to the other eye. Participants consciously perceive the flashing stimulus because it is dynamic, but the static image is generally suppressed from conscious experience. In Lupyan and Ward's (2013) study, participants who were presented with valid vs. invalid labels for the object present during CFS actually showed greater sensitivity to detect the presence of a suppressed image.

"Label-feedback" thus explains the myriad ways in which language impacts spatial cognition (Boroditsky, 2001), color perception (Winawer et al., 2007), action perception (Stanfield and Zwaan, 2001; Zwaan et al., 2002) and not least, our own language and emotion findings across adult cognition (for reviews, see Lindquist and Gendron, 2013; Lindquist et al., in press a,b). For instance, in several of our studies, we have down-regulated the label-feedback effect on adult emotion perception by temporarily decreasing participants' access to the meaning of emotion words via a process called *semantic satiation* (Lindquist et al., 2006; Gendron et al., 2012). In semantic satiation, participants repeat a word out loud 30 times until the meaning of the word becomes temporarily inaccessible. Semantic satiation operates by temporarily disconnecting the phonological form of the word with its meaning (Tian and Huber, 2010). In one of our most recent studies (Gendron et al., 2012), satiating relevant emotion words prior to participants perceiving a face impaired that face's ability to perceptually prime itself again later in the trial. Perceptual priming is evidenced when seeing a stimulus once causes a person to render faster judgments about the identical stimulus on later presentations and is thought to be mediated by visual processing occurring in the visual cortex of the brain (Grill-Spector, 2008). Specifically, participants repeated a relevant emotion word (e.g., anger) or an irrelevant abstract concept (e.g., idea) out loud 30 times before seeing a facial expression (e.g., Identity 1 depicting a scowl). Later in the trial, participants either saw the same face again (e.g., Identity 1 depicting a scowl) or a face that differed in terms of emotion (e.g., Identity 1 depicting a frown), identity (e.g., Identity 2 depicting a scowl), or both (e.g., Identity 2 depicting a frown). We measured perceptual priming as participants' speed to render an arbitrary perceptual judgment (i.e. how close or far apart the eyes of the face were) about the second face presented on critical trials when perceptual priming should occur (e.g., when Identity 1 scowls were followed by Identity 1 scowls). We hypothesized that if emotion concepts are routinely involved in emotion perception, then disrupting access to emotion concepts ought to interfere with how an emotional face is perceived, which would in turn impair its ability to perceptually prime itself later in the trial. Consistent with this hypothesis, semantic satiation interfered with the ability of the first face to facilitate judgments made about the subsequently presented face, even though the task involved making an arbitrary perceptual judgment that did not itself require access to emotion concepts. Importantly, our findings were not due to fatigue because satiating an irrelevant word (e.g., "idea") did not similarly impair a face's ability to perceptually prime itself later in the trial (Gendron et al., 2012).

Together, these findings suggest that language may not only shape emotion by ultimately helping people acquire knowledge about emotion across development, but language might also more proximally shape emotion by contributing to the ability to make situated conceptualizations of emotion in the moment.

# Implications for the Role of Language in the Acquisition and Use of Emotion

The implications of the role of language in emotion concept acquisition and utilization are vast. In applied arenas, investigations of how infants and children develop emotion knowledge via words could inform interventions for individuals with developmental disorders or maladaptive emotions. Recent evidence suggests that the emotion perception deficits observed in adults with autism are mediated by alexithymic traits (Cook et al., 2013), suggesting an important relationship between autism and emotion concept knowledge. Teaching children to pair their bodily sensations or the facial expressions made by others with words early in life might therefore have a protective effect on children at risk of autism. Such interventions could even be used for infants of alexithymic parents, who are at greater risk of becoming alexithymic themselves and experiencing the associated decrements in health and well-being (Berenbaum and James, 1994; Lumley et al., 1996).

Children with language disorder diagnoses also face emotional difficulties, underscoring the import role of language in emotion. In particular, language impairments seem to impact children's ability to differentiate between emotions. One study suggested that children with language impairments lack the full range of differentiated positive and negative emotions, and instead simply differentiate their internal states in global terms such as "good" or "bad" (Fujiki et al., 2002). These same children, lacking the emotion differentiation that language enables, also had trouble identifying internal and external cues that would help them regulate their affective states. These findings are ultimately consistent with evidence that learning specific emotion words (e.g., fear, anger, sadness, disgust) helps normally developing children make more differentiated, nuanced situated conceptualizations of other's affective states. Prior to learning specific emotion words around ages 2–3, normally developing children only seem to understand affective valence (happy vs. sad; Russell and Widen, 2002; Widen and Russell, 2008). Thus, language may help turn a child's feelings of global "badness" into the differentiated negative emotions that his or her culture's linguistic structures represent. For example, alexithymic children may be able to perceive and label core affective states in their bodies such as "bad" and "good," "hurt" and "nice" (which may explain why alexithymic individuals tend to exhibit more somatization disorders: Gulec et al., 2013; Zunhammer et al., 2013; Gulpek et al., 2014; Tominaga et al., 2014), but are unable to differentiate that affect into discrete emotion categories via emotion construction (see Lindquist and Barrett, 2008b for a discussion).

Beyond developing interventions for at-risk children, educational tools encouraging children to label their own and others' emotions might offer individuals skills that contribute to greater social and emotional well-being. Although young children begin to be able to pair words and emotional expressions across the first several years of life, they might benefit from learning to do this earlier on in childhood. As aforementioned, children who know different discrete emotion words (e.g., "anger" vs. "fear" vs. "sadness") can correspondingly differentiate between facial expressions of those emotions, and children with parents who label emotions are better at labeling their own emotions even by 36 months of age. Finding practical ways to increase parents' skill at discussing the bodily, situational, and behavioral aspects of emotions with their children would likely prove beneficial. Classrooms and daycares that routinely ask young children to pair words with facial expressions or to describe the situational and interoceptive features of their feelings may thus produce more emotionally intelligent children who exhibit less worry and depression (Rieffe et al., 2007) and who have superior social and academic outcomes both in the moment and later in life (for reviews see Halberstadt et al., 2001, 2013). Indeed, a recent meta-analysis (Durlak et al., 2011) indicates that children who go through emotion training techniques exhibit an 11-percentile point increase in achievement on grades and standardized test scores and exhibit more prosocial behavior and less emotional distress in daily life.

Language-based emotion interventions might not just apply to children. It is argued that adults with alexithymia have difficulty making situated conceptualizations of their on-going affective states and of others' facial muscle movements (Lindquist and Barrett, 2008a). If language plays a role in emotion, then one means of treating such individuals would be to have them engage in word-emotion matching tasks. Such tasks could be used in therapists' offices, in the workplace, or implemented online. We recently found that alexithymic adults have more difficulty matching an emotional face (e.g., an angry face) with another emotional face (e.g., another angry face) than do non-alexithymic adults. Yet alexithymic adults are as quick and sensitive as nonalexithymic individuals when asked to pair a face with a word (Nook et al., in press). These findings suggest that alexithymic individuals may not automatically access words to help make situated conceptualizations of facial expressions, but that when words are provided for them, they can indeed engage in a situated conceptualization of emotion.

The construct of alexithymia has been correlated with a host of psychopathologies (Bagby et al., 1986), and in particular somatization disorders (see Gucht and Heiser, 2003 for review; for a recent review of alexithymia in depression and anxiety, see De Berardis et al., 2008; for a recent review of alexithymia in eating disorders, see Nowakowski et al., 2013; for a recent review and meta-analyses of alexithymia's connection with schizophrenia, see O'Driscoll et al., 2014; for other recent empirical work on alexithymia's connection to personality disorders, see Nicolo et al., 2011; Loas et al., 2012); thus, language-based emotion interventions in adults may have clinical relevance to multiple forms of psychopathology. For example, depression and anxiety are both associated with difficulty identifying feelings, while anxiety is specifically associated with difficulty describing feelings (Korkoliakou et al., 2014). Similarly, a recent study on borderline personality disorder (BPD) found that individuals with BPD reacted to empathy inductions with greater personal (rather than empathic) distress and greater difficulty labeling their affective reactions (New et al., 2012). Demiralp et al. (2012) found that individuals with Major Depressive Disorder had less differentiated negative emotion experiences compared with healthy individuals. Our model would suggest that the poor negative emotion differentiation observed in individuals with Major Depressive Disorder may be in part driven by a paucity of conceptual knowledge, which would make it more difficult for individuals to understand and ultimately regulate their negative feelings. Indeed, greater differentiation of one's emotional states is associated with better emotion regulation (Barrett et al., 2001). Better emotion differentiation may also serve as a protective factor against destructive emotion regulation strategies such as non-suicidal self-injury, as a recent daily diary study of individuals with BPD found (Zaki et al., 2013). Although some research suggests that language (in the form of journaling, for example) can dampen or distance the effects of negative emotions (Pennebaker and Beall, 1986; Wilson and Schooler, 1991; Pennebaker, 1997; Hemenover, 2003), other work suggests that language can also increase the discreteness of an emotion experience, making it easier to regulate (Lieberman et al., 2007; Kassam and Mendes, 2013; Burklund et al., 2014). This literature supports our predictions that when language for emotion concepts is both present and accessible, emotions are constructed as more distinctive and discrete experiences. We suggest that interventions targeted at increasing individuals' conceptual knowledge and tendency to make situated conceptualizations of emotions (for example, developing a more nuanced understanding of the bodily, behavioral, and situational dimensions of emotions and using that information to help identify which emotion one is feeling) may particularly help individuals who are struggling with high emotional lability and mood dysregulation. We suggest that increased emotion differentiation, supported by improved conceptual and linguistic resources regarding emotion, will increase individuals' ability to identify and articulate what they are feeling in a way that promotes effective emotion regulation.

Findings suggest that language may also help bilingual or multilingual individuals implicitly regulate their emotions. For instance, it has been argued that because some languages denote differences between emotion categories that others do not (e.g., Vietnamese speakers conceive of shame v. anguish as distinct, whereas English speakers does not; Alvarado and Jameson, 2010), this may promote greater emotion differentiation and thus, greater emotion regulation, when speakers are thinking in this language (for a review see Pavlenko, 2014). Bilingualism might also support emotion regulation by implicitly producing emotional distance when an individual is speaking in their non-dominant language. "Distancing" is an emotion regulation strategy that involves deliberately assuming a detached perspective on the emotional situation (Beck, 1970; Kross and Ayduk, 2011; Kross et al., 2014). A number of studies suggest that multilingual speakers experience less emotional reactivity (measured as skin conductance responses, self-ratings) when presented with words or phrases or when asked to recall events in their non-native language (for a review see Pavlenko, 2014). A second language might therefore implicitly "distance" individuals from the affective value of past and/or present events. However, whether a first or second (or third, etc.) language is likely to serve a distancing function depends on whether that language is a person's dominant and most frequently used language. In cases in which individuals report that their second language is their dominant and preferred language, those individuals tend to have greater reactivity toward affective words in their second, as compared to their first language (Degner et al., 2012; Simcox et al., 2012).

In addition to important applied implications of a languageemotion link, there are vast theoretical implications for the role of language in emotion. Not least of which, is the implication

that emotions are constructed via more basic elements rather than physical types that are only named by words (Barrett, 2006b; Lindquist, 2013; Lindquist et al., in press b). More broadly, the role of language in emotion opens avenues for understanding the cultural relativity of emotions. Previous work on the "taxonomy" of emotions relied on English emotion terms to define the emotions that exist (for example, Shaver et al., 1987; Zinck and Newen, 2007). In reality, we believe that these types of measures catalog conceptual knowledge about emotion categories, and in most cases, English emotion conceptual knowledge in particular. However, as Wierzbicka and others have argued, mapping human experience solely based on English language terms fails to understand the highly variable nature of emotion across cultures (Wierzbicka, 2009).

The CAT recognizes the power of language in emotion and instead predicts that the precise emotions experienced by a given person in a given culture will depend on the emotion concepts available to that person. Instead of hypothesizing that basic emotion categories are given by specific structures in the brain, the CAT envisions emotion terms such as "happiness," "sadness,"

# References


"fear," "disgust," and "anger" as abstract concepts that become "essence placeholders" for distributions of conceptual knowledge about embodied mental states. Such conceptual knowledge feeds forward in a given instance of experience to help predict the meaning of bodily sensations in a given context. The brain infers that the current subjective state most closely matches a certain population of instances characterized as, e.g., "sadness" (in English) or perhaps an entirely different population of instances in another language. Critically, since different languages provide different morphological placeholders for distributions of embodied knowledge, this could cause individuals to segment and experience their momentary bodily states in either subtly different or even quite distinct ways. Recognizing the role of language in emotion may thus help scientists better measure and document the individual differences and cultural relativity underlying emotion categories in a way that can plot new directions for the study of emotion. We look forward to future directions in language-emotion research that will assess the ways in which language acquisition and utilization shape how humans experience the emotional world across the lifespan.

*Approaches*, eds G. R. Semin and G. R. Smith (New York: Cambridge University Press).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Lindquist, MacCormack and Shablack. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# 10 years of BAWLing into affective and aesthetic processes in reading: what are the echoes?

Arthur M. Jacobs 1, 2, 3 \*, Melissa L.-H. Võ<sup>4</sup> , Benny B. Briesemeister <sup>1</sup> , Markus Conrad2, 5 , Markus J. Hofmann1, 6, Lars Kuchinke2, 7, Jana Lüdtke1, 2 and Mario Braun1, 8

<sup>1</sup> Department of Experimental and Neurocognitive Psychology, Freie Universität Berlin, Berlin, Germany, <sup>2</sup> Cluster of Excellence "Languages of Emotion", Freie Universität Berlin, Berlin, Germany, <sup>3</sup> Dahlem Institute for Neuroimaging of Emotion, Berlin, Germany, <sup>4</sup> Scene Grammar Lab, Department of Cognitive Psychology, Goethe University Frankfurt, Frankfurt, Germany, <sup>5</sup> Department of Cognitive, Social and Organizational Psychology, Universidad de La Laguna, San Cristóbal de La Laguna, Spain, <sup>6</sup> Department of Psychology, General and Biological Psychology, University of Wuppertal, Wuppertal, Germany, <sup>7</sup> Experimental Psychology and Methods, Faculty of Psychology, Ruhr Universität Bochum, Bochum, Germany, <sup>8</sup> Centre for Cognitive Neuroscience, Universität Salzburg, Salzburg, Austria

### Edited by:

Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Tübingen, Germany

#### Reviewed by:

Pia Knoeferle, Bielefeld University, Germany Marta Ponari, University of Kent, UK

#### \*Correspondence:

Arthur M. Jacobs, Department of Experimental and Neurocognitive Psychology, Freie Universität Berlin, Habelschwerdter Allee 45, D-14195 Berlin, Germany ajacobs@zedat.fu-berlin.de

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 20 November 2014 Paper pending published: 08 March 2015 Accepted: 13 May 2015 Published: 03 June 2015

#### Citation:

Jacobs AM, Võ ML-H, Briesemeister BB, Conrad M, Hofmann MJ, Kuchinke L, Lüdtke J and Braun M (2015) 10 years of BAWLing into affective and aesthetic processes in reading: what are the echoes? Front. Psychol. 6:714. doi: 10.3389/fpsyg.2015.00714 Reading is not only "cold" information processing, but involves affective and aesthetic processes that go far beyond what current models of word recognition, sentence processing, or text comprehension can explain. To investigate such "hot" reading processes, standardized instruments that quantify both psycholinguistic and emotional variables at the sublexical, lexical, inter-, and supralexical levels (e.g., phonological iconicity, word valence, arousal-span, or passage suspense) are necessary. One such instrument, the Berlin Affective Word List (BAWL) has been used in over 50 published studies demonstrating effects of lexical emotional variables on all relevant processing levels (experiential, behavioral, neuronal). In this paper, we first present new data from several BAWL studies. Together, these studies examine various views on affective effects in reading arising from dimensional (e.g., valence) and discrete emotion features (e.g., happiness), or embodied cognition features like smelling. Second, we extend our investigation of the complex issue of affective word processing to words characterized by a mixture of affects. These words entail positive and negative valence, and/or features making them beautiful or ugly. Finally, we discuss tentative neurocognitive models of affective word processing in the light of the present results, raising new issues for future studies.

Keywords: Berlin Affective Word List (BAWL), valence decision task, lexical decision task, emotion, word recognition models, neurocognitive poetics, reading, aesthetics

# Introduction

The aim of this paper is to discuss the contribution of a lexical data-base, the BAWL, to the study of affective and aesthetic processes in reading. We start with a short overview of studies using the BAWL in a variety of experimental settings that investigate a wide range of questions covering perceptual-attentional, memory, affective-aesthetic, or social-emotional issues. We then present a re-analysis of the original BAWL data (Võ et al., 2006) suggesting that both discrete emotion and embodiment or semantic richness variables also affect processing of the BAWL words. Results of three new studies from our lab investigating affective lexical semantics are subsequently discussed: one uses a special version of the BAWL to look at affective lexical semantics in children, one uses a novel class of stimuli that have a clear bivalent affective semantic structure, and the last one looks at what makes words beautiful or ugly. The paper ends with a discussion of tentative neurocognitive models of affective word recognition in the light of results from this and other recent publications addressing the How, Where, and When questions of valence ratings and decisions.

Experimental research on visual word recognition and reading has long neglected the fact that those high-dimensional symbolic stimuli called words have properties relating to our bodily sensations and actions, as well as to our affective system. Thus, popular models of visual word recognition, text processing, or reading remained completely silent with regard to potential affective or aesthetic effects of words (Jacobs, 2011). This might come as a surprise considering that early theoreticians of language, such as Freud (1891) or Bühler (1934), already argued that both spoken and written words are embodied stimuli with the potential to elicit overt and covert sensory-motor and affective responses. For example, Bühler introduced the notion of "Sphärengeruch" (spheric fragrance of words), according to which words have a substance, and the actions they serve speaking, reading, thinking, feeling – are themselves substancecontrolled. He gives the example of the word "Radieschen" (garden radish) that can evoke red or white color impressions, crackling sounds, or earthy smells and spicy tastes in the minds of readers and transport them either into a garden or to a dinner table, which creates an entirely different "sphere" as, say, the word "ocean". The renaissance of Bühler's ideas in recent theories of symbol grounding, embodied cognition, or neural reuse (Niedenthal, 2007; Anderson, 2010; Willems and Casasanto, 2011) can explain why evolutionary young cultural objects like words can evoke basic and fiction emotions as well as aesthetic feelings at the subjective-experiential level of observation, and also activate affective processing networks at the neuronal level. As outlined in Schrott and Jacobs (2011) the challenge here is to bridge the gap between neurobiological theories of emotion, as perhaps best represented by Panksepp's (1998) core affect systems theory, and complex (psycho-)linguistic models, as exemplified by Jakobson's (1960) extended version of Bühler's (1934) 'organon model' of language functions. Since evolution had no time to invent a proper affective system for art reception, even less so for reading, the emotional and aesthetic processes we experience when reading must be somehow linked to the ancient neuronal affect circuits we share with all mammals. As a concise name for the latter assumption about the emotion-language link we have coined the term 'Panksepp-Jakobson hypothesis' (Jacobs and Schrott, 2013; Jacobs, 2015b), which finds indirect or direct support in many papers from our lab and others (e.g., Cupchik, 1994; Kneepkens and Zwaan, 1994; Miall and Kuiken, 1994; Oatley, 1994; Kuchinke et al., 2005; Kissler et al., 2007; Hofmann et al., 2009; Schacht and Sommer, 2009; Briesemeister et al., 2011a,b; Altmann et al., 2012, 2014; Bohrn et al., 2012a,b, 2013; Briesemeister et al., 2012, 2014a,b; Ponz et al., 2013; Hofmann and Jacobs, 2014; Hsu et al., 2014; Jacobs, 2014a,b; Hsu et al., 2015a,b,c).

# The "Berlin Affective Word List" (BAWL) As a Basic Tool for Studying Affective and Aesthetic Processes in Reading

Limbach's (2004) wonderful book presenting the results of the election of the most beautiful German words over many years, makes readers discover impressive examples for the fact that even 9-year old children can find discrete emotions, such as joy, or feelings of beauty in single words and can also convincingly argue why (Schrott and Jacobs, 2011). These examples leave no doubt that words can be positive or negative, beautiful or ugly, more or less exciting or calming, evoke mental images of sensorymotor events, or feelings of happiness. They also support the notion of one-word poetry, i.e., that single utterances or words even outside lyrical contexts—can fulfill what Jakobson called the poetic function and cause aesthetic emotions (Jakobson, 1960; Jacobs and Kinder, 2015).

However, introspections and intuitions about how words can evoke affective and aesthetic processes are one thing; experimentally demonstrating this is yet another. Here we will not immerse into discussions on what emotions are (Kagan, 2010). Rather, we focus on the empirical demonstration of different word properties and their influence on recognition processes that can be meaningfully related to theories of emotion covering a wide spectrum from the classical valence/pleasantness and arousal/activation dimensions of words, to discrete emotion and embodied cognition features, as estimated by ratings of joy/happiness, disgust, or smelling.

To provide a basic tool for researchers interested in affective reading processes in the German language, we have over the last 10 years developed the BAWL—providing valence, arousal, and imageability ratings for approximately 3000 German words. **Table 1** summarizes more than 50 studies (until November 2014) that have used words from the BAWL to study effects of affective word properties (other studies not included here have used the BAWL to control for affective word properties, e.g., Briesemeister et al., 2009; Hofmann et al., 2011; Hsu et al., 2014, 2015a,b,c). The majority of these studies used single words and employed an explicit valence decision task (VDT) or an implicit<sup>1</sup> lexical decision task (LDT). But also various memory tasks with mostly valence as the independent variable (IV) and a variety of dependent variables (DVs) have been used to explore effects on sublexical, lexical, and supralexical levels.

These studies show that the BAWL is a popular tool for bridging the language—emotion gap in research and that its stimuli are well cross-validated at the three relevant processing levels: experiential (e.g., subjective ratings, selfreports; Võ et al., 2006; Schnitzspahn et al., 2012), behavioral and psychophysiological (e.g., response times, heart rate, startle reflex, oculo- and pupillometric responses; Kuchinke et al., 2007; Võ et al., 2008; Bayer et al., 2011; Briesemeister et al., 2011a,b; Herbert et al., 2013), and neuronal (fMRI, EEG, fNIRS, and TMS or tDCS; Kuchinke et al., 2005, 2006; Hofmann et al., 2009; Conrad et al., 2011; Bayer et al., 2012a,b; Schlochtermeier et al., 2013; Tempel et al., 2013; Weigand et al., 2013a,b; Briesemeister

<sup>1</sup> Implicit with regard to the affective processing of words.

#### TABLE 1 | Summary of studies using the BAWL for stimulus manipulations.


Abbreviations: Val, valence; Aro, arousal; disc. Emo, discrete emotion.

et al., 2014a,b; Gärtner and Bajbouj, 2014; Hsu et al., 2014; Recio et al., 2014), as well as in computational-information technological and cartographic studies (Pak and Paroubek, 2010; Garcia Becerra, 2012; Hauthal and Burghardt, 2014).

Complementing the "Affective Norms for English Words" (ANEW; Bradley and Lang, 1999), the Sussex Affective Word List (SAWL; Citron et al., 2012), or the "Affective Norms for German Sentiment Terms" (ANGST; Schmidtke et al., 2014a), which rely on a dimensional theory of emotion a la Wundt, Lang, or Russell, a recent version of the BAWL, the DENN-BAWL, is also compatible with discrete emotion theories, such as Darwin's or Panksepp's (Briesemeister et al., 2011a, 2014a,b). Even more recent extensions include a multilingual version of the BAWL containing more than 6000 words allowing comparisons between German, Spanish, English, and French (Schmidtke et al., 2014a), and preliminary versions for testing children, the kidBAWL, including embodiment ratings (eBAWL), the noun-noun compound/NNC-BAWL, special versions for clinical applications (cBAWL; Gole et al., 2012; Kometer et al., 2012; Herbert et al., 2013; Gärtner and Bajbouj, 2014), and one for experiments in neuroaesthetics (bBAWL). As shown in the following sections, the BAWL can be used to estimate the emotion potential of lexical or supralexical units, and is complemented at the sublexical level by the EMOPHON tool, allowing to estimate the affective value of sublexical units (Aryani et al., 2013). Together these tools offer the possibility to obtain estimates of the emotion potential and aesthetic aspects not only for single words but also for supralexical units like text passages, poems, or songs, as evidenced by recent studies from our lab (Jacobs et al., 2013; Hsu et al., 2014, 2015a,b,c; Lüdtke et al., 2014; Jacobs, 2015a,b).

# BAWL06 Reanalysis of Valence Decision Response Times (VDRTs) With a Combination of Exploratory Factor Analysis and Increasingly Complex Linear Mixed Models (LMM)

Researchers interested in affective word properties face the challenge to single out effects of features like valence from more than 50 quantifiable factors known to affect word recognition performance (Graf et al., 2005). Apart from valence and arousal, the about 3000 words validated in our first two BAWL papers (Võ et al., 2006, 2009) are characterized by a dozen relevant psycholinguistic variables, such as word length, neighborhood density or frequency, allowing to disentangle possible affective effects from those factors often confounded with valence or arousal, e.g., imageability (Kousta et al., 2011; Westbury et al., 2013).

In the original paper introducing the BAWL (Võ et al., 2006; henceforth BAWL06), we presented VDRTs as a function of valence ratings for 360 German words and obtained a slightly asymmetric, inverse U-shaped curve, mean RTs being shortest for positive words, followed by negative, and neutral ones. The valence ratings accounted for about 50% of mean RT variance (for a subset of 360 words), thus leaving 50% unaccounted for. Since then, the words in the BAWL have been updated by a number of additional features. So, here we ran a reanalysis of the original data to see which other variables may account for the remaining 50% of variance. Likely candidates are other affective-semantic variables like arousal and imageability, (sub)lexical variables like frequency, number of syllables, or neighborhood density, and discrete emotion variables (Briesemeister et al., 2011a, 2014a,b). Moreover, some recent work has provided evidence that words also possess the potential to evoke bodily sensations and mental imagery associated with the sensory-motor system, one nontrivial source of semantic information (Bühler, 1934; Andrews et al., 2009). We thus also took into account variables related to sensory experience (Juhasz et al., 2011) and body object interaction (Siakaluk et al., 2008), sometimes being considered as parts of a metavariable affecting word recognition called semantic richness (Pexman et al., 2008; Yap et al., 2012). In doing so, we followed a mixed approach combining available data for variables such as word frequency or discrete emotion ratings (from the DENN-BAWL) with newly collected ratings of embodiment features. To reduce complexity, the latter were submitted to an exploratory factor analysis which is useful to find possible (latent) factor structures underlying a larger number of variables. This resulted in a tentative three factor solution. A total of 14 variables were then submitted to a stepwise LMM to explore which type (e.g., affective-semantic vs. embodiment) and combination of variables may have played a role in determining the BAWL06 VDRT data (see Appendix in Supplementary Materials for details).

# DENN-BAWL and eBAWL: Discrete Emotion and Embodiment Features Structure of BAWL06 Words

A fine-grained analysis of the 175 words of the BAWL06 for which we had discrete emotion ratings revealed a maximum of 91 (52%) words for which joy/happiness was the "dominant" associated emotion (i.e., maximum rating value of all five discrete emotions), 35 (20%) anger words, 32 (18.5%) fear words, only 10 sad words (6%), and a minimum of six disgust words (3.5%). The "top 3" (i.e., joy rating > 2.5/5) joy words were: SONNE (sun), MEER (sea), and SOMMER (summer), the top anger-related words: STAU (traffic jam), ARROGANZ (arrogance), and GEIZ (avarice), the top fear words: GIFT (poison), UNHEIL (calamity), and MORD (murder), the top sad words: LEID (distress), TRENNUNG (separation), and FRIEDHOF (graveyard), and the top disgust words: GESTANK (stink), ÜBEL (evil), and BAKTERIE (bacteria). However, as a matter of fact, a lot of words do not really have a dominant emotion associated with it, but are clearly ambi- or polyvalent, e.g., the word "rocket" (RAKETE) shares a mean joy rating of 1.95 with a mean anger value of 2, and a valence rating of -0.8. Now, is this word neutral or negative, or does it rather have a mixed affectivesemantic structure (Briesemeister et al., 2012)? Other striking examples emphasizing our point are words like SCHLAG ("blow" or "strike"), for which two negative emotions with an opposite approach-avoidance structure compete (average anger value = 2.8, fear value = 2.7), or SCHULD ("guilt"), for which we have a perfectly balanced trivalent structure (anger, sadness, and fear = 2.3).

To examine possible effects of embodied cognition, we collected the following embodiment ratings for about 700 German words (see Appendix A1 in Supplementary Materials), asking to what extent subjects associate a word with seeing/SEE, hearing/HEA, smelling/SME, tasting/TAS, touching/TOU, feeling/sensing/FEE, or moving/MOV (eBAWL, cf. McRae et al., 2005). Looking at the dominant embodiment ratings for all BAWL06 words available (N = 193), we found a maximum of 116 (60%) words for which seeing was the dominant sensorymotor association, 53 feeling words (27%), nine hearing words (5%), six tasting (3%), five touch words (2.5%), three moving words (1.5%), and a minimum of only one smell word (0.05%). As an illustration, the highest embodiment or E-index (sum of all seven ratings) of all words had: MEER (sea; 31.0), followed by HONIG (honey; 25.9), SCHWESTER (sister; 25.9), and ZIGARRE (cigar; 25.8); the lowest had ZWECK (purpose; 9.8), followed by ZUFALL (chance 10.5), RABATT (discount; 10.7), and SPIONAGE (espionage; 10.9). Similarly to the mixed discrete emotion structure, the BAWL06 words also seem to have a mixed embodied feature structure, as exemplified by words like WAFFE ("weapon") with very similar ratings for: touch (4.88), seeing (4.82), and hearing (4.53), or TRENNUNG ("separation"): seeing (4.76), and feeling (4.53).

An exploratory factor analysis (maximum likelihood/varimax) on the seven embodiment variables revealed a significant three-factor structure accounting for about 55% of the variance with acceptable eigenvalues (2.3, 1.4, 1.2). TAS and SME were related to Factor 1 (Taste), TOU and SEE to Factor 2 (Grasp), MOV and HEA to Factor 3 (Move), and FEE only marginally to Factor 3. These three factors were also included in the following analyses.

# Stepwise LMM Approach with Three Affective-Semantic, Three (Sub)lexical, Five Discrete Emotion, and Three Embodiment Variables

The previous analyses indicated that the BAWL06 words have a complex mixed affective-semantic structure that likely contributes to variance in dependent measures such as VDRT or LDRT. We tested this assumption using a stepwise LMM approach whose advantages in psycholinguistic research using two random factors (i.e., participants and words) have been discussed elsewhere (e.g., Baayen et al., 2008; Kliegl et al., 2010; Janssen, 2012; Kuchinke and Lux, 2012; Yap et al., 2012; Lüdtke et al., 2014). Following Janssen (2012), a statistical model of the data using log-transformed VDRT<sup>2</sup> as dependent variable was built from a null model (two random effects only: participants and words) by stepwise adding all main fixed effects<sup>3</sup> for three affective-semantic (valence/V, arousal/A, imageability/I), three (sub)lexical (logF, syllables/S, and N),

#### TABLE 2 | Results of stepwise LMM analysis.


Number and name of factors in brackets (see Appendix A2 in Supplementary Materials for more details).

five discrete emotions (joy/happiness/HA, anger/AN, fear/FE, sadness/SA, and disgust/DI), and three embodiment variables (Taste, Grasp, Move). We then started with four simple unmixed models (Affective-Semantic, Lexical, Discrete, Embodiment), after which we entered the eight variables that yielded significant effects in those four models into a complex mixed (CoMi) model using V, A, Syl, HA, AN, SA, Taste, and Grasp as fixed effects (see **Table 2**). As an additional control model, we tested an LMM combining all 14 variables (14 V model) of the four unmixed models (independently of the significance of their effects). The two best-fitting were also the most complex models (CoMi and 14 V) which could not be discriminated on the basis of the AICc values<sup>4</sup> . A chi-square test using the log-likelihood data (i.e., likelihood ratio test) revealed a significant difference [chi-square (df = 6) = 28, p < 0.001] favoring the 14 V model.

What emerges from these results is a more complex picture than back in 2006: depending on which variables are entered into LMM or standard multiple regression analyses, VDRTs can be affected by all four groups of variables analyzed here: affectivesemantic, (sub)lexical, discrete emotion, and embodiment. This fits with results from the above mentioned studies showing effects of both discrete emotion and embodiment or semantic richness variables. To what extent those variables interact with each other (and variables not considered here) in influencing simple or transformed RTs from valence/lexical decision or other reading tasks is an issue for future studies. A related question is to what extent rating variables like valence and happiness, arousal and fear, or disgust and smell tap into the same underlying mental/neuronal processes (Westbury et al., 2013, 2014). We believe this issue cannot be decided on the basis of more or less exploratory LMM or regression analyses alone, but requires the research strategy of functional overlap modeling by help of computer models of visual word recognition that have sufficient structure to not only simulate effects of lexical variables, but also of the other three types of factors analyzed here (Jacobs and Grainger, 1994; Grainger and Jacobs, 1996; Hofmann and Jacobs,

<sup>2</sup> Initial analyses showed that using logRT instead of RT improved model fits significantly (cf. Janssen, 2012).

<sup>3</sup> In order to keep the analyses simple given the total number of variables, and since logRT transforms can have nonlinear effects on interactions (Kliegl et al., 2010), we did not introduce any interaction terms into the present LMM analyses. Future

studies should look into theoretically and empirically well founded interactions between the 20 variables examined here (cf. Yap et al., 2012).

<sup>4</sup>The thumb rule is that a difference of 10 points is usually significant (Janssen, 2012).

2014). We will discuss first steps into this direction at the end of this paper.

# Affective Lexical Semantics in Children: the KidBAWL

Effects of dimensional and discrete affective word features are now well documented for adult subjects. However, we are not aware of similar studies using the ANEW or SAWL, for instance, on children. The already mentioned examples from Limbach's (2004) book and observations from daily life suggest, though, that children are already aware of emotional and even aesthetic properties of single words. We thus ran a first study using an adapted mini-version of the BAWL (N = 90 words compatible with text book vocabulary for age groups 7–12) on a sample of 20 children between age 7 and 12 to see to what extent the results obtained with adults could be replicated or extended (see Appendix in Supplementary Materials for Method details). The children rated these words (normally distributed on the variables valence and arousal, as taken from the BAWL06/09 databases) on valence and arousal, and additionally reported if the word was unknown or hard to imagine (imageability check). The ratings of all 20 children showed both strong valence and arousal effects and an LMM with six relevant fixed effects (valence, arousal, imageability<sup>5</sup> , syllables, frequency, and N) and two random effects (participants, words) showed that the standard (i.e., adult) valence and arousal values from the original BAWL were significant predictors of the children's valence ratings [t ratio (valence) = 15.37; p < 0.0001; t ratio (arousal) = −3.13; p < 0.0001], whereas only BAWL arousal was a significant predictor for the arousal ratings of the children [t ratio (arousal) = 7.36; p < 0.0001].

**Figure 1A** shows how well the adult valence ratings predict those of the children across the entire valence range: the overall correlation is high (r = 0.91; p < 0.0001) suggesting that in general at the level of categories (negative, neutral, positive) children of that age group have about the same concept of valence and/or the same judgment behavior as adults. If one breaks this down to the three valence categories, the correlations reveal a more differentiated picture: For the 30 negative words, only a quadratic correlation was significant (t ratio = −2.1; p < 0.045) suggesting that children use a wider range of negative ratings including extreme values, e.g., the noun GEWALT (violence) and the verb MORDEN (to kill) had more extreme z-values for children than for adults (−2.2 vs. −1.4 and −2 vs. −1.4, respectively). For the 30 neutral words, the linear correlation was significant (t ratio = 2.1; p < 0.046), whereas for the 30 positive words no significant correlation could be observed in this sample. This is due to extreme discrepancies for words like the verb KÜSSEN (to kiss) which had a much less positive zvalue (0.3) for children than for adults (1.4). An even extremer example is the adverb OPTIMAL (optimal) with a z-value of 0.02 for children compared to 1.3 for adults. In contrast, the nouns MAMA (mama) or NATUR (nature) evoked more positive judgments in children (both 1.5) than in adults (both 1.2).

**Figure 1B** shows how well the adult arousal ratings predict those of the children: the correlation is significant but not perfect (r = 0.67; p < 0.0001; Schmidtke et al., 2014a, already document that arousal ratings generally appear to be less reliable than valence ratings). The higher intercept of the children's ratings might suggest that either they felt more aroused by the words or were more biased toward choosing higher arousal values.

Although due to the small sample size of participants and words these results might not be representative, they raise interesting questions for future studies in this underresearched field: Is there a general tendency for children to judge words associated with aggression or violence more negatively than adults? How do affective semantic fields develop over life span, and which role does age of acquisition play in this?

To generate more research questions for future studies on affective lexical semantics, we also looked at individual items and how they differ with regard to the variation in children's valence or arousal responses. The three "least stable" words concerning valence ratings were KILLER (killer), TUMOR (tumor), and

<sup>5</sup>Adult ratings taken from the BAWL06/09 studies.

TERROR (terror) with standard deviations of ≥1.5. The three most stable were NATUR (nature), TOPFIT (topfit), and MAMA (mama) with std ≤0.5. **Figure 2** shows why KILLER is affectively so ambivalent and MAMA so unambiguous: some children find KILLER very negative, but others seem to think the opposite, its mean valence being slightly positive (2.6/5). In contrast, the affective semantics for the word MAMA seem stable, all subjects seeing it on the "good" side of the valence scale. Although all our items were compatible with text books for children of that age group and we analyzed only words judged to be familiar, the level of comprehension for items like KILLER or OPTIMAL (see above) might, of course, still differ much more for children than for adults. This is supported by the fact that—in contrast to adults—for neutral and positive words, imageability was a significant predictor of children's valence ratings (r <sup>2</sup> = 0.21, p < 0.012; r <sup>2</sup> = 0.15, p < 0.035, respectively).

Our aim here is not to enter into test- or measurement theoretic issues, but to illustrate some of the complexities of trying to determine which subject- and item-related factors influence valence and arousal responses with high-dimensional word stimuli. In standard papers involving the BAWL, ANEW or similar databases, such "qualitative" analyses are not presented, but they are helpful when it comes to developing "hot" process models of reading that include affective aspects, as discussed later.

# Affectively Bivalent Words: the NNC-BAWL

The above KILLER example demonstrates that words can appear to have a mixed or ambivalent affective semantic structure as a result of averaging ratings across different subjects (Briesemeister et al., 2012). But can they also have an intrinsically mixed or polyvalent structure, and, if so, how valid are our valence measures? We examined this question empirically using the novel case of affectively bivalent noun-noun compounds (NNCs). One motivation for this were the results of recent computational studies using co-occurrence analyses of ultralarge databases (>10 billion words; Shaoul and Westbury, 2009; Warriner et al., 2013) to estimate the semantic structure of emotion words (Westbury et al., 2013, 2014). They demonstrated computationally that an important factor contributing to the mixed affective semantic structure of words is the "company they appear in," thus confirming Andrews et al.'s (2009) model of lexical semantics. Using very large sample sizes, such objective co-occurrence analyses are helpful for complementing subjective rating studies, as evidenced by our recent finding that valence, arousal, or imageability judgments can be largely or entirely accounted for by two computational measures: the size and density of a word's context and the multiple emotional associations of the word (Westbury et al., 2013, 2014). Next, we present a study in which we varied the valence of the "company" of a word being part of an NNC to examine how within-word valence (in)congruities affect ratings and VDRTs.

## Uni- And Bivalent NNCs

Take the word SEXBOMB and try to judge its valence and arousal. Given that the word is familiar and its processing therefore largely automatized the task is perhaps not too difficult and you will most likely rate it as positive and arousing (as average ratings suggest) despite the fact that its second component (the head) is a negative fear word. Apparently, the first word (the modifier) here is dominant for affective semantics. But what about the neologism BOMBSEX? Probably you read this word for the very first time and therefore it will take a bit longer to evaluate its emotion potential, likely due to the interactive and concurrent integration of phonological, morphosyntactic, and semantic features into a complex meaning gestalt which involves the left inferior frontal gyrus (LIFG) as a neuronal key structure (Forgacs et al., 2012). This integration process might be hindered by the feeling that the first word of the NNC is, in principle, negative, whereas the second is positive, thus creating a valence conflict which might interfere with interpreting, fluent word recognition, and the overall valence rating.

In order to examine the effects of such valence conflicts on word processing, in a recent study using the VDT we created 120 novel NNCs (10–16 letters long; see Appendix in Supplementary Materials for Method details). The NNCs were divided into four valence categories based upon the BAWL06/09 ratings for each of the two words constituting an NNC (negative valence from -3 to -1.3; positive valence from 1.3 to 3): positive-positive (PP, e.g., DUSCHVENUS/shower-venus), negative-negative (NN; PICKELHORROR/pimple-horror), positive-negative (PN; JUGENDFREITOD/youth-suicide), and negative-positive (NP; MIGRÄNEHOBBY/migrane-hobby). Participants first carried out a VDT, followed by ratings for each word on the following dimensions: valence (−3 to 3), arousal (1–5), imageability (1–7), and comprehensibility (1–7). The results of a One-Way ANOVA showed that compound type had a significant effect on VDRTs (F = 19.16; p < 0.0001), and post-hoc t-tests showed that both incongruous conditions had longer RTs than the congruous ones

(PN: 1.95s ≤ NP: 1.98s > NN: 1.69s ≤ PP: 1.74s; all ps < 0.0001), but did not differ significantly from each other.

Another question we asked was to what extent the rated NNC valence was determined by the valence of the two nouns (as rated independently in the BAWL06/09 studies). The box-plots in **Figure 3** show that the clearest results were obtained—as could be expected—for congruous NNCs (PP, NN) with a slight advantage for double negatives, where all 30 compounds were rated as negative, whereas 4/30 PP words were rated as negative although both components were positive. The interesting result is that both incongruous NNCs had almost identical distributions and means both being rated as negative (NP: −0,85; PN: −0.83), suggesting a negativity bias or negative valence dominance for bivalent NNCs, independent of whether head or modifier are negative. Comprehensibility was ranked as follows: PP > NN ≥ PN ≥ NP, the latter three not differing significantly from each other. This finding can be explained by the fact that positive words provide a greater amount of semantic associations (Hofmann et al., 2011; Hofmann and Jacobs, 2014). Thus, semantic activation can spread across these associative pathways, and thereby elicit a positivity bias during meaning construction, an interesting hypothesis to be tested in future research (see also Lüdtke and Jacobs, this issue). Arousal was ranked: NN > NP ≥ PP ≥ NP, the only significant difference being between NN and NP.

Overall, the results indicate that valence conflicts in compounds interfere with meaning construction and raise important issues for future studies, e.g., about the time course and neuronal correlates of processing affective and other emotional or non-emotional semantic features of words (Briesemeister et al., 2014a,b). Such uni- or bivalent NNCs can be useful stimuli in studies on combinatorial semantic processing and metaphor comprehension (Forgacs et al., 2012), conflict resolution, affective word processing requiring stronger valence conditions (i.e., double-positive or –negative words), affective priming (Fazio, 2001), or cultural and existential neuroscience (Silveira et al., 2013). For example, Graupmann et al. (2013) used novel NNCs constructed from BAWL09 words (e.g., ENTENBUMERANG/duck-boomerang) as "meaning threat primes" in a recent study on cultural preferences.

# What Makes Words Beautiful or Ugly? The bBAWL

In the above mentioned book on the most beautiful German words (Limbach, 2004), the 9 year old Sylwan Wiese explains why the word LIBELLE (dragonfly) is the most beautiful for him: it has three "Ls" which is his preferred letter. This makes the word glide so well on his tongue (which is not the case for all German words). He also loves seeing them wobble and finds that the word expresses this feeling, that it ensures that one is not afraid of these insects. A deeper analysis uncovers more cues like the fact that the first four letters (LIBE-) phonologically form and perhaps unsconsciously evoke the German word for "love" (LIEBE), or that the last four (-ELLE) conjure feminine associations. Importantly, the child already mentions three cues for the beauty of words, a phonological one (the Ls), a perceptual one (the wobbling), and an affective-semantic (no fear), which supports the view that both associations with discrete emotions and embodied cognitions play a role in aesthetic appreciations of words.

The literature on word recognition and reading, however, is astonishingly mute when it comes to the issue why words can be beautiful or ugly (Schrott and Jacobs, 2011; see Bohrn et al., 2013, for an exception). In a pilot study we therefore collected 450 words from databases like the most beautiful and most ugly German words, dictionaries of German adolescent language, and the BAWL06/09 (see Appendix A5 in Supplementary Materials for Method details). Twenty subjects rated them on valence, arousal, familiarity, imageability, and beauty. Stepwise regression analyses showed that of all possible models beauty was best predicted by valence and familiarity (r 2 lin = 0.77; RMSE = 0.47; AICc = 608), while arousal and imageability did not account for a significant part of variance in our sample. Most interestingly, the most beautiful word in our sample was LIBELLE with a mean rating of 6.1/7, followed by MORGENRÖTE (aurora, 5.9), and MITTSOMMERNACHT (midsummernight, 5.8).

That valence predicts beauty ratings fits with the classical notion shared by scholars as different as Kant, Gadamer, or Ramashandran that pleasure is a necessary key component of aesthetic feelings (Jacobs, 2015b). That both pleasure and familiarity contribute to the subjective beauty of verbal material was also shown in a recent fMRI study by Bohrn et al. (2013) on German proverbs, which confirmed a major hypothesis of the neurocognitive poetics model of literary reading (Jacobs, 2011, 2015a,b) claiming that ancient neural systems associated with pleasure or disgust (e.g., ventral striatum; anterior insula/aINS) are involved in aesthetic feelings concerning verbal material, i.e., the Panksepp–Jakobson hypothesis mentioned in the introduction. Concerning the backside of beauty, i.e., ugliness, another recent study combining intracranial and surface EEG also confirmed this hypothesis by showing that as early as 200 ms post-stimulus the aINS significantly responded to disgusting words (Ponz et al., 2013). Such results challenge standard "cold cognitive" models of word recognition and reading, which so far ignore affective features of words and do not include subcortical or limbic structures in the "reading network" (cf. Hofmann and Jacobs, 2014).

To obtain an idea about which semantic features contribute to the beauty or ugliness of words, we ran a hierarchical cluster analysis over the five rated variables yielding an adequate set of 13 clusters (cubic clustering criterion = 2.5). Table A1 (Appendix in Supplementary Materials) gives 10 example words of the extreme clusters 1 and 12. The most beautiful words of Cluster 1 overall described nine phenomena from nature (animals, flowers, rainbow etc.) and four states/objects of wellness (e.g., coziness), all rated high on beauty, valence, and imageability, and low on arousal. In contrast, the overall 24 "ugliest" words from cluster 12 were almost all swear words associated with genitalia.

Naturally, our pilot study on the beauty of words is only a beginning. The above re-analysis of the BAWL06 data as well as the intuitive evidence by the contributors to the book "The most beautiful German word" suggest that associations with dimensional and discrete emotions, as well as embodied features also contribute to beauty ratings, as probably do sublexical factors such as phoneme valence (Aryani et al., 2013, in press), and phonological iconicity (Schmidtke et al., 2014b), or lexical ones like the sound image of words (Ullrich et al., this issue).

# Toward a Neurocomputational Model of the VDT and Affective Word Recognition

The above results from various studies using the BAWL as a tool for revealing aspects of the processing of affective words have shed light on different factors affecting valence or lexical decisions in adults and children with simple or complex words. They can thus motivate and constrain the development of "hot" process models of word recognition, sentence comprehension, or text processing that would include affective and aesthetic processes (Jacobs, 2015a,b). In the following, we would like to discuss some elementary features of such a model starting with computational aspects and ending with three tentative neurocognitive models. The aim of these models is to help answer the question How exactly subjects go about when judging the valence of high-dimensional "symbolic" stimuli like words, as in the VDT. Related questions to be answered by any neurocomputational process model concern the Where (functional, neuroanatomical) and When of the effects observed with affective words (cf. Kissler et al., 2006; Citron, 2012; Hofmann and Jacobs, 2014). While the behavioral data presented in this paper do not directly speak to the latter two questions, the studies using BAWL stimuli from our lab and others, summarized in **Table 1**, do so and thus—together with other literature—provide a basis for the following theoretical considerations.

Perhaps the most basic information provided by the above studies with regard to such modeling projects is that already seven to 12 year old children show a well developed ability to judge the valence and arousal of words indicating that they have access to their affective semantic features. Whether these are the result of contextual learning/evaluative conditioning processes (Fritsch and Kuchinke, 2013), or some other unknown mechanism linking emotional and embodied experiences to words is still an open question, though. It seems safe to assume, however, that the processes determining valence and arousal values are triggered by some visual and/or linguistic features of a word which are perceived before a valence decision takes place. In principle, these features could be of sublexical or lexical origin, or both, and—if we assume automatic phonological recoding of written words and multiple embodied associations (Bühler, 1934; Jacobs et al., 1998; Yap et al., 2012)—they can be visual, phonological, multi-sensory-motor, or some combination. If in analogy to the model by Andrews et al. (2009) the affective meaning of words is best understood as the result of learning the statistical structure underlying a single joint distribution of both experiential and distributional data, then valence and arousal could be seen as semantic supra-features that result from (i) neural activation patterns distributed over the sensory-motor representations of their referents (experiential aspect) and (ii) the linguistic company the words keep, i.e., the size and density of their context, as computationally modeled using co-occurrence statistics (Hofmann and Jacobs, 2014).

The second message of the present empirical results and previous literature for model construction is that the experiential aspect would include both associations with discrete emotions and embodied features, whereas the distributional aspect would include partial or full transfer of the valence and arousal features of the context words to the target word via affective spreading activation (Hofmann and Jacobs, 2014). The distributional aspects would hypothetically contribute less strongly to the arousal value than the experiential ones, if arousal is considered the more direct and body-related variable of the two. A recent computational study by Westbury et al. (2014) indeed suggests that arousal ratings are associated more strongly with autonomic reactivity than valence, predicted by co-occurrence similarity to emotion labels naming automatic emotional reactions (e.g., the words HUMILIATION, LUST, and PANIC). In contrast, the best computational model of valence ratings was very different, and had a clear structure suggesting that they are highly associated with four dimensions: potency (strongweak), happiness, approachability (bad-pleasure), and anger/ rage.

A third message of our results is that models of affective word processing should take into account the mixed affective semantic structure (or ambi- and polyvalence) of many words, whether for simple nouns like KILLER or for more complex NNCs like SEXBOMB. This calls for a change in methodology when studying affective word recognition, using both a bipolar and a bivariate approach to see which one provides a better fit to the data (Briesemeister et al., 2012). A final dispatch of the present data for future models of affective and aesthetic word recognition is that valence and familiarity likely play a greater role than arousal and imageability for the judged beauty of words.

# Computational Models of Affective Word Recognition

Computational models of affective word recognition must specify Where and When in the model the factors arousal, valence, or semantic associations exert their influence and How these factors interact in determining a valence or lexical decision. Given the success of interactive activation models (IAMs) in predicting "cold" word recognition performance in the LDT and in making the underlying processes transparent, i.e., algorithmically concrete (e.g., Grainger and Jacobs, 1996; Hofmann and Jacobs, 2014), they also are a good candidate for simulating "hot" affective word processing in the VDT or LDT. First steps in this direction were made with the models of Siegle et al. (2002) and Kuchinke (2007) whose MROMe (Multiple Read-Out Model emotional) could account for faster lexical decisions in positive words (Kuchinke et al., 2005, 2007) by an evaluation mechanism added to the original MROM (Grainger and Jacobs, 1996). However, it could not predict RT differences between positive and negative words, such as the RT advantage for positive words found in Võ et al. (2006), or Briesemeister et al. (2011a,b). A further development trying to overcome limitations of previous models is the Associative Read-Out Model (AROM; Hofmann et al., 2011) which extends the scope of IAMs by introducing explicit memory and semantic representations necessary for implementing emotional aspects. Using this model, Hofmann and Jacobs (2014) presented evidence suggesting that positive valence effects can be explained by semantic cohesion (i.e., the higher semantic-associative cohesiveness of affective words compared to neutral ones), as suggested by Phelphs et al. (1998). A future neurocomputational model trying to account for the How of affective and aesthetic word recognition should augment the aforementioned models with clear predictions regarding the Where and When, i.e., the neurofunctional/-anatomical locus and time-course of valence (or other affective) effects. However, the time for a unified model does not seem to be ripe yet, since currently this could be done from at least three different theoretical perspectives which are sketched in the following section.

# Bipolar Perspective

Regarding the neurofunctional Where question, from a first perspective viewing valence as a bipolar construct, Amy and aIns activations—or an amygdalar-hippocampal network (Kensinger and Corkin, 2004)—are primarily associated with arousal, whereas valence is most often associated with OFC (Lewis et al., 2007), ventral anterior (as well as posterior and subgenual) cingulate cortex (vACC; Maddock et al., 2003), inferior frontal (Briesemeister et al., 2014b) or a prefrontal cortex-amygdalar network (Kensinger and Corkin, 2004; Schlochtermeier et al., 2013). When looking at the When question, i.e., temporal word recognition ERP effects of valence and arousal (see Citron, 2012; or Kissler et al., 2006, for reviews) there is evidence that arousal comes first (N1; Hofmann et al., 2009; Kissler and Herbert, 2012), followed by valence (early posterior negativity/EPN, late positivity complex/LPC; Recio et al., 2014), with reward being in between (P2; Schacht et al., 2012). But there is also data suggesting that valence comes first (P1; e.g., Bayer et al., 2012a), followed by arousal (EPN; Bayer et al., 2012b). All these results were obtained with BAWL or ANEW-type words, but appear somewhat inconsistent, sometimes even from within the same lab (e.g., Bayer et al., 2010, 2011; Palazova et al., 2011; Rellecke et al., 2011): some data suggest very early "pre-lexical" effects of valence (P1), others late, "post-lexical" effects (LPC)<sup>6</sup> . It also remains unclear to what extent valence and arousal effects on ERPs interact (Citron et al., 2014; Recio et al., 2014), or are confounded with effects of discrete emotional information like joy or disgust (N1; Briesemeister et al., 2014a,b; EPN; Ponz et al., 2013). At the behavioral and peripheral-physiological levels, word valence influences RTs and ratings, presumably triggered by discrete emotion and/or embodied features, and albeit to a much lesser and more uncertain degree also autonomous nervous system variables, like corrugator, electrodermal, and pupillary activity (Võ et al., 2008; Bayer et al., 2011, but see Kuchinke et al., 2007 for null findings in LDT). However, much as for neuroimaging data, results are inconsistent, sometimes showing shorter RTs for positive words, sometimes for negatives, sometimes no advantage compared to neutral words (Hofmann and Jacobs, 2014).

# Bivariate Perspective

A second perspective views valence as a bivariate construct (e.g., Norris et al., 2010; Briesemeister et al., 2012), relating it to notions of reward and behavioral activation (positivity) vs. punishment and behavioral inhibiton (negativity). In this perspective, positivity is neuroanatomically most often associated with the basal ganglia (BG) including the ventral striatum (VS), left frontal pole (lFP), mOFC, vmPFC, pCC, and SMA, whereas negativity is rather associated with insula, right amygdala (rAmy), PAG, rdACC, lOFC, dmPFC, and deep cerebellar areas (Maddock et al., 2003). We are not aware of studies answering the When question of positivity vs. negativity activation, but Norris et al. (2010) summarize behavioral, peripheral-physiological, and ERP research supporting the negativity bias and positivity offset hypotheses of this perspective and thus providing indirect evidence for it.

# Interactive Perspective

Finally, a third theoretical perspective merits discussion, because some results suggest that valence and arousal affect processing of emotional stimuli in an interactive way (Herbert and Kissler, 2010; Citron et al., 2014). According to this perspective, stimuli with negative valence (e.g., bitter taste) or high arousal (e.g., a loud noise) elicit a withdrawal tendency and corresponding mental set, because they represent a possible threat. In contrast, stimuli with positive valence (e.g., sweets) or with low arousal (e.g., a newsletter) elicit an approach tendency because they are perceived as safe (Briesemeister et al., 2013). These two tendencies are hypothesized to be initiated independently at a pre-attentive level and subsequently integrated in order to evaluate the stimulus for further action. This perspective predicts that positive low-arousal and negative high-arousal stimuli (are easier to process, because they elicit

<sup>6</sup>The notions pre- and post-lexical can easily be interpreted in the framework of serial stage models of word recognition, but require a more differentiated definition when using nonlinear dynamic processing models from e.g., the IAM family. This is because in IAM-type models, the lexical feedback loop changes activation at the sub- or prelexical levels (e.g., letters, syllables). Thus, in this model context "prelexical" would mean a pure bottom-up effect, not affected by any lexical feedback. Since lexical feedback typically requires four to seven processing cycles to show sublexical effects, any "prelexical" effect predicted from such models must occur extremely early.

FIGURE 4 | (A–C) Diagrams showing hypothetical relations between affective word variables and their effects at the behavioral, brain-electrical, and neurofunctional levels. Continuous-line arrows assume strong relations, interrupted and dotted lines weaker, more questioneable ones. Abbreviations: (A) EDA, Electrodermal activity; Amy, amygdala; aIns, anterior insula; EPN, early posterior negativity; LPC, late posterior complex; OFC, orbitofrontal cortex; vACC; ventral anterior cingulate cortex. (B) BG, basal ganglia; VS, ventral striatum; lFP, left frontal pole; mOFC, medial orbitofrontal cortex; (Continued)

#### FIGURE 4 | Continued

vmPFC, ventromedial prefrontal cortex; pCC, posterior cingulate cortex; SMA, supplementary motor area; rAMy, right amygdala; PAG, periaqueductal gray; rdACC, right dorsal anterior cingulate cortex; lOFC, left orbitofrontal cortex; dmPFC, dorsomedial prefrontal cortex; Cereb, cerebellum. (C) PosLoAro, positive valence, low arousal; NegHiAro, negative valence, high arousal; PosHiAro, positive valence, high arousal; NegLoAro, negative valence, low arousal; rIns, right Insula; lphG, left parahippocampal gyrus.

congruent tendencies (approach and withdrawal, respectively), whereas positive high-arousal and negative low-arousal stimuli are more difficult to process because they elicit conflicting approach-withdrawal tendencies. At the neurofunctional level, Citron et al. (2014) recently reported evidence for this perspective showing greater neural activation within right insular cortex in response to stimuli evoking conflicting approach-withdrawal tendencies (i.e., positive high-arousal and negative low-arousal words; PosHi; NegLo) compared to stimuli evoking congruent approach vs. withdrawal tendencies (i.e., positive low-arousal and negative high-arousal words; PosLo; NegHi). Further supporting evidence comes from ERP studies in favor of the approach-withdrawal assumption and the idea of the emotional and motivational embodiment of words (Herbert and Kissler, 2010; Herbert et al., 2012, 2013).

These considerations are sketched in the hypothetical diagrams of **Figure 4**. **Figure 4A** sketches the bipolar model of valence. **Figure 4B** sketches the bivariate interpretation of valence, arousal being left out, because it plays no key role in this perspective, which also makes no specific hypotheses with regard to differential effects of bivariate valence on RTs or ratings<sup>7</sup> (Norris et al., 2010). Finally, **Figure 4C** sketches the interactive view. Note that all models incorporate the view that valence and arousal are affective super-features derived from experiential and/or distributional word properties including discrete and embodied features processed during an earlier phase (Briesemeister et al., 2014a,b).

# Conclusion

The present paper offers an overview about the lessons we learned from previous versions of the BAWL, and discusses some future perspectives characterizing the affective connotation of words on embodied, developmental, discrete-emotion, and aesthetic dimensions of meaning. This enriched perspective on word processing is further complemented by analyses based on the co-occurrence of words that either reduce the dimensions of meaning or explain positivity by semantic processes. These approaches provide a first step toward neuro-computationally concrete models of affective word, sentence, and text processing, which we see as the major challenge for the future (Jacobs, 2015a,b).

<sup>7</sup>With the exception of mood ratings.

# Acknowledgments

This research was supported by the DFG-funded Cluster of Excellence "Languages of Emotion," Freie Universität Berlin. We thank Teresa Sylvester for the idea to apply the BAWL to children and her help in stimulus preparation, and Anne Maria Kiesewetter, Alexander Darrall, Tobias Bernklau, Valentina Elias, Marvin Franke und Jan Pütz for their help in data acquisition. We

# References


also thank Sascha Tamm and Michael Kuhlmann for technical support.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.00714/abstract

for an orthography-phonology-conflict. Neurosci. Lett. 455, 124–128. doi: 10.1016/j.neulet.2009.03.010


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Jacobs, Võ, Briesemeister, Conrad, Hofmann, Kuchinke, Lüdtke and Braun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Affective Norms for 4900 Polish Words Reload (ANPW\_R): Assessments for Valence, Arousal, Dominance, Origin, Significance, Concreteness, Imageability and, Age of Acquisition

Kamil K. Imbir\*

Faculty of Psychology, University of Warsaw, Warsaw, Poland

In studies that combine understanding of emotions and language, there is growing demand for good-quality experimental materials. To meet this expectation, a large number of 4905 Polish words was assessed by 400 participants in order to provide a well-established research method for everyone interested in emotional word processing. The Affective Norms for Polish Words Reloaded (ANPW\_R) is designed as an extension to the previously introduced the ANPW dataset and provides assessments for eight different affective and psycholinguistic measures of Valence, Arousal, Dominance, Origin, Significance, Concreteness, Imageability, and subjective Age of Acquisition. The ANPW\_R is now the largest available dataset of affective words for Polish, including affective scores that have not been measured in any other dataset (concreteness and age of acquisition scales). Additionally, the ANPW\_R allows for testing hypotheses concerning dual-mind models of emotion and activation (origin and subjective significance scales). Participants in the current study assessed all 4905 words in the list within 1 week, at their own pace in home sessions, using eight different Self-assessment Manikin (SAM) scales. Each measured dimension was evaluated by 25 women and 25 men. The ANPW\_R norms appeared to be reliable in split-half estimation and congruent with previous normative studies in Polish. The quadratic relation between valence and arousal was found to be in line with previous findings. In addition, nine other relations appeared to be better described by quadratic instead of linear function. The ANPW\_R provides well-established research materials for use in psycholinguistic and affective studies in Polish-speaking samples.

Keywords: affective norms, duality of emotion, duality of activation, polish language, psycholinguistic indexes

# INTRODUCTION

# Affective Norms for Verbal Research Stimuli

The affective nature of stimuli is an important issue when the consequences of emotions are the point of interest (Osgood et al., 1957; Russell, 2003). This applies to language and emotion relations. Therefore, with the use of Lang (1980) Self-assessment Manikin (SAM) scale, the Affective Norms for over 1000 English Words (ANEW: Bradley and Lang, 1999) dataset was introduced

Edited by:

Cornelia Herbert, University of Ulm, Germany

#### Reviewed by:

José Antonio Hinojosa, Universidad Complutense of Madrid, Spain Monika Riegel, Nencki Instute of Experimental Biology, Poland

> \*Correspondence: Kamil K. Imbir kamil.imbir@gmail.com

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 30 October 2015 Accepted: 01 July 2016 Published: 18 July 2016

#### Citation:

Imbir KK (2016) Affective Norms for 4900 Polish Words Reload (ANPW\_R): Assessments for Valence, Arousal, Dominance, Origin, Significance, Concreteness, Imageability and, Age of Acquisition. Front. Psychol. 7:1081. doi: 10.3389/fpsyg.2016.01081

**62**

and stimulated the development of analogical datasets in numerous languages and cultures (for a review, see Table 1 in Riegel et al., 2015). The list of affective norms datasets is still growing because of the importance of such stimuli for all researchers interested in the interplay between language and emotion. Such datasets allow researchers to manipulate certain dimensions (e.g., valence) and to control for the potential effects of other dimensions (e.g., arousal, dominance, or concreteness). Different affective and psycholinguistic dimensions were demonstrated to shape the processing of stimuli in the mind (Citron et al., 2016). Taking this into account, all classical and some additional measures were included in the Affective Norms for Polish Words Reload (ANPW\_R) dataset. The number of words assessed in the ANPW\_R was increased in order to provide the biggest datasets among other word norms in the Polish language. In the next two sections, the importance of the affective and psycholinguistic dimensions included in the ANPW\_R shown in previous research is described in detail.

# Affective Qualities of Stimuli: Valence, Dominance, Origin, Arousal, and Subjective Significance

Valence is the most intuitive property of an affective state (Kagan, 2007) and describes the pleasantness vs. the unpleasantness of feelings toward an object (Lang, 1980; Russell, 2003). This determines many of the processes in the cognitive domain ranging from memory modulation during stress (Smeets et al., 2006) to associations with vertical positions (Meier and Robinson, 2004), found to be up for positively valenced words but down in the case of negatively valenced stimuli. In addition, many electroencephalography (EEG) studies have shown that valence modulates cortical correlates of word processing (e.g., Citron, 2012; Kaltwasser et al., 2013; Imbir et al., 2015a). Norms collected for valence dimensions are the most reliable in terms of stability when assessed in test–retest and split-half estimation methods (c.f. Soares et al., 2012; Montefinese et al., 2014; Imbir, 2015a; Riegel et al., 2015).

Much less experimental work has been performed with the dominance dimension (c.f. Fontaine et al., 2007; Moors et al., 2013; Imbir, 2015a), which represents a measure of control toward perceived feelings evoked by stimuli, and varies from being under the influence of affect to being in charge of controlling ourselves. Dominance has also been operationalized in different ways in several studies. For example, Moors et al. (2013) used power or control (Fontaine et al., 2007) as an example of the dominance dimension (ranging from weak/submissive to strong/dominant). Dominance dimension, as well as valence and arousal, was found to reflect brain activity connected with current mood in a more coherent way than the traditional approach in mood description based on discrete emotional states (e.g., Wyczesany and Ligeza, 2015).

Last, the origin dimension, recently introduced by Jarymowicz and Imbir (2015), is the purely affective quality of stimuli. This represents the duality-of-mind-based distinction between two mechanisms of affective reaction formation. The SAM scale (Imbir, 2015a; c.f. Figure 1) consists of a bimodal scale representing the perceived origination of feelings from the heart or from the mind. The heart metaphor describes states that are automated and require fewer cognitive operations. Automatic emotional states appear to be spontaneous, quick and subjectively certain. In the formation of these states, a biological value criterion of evaluation (Damasio, 2010) is very important. The mind metaphor is defined as feelings that are deliberative, requiring a lot of cognitive operation, thus not spontaneous, but resulting from careful consideration. Such consideration is subjectively not free of doubt (due to the underlying multidimensional appraisals) and based on evaluative standards (Reykowski, 1989), representing verbalized criteria of evaluation. The original concept derives from the duality-ofmind theories perspective (for a review, see Gawronski and Creighton, 2013) and describes engagement of the mental system in the formation of the affective state (Automatic or Reflective Evaluating System: c.f. Jarymowicz and Imbir, 2015). Although origin is newly proposed, some experimental results show that it is worth investigating its consequences for cognition. For example, origin was found to modulate cognitive control in the Emotional Stroop Task and the Antisaccade Task (Imbir and Jarymowicz, 2013), making it hard to maintain control after automatic-originated (both negatively and positively valenced) words or sentences presentation. Other results concerning the scope of attention suggest that reflective-originated stimuli widen while automatic-originated stimuli narrow the scope in the visual field measured with the detection of stimuli that appear closer to or more distant from a center of visual field (Imbir, 2013). In addition, electrophysiological data (Imbir et al., 2015a) indicate that origin is useful in describing the mechanisms of emotional word processing and producing differences in amplitudes of evoked potentials that are independent from previously discovered effects of valence, arousal, frequency of use in language and concreteness. The SAM scale, developed to measure origin, appears to be a stable and reliable method of assessing this dimension (Imbir, 2015a).

Arousal is defined as an energetic reaction to stimuli varying from calm (sleep, no activation) to completely excited (extreme activation). In other words, arousal describes the energetic side of an affective state at a particular time and is sometimes referred to as the intensity or energy level. This energy expresses the degree of excitement or activation an individual feels toward a given stimulus (Lang, 1980); thus, the arousal level can be treated as a property of the stimulus that influences the current affective state (Russell, 2003). Arousal was found to modulate flanker competition in the flanker task (Freitas et al., 2007; Kuhbandner and Zehetleitner, 2011; Imbir, 2015b), cognitive control in the Emotional Stroop Task (e.g., Nigg, 2000; McKenna and Sharma, 2004) and electrophysiological correlates of word processing (e.g., Hofmann et al., 2009). Arousal best describes activation mechanisms for simple processes that do not require much cognition (Epstein, 2003) and was found to disturb high-order systematic processing (Kahneman, 2003, 2011) and to switch the balance between experimental and rational minds more toward the experimental one (Epstein, 2003).

Taking into account the duality-of-mind perspective, the question arises: what is the activation mechanism for rational and systematic effortful processing? This ought to be based on conscious attitudes toward stimulation concerning the significance of a situation in the context of subjective goals and expectations (Imbir, 2016a). From that point of view, subjective significance (Imbir, 2015a) was proposed and operationalized in the SAM scale analogous to arousal SAM (Lang, 1980). Some data suggest that subjective significance modulates the way in which arousal impairs cognitive control in the Emotional Stroop Test. Reaction latencies for highly arousing stimuli were shorter for low and high subjective significant words in comparison to words of medium significance (Imbir, 2016b). Subjective significance may be compared to impact operationalized for picture stimuli (Ewbank et al., 2009). Impact is defined as a visual media– related term describing that a certain stimulus has the potency to influence people, catch their attention and be remembered. Both concepts refer to the ability of stimulation to cause an intense reaction. Such intensity is analogous to arousal but engages more conscious- and more subjective-based processes, and thus should be considered in the dual-mind perspective as the reflective aspect of the intensity of the reaction to stimuli. Pictures of high-impact dimension values were found to be responsible for increased amygdale activation, compared to neutral, and lowimpact stimuli (Ewbank et al., 2009). Another concept close to subjective significance is salience (e.g., Kahnt and Tobler, 2013), which describes the importance of outcomes. Considering decision making and risk, gains and losses associated with options given are different in valence but similar in salience. This means that people perceived some outcomes as important in comparison to neutral outcomes that are perceived as nonsalient. Salience itself is not a quality of stimuli but the relation between stimuli in a task that requires decision making. Salience was found to modulate the neural response in decision-making procedures (c.f. Kahnt and Tobler, 2013). Since the concept of rational mind activation is a rather new one in psychology (see Imbir, 2016b), ANPW\_R provides a unique measure of this property of stimuli.

# Psycholinguistic Qualities: Concreteness, Imageability, and Subjective Age of Acquisition

Some qualities of words provided in the ANPW\_R are not affective, but they may have a potential impact on word processing (Moors et al., 2013). The decision on their inclusion was based on the potential role for an alternative explanation for affective dimension outcomes in order to provide a comprehensive dataset for researchers. The concreteness dimension describes the type of stimuli in the case of words related to concrete vs. abstract objects. In other words, concreteness refers to the ability to see, hear, and touch something (Bird et al., 2001). Concreteness was measured for verbal stimuli several times (e.g., Kanske and Kotz, 2010; Ferré et al., 2012; Montefinese et al., 2014; Hinojosa et al., 2016) and was found to modulate the event-related potential (ERP) correlates of emotional word processing (c.f. Kanske and Kotz, 2007; Barber et al., 2013; Palazova et al., 2013). What is more, concreteness interplays with valence in the way that abstract words were found to be perceived in a more valenced way than concrete words (c.f. Vigliocco et al., 2014).

Imageability represents the degree of how easy it is to imagine the objects or states represented by the stimulus (Bird et al., 2001). From a theoretical point of view, imageability could be similar to concreteness, but imageability involves not only the cognitive aspect of stimuli concreteness perception but also the active imagination connected with mental representation creation and perhaps the number of interactions with word designates. Imageability has been measured for verbal stimuli several times (e.g., Bird et al., 2001; Cortese and Fugett, 2004; Võ et al., 2006, 2009; Janschewitz, 2008; Citron et al., 2014; Monnier and Syssau, 2014; Schmidtke et al., 2014; Riegel et al., 2015) and was found to be involved in word recognition processes (e.g., Davelaar and Besner, 1988) and memory (e.g., Sadoski and Paivio, 2001).

Subjective age of acquisition (AoA), which representsthe subjectively perceived difficulty of words, was found to be correlated with word frequency (Bird et al., 2001). Highfrequency words tend to be learned early in life. Subjective age of acquisition has been measured in some affective norms studies ( e.g., Moors et al., 2013; Warriner et al., 2013; Citron et al., 2014) and was found to be the most important factor determining word recognition response times, after frequency, length, similarity to other words and words onset (Kuperman et al., 2012). In addition, in a Dutch-speaking sample, frequency and AoA left no variance for imageability in visual word recognition (Brysbaert et al., 2000).

# Polish Affective Norms for Datasets of Words

Until now, only two datasets contain affective norms for Polish verbal stimuli (Imbir, 2015a; Riegel et al., 2015). The first dataset, the ANPW (Imbir, 2015a), provides the norms for six dimensions (valence, arousal, dominance, origin, subjective significance, and source) for 1586 Polish words and compound expressions collected from a large group of participants (more than 1600) with the use of a standard paper-and-pencil procedure. The ANPW list was based on ANEW (Bradley and Lang, 1999) translated and extended by additional words considered good representations of extreme origin and subjective significance values. The second dataset in the Polish language is the Nencki Affective Word List (NAWL; Riegel et al., 2015), a dataset that provides assessments for valence, arousal and imageability for 2902 words assessed by 266 Polish participants in a computerized procedure. The NAWL is a Polish adaptation of the Berlin Affective Word List-Reloaded (BAWL-R; Võ et al., 2009). As a supplement to the NAWL, assessments of compliance with basic emotions (happiness, anger, sadness, fear, and disgust) were developed (Wierzba et al., 2015).

# Aim and Hypothesis

The motivation for introducing the ANPW\_R was to provide research materials for scientists interested in the interplay between language and emotions (e.g., Citron, 2012; Kaltwasser et al., 2013; Imbir et al., 2015a). The areas of interest for affective norms for words are not limited to emotional scientists but also extend to researchers interested in psycholinguistics, including more complex processes such as morphosyntactic processing (Martín-Loeches et al., 2012; Hinojosa et al., 2014; Díaz-Lago et al., 2015) or phonological processes during language production (Hinojosa et al., 2010; White et al., 2016).The main aim of the current work was to extend a recently introduced the ANPW (Imbir, 2015a) dataset to a greater number of words, as well as to assess the properties of stimuli using new scales such as concreteness and subjective age of acquisition. These two dimensions have never been assessed in Polish language normative studies for words. An additional aim was to check whether ratings collected with a low number of participants assessing a large number of stimuli are as reliable as the traditional paper-and-pencil procedure used with a large number of participants assessing a small number of stimuli.

The ANPW\_R dataset was expected to be reliable (in terms of split-half estimates) and stable (in terms of correlation with the ANPW (Imbir, 2015a), a previously conducted normative study for a Polish language sample, for valence, arousal, dominance, origin, and significance, as well as correlations with the NAWL (Riegel et al., 2015) for valence, arousal, and imageability. In addition, a quadratic relation between valence and arousal (e.g., Ferré et al., 2012; Soares et al., 2012; Monnier and Syssau, 2014; Riegel et al., 2015), as well as dominance and arousal (Montefinese et al., 2014), was expected. Furthermore, in light of the literature (e.g., Ferré et al., 2012; Monnier and Syssau, 2014; Montefinese et al., 2014; Riegel et al., 2015), gender differences in female and male assessments of words were expected for affective and psycholinguistic variables, especially more polarized assessments for women for valenced stimuli.

# METHODS

# Participants

The study involved 400 participants (200 females) aged from 18 to 32 (M = 21.89, SD = 1.91), students from different Warsaw universities and colleges of natural sciences (32%, N = 128), social sciences (excluding psychology students) and humanities (36%, N = 144) and technical sciences (32%, N = 128). The proportion of sexes across faculty types was balanced (50% female in each case) in order to avoid any sex bias over affective evaluations. Participation was voluntary in nature and was rewarded by a small prepaid gift card (about €20 each). Participants were recruited via Internet faculty sites and via traditional posters placed indifferent departments. Participants provided informed consent to participate; written consent was not collected as the participants were assured anonymity. Participants provided informed consent via the Internet to the lab member who recruited the participants and was documented in a research diary. This procedure was suggested by the bioethical committee that approved the research. No personal data were collected from the participants. The design, the experimental conditions and the consent procedure for this study were approved by the bioethical committee of the Maria Grzegorzewska University. Contact with participants was maintained via email. After the assessments were completed, a single laboratory meeting took place.

# Materials and Design

# Self-Assessment Manikin (SAM) Scales

To measure five affective as well as three psycholinguistic variables, the SAM scales were applied. In the case of the classical affective dimensions (valence, arousal, and dominance), the original Lang (1980) SAMs were used. To measure origin and subjective significance, both describing variables from the emotional duality model (Jarymowicz and Imbir, 2015) scales introduced in the ANPW (Imbir, 2015a) were used. To measure psycholinguistic variables (concreteness, imageability and subjective age of acquisition) three new SAM scales were created in order to assure formal similarities with affective ratings. **Figure 1** presents SAMs used in the current study.

Because of the fact that some scales were easier to understand for naïve participants (e.g., valence, imageability) and some others could be more difficult (e.g., dominance, origin, significance), additional descriptions of scales were provided (c.f. Imbir, 2015a; submitted). Those descriptions explained in detail the meaning of the scales and provided examples of both ends of the scales. The words presented as examples were chosen in a manner that presented different aspects of each scale end. For example, in the case of origin, both automatic and reflective origins were exemplified by negative and positive instances. **Table 1** presents descriptions of each scale used in the current study. Those for valence, arousal, dominance, origin, and subjective significance scales were identical as those used in the ANPW dataset creation (c.f. Imbir, 2015a).

# List of 4905 Polish Words

The list of stimuli used in the current experiment was based on two main sources. First of all, 1586 words were taken from the ANPW (Imbir, 2015a). The aim of this decision was to estimate similarities in using different methods of obtaining affective ratings (classical paper and pencil used on a large number of participants, and the new method, based on a large number of assessments done by a much smaller number of participants (c.f. Moors et al.'s, 2013) and to collect new assessments for words of a psycholinguistic nature not included in the ANPW dimensions. The remainder of the words was taken from Moors et al. (2013) Dutch Affective Words Norms list of 4299 items translated into Polish. The 4299 words were presented in their original list in two different languages (Dutch and English translations), thus the computerized translation of the Google Translate engine was applied in the first stage. The algorithm was simple; the Dutch and English lists were translated separately into Polish and then compared in line with translation procedure. In 3270 cases, the Polish translation was the same in both lists, thus this was accepted as valid. The remaining 1029 words were carefully inspected by a bilingual person who specializes in the English language. Unfortunately, there was no person bilingual in Dutch and Polish available at the time of translation, thus at this stage, the Polish Google machine translations from Dutch and English, the English version of words and the Dutch part of speech (data provided in Moors et al., 2013) were used as the basis for further decisions. It appeared that in 678 cases, both computer translations from Dutch and English differed in Polish flexion (nouns and verbs have a lot of versions), so, translations


were corrected to their base form and accepted. The remaining 351 cases were translated by an English language philologist who specializes in translations. In the final list of 4299 Polish words, 1057 duplicates were found (321 among the translations and 736 with comparison to the ANPW), thus only 3242 new words were added to the previously collected 1586 words. Other Polish words were included covering: some neutral terms (nouns describing actions) from earlier studies conducted by this author (N = 28), Polish vulgarisms (N = 5) and names of European or world states and nations (N = 44). All this comes to 4905 words included for assessment in the ANPW\_R dataset. The whole list consists of 2907 nouns (59%), 1126 verbs (23%), 768 adjectives (15%), 44 adverbs (.8%), and 60 others (including two compound words expressions).

# Questionnaires Prepared

To make the assessments more accessible to participants, a computerized Excel spreadsheet questionnaire, similar to those used by Moors et al. (2013), was prepared. The whole questionnaire consisted of four different spreadsheets. The first explained the aim of the study, the importance of the results obtained and what was involved in completing the questionnaire. At this stage, the SAM scale was described in terms of its idea of emotional states presented in a non-word, pictorial nature that helps in intuitive judgments of feelings and current states. Participants were also informed that there would be a description of the scale provided in order to clarify the meaning of both ends of the scale. The required type of response to the words was described as placing numbers (from 1 to 9 in the case of seven different measures, and from 3 to 18 in the case of subjective age of acquisition) next to the assessed word. It was highlighted that this was a subjectively based validation, thus there was no question of responses being judged as "bad" or "good" answers. In addition, instruction was provided to encourage quick validation and to split the whole work into 5–7 short sessions, one a day each. Participants were also asked to leave empty spaces and not to assess words they do not know themselves. The second spreadsheet consisted of sociodemographic data (sex, age, number of years at university, department type). The third spreadsheet presented the training session. The SAM scale and its description were placed at the top of the page. Below this, three example words were placed (not included in the 4905 dataset) and the task was to evaluate them using the SAM scale. The last spreadsheet presented a SAM scale with its description and below a full list of the 4905 stimuli presented in a unique random order that was different for each participant. The SAM scale was visible at all times at the top of the spreadsheet during the assessment process, in order to provide a continuous reference point.

# Procedure

The task for the participants was to evaluate a list of 4920 words (15 were doubled in order to provide additional estimation of reliability (c.f. Imbir, 2015a) using a single SAM scale described in detail at the beginning of procedure. At the end of a week, the researcher sent some recruited volunteers the Excel spreadsheets to collect the assessments. Participants were instructed to perform the procedure at their own pace in short sessions over the whole week. They were asked to perform their assessments in a stable environment without any distractions. Confirmation of having fulfilled these procedure requirements was mandatory after sending the results back. In the following week, participants were invited to the laboratory to collect their reward. At this stage, all participants' questions were answered and the procedure was explained in detail. Interviews were also focused on checking that the procedure requirements had been fulfilled in order to establish whether any of the requirements had not been met. About 10 participants were excluded because they had not fulfilled the procedure requirements and their assessments were replaced by those of other, additional participants.

# RESULTS

# Data Treatment and Analytic Strategy

The first step was to enter data into the database. Only questionnaires from participants who had fulfilled the criteria of responding within 1 week and who did not report any abnormalities during their work were included. Then descriptive statistics [number of assessments (N), Mean (M), Standard Deviation (SD), Range (Min and Max values)] were calculated for each word, separately for each of the 8 scales. All analyses were carried out using IBM SPSS 22 statistical software. The Supplemental Material (Appendix 1) includes all values for valence, arousal, dominance, origin, significance, concreteness, imageability, and subjective age of acquisition assessments. Each word was rated by 400 participants. Each scale was assessed by 50 participants (25 females). Participants were instructed to leave words without an assessment in the case of words not familiar to them. The number of participants indicating that they did not know a certain word varied from 0 to 244 (M = 2.29, SD = 13.52). For that reason some ratings are calculated based on a lower number of assessments.

Data were analyzed in order to achieve: (1) the verification of the ANPW\_R dataset reliability, (2) understanding of the impact on assessments of other factors, like participants' sex as well as, (3) verification of the relations between measured dimensions. First of all, the properties of measures were assessed with descriptive statistics. Secondly, to validate assessments collected in the current study, reliability, and stability of assessments was estimated with the use of four different approaches based on the current dataset (split-half correlations and doubled words in list assessments congruency) and earlier studies (congruencies in ratings for certain words between the ANPW\_R and the ANPW or the NAWL). Also, sex differences were assessed with the use of r-Pearson correlations and ANOVA analyses in order to check if the perception of words in affective as well as psycholinguistic variables differs across genders. Finally, the relations between measures were analyzed with use of linear (r-Pearson correlation) as well as curvilinear (Regression analyses) models.

# Descriptive Statistics

**Table 2** presents descriptive statistics for the assessments of all affective and psycholinguistic variables used and the lexical dimensions such as number of letters in word (length) and frequency estimations based on two sources: Subtlex\_pl, dataset created on the basis of movies and television programs subtitles (Mandera et al., 2014) and Kazojc´ (2011) dataset of huge literature, electronic texts and web pages collections.

**Figure 2** shows the distributions of eight measures. The distributions for valence and concreteness are bimodal, while imageability is flat and biased toward high scale values. Dominance meets the best approximate normal distribution centered over the middle of the scale. In the case of arousal and subjective significance the distribution is approximately normal with a negative bias (toward low scale values), whilst in the case of origin, the approximately normal distribution is positively biased (toward high scale values).

**Figure 3** shows homogeneity of ratings in terms of means plotted against their standard deviations for each measure applied in the ANPW\_R. Additionally, regression lines with R 2 and p-values for each case are provided. Ratings' distribution in M × SD space gives us information concerning to what extent assessments were congruent. It is especially important for neutral / moderate (around middle of the scale) assessments that may be the result of (a) neutral or moderate properties of the stimulus when SD is low or (b) incongruent assessments, when some participants rate the stimulus as low whereas other participants rate it as high in certain measures. For example, in the valence dimension among neutral stimuli some have low SD whereas others have high SD-values. In most of the cases (apart from dominance) the relationships plotted were better explained by a quadratic unction rather than a linear (in terms of bigger R 2 and significant R 2 change). The most frequent relationship observed is reversed "U" shaped relation, suggesting


TABLE 2 | Summary of variables included in the word list with means (M), standard deviations (SD) and ranges for all participants.

that neutral / moderate stimuli are in fact more incongruent in assessments. This is not surprising, taking into account that a word can obtain an extreme mean value only when most of the assessments are as extreme as mean itself is, thus extreme stimuli are more congruent than moderate ones. Surprisingly, in the case of valence, the relation is "U" shaped, not reverse "U" shaped. There is a group of neutral stimuli that were very low in SDvalues (c.f. **Figure 3**). A similar pattern was found in the case of an Italian adaptation of ANEW (Montefinese et al., 2014).

# Reliability of Measurement

To measure reliability two types of estimations were applied. The first was the split-half method based on splitting the entire number of into two separate subsets. The split was based on the participants' numbering (odd or even) with respect to gender balance for both subsamples. The second was introduced in the ANPW (Imbir, 2015a) dataset and was based on including into the assessed words list some randomly chosen doubled stimuli. In the current study 15 words were repeated and placed in random positions in the 4905 words list. Participants were not aware that some words were repeated and afterwards nobody indicated that fact. This was probably because participants assessed words on different days during the week.

With respect to the split-half estimate, the Pearson correlations were applied. Due to splitting the whole dataset into two halves the Spearman–Brown formula was applied to adjust correlations due to the lower—in comparison to the whole research probe collected—number of participants in both subsets. In all cases the correlations were high and significant, varying from 0.828 (0.906 with S-B formula adjustment) for origin to 0.979 (0.986) for valence. **Table 3** presents the pattern of correlation for each of the eight measures.

To measure whether 15 repeated random words were assessed in the same way, the ANOVA analysis was applied. Repetition (first vs. second) and paired words' number (1–15) were treated as within-subject factors. Eight different (one for each dimension measured) ANOVAs were conducted. Only the main effects of repetition interesting from a theoretical point of view will be presented here. In all cases word pairs differed significantly from one another, but this is an obvious effect, thus would be omitted. In all cases ANOVA analysis showed no significant differences between the first and second assessment of 15 repeated words for valence: F(1, 49) = 1.36, p = 0.25, η <sup>2</sup> = 0.027; arousal: F(1, 48) = 1.45, p = 0.23, η <sup>2</sup> = 0.029; dominance: F(1, 49) = 0.06, p = 0.81, η <sup>2</sup> = 0.001; origin: F(1, 49) = 0.48, p = 0.5, η <sup>2</sup> = 0.01; significance: F(1, 48) = 0 2.1, p = 0.15, η <sup>2</sup> = 0.042; concreteness: F(1, 49) = 1.41, p = 0.24, η <sup>2</sup> = 0.029; imageability: F(1, 49) = 0.27, p = 0.6, η <sup>2</sup> = 0.006; or the subjective age of acquisition: F(1, 49) = 0.44, p = 0.5, η <sup>2</sup> = 0.009.

# Stability of Measurement

To measure the stability of affective ratings, the Pearson correlations were applied for words from the ANPW (N = 1585) repeated in the ANPW\_R for five affective variables measured in both studies: valence, arousal, dominance, origin, and significance. Both studies used different methodologies of assessment collection—paper-and-pencil was run over a huge sample in the ANPW case and computerized method was used over a much smaller sample in the ANPW\_R case. It appears that both methods generated very similar results. All correlations were significant and assessments correlate from 0.738 in the case of the subjective significance scale to 0.927 in the case of the valence scale.

Correlation analyses with another existing Polish Word norms dataset of 2902 words (NAWL: Riegel et al., 2015) including valence, arousal and imageability assessments were performed. It appears that 1274 words from the NAWL were included in the ANPW\_R, so for this subset stability of ratings was checked. Correlations were high and varied from 0.947 for valence, 0.732 for arousal to 0.827 for imageability. **Table 3** presents obtained results for both existing datasets and the ANPW\_R dataset.

# Sex Differences

In order to compare perception of affective words included in the ANPW\_R across both sexes two methods were applied. The first was a Pearson correlation of ratings given by females and males. The affective ratings were calculated separately for all women and men participating in the final data. All correlations were significant (p < 0.001) and varied from 0.749 for significance to

#### TABLE 3 | Reliability estimations for each variable.


<sup>a</sup>Split-half correlations (r-Pearson's) estimation for all words and Spearman–Brown adjustments;

<sup>b</sup>Correlations (r-Pearson's) with 1586 ANPW dataset;

<sup>c</sup>Correlations (r-Pearson's) with 1274 words from NAWL dataset (Riegel et al., 2015);

<sup>d</sup>Correlations (r-Pearson's) between female and male assessments.

#### TABLE 4 | Mean assessments for female and male participants in case of each analyzed dimension.


0.964 in the case of valence. The last column in **Table 3** presents results for each dimension.

The second approach used to measure gender differences was to search for differences in average ratings for all of the eight measured dimensions. To do so, eight different analyses of variance (one for each dimension) were applied. Sex was treated as a within-words factor and valence was treated as a betweenwords factor. Valence was divided into three categories based on sentence average scores—negative: 1–4; neutral: 4–6 and positive: 6–9 (c.f. Ferré et al., 2012; Monnier and Syssau, 2014)—and used in each analyses as the easiest and most intuitive dimension to search for more subtle effects. Such an approach was used earlier to assess gender differences (e.g., Monnier and Syssau, 2014). **Table 4** presents the mean assessments for female and male participants in case of each analyzed dimensions. **Table 5** presents results of ANOVA analyses. Valence effects were checked



with post-hoc Scheffé test. Significant (p < 0.05) differences between valence categories are shown in separate column.

# Relations between Measures

For all affective norms studies it is especially important to search for patterns in relations between assessed measures. Those relations, if repeatable across cultures and languages, can tell us more about the theoretical status of the affective meaning of stimuli. To check for a correlation pattern in the case of the ANPW\_R dataset, r-Pearson correlation was applied in the case of affective, psycholinguistic and linguistic variables. The correlation pattern is presented in **Table 6**. To check the nature of inspected relations, additional regression analyses were conducted. In the **Table 6**, cases of higher value of variance explained by quadratic function are represented by lightershaded cells.

Here only significant (p < 0.001) and large (r > 0.35, sharing more than 10% of common variance) correlations are discussed. It appears that valence correlates negatively with arousal (r = −0.464), which suggests that negative stimuli are more arousing than positive ones. It is quite a common finding that the valence and arousal relationship is quadratic in nature and forms a "U" shaped curve. For further investigation of this correlation the regression analysis with Valence as the independent factor and Arousal as the dependent factor was carried out to test both the quadratic and the linear models of the valence and arousal relationship. This analysis showed that the Valence and Arousal relationship in the ANPW\_R is better explained by the quadratic function y = 0.227x <sup>2</sup> − 2.493x + 10.503: R <sup>2</sup> = 0.48, F(2, 4902) = 0 2253.4, p = 0.001, rather than the linear relationship: R 2 = 0.22, F(1, 4903) = 1346.22, p = 0.001, which accounted for less variance. Also R 2 change due to inclusion of the quadratic function was highly significant: F(1, 4902) = 2478.4, p = 0.001. **Figure 4** presents the dimensional distributions of ratings as well as best fitting to the data function.

Taking into account affective variables, dominance is highly positively correlated with valence (r = 0.693), which means that positive words are perceived as evoking controllable


experiences, while negative as uncontrollable ones. Arousal is negatively correlated with origin (r = −0.46), which means that automatic-originated stimuli are more arousing than reflectiveoriginated ones. Arousal is positively correlated with significance (r = 0.378), which suggests that more arousing stimuli are also perceived as more crucial and subjectively significant. Taking into account relations between affective or arousal and psycholinguistic measures, concreteness correlates positively with arousal (r = 0.378) and subjective significance (r = 0.685), which means that abstract stimuli are more arousing and subjectively significant than concrete ones. Imageability is negatively correlated with subjective significance (r = −0.448), which means that easier-to-imagine-words stimuli are perceived as less significant. Taking into account psycholinguistic variables, imageability is negatively correlated with concreteness (r = −0.8), thus easier-to-imagine-words stimuli are perceived as more concrete. Subjective age of acquisition assessments were negatively correlated with imageability (r = −0.515) and both frequency estimations (natural logarithms: LN) on the basis of the Subtlex\_pl dataset (r = −0.449) and Kazojc´ (2011) dataset (r = −0.438). Those relations mean that words that are acquit later in an individual development are harder to imagine as well as less frequent. Also, concreteness was positively associated with length of words (r = 0.39), which means that abstract stimuli were composed of the larger number of letters in the ANPW\_R

dataset. Additionally, in order to check the nature of relations between measures (liner or curvilinear), the regression analyses were conducted. Appendix 2 presents detailed results of regression analyses for cases when measures relation was better explained by a quadratic function (higher R 2 explained by a quadratic function than a linear one and significant R 2 change between functions c.f. lighter-shaded cells in **Table 6**). All quadratic relationships found for valence are presented in **Figure 4**, while remaining are presented on Figure 5 located in Appendix 2 in Supplementary Material.

# DISCUSSION

# Distribution, Stability, and Reliability of Assessments

As shown in **Figure 2**, the assessments cover the whole scale for valence, concreteness, and dominance, while there was a relative lack of highly arousing, significant and acquired-laterin-age words as well as low imaginable and heart-originated ones. The valence distribution is very similar to that obtained in the original ANEW dataset (Bradley and Lang, 1999), other adaptations (Redondo et al., 2007; Soares et al., 2012; Montefinese et al., 2014) and norms for a greater number of words (Lahl et al., 2009; Moors et al., 2013; Warriner et al., 2013; Riegel et al., 2015). Mean and standard deviation distributions shown in **Figure 3** indicate that for valence, dominance and partly origin there is a group of neutral/ moderate words that are perceived in an unambiguous way (low SD-values), but moderate values for other dimensions resulted from an ambiguous perception of affective reaction (high SD-values). Such findings are common in affective

norms studies. For example, in ANEW (Bradley and Lang, 1999), NAWL (Riegel et al., 2015) and the Italian ANEW adaptation (Montefinese et al., 2014), neutral in the valence dimension stimuli, were composed of both low SD and high SD stimuli.

Split-half assessment shows that the current dataset provides highly reliable values for all measured dimensions that are comparable with other existing datasets (Redondo et al., 2007; Soares et al., 2012; Moors et al., 2013; Montefinese et al., 2014; Imbir, 2015a; Riegel et al., 2015). Fifteen doubled-words analyses also showed that assessments were reliable and stable within the current study. It is interesting to note that correlations with other existing Polish language datasets are very satisfactory. This is the case with valence, arousal, dominance, origin and significance for 1586 words reassessed from the ANPW (Imbir, 2015a) as well as 1274 words shared with the NAWL (Riegel et al., 2015). This means clearly that the method of assessment used (c.f. Moors et al., 2013) is as good as traditional paper-and-pencil (e.g., Imbir, 2015a) estimations collected from a large number of participants assessing a low number of words.

## Sex Differences

Sex differences in affective reaction perception to words have been found several times in affective norms creation studies (c.f. Soares et al., 2012; Monnier and Syssau, 2014; Montefinese et al., 2014; Riegel et al., 2015). It is often expected, based on a stereotypical picture, that women are more emotional than men (e.g., Montefinese et al., 2014). Also, arousal and dominance are expected to be different between women and men in that men should perceive their reactions as more polarized in arousal and dominance (Montefinese et al., 2014). In the ANPW\_R ratings between female and male participants they were found to correlate rather highly and were even comparable with levels of split-half estimation of reliability (c.f. **Table 3**). Using ANOVA analyses (c.f. Ferré et al., 2012; Monnier and Syssau, 2014) with data from the ANPW\_R, all variables were found to differ for female and male ratings. In fact, women perceived valence in a more polarized way than men (c.f. **Table 4**), which means that negative words were more negative whereas positive ones were more positive in comparison to men's ratings. Arousal, dominance, origin, significance, imageability, and subjective age of acquisition dimension assessments were higher, while concreteness was lower for women than men, but not more polarized as it is in the case of valence. Interaction in the case of subjective age of acquisition revealed (c.f. **Table 4**), that negative words are perceived by men as learned earlier in comparison to women. Reversed relation can be observed in the case of positive words. Results for valence, arousal, dominance, and imageability are coherent with previous findings (Montefinese et al., 2014; Riegel et al., 2015).

# Relations between Affective Variables

The pattern of correlations presented in the ANPW\_R is consistent with previous findings concerning affective norms for words. For example, valence and arousal were found to follow a quadratic relationship (e.g., Redondo et al., 2007; Soares et al., 2012; Moors et al., 2013; Monnier and Syssau, 2014; Montefinese et al., 2014; Imbir, 2015a; Riegel et al., 2015), meaning that for neutral words we observe a low arousal level, while for both negative and positive stimuli the arousal level is higher. Although this is a general trend, one may find words that do not follow this trend, and despite a neutral valence, are highrather than not negatively low-arousing stimuli (c.f. **Figure 4**). Also, the arousal and dominance relationship appeared to be better described by the quadratic function. This was found earlier in the Italian adaptation of the ANEW list (Montefinese et al., 2014) and in Affective Norms for 718 Polish Short Texts (Imbir, submitted). This could be explained quite easily by the high positive correlation between valence and dominance, suggesting that both dimensions share much in common, thus correlating in a similar way with arousal.

In the ANPW\_R dataset, six more quadratic relations were found to explain better the correlations between measured dimensions. For valence (c.f. **Figure 4**) those were the origin and subjective significance dimension cases. Taking into account origin, most valenced (negative and positive) words were perceived as more automatic-originated, while neutral was seen as more reflective-originated. This is probably because of the association of metaphors used to describe both ends of the origin scale. "Heart" is associated with passion and emotions, while "mind" is associated with reason and much less with passion. Similar results were found in the case of Polish Short Texts (Imbir, submitted). For significance, both ends of the valence scale were perceived as more crucial (subjectively significant) than neutral words. This is a similar pattern to that obtained in the valence and arousal case relationship in which valenced words were simultaneously more arousing ones. The previously mentioned arousal and dominance relationship can also be seen for subjective significance and dominance. The same moderate stimuli from the dominance scale are perceived as less subjectively significant in comparison to both controllable and uncontrollable stimuli. This could support the expectation that arousal and significance are two distinct mechanisms of activation interacting in a similar way with valence, but correlated with each other at a moderate level (r = 0.378).

The quadratic relation of origin and subjective significance was also found to be similar to the valence and arousal correlation. Moderate originated words were perceived as less significant than both automatic- and reflective-originated ones. In several cases of relations between valence and concreteness or imageability, arousal or dominance with concreteness, as well as origin and imageability, the distribution patterns shown on **Figure 4** and Figure 5 in Appendix 2 in Supplementary Material are much less clear (c.f. **Figure 4**) and many exemptions from the general trend can be easily observed. The quadratic function still explains correlations better than the linear relation.

To sum up, the pattern of correlations results supports the claim that arousal and subjective significance are both activation aspects of affective reactions to stimuli. Also, valence and origin relate with both activation mechanisms in a similar way. The origin and valence relationship is challenging for the expectation of no relation between the two factors, but this is probably due to the metaphor used in the SAM scale construction. Dominance and valence are similar in relation to other dimensions, thus it is quite logical to omit dominance in affective norms creation (c.f. Riegel et al., 2015).

# Relations between Affective and Psycholinguistic Variables

Relations between affective and psycholinguistic measures are also worth interpretation (Citron et al., 2014), since it is a relatively new part of affective norms studies. The ANPW\_R, due to large number of assessed dimensions, gives us an opportunity for wide inspection of relations between two different types of measures. The results confirmed earlier findings for Spanish words (Hinojosa et al., 2016) that concreteness is negatively correlated with valence. The positive, linear correlation of concreteness and arousal was found in the ANPW\_R. This result is coherent with Hinojosa et al. (2016), but not coherent with Italian norms (Montefinese et al., 2014) reporting quadratic relation between those measures. The relation between imageability and arousal was found to be quadratic in the ANPW\_R, which is coherent with Montefinese et al. (2014), but different to the findings of Citron et al. (2014) for English words. Finally, the subjective age of acquisition relation to affective measures was found to be negative for valence, the same as in the Dutch normative study (Moors et al., 2013), negative for dominance which is opposite to Moors et al. (2013) findings, and negative for arousal, also opposite to the results of Citron et al. (2014). The pattern of relations described above does not allow us to draw conclusions, especially because the correlations between psycholinguistic and affective measures are typically low, thus although significant, they are rather weak (c.f. Janschewitz, 2008; Moors et al., 2013; Citron et al., 2014; Montefinese et al., 2014; Hinojosa et al., 2016).

# Current Study Limitations

It is worth highlighting that the current study has limitations. First of all the translation procedure employed, based on combined bilingual machine and human based steps may not be enough to compare the results in word-to-word comparison of assessments in cross cultural studies. Also, using the ANPW\_R one had to watch out for the number of assessments done for each word, because some words scored lower than 50 of assessments, due to their unfamiliarity to the participants. Those words are included in the dataset in order to allow scientist include the familiarity scores in possible usages of the ANPW\_R.

# Possible Use of the ANPW\_R

A research method of the Affective Norms of 4905 Polish Words Reload (ANPW\_R) is important for the development of affective research in the Polish-speaking samples. It provides norms for eight different affective and psycholinguistic scales describing perception of reactions to the stimuli. Due to two new proposed dimensions introduced in the ANPW (origin and significance: Imbir, 2015a), the ANPW\_R allows researchers to test hypotheses concerning the new developments in the field of affective sciences using the duality-of-mind approach. Also, the inclusion of three psycholinguistic variables (concreteness, imageability, and subjective age of acquisition) makes the ANPW\_R dataset go beyond the standard approach in affective norm generation studies. Appendix 1 in Supplementary Material also presents measures of frequency based on two different Polish datasets (Kazojc, 2011; Mandera et al., 2014 ´ ) as well as grammatical classes and length for each word. The dataset can be used without restriction by all scientists interested in: (1) searching for word processing mechanisms or (2) wanting to manipulate the affective state of an individual. As a supplement to this list a Polish Pseudoword List was prepared recently (Imbir et al., 2015b), providing a list of 3023 pseudo-words generated from words used in the ANPW\_R and complementary to them in length.

## Description of the Database

The normative values of the Polish adaptation of affective norms are included in the Appendix to this article. In the first two columns, the full list of Polish words (4905) and their English translations is provided. Then, four lexical variables (two measures of frequency in the Polish language, parts of speech, and number of letters) are presented. Starting from column H, five affective dimensions (valence, arousal, dominance, origin, and significance) as well as three psycholinguistic dimensions

# REFERENCES


(concreteness, imageability and subjective age of acquisition) are reported. For each variable, the number of participants assessing single words [N], the range, represented by the minimal [Min] and maximal [Max] rates, the mean [M], and standard deviation [SD] are presented in subset columns of a dataset spreadsheet. The ANPW\_R is freely available to the scientific community for noncommercial use as a form of supplemental online material.

# AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

# FUNDING

The project was funded by the National Science Center on the basis of decision 2013/09/B/HS6/00303.

# ACKNOWLEDGMENTS

I thank Natalia Maksimowicz, Alicja Brzozowska, Katarzyna Kubinska, Dagmara ´ Swierczewska, and Iga Parkitna for data ´ collection and technical assistance.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2016.01081


processing of gender information: an event-related potential study. Cogn. Affect. Behav. Neurosci. 14, 1286–1299. doi: 10.3758/s13415-014-0291-x


EEG-independent component analysis. Exp. Brain Res. 233, 723–733. doi: 10.1007/s00221-014-4149-9

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Imbir. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Event-related brain responses to emotional words, pictures, and faces – a cross-domain comparison

## *Mareike Bayer\* and Annekathrin Schacht*

Courant Research Centre Text Structures, University of Göttingen, Göttingen, Germany

#### *Edited by:*

Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Tübingen, Germany

#### *Reviewed by:*

Marianna Eddy, U.S. Army Natick Soldier Research, Development and Engineering Center, USA Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Tübingen, Germany Philipp Kanske, Max Planck Institute for Human Cognitive and Brain Sciences, Germany

*\*Correspondence:*

Mareike Bayer, Courant Research Centre Text Structures, University of Göttingen, Nikolausberger Weg 23, Göttingen 37073, Germany e-mail: mbayer@uni-goettingen.de

Emotion effects in event-related brain potentials (ERPs) have previously been reported for a range of visual stimuli, including emotional words, pictures, and facial expressions. Still, little is known about the actual comparability of emotion effects across these stimulus classes. The present study aimed to fill this gap by investigating emotion effects in response to words, pictures, and facial expressions using a blocked within-subject design. Furthermore, ratings of stimulus arousal and valence were collected from an independent sample of participants. Modulations of early posterior negativity (EPN) and late positive complex (LPC) were visible for all stimulus domains, but showed clear differences, particularly in valence processing.While emotion effects were limited to positive stimuli for words, they were predominant for negative stimuli in pictures and facial expressions.These findings corroborate the notion of a positivity offset for words and a negativity bias for pictures and facial expressions, which was assumed to be caused by generally lower arousal levels of written language. Interestingly, however, these assumed differences were not confirmed by arousal ratings. Instead, words were rated as overall more positive than pictures and facial expressions.Taken together, the present results point toward systematic differences in the processing of written words and pictorial stimuli of emotional content, not only in terms of a valence bias evident in ERPs, but also concerning their emotional evaluation captured by ratings of stimulus valence and arousal.

**Keywords: emotion, language, pictures, facial expressions of emotions, event-related brain potentials (ERPs), domain specificity, positivity bias, negativity bias**

### **INTRODUCTION**

A smiling face, a sad headline on a newspaper: emotional stimuli seem to easily attract our attention in everyday live. And in fact, a wealth of evidence has established the preference of emotional information across a variety of stimuli and has traced it back to the level of neuronal processing. It is the aim of the present study to compare emotion effects in event-related brain potentials (ERPs) between the three stimulus domains most often encountered within the visual modality, namely for pictures of emotional scenes or objects,facial expressions of emotions, and written words of emotional content.

Numerous studies using ERPs reported evidence for preferential processing of emotional pictures (Cuthbert et al., 2000; Schupp et al., 2000; for review, see Olofsson et al., 2008), facial expressions of emotions (Schupp et al., 2004b; Rellecke et al., 2012), and emotional words (e.g., Herbert et al., 2008; Schacht and Sommer, 2009a,b; Bayer et al., 2012b). These studies suggest general similarities between stimulus domains in ERP effects elicited by emotional content. This seems noteworthy considering the vast differences between stimulus domains in general: In the first place, these differences concern physical stimulus properties (e.g., words as symbolic stimuli compared to pictorial stimuli, i.e., pictures and faces); furthermore, they extend to theoretical considerations. As an example, it was recently proposed that pictures of objects or scenes might provide *direct* affective information, while facial expressions of emotion only constitute

*indirect* affective information since they primarily convey the emotion of the person depicted (Walla and Panksepp, 2013). Despite these differences, however, a number of ERP components were frequently shown to be similarly elicited or modulated by emotional stimulus content. At around 100 ms after stimulus onset, the P1 component reflects the perceptual encoding of visual input. Modulations for emotional as compared to neutral stimuli were reported for words (e.g., Scott et al., 2009; Bayer et al., 2012b; Keuper et al., 2012), pictures (Delplanque et al., 2004), and facial expressions (e.g., Rellecke et al., 2012). Subsequently, enhanced sensory processing of emotional stimuli is indexed by the so-called early posterior negativity (EPN, e.g.,Junghöfer et al., 2001; Schupp et al., 2004a; Kissler et al., 2007; Herbert et al., 2008; Bayer et al., 2012a). Although the EPN occurs with comparable scalp distributions, its latency clearly differs between stimulus domains: For pictures and faces, the EPN usually starts around 150 ms after stimulus onset, whereas its onset for emotional words has been located at a post-lexical processing stage at around 250 ms after stimulus onset (e.g., Palazova et al., 2011). Finally, starting from ∼300– 400 ms after stimulus onset, an enhanced parietal positivity for emotional stimuli was suggested to reflect higher-order stimulus evaluation [late positive complex (LPC), e.g., Cuthbert et al., 2000; Schupp et al., 2000; Herbert et al., 2008; Schacht and Sommer, 2009a; Bayer et al., 2012a]. Although LPC amplitudes have been reported for both emotional words, pictures, and facial expressions, the scalp topography of the LPC was reported to differ

between stimulus domains, thus indicating the involvement of at least partially different brain structures in the elaborate processing of emotional stimulus content (Schacht and Sommer, 2009a).

Despite the notable similarities in the emotional processing mentioned above, only little is known about the actual comparability of emotion effects between stimulus domains in terms of effect strength, automaticity, or possible valence biases. Clear conclusions are complicated for two reasons. First, the comparability *across* studies is severely limited by specificities of experimental designs and procedures, stimulus materials, and task demands; all of which may influence emotion effects in ERPs (Fischler and Bradley, 2006; Schacht and Sommer, 2009b). Second, only a relatively small number of studies has realized direct, that is *within*-subject, comparisons of emotion effects between stimulus domains; to the best of our knowledge, none of them employing all three visual domains mentioned above at the same time and under comparable task demands.

Despite these limitations, previous research has generated evidence for two diverging accounts about differences in emotion effects between stimulus domains. First, it was suggested that words might be generally less capable of triggering emotion effects than pictures and facial expressions. This difference was explained with a supposedly lower arousal level of symbolic stimuli, i.e., words, in general, which might thus elicit weaker arousal responses (De Houwer and Hermans, 1994; Hinojosa et al., 2009). In a study by Hinojosa et al. (2009), emotion effects for words in an intactness decision were limited to the LPC time window from 350 to 425 ms. In contrast, emotional pictures additionally elicited effects in reaction times (RTs) and in the earlier EPN component1. Similar results were reported for emotional facial expressions. In a study by Rellecke et al. (2011), employing superficial face-word decisions, both emotional facial expressions and emotional words elicited emotion effects already at early latencies between 50 and 100 ms after stimulus onset, but later EPN effects were limited to facial stimuli. On theoretical grounds, these results are in line with the notion of a cascaded response to stimuli according to their biological relevance and thus to their possible impact on the well-being of the observer (Lang et al., 1997). This assumption was corroborated by reduced emotion effects in facial muscle activity for words as compared to pictures and sounds (Larsen et al., 2003) and in the activity of the autonomous nervous system (for a discussion, see Bayer et al., 2011).

In contrast to these results, a number of recent studies reported similar activation patterns for emotional content of pictures, faces, and words, at least concerning the activity of the central nervous system. In an ERP study by Schacht and Sommer (2009a), both emotional facial expressions and emotional words elicited emotion effects in EPN and LPC amplitudes, albeit differing in time course, and, in case of the LPC, in scalp distributions. Similarly, Schlochtermeier et al. (2013), reported evidence for comparable emotion-related brain activity for emotional pictures and words in a fMRI study, while taking into account the differences of visual

complexity between stimulus domains. Finally, both pictograms and words elicited similar modulations of the P300 component in an ERP study; anterior modulations in the LPC window were even more pronounced for words compared to pictograms (Tempel et al., 2013). Thus, it is conceivable that stimulus domains may not generally differ in their capacity to trigger emotion-related brain activity.

A further domain-specificity in emotion processing becomes evident when considering differential effects of emotional valence. Here, a large number of findings suggest the existence of valence biases, which differ between domains. For written words, a bias for positive stimuli has often been reported, which was evident not only in ERPs (e.g., Herbert et al., 2006; Kissler et al., 2009; Bayer et al., 2012b), but also in RTs (e.g., Schacht and Sommer, 2009b) and amygdala activity (Herbert et al., 2009). In the case of pictures, however, negative stimuli seem to attract preferential processing, resulting in augmented emotion effects for negative as compared to neutral or positive stimuli (Carretié et al., 2001; Smith et al., 2003; Delplanque et al., 2004). Finally, regarding emotional facial expressions, results suggest a facilitated processing of threatening faces as conveyed by angry or fearful facial expressions (Schupp et al., 2004b; Pourtois and Vuilleumier, 2006; Rellecke et al., 2012).

As in the case of domain comparisons (reporting generally reduced emotion effects in the verbal domain), these differences were again related to an assumed lower arousal level of written words (Herbert et al., 2006). Findings were interpreted in the framework of a theory proposed by Cacioppo and Gardner (1999) suggesting the existence of a preference of positively valenced stimuli at relatively low arousal levels (*positivity offset*) and a *negativity bias*, that is, a preferential processing of negative information at high arousal levels.

In summary, two major differences in the processing of emotional content between written words and pictorial stimuli (including pictures and facial expressions) arose from previous research. First, words in general were reported to be less capable of triggering emotion effects in ERPs (although a number of studies reported similar activation patterns). Second, a positive valence bias was frequently shown within the verbal domain, whereas a preference for negative content was reported for pictures and facial expressions. In both cases, these differences were supposed to originate from a generally lower arousal level of written words in comparison to pictures or facial expressions.

The present study had two major objectives. First, it aimed to investigate possible differences in emotion effects in ERPs between the three stimulus domains in a within-subject design. More precisely, it sought to answer the question whether (i) effects of emotional content would be reduced or absent in response to words as compared to pictures or facial expressions; or (ii) there would be a positive valence bias for words and a bias for negative valence for pictures and facial expressions. Both findings were reported in previous literature and were often supposed to result from lower arousal valuesfor words compared to pictorial stimulus domains. Therefore, the second aim of this study was to provide empirical evidence for this theoretical assumption by collecting valence and arousal ratings for all experimental stimuli using an independent sample of participants.

<sup>1</sup>Data for words and pictures were collected in separate experiments from different participant samples.

Since several emotion effects have been shown to differentially depend on task demands (e.g., Schacht and Sommer, 2009b), the same task – a silent reading/passive viewing paradigm with occasional 1-back recognition memory tests – and experimental design was employed in order to achieve maximal comparability between stimulus domains. Above that, we decided to present stimuli in their most "naturalistic" form, thus accepting differences in physical features between stimulus domains like colorfulness, size, and complexity, in order to allow for results that are representative for each stimulus domain.

#### **MATERIALS AND METHODS PARTICIPANTS**

Data was collected from 25 native German speakers; one data set was excluded from the analyses due to the participant's left-handedness. The remaining 24 participants (mean age = 25.4 years, SD = 4.9) were right-handed (according to Oldfield, 1971), had normal or corrected-to-normal vision, and no phobias or other psychiatric or neurological disorders according to self-report. Participants received course credits or 20 Euros for participation. The study was designed according to the Declaration of Helsinki and was approved by the local institutional review board.

#### **STIMULI**

Stimulus materials consisted of 72 faces, pictures, and words, each. Within each stimulus domain, the stimulus set contained 24 positive, 24 negative and 24 neutral stimuli.

Words were selected from the Berlin Affective Word List Reloaded (Vo et al., 2009); only nouns were included. Stimulus categories were controlled regarding word frequency (Baayen et al., 1995), word length (numbers of letters and syllables) and imageability ratings, all *F*s(2,69) < 1; for stimulus characteristics, see **Table 1**. Emotion categories differed significantly in their valence ratings as expected, *F*(2,69) = 1403.45, *p* < 0.001. Regarding stimulus arousal, positive and negative words were rated as significantly more arousing than neutral words, *F*s(1,46) > 100.86, *p*s < 0.001, but did not differ from each other, *F*(1,46) = 1.43, *p* = 0.714.

Picture stimuli were chosen from the IAPS database (Lang et al., 2008). As for word stimuli, emotion categories differed significantly in their valence ratings, *F*(2,69)=446.57, *p*<0.001. Positive and negative pictures were matched for arousal, *F*(1,46) < 1, but were significantly more arousing than neutral pictures, *F*s(1,46) > 151.07, *p*s < 0.001. Emotion categories did not significantly differ in their luminance, apparent contrast, and physical complexity as measured by JPEG file size, all *F*s(2,69) < 1, or in the number of pictures depicting humans, *F*(2,69) = 1.08, *p* = 0.344.

Face stimuli consisted of portraits of 72 different persons with happy, neutral, or angry facial expressions (*n* = 24 per category, 12 female). Faces were chosen from previous studies by Rellecke et al. (2012). Valence ratings confirmed that angry faces were perceived as more unpleasant than happy and neutral faces, and that happy faces were rated as more pleasant than neutral and angry faces, all *F*s(1,46) > 540.99, *p*s < 0.001. A rectangular gray mask with an ellipsoid aperture was added to the portraits in order to display solely the facial area.

#### **Table 1 | Stimulus characteristics for words, pictures, and faces.**


Additionally to listing valence and arousal ratings on the original rating scales, rating values were transformed to the scales used in the post-experimental ratings in order to allow for comparability between rating values.

#### **PROCEDURE**

Before the start of the experiment, participants signed informed consent and provided demographic information. Stimuli were presented at the center of a computer screen positioned at a distance of 60 cm from the participant. At the beginning of each trial, a mask was presented for 1 s; corresponding to stimulus domain, the mask consisted of a scrambled word, picture, or face. Following this mask, stimuli were presented for 3 s. Words, pictures, and faces were presented in separate blocks; the order of blocks was counterbalanced. Within blocks, stimuli were presented twice in randomized order. After 10% of trials, stimuli were followed by a 1-back task in order to ensure participant's attention to the stimuli. During these test trials, a stimulus was presented within a green frame, and participants indicated by button press whether the presented stimulus was identical to the preceding stimulus or not. Importantly, the position of test trials was randomized and thus unpredictable to the participant. Furthermore, all test trials were excluded from analyses. Words were presented in Arial font at font size 28 and spanned a mean visual angle of 2.4◦ × 0.9◦. Pictures had a size of 512 × 384 pixels, corresponding to a visual angle of 15.4◦ × 10.8◦; faces were presented at a size of ∼300 × 350 pixels, resulting in a visual angle of 8.6◦ × 11.4◦.

#### **DATA ACQUISITION**

The EEG was recorded from 61 electrodes placed in an electrode cap according to the extended 10–20 system (Pivik et al., 1993); four electrodes placed at the outer canthi and below both eyes were used to record electro-oculograms. Signals were recorded

with a sampling rate of 500 Hz and amplified with a bandpass filter of 0.032–70 Hz. Electrode impedance was kept below 5 k-. Electrodes were referenced to the left mastoid; offline, data was re-referenced to average reference. Blinks were corrected using Surrogate Multiple Source Eye Correction implemented in Besa (Brain Electric Source Analysis, MEGIS Software GmbH). Epochs containing artifacts, i.e., amplitudes exceeding –100 or +100 μV or voltage steps larger than 50 μV, were discarded, resulting in the elimination of 1.5% of trials. Overall number of discarded trials per condition (domain by emotion) ranged between 13 and 21 and did not differ between conditions, as indicated by a repeatedmeasures ANOVA, all *F*s < 1.33. Continuous data was segmented into segments of 1100 ms, starting 100 ms prior to stimulus onset, and referred to a 100 ms pre-stimulus baseline.

#### **DATA ANALYSIS**

Behavioral performance in the 1-back task was analyzed by a repeated-measures ANOVA including the factors domain (words, pictures, faces) and trial type (repeated stimulus, new stimulus). P1 amplitudes were determined by an automated peak-detection algorithm as maximal positive deflection between 50 and 150 ms after stimulus onset at occipital electrodes PO9, PO7, PO8, and PO10; electrode PO8 was used as reference channel. Modulations of the EPN were assessed as mean ERP amplitudes at a group of posterior electrodes (TP9, TP10, P9, P7, P8, P10, PO9, PO10, Iz); the LPC was quantified at a group of centro-parietal electrodes (CP1, CPz, CP2, P3, Pz, P4, PO3, POz, PO4). Since stimulus domains did show considerable differences in their general processing [see **Figure 1** for global field power (GFP) amplitudes], and in order to account for domain-related difference in the time course of emotion effects (Schacht and Sommer, 2009a), time windows for the analyses of EPN and LPC within each stimulus domain were determined by visual inspection of grand mean waveforms. The EPN was analyzed between 250 and 400 ms for words, between 180 and 250 ms for pictures, and between 170 and 300 ms for faces. For the LPC, time windows for analyses ranged from 500 to 650 ms (words), from 400 to 800 ms (pictures), and from 400 to 600 ms (faces). In the EPN time windows, we additionally analyzed anterior activations at electrode locations

AFz, F3, Fz, F4, FC1, FCz, FC2. Within each stimulus domain, the influence of emotional content on P1, EPN, anterior positivity and LPC amplitudes was analyzed by repeated-measures ANOVAs including the factors emotion (positive, neutral, negative) and electrode (see above for specific electrode numbers and locations per region of interest); only significant main effects of emotion will be reported. In order to specify the onsets of EPN effects in each stimulus domain, onset analyses were performed on significant *post hoc* comparisons. To this aim, we applied running *t*-tests on grand averages of ERP differences between emotion conditions; in order to prevent spurious results, only activations with a minimum length of 10 consecutive significant data points were considered. Degrees of freedom in ANOVAs were adjusted using Huynh–Feldt corrections. Results will be reported with uncorrected degrees of freedom, but corrected *p*values. Within *post hoc* tests, Bonferroni-corrections were applied to *p*-values; all significant and marginally significant results (<0.1) are reported.

#### **POST-EXPERIMENTAL RATING OF EMOTIONAL VALENCE AND AROUSAL**

All stimuli used in the main experiment were rated for emotional valence and arousal by an independent sample of 67 participants (mean age = 24.1, SD = 3.4; 48 female) by using a computerized version of the self-assessment manikin (Bradley and Lang, 1994). Like in the main experiment, stimuli were present blockwise for each domain, in randomized order within each block; the order of blocks was counterbalanced. Furthermore, the sequence of ratings (valence and arousal) was counterbalanced. Ratings were aggregated over conditions and analyzed byANOVAs including the factors emotion (positive, negative, neutral) and stimulus domain (words, pictures, faces). Alpha levels in *post hoc* comparisons were Bonferroni-corrected. For interactions between emotion and stimulus domain, only emotion comparisons across stimulus domains will be reported.

#### **RESULTS**

#### **BEHAVIORAL DATA**

Performance in the 1-back task was at 96.87%. An ANOVA on percentage of correct classifications yielded no significant differences between stimulus domains (words, pictures, faces) or trial type (repeated vs. new stimulus).

#### **EVENT-RELATED BRAIN POTENTIALS** *Words*

No emotion effects were visible in P1 peak amplitudes, *F*(2,46)<1. In the EPN time window, ANOVAs revealed a significant main effect of emotion, *<sup>F</sup>*(2,46) <sup>=</sup> 6.48, *<sup>p</sup>* <sup>&</sup>lt; 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.220, reflecting larger amplitudes of the EPN for positive compared to neutral words, *<sup>F</sup>*(1,23) <sup>=</sup> 13.81, *<sup>p</sup>* <sup>&</sup>lt; 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.375; the onset of this effect was located at 250 ms. Analyses of the anterior positivity in the EPN time window showed a main effect of emotion, *<sup>F</sup>*(2,46) <sup>=</sup> 4.85, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.174, reflecting larger amplitudes for positive than for neutral words, *F*(2,46) = 9.19, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.285. Furthermore, there was a main effect of emotion in the LPC time window, *F*(2,46) = 4.14, *p* < 0.05, η2 <sup>p</sup> = 0.152, which was based on a more pronounced positivity at centro-parietal electrodes for positive compared to negative

words, *<sup>F</sup>*(1,23) <sup>=</sup> 7.43, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.244, and, as a trend, for positive versus neutral words, *F*(1,24) = 5.17, *p* = 0.075, η2 <sup>p</sup> = 0.200. ERP results are depicted in **Figure 2**, for an overview see **Table 2**.

#### *Pictures*

Analyses of P1 amplitudes revealed no significant effects of emotion, *F*(2,46) = 1.44, *p* = 0.247. For the EPN, ANOVAs revealed a main effect of emotion, *<sup>F</sup>*(2,46) <sup>=</sup> 5.53, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.194. This effect was based on larger EPN amplitudes for negative relative to neutral pictures, *<sup>F</sup>*(1,23) <sup>=</sup> 9.22, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.286, with an

onset at 174 ms, and for positive compared to neutral pictures, *<sup>F</sup>*(1,23) <sup>=</sup> 7.20, *<sup>p</sup>* <sup>&</sup>lt; 0.05, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.238, which started at 194 ms. At anterior electrodes, analyses revealed no significant activations, *F*(2,46) = 1.89, *p* = 0.162. A pronounced effect of emotion was observed in the LPC, *<sup>F</sup>*(2,46) <sup>=</sup> 5.98, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.610. This effect was based on larger LPC amplitudes for negative pictures compared to both neutral pictures, *F*(1,23) = 53.89, *p* < 0.001, η2 <sup>p</sup> = 0.701, and positive pictures, *F*(1,23) = 24.47, *p* < 0.001, η2 <sup>p</sup> = 0.516. Furthermore, positive pictures elicited larger LPC amplitudes than neutral pictures, *F*(1,23) = 17.85, *p* < 0.001, η2 <sup>p</sup> = 0.437.

**FIGURE 2 | Grand mean waveforms for positive, neutral, and negative words, pictures, and faces at selected electrode locations.** Highlighted areas show time windows of analyses for EPN and LPC in the respective stimulus domain. Scalp distributions of significant emotion effects are

depicted as differences between indicated emotion categories; time windows correspond to the highlighted areas on the left side. The voltage scale of −1 to 1 μV applies to all topographies but the LPC for pictures, where the corresponding scale is depicted underneath the scalp distributions.


**Table 2 | Overview of emotion effects in EPN and LPC, including anterior positivities in the EPN time window.**

The table shows significant results of post-tests.

#### *Faces*

As for words and pictures, there were no emotion effects in P1 amplitudes, *F*(2,46) < 1. A main effect of emotion occurred for the EPN, *<sup>F</sup>*(2,46) <sup>=</sup> 15.87, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 408; angry faces elicited larger EPN amplitudes relative to both positive faces, *<sup>F</sup>*(1,23) <sup>=</sup> 18.1, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.440, and neutral faces, *<sup>F</sup>*(1,23) <sup>=</sup> 24.57, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.516. The onsets of these effects were located at 176 ms (negative vs. positive) and at 168 ms (negative vs. neutral). In the same time window, a significant emotion effect was evident at anterior sites, *<sup>F</sup>*(2,46) <sup>=</sup> 15.78, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.407, reflecting larger positive amplitudes for both negative and positive faces compared to neutral faces, *<sup>F</sup>*(1,23) <sup>=</sup> 24.87, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 520, and *<sup>F</sup>*(1,23) <sup>=</sup> 21.32, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.480, respectively. In the LPC time window, analyses showed a main effect of emotion, *<sup>F</sup>*(2,46) <sup>=</sup> 7.50, *<sup>p</sup>* <sup>&</sup>lt; 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.246, reflecting larger LPC amplitudes for negative relative to positive faces, *F*(1,23) = 15.57, *p* < 0.01, η<sup>2</sup> <sup>p</sup> = 0.404.

#### **RATINGS OF EMOTIONAL VALENCE AND AROUSAL** *Arousal*

Arousal ratings showed no main effect of stimulus domain, *F*(2,207) = 1.829, *p* = 0.163. There was a main effect of emotion category, *<sup>F</sup>*(2,207) <sup>=</sup> 329.11, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.761, indicating that positive and negative stimuli were rated as more arousing than neutral stimuli; additionally, negative stimuli received higher arousal ratings than positive stimuli, all *p*s < 0.001. An interaction of emotion and stimulus domain, *F*(4,207) = 7.26, *p* < 0.001, η2 <sup>p</sup> =0.123, indicated that differences for emotion categories across stimulus domains were limited to negative stimuli, where pictures received higher arousal ratings than words, *p* < 0.05. For rating results, see **Table 3** and **Figure 3**.

#### *Valence*

As expected, results showed a main effect of emotion category, *<sup>F</sup>*(2,207) <sup>=</sup> 1526.762, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.937. Positive stimuli were rated as more pleasant than neutral and negative stimuli, *p*s < 0.001; likewise, negative stimuli received lower valence ratings than neutral stimuli, *p*s < 0.001. A main effect of stimulus domain, *<sup>F</sup>*(2,207) <sup>=</sup> 9.528, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.084, revealed that words were overall rated as more pleasant than pictures and faces, *p*s < 0.01, whereas the latter domains did not differ from each other. Additionally, an interaction of emotion category and stimulus domain, *<sup>F</sup>*(4,207) <sup>=</sup> 3.94, *<sup>p</sup>* <sup>&</sup>lt; 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.071, was due to the fact that the differences between stimulus domains were limited to positive stimuli (where words received higher valence ratings than pictures, *p* < 0.05) and neutral stimuli (with higher

**Table 3 | Results of post-experimental ratings for words, pictures, and faces.**


valence ratings for words and pictures in comparison to faces, all *p*s < 0.050).

#### **DISCUSSION**

The present study aimed to investigate emotion effects in the processing of words, pictures, and facial expressions. Emotion effects in form of EPN and LPC were visible in all stimulus domains. Interestingly, results furthermore showed valencespecific biases depending on stimulus domain: within the verbal domain, ERPs indicated a preferential processing of positive content, whereas a bias for negative valence became evident for both pictures and facial expressions. In disagreement with previous assumptions (Hinojosa et al., 2009), however, words did not receive lower arousal ratings than pictures or facial expressions.

The finding that emotion effects in EPN and LPC amplitudes occurred in all three stimulus domains is in line with previous studies reporting comparable emotion-related activity for emotional words and pictorial stimuli. First, Schacht and Sommer (2009a) reported EPN and LPC modulations for both emotional words and emotional facial expressions. In a second study, words elicited similar emotion-related activations in P300 amplitudes compared to pictograms and pronounced emotion effects in the LPC time window (Tempel et al., 2013). Finally, in an fMRI study by Schlochtermeier et al. (2013), emotional words and pictures elicited comparable emotion-related activity. Nonetheless, these and the present results seem to be at contrast with a number of studies suggesting reduced capability of words as compared to pictures or faces to elicit emotion effects in ERPs (e.g., Hinojosa et al., 2009). A possible explanation for this pattern of results was recently proposed by Rellecke et al. (2011), suggesting that

emotional processing in faces was automated to a higher degree than for words: In a highly superficial face-word decision task, emotion effects beyond 100 ms after stimulus onset were limited to face stimuli, although the exact same words had elicited EPN and LPC effects in another study using a lexical decision task (Palazova et al., 2011). These findings thus point to the importance of taking differences in task-dependence between stimulus domains into account. Notably, studies reporting *comparable* emotion-related activity for words and faces/pictures employed tasks that required lexico-semantic processing of verbal stimuli like a lexical decision task (Schacht and Sommer, 2009a), a valence judgment task (Schlochtermeier et al., 2013; Tempel et al., 2013), or an occasional recognition memory task in the present study. In contrast, investigations reporting *reduced* emotion effects for words used superficial tasks that could be performed on the basis of coarse perceptual features without requiring lexicosemantic processing, e.g., a face-word decision task (Rellecke et al., 2011), or a discrimination of intact stimuli within a series of scrambled distractors (Hinojosa et al., 2009). In line with these findings, Frühholz et al. (2011) reported that EPN effects for emotional faces occurred for both implicit (color naming) and explicit (emotion judgment) tasks, whereas EPN modulations for emotional words were limited to the valence judgment task. Thus, it seems that emotional words do not generally show a reduced capability to elicit emotion effects, but it seems to be automated to a much lesser degree than for facial expressions and pictures. This suggestion was also corroborated by task comparisons within the verbal domain, which suggested a high task-dependence of emotion effects, especially for the LPC (Fischler and Bradley, 2006; Schacht and Sommer, 2009b), but – more recently – also for the EPN (Hinojosa et al., 2010; Bayer et al., 2012b).

Considering previous findings, the present study also points to the importance of context effects for emotion effects in word processing. In a study investigating the influence of font size on emotion processing (Bayer et al., 2012a), the same stimulus words than in the present study did not receive a positive valence bias, but elicited EPN and LPC modulations for both positive and negative stimuli. Importantly, this difference occurred although the same task was employed in both studies. Furthermore, stimuli were presented in a blocked design and not intermixed with other stimuli (pictures and faces in the present study and words in large font size, respectively), a design that is likely to even reduce context effects. In case that (single) words are actually presented in direct context with "competing" stimuli, the impact on emotional processing in words seems to be even larger. This was shown for face stimuli (Rellecke et al., 2011), but also for linguistic context information (Bayer et al., 2010). Taken together, emotion effects for written words might not only depend on the immediate task at hand, but also on the broader experimental context as provided by previously presented stimuli.

Although emotion effects in ERPs were evident in all stimulus domains, results showed clear differences in valence biases between stimulus domains. For facial expressions, EPN and LPC effects were limited to negative stimuli. In the case of pictures, sensory processing of emotional content as evidenced by the EPN was visible for negative compared to neutral pictures, and, with a later onset, also for positive compared to neutral pictures. In the later LPC interval, however, emotion effects were largest for negative stimuli. In contrast, emotion effects were limited to positive stimuli in the verbal domain. Furthermore, words overall received more positive ratings than pictures or facial expressions. These results are in line with a bias for positive valence for words (e.g., Herbert et al., 2009; Bayer et al., 2012b) and a negative valence bias for pictures (e.g., Carretié et al., 2001) and facial expressions (e.g., Pourtois and Vuilleumier, 2006) as evidenced in previous reports. Interestingly, analyses of the anterior positivity in the EPN time window only partly corresponded to EPN effects. In the verbal domain, enhanced ERP amplitudes to positive compared with neutral words were in accordance with both the EPN as well as with previous literature on anterior (P2) effects of emotional content (Kanske and Kotz, 2007). In contrast, discrepancies were more pronounced for facial expressions. Here, both positive and negative faces differed from neutral faces at anterior electrode sites, while EPN effects were limited to negative faces, both in comparison to neutral and positive faces. Enhanced frontocentral positivities were previously reported in a similar time window (155–200 ms) by Eimer and Holmes (2002) for fearful compared to neutral facial expressions; comparisons, however, are limited by the fact that this study did not include positive stimuli. Concerning the EPN in response to emotional facial expressions, the time course of the component is of special interest since it temporally coincides with the N170 component. Although the dissociation between emotion effects in the EPN and the N170 has been a matter of debate, recent research suggested the involvement of at least partially dissociable neural generators (Rellecke

et al., 2011, 2013). Lastly, anterior emotion effects were absent for pictures. Taken together, these results suggest that anterior effects are not merely counterparts of the posterior EPN effects in the same time window, suggesting a domain-specific involvement of multiple neural sources already at early stages of emotion processing.

On a theoretical level, the positive valence bias for words was related to a so-called positivity offset, describing the notion of a preference for positive stimuli at rather low levels of emotional activation, which was suggested to be the basis of approach motivation in neutral contexts (Cacioppo and Gardner, 1999). In contrast, results for pictures and faces are in line with the idea of a negativity bias for stimuli at higher arousal levels, which was supposed to prepare the organism for rapid responses to threatening or dangerous stimuli (see Cacioppo and Gardner, 1999). The assumption of generally lower arousal of words in comparison to pictorial stimuli is well comprehensible considering the arbitrary nature of written language, which requires the translation of symbols into meaningful concepts. Interestingly, however, this assumption was not corroborated by arousal ratings collected in addition to the present ERP study, where words did not receive reduced arousal ratings in comparison to pictures or facial expressions. In our opinion, this finding warrants careful interpretation. Instead of assuming that words do actually hold the same potential to elicit arousal reactions as pictures or facial expressions, we tentatively suggest that arousal ratings for words might reflect different aspects of the arousal concept than in the case of pictorial stimuli. Arousal ratings for words might thus – to a higher degree than in the case of pictorial stimuli – reflect a mainly cognitive evaluation of an underlying concept (cf., Bayer et al., 2011). In contrast, pictures hold much more imminent information (e.g., about possible dangers emanating from a stimulus) and might thus enable a more realistic assessment of its actual arousal value, i.e., the potential of a stimulus to elicit arousal reactions, by accounting for a bodily aspect of arousal. Further evidence for this assumption arises from the present data when relating emotion effects in ERPs to arousal ratings within stimulus domains. In the case of pictures and facial expressions, arousal ratings accurately mirror emotion effects in ERPs, with higher arousal ratings for negative pictures or facial expressions as compared to neutral or positive stimuli. For words, however, there is less agreement between ratings and ERP effects. Although a clear bias for positive valence was evident in ERPs, arousal ratings revealed no significant differences between negative and positive words, and even showed numerically larger values for the negative stimulus category. Undoubtedly, future research will need to corroborate the assumption of systematic differences in arousal ratings between pictorial and symbolic stimuli, and should elucidate whether these assumed differences are influenced by the presentation mode of stimuli, that is whether they are presented block-wise (as in the present study) or fully randomized. Furthermore, it would require developing instruments able to capture different aspects of the arousal concepts, which then might also shed new light on ambiguous findings concerning the role of stimulus arousal in previous research.

As discussed above, emotional processing within words, pictures, and facial expressions exhibits a number of notable

differences concerning valence processing and task dependence. When considering these differences and their possible causes, it is unfeasible to neglect dissimilarities between stimulus domains themselves. As already mentioned, a major *processing* difference between written words and pictorial stimuli concerns the symbolic nature of the former stimulus class. In ERPs, this difference is reflected in EPN onsets, which were located at around 170– 190 ms for pictures and facial expressions, but only started at 250 ms for words, reflecting the increased time necessary to gain access to lexico-semantic information. Above that, stimuli of these domains differ notably in their basic physical features. At this point, it seems noteworthy to make a distinction between basic domain-specific physical features and emotion-specific features. Concerning the former, written language is usually comprised of highly similar symbols without variability in size or color. Likewise, faces (of a given ethnicity) exhibit a highly distinctive arrangement of features with rather small differences between individuals. In contrast, pictures of objects or scenes show a high variability concerning visual complexity, colorfulness, or scope. Since stimuli were presented in a naturalistic form in the present study and thus differed in size, color, and complexity between domains, these differences became obvious in GFP activations averaged over emotion conditions within each domain, where pictures elicited by far the largest activations. Given these fundamental differences in basic activations between stimulus domains, the present study avoided any analyses of emotion by domain interactions.

Apart from general physical differences, it is interesting to consider the level at which the distinction between emotional and neutral stimuli becomes manifest: Within written words and pictures, these distinctions are presumably related to the emotional *meaning* (given careful stimulus selection). In contrast, within facial expressions, distinctions between neutral and emotional expressions are *determined* by specific arrangements of facial features (as, for example, described by facial action units) and thus completely depend on differences in physical properties. Furthermore, as stated in the introduction, it was discussed that facial expressions of emotion comprise only indirect affective information (Walla and Panksepp, 2013), most likely depicting an emotional reaction toward a direct affective stimulus. As a consequence, facial expressions of emotion are usually classified using the concept of basic emotions, while pictures and words are most often described via two-dimensional constructs comprising valence and arousal. For these reasons, it seems impossible to realize an experimental design with fully matching semantic information across domains – while one can include a picture of a cat as well as the word "cat," it is impossible to select a matching facial expression2. Taken together, these points illustrate that domain-specific differences in physical properties and processing

<sup>2</sup>In the present study, we approached this problem by focusing solely on angry facial expressions, accepting that this would result in reduced emotional variability in comparison to negative pictures and words, since they were not pre-selected in this regard. On the other hand, angry facial expressions were supposed to be particularly well suited for activating the human fear system (see Schupp et al., 2004b), and were thus frequently used in previous research (e.g., Schupp et al., 2004b; Rellecke et al., 2011, 2012). Above that, recent research suggested that differences in EPN and LPC between (negative) emotion categories in face processing seem to be negligible (Recio et al., 2014).

requirements should be taken into account when interpreting similarities and differences between emotion effects in specific stimulus domains.

In summary, the present study compared emotion effects in ERPs elicited by words, pictures, and facial expressions. In order to maximize their comparability, stimuli were presented in withinsubject design using a task that ensured attentive processing with mostly identical demands on perceptual and cognitive resources across domains. Results showed that emotion effects in form of EPN and LPC occurred in all stimulus domains, but revealed pronounced differences in valence processing between stimulus domains. While emotion effects were limited to positive stimuli in the verbal domain, they were predominant for negative pictures and faces. In addition, words received generally higher valence ratings than pictures and facial expressions. Interestingly, assumed differences in arousal level between stimulus domains were not reflected in arousal ratings collected in the present study, possibly due to the involvement of different evaluative aspects in these ratings. Taken together, the present results point toward systematic differences in the processing of written words and pictures or facial expressions and thus advise caution in the interpretation and comparison of both results as well as underlying concepts across stimulus domains.

#### **ACKNOWLEDGMENTS**

The authors thank Werner Sommer for fruitful discussions on experimental design, Thomas Pinkpank, Ulrike Bunzenthal and Rainer Kniesche for technical support, and Ramona Kopp and Sibylla Brouër for help with data collection. We acknowledge support by the German Research Foundation and the Open Access Publication Funds of the Göttingen University.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 June 2014; accepted: 12 September 2014; published online: 06 October 2014.*

*Citation: Bayer M and Schacht A (2014) Event-related brain responses to emotional words, pictures, and faces – a cross-domain comparison. Front. Psychol. 5:1106. doi: 10.3389/fpsyg.2014.01106*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Bayer and Schacht. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# *Marina Palazova\**

*General Psychology and Neurocognitive Psychology, International Psychoanalytic University Berlin, Berlin, Germany \*Correspondence: marina.palazova@ipu-berlin.de*

#### *Edited by:*

*Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Germany*

#### *Reviewed by:*

*Aimee Mavratzakis, University of Newcastle, Australia*

**Keywords: emotions, visual word recognition, linguistic representation, emotional valence, lexical access**

Although emotional processing in words became a strong focus of research recently, less attention was given to the question of functional localization of emotion effects in the stream of visual word recognition directly. Here, the impact of emotional connotation of words on different processing stages of reading (pre-lexical, lexical, or semantic) is investigated. Or put alternatively: How is emotional valence represented within the linguistic representational system?

From a psycholinguistic perspective there are at least two types of linguistic representations which are central to visual word recognition. These are lexical and semantic representations. It is a challenging endeavor to define the term lexical: Whether low-level lexical representations (pure orthographic processing or the visual word form) should be differentiated from higher-level lexical representations (denoting, e.g., word frequency), is for example an open issue. Furthermore, orthographic processing may comprise sublexical processing on the level of letters and syllables, and lexical processing on the level of the complete word form. The term semantic commonly refers to the meaning of words, presumed as internally represented concepts made of smaller elements of meaning organized by semantic similarity.

In psycholinguistics separate lexical and semantic representations are presumed. Accordingly, most models of visual word recognition assume that lexical representations are retrieved (lexical access) after basic low-level visual perception of line forms and colors, which then culminate in activation of semantic knowledge. Models of word recognition differ with respect to their assumptions about discreteness of the processing stages and to mechanisms of accessing the lexical and semantic representations. While early models of visual word recognition postulated discrete processing stages (e.g., Forster, 1976) more recent computational approaches (e.g., Coltheart et al., 2001) assume interactive processing stages organized in a cascaded manner. To my knowledge, there is no single visual word recognition model though simulating both lexical and semantic effects. Thus, emotional valence seems an interesting factor, since there is an ongoing debate about whether it should be understood as a lexical or as a semantic factor. Insights into the linguistic representations related to emotional valence would deliver important implications for visual word recognition models in general.

For comparison of the time course of emotion effects and visual word recognition the prominent event-related potential (ERP) components in visual word processing should first be considered irrespective of emotion. Higher-level lexical representation effects (e.g., of word frequency) are observed already 100-ms post-stimulus. Since word frequency is broadly accepted to be a lexical factor, such modulations imply that lexical access is underway already starting in the time course of the P1 (Assadollahi and Pulvermüller, 2003; Hauk et al., 2006; Palazova et al., 2011). Earliest effects reported for semantic factors start at 160 ms (Hauk et al., 2012). Nevertheless, a more conservative view on word recognition postulates a timeline of 150 ms for pre-lexical and low-level lexical processing, at 250 ms for lexical and at 400 ms for semantic access (e.g., Grainger and Holcomb, 2009). Such results have some very important implications for the understanding of word recognition processes: (i) there seems to be a certain variability of onsets of separate linguistic processing stages in time, and (ii) the early effects may also indicate feedback mechanisms even on sublexical/low-level lexical processing stages (Carreiras et al., 2014). A current proposal is pointing to a possible key role of the ventral occipitotemporal cortex regarding feedback mechanisms in reading (Price and Devlin, 2011). Most models of word recognition, however, assume at least in very early processing stages a feedforward mechanism without any feedback from high-level to very early processing stages.

Dimensional models of emotion have a long tradition in psychology and are among the most influential theories of emotion processing. These models suggest two main dimensions that describe the emotional space – (i) emotional valence denotes whether a stimulus is being perceived and experienced as positive or negative, and (ii) arousal constitutes the intensity of the appraisal process. I will limit the article to discussion of valence effects which can be understood as the dimension that underlies the quality of emotional experience. Considering the time course of emotional valence effects three different components of the ERP were observed with words. Very early emotion effects have been observed in the time course of P1 (Bernat et al., 2001; Hofmann et al., 2009; Bayer et al., 2012) or N1 (Kissler and Herbert, 2013) presumably reflecting activation of visual cortex. Recently, also a temporal area, the left middle temporal gyrus (MTG), has been discussed as the neural source underlying emotional P1 modulations (Keuper et al., 2014). Earliest emotion effects have been observed already starting at 50 ms after stimulus onset in the C1 component, conceivably reflecting first responses in the primary visual cortex (Rellecke et al., 2011). The second eminently reported component to emotional words is the early posterior negativity (EPN), starting approximately 200 ms after stimulus onset (Kissler et al., 2007; Herbert et al., 2008; Schacht and Sommer, 2009; Palazova et al., 2011, 2013). The EPN is an augmented negativity to emotional stimuli as compared to neutral stimuli at occipito-temporal sites, which is seen to reflect attention allocation to intrinsically relevant stimuli involving an extended network of occipital, temporal, and parietal areas (Keuper et al., 2014). The late positive complex (LPC), the third emotional ERP component, has been observed from latencies of 350 ms and higher, and consists in increased centro-parietal positivity for emotional stimuli relatively to neutral ones. An LPC has often been found in studies with written words in tasks demanding higher level lexico-semantic processing (Herbert et al., 2006, 2011; Carretie et al., 2008; Kissler et al., 2009; Schacht and Sommer, 2009; Hinojosa et al., 2010).

The timing of the separate emotion components indicates impact of emotion on several word recognition stages. While the time course of very early emotion effects seems too early to reflect fully accessed word meaning, the time course of the EPN does not allow for such a clear conclusion. A comparison of the time course of the EPN and lexical and semantic stages of visual word recognition alone does not deliver much insight into the underlying functional mechanisms. According to the described time-course of the EPN in visual word recognition, both a lexical and a semantic locus would be conceivable. Considering the conservative view on word recognition the EPN would on the one hand fully coincide with lexical processing stages from 200 ms onwards, on the other hand evidence speaking for semantic access already before 200-ms post-stimulus would indicate a semantic functional locus of emotional valence effects.

#### **ARE EFFECTS OF EMOTIONS SEMANTIC IN NATURE?**

There is clear evidence in favor of a semantic locus of emotion effects. By now a differentiated picture of results has emerged for the EPN component. As mentioned, the EPN time course is not sufficient to distinguish whether emotion effects can be semantic, that is whether these effects are a consequence of retrieved semantic representations as proposed by Kissler et al. (2007). A lexical locus may also be conceivable since a component related to semantic processing as the N400 peaks later than the latency of the EPN. Furthermore, according to the conservative view lexical processing is underway coinciding with the time course of the EPN.

Another possibility to address this question is to orthogonally combine emotional valence with other factors that are either lexical or semantic and track their interactions in accordance to the additive factor method (Sternberg, 2011). Palazova et al. (2013) followed this logic and examined the time course of emotion effects within concrete and abstract words. Word concreteness is a semantic factor which refers to whether the correspondence of a mental concept in reality can be perceived by the senses or not, and has been observed to alter response times and late components in the ERP as the N400. Importantly, emotion effects interacted with concreteness within the EPN with concrete words eliciting earlier EPN than abstract words. In the same line of arguments, Palazova et al. (2011) combined orthogonally emotional valence with word frequency, a factor that is broadly accepted to be lexical of nature. In contrast to the emotion concreteness interaction, no interactions of the factors emotion and frequency were observed for the EPN. Simultaneously long lasting main effects of both factors were observed. These two studies together deliver direct evidence for a semantic functional locus of processes reflected in the EPN. That is the presumed increased attention evoking the EPN depends on retrieval of semantic meaning of the words. Considering the fact that the LPC is generally observed after the EPN and interpreted as elaborate processing of emotional connotation, the LPC can be congruously interpreted as based on the retrieved meaning of emotional words.

#### **ARE EFFECTS OF EMOTIONS LEXICAL IN NATURE?**

The very early emotion effects in words cannot be easily explained with the semantic locus hypothesis and have generated much debate. Two hypotheses were established to explain why and how they do emerge. First, very early emotion effects have been interpreted as a marker for facilitated and accelerated lexical access of emotional compared with neutral words (Hofmann et al., 2009). The second refers to the idea that very early emotion effects can be explained by conditioned responses to word form of emotional connotation (Palazova et al., 2011); please see also Keuper et al. (2014) for a related account on very early emotion effects at the level of lexical processing.

Speeded lexical access is conceivable since response times are shorter and lexicality effects (the first ERP difference between words and pseudowords) exhibit a shorter latency to emotional than to neutral words (Kissler and Herbert, 2013). The underlying mechanisms are still elusive. Importantly, however, the idea of speeded lexical access and the conditioning hypothesis are not mutual exclusive – it could be possible that conditioned responses to emotional words account for facilitated retrieval of (sub-)lexical representations. The question, which is not answered yet, is on which level of lexical representation exactly emotion exerts its influence. First, speeded lexical access may depend on emotion as a part of the lexical representation, which in analogy to word frequency would facilitate retrieval of higher-level lexical representations. An alternative would be feedback processing from fast accessed semantic representations of words. Emotional valence may be the first retrieved semantic feature of a word (Palazova et al., 2013), and therefore may exert facilitating feedback influence on the lexical level without emotion being represented as a part of lexical representations. This alternative seems less plausible, since the arguments for a lexical locus overweigh those against it: very early emotion effects in the C1 and P1 seem too early to reflect feedback processing from semantic processing stages, and early interactions with word frequency are a direct indication of higher-level lexical representations. On the other hand, as recently shown by Keuper et al. (2014), MTG involvement in P1 effects points to lower-level lexical representation, i.e., the visual word form. The earliest observed emotion effects (starting already 50-ms post-stimulus) and variability across observed very early emotion effects (only to negative words: Hofmann et al., 2009; or to positive Palazova et al., 2011; Bayer et al., 2012; or both Keuper et al., 2014) would even indicate the sublexical level on the basis of syllables. That is, sublexical entities, e.g., prefixes may serve as conditional cues for emotional valence information. In morphologically rich languages such as German it is conceivable that some prefixes would carry some valence information in case they are more frequently related to negative or to positive than to neutral words. The exact level of linguistic representation is still an open question and would need future research.

Taken together, it can be assumed that emotional valence is a semantic feature, possibly the first semantic feature to be retrieved from semantic memory when reading words. A growing body of evidence is pointing to a second possible locus of emotion in the lexical linguistic representations. The exact level of lexical representation and the underpinning learning mechanisms are open issues. The conclusion that emotional valence impacts word recognition on multiple stages and might be both part of lexical and of semantic representations is pinpointing future challenges for models of visual word recognition, that is, first, the need for integration of models that either have a focus on lexical or on semantic processing, and second, the integration and prediction of word dimensions like emotion within such models.

#### **REFERENCES**


'hate' differ from 'sleep': using combined electro/magnetoencephalographic data to reveal the sources of early cortical responses to emotional words. *Hum. Brain Mapp.* 35, 875–888. doi: 10.1002/hbm.22220


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 01 April 2014; accepted: 12 September 2014; published online: 30 September 2014.*

*Citation: Palazova M (2014) Where are emotions in words? Functional localization of valence effects in visual word recognition. Front. Psychol. 5:1105. doi: 10.3389/ fpsyg.2014.01105*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Palazova. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Neural correlates of an early attentional capture by positive distractor words

# *José A. Hinojosa1,2 \*, Francisco Mercado3 , Jacobo Albert 1,4 , Paloma Barjola3 , Irene Peláez <sup>3</sup> , Cristina Villalba-García1 and Luis Carretié4*

<sup>1</sup> Instituto Pluridisciplinar, Universidad Complutense de Madrid, Madrid, Spain

<sup>2</sup> Facultad de Psicología, Universidad Complutense de Madrid, Madrid, Spain

<sup>3</sup> Facultad de Ciencias de la Salud, Universidad Rey Juan Carlos, Madrid, Spain

<sup>4</sup> Facultad de Psicología, Universidad Autónoma de Madrid, Madrid, Spain

#### *Edited by:*

Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Tuebingen, Germany

#### *Reviewed by:*

Lars Kuchinke, Ruhr Universität Bochum, Germany Johanna Maria Kissler, University of Bielefeld, Germany

#### *\*Correspondence:*

José A. Hinojosa, Instituto Pluridisciplinar, Universidad Complutense de Madrid, Paseo Juan XXIII, 1, Madrid 28040, Spain e-mail: hinojosa@pluri.ucm.es

Exogenous or automatic attention to emotional distractors has been observed for emotional scenes and faces. In the language domain, however, automatic attention capture by emotional words has been scarcely investigated. In the current event-related potentials study we explored distractor effects elicited by positive, negative and neutral words in a concurrent but distinct target distractor paradigm. Specifically, participants performed a digit categorization task in which task-irrelevant words were flanked by numbers. The results of both temporo-spatial principal component and source location analyses revealed the existence of early distractor effects that were specifically triggered by positive words. At the scalp level, task-irrelevant positive compared to neutral and negative words elicited larger amplitudes in an anterior negative component that peaked around 120 ms. Also, at the voxel level, positive distractor words increased activity in orbitofrontal regions compared to negative words. These results suggest that positive distractor words quickly and automatically capture attentional resources diverting them from the task where attention was voluntarily directed.

**Keywords: emotion, positive distractors, anterior N1, word processing, event-related potentials**

#### **INTRODUCTION**

In order to maintain coherent behavior in a continuously changing environment, attentional processes are controlled endogenously to allow for keeping goal-directed behaviors in spite of distracting events. At the same time, organisms need to be able to effectively process novel, unexpected events, that could be either advantageous or dangerous, so as to ensure appropriate responses with either approach or avoidance behavior (Egeth and Yantis, 1997; Chica et al., 2013). The mechanism that is able to detect the appearance of these new events is called exogenous attention (also referred to as bottom-up, involuntary or stimulus-driven attention). It may be described as an adaptive mechanism for the rapid detection and processing of biologically relevant events, even when individuals are engaged in a resource-consuming task (Carretié, 2014). Exogenous shifts are reflexive, with attention being automatically pulled by external stimulation. According to different theoretical views (see Yantis, 2000 for a review), exogenous attention involves several processes such as the spatial automatic orientation of processing resources toward those events that deserve further processing (Sokolov, 1963; Graham and Hackley, 1991; Corbetta and Shulman, 2002; Posner et al., 2007), or the modulation of perceptual neural mechanisms that potentiate the processing of those stimuli capturing attention (Serences and Yantis, 2007; Asplund et al., 2010).

The results of several event-related potential (ERP) studies have revealed that some components may be related to distinct mechanisms involved in exogenous attention. In this sense, an anterior N1component peaking around 100 ms has been associated with

an attentional mechanism of the prefrontal cortex, which directs attention and generates a bias signal that either enhances or suppresses sensory representations in visual pathways (Hillyard and Anllo-Vento, 1998; Barceló et al., 2000; Di Russo et al., 2003). Perceptual potentiation seems to be reflected by modulations in posterior P1and N1 components peaking around 100 and 150 ms, respectively, (Hillyard et al., 1998; Vogel and Luck, 2000; Di Russo et al., 2005; Natale et al., 2006). Several dorsal and ventral brain areas in the frontal and parietal cortex have been proposed to subserve attentional networks implicated in exogenous attention (see Corbetta and Shulman, 2002, and Corbetta et al., 2008 for reviews). These regions seem also to exert a modulatory control over the activity of occipital visual cortices (Kastner et al., 1998; Brefczynski and DeYoe, 1999).

Emotional stimuli are particularly relevant for an organism's survival. Indeed, enhanced shifts in attention from the target stimulus toward competing emotional as compared to neutral faces or scenes presented as distractors are consistently observed (see Carretié, 2014, and Pourtois et al., 2013, for reviews). Capture of exogenous attention by emotional distractors increases reaction times and/or errors (e.g., Schimmack and Derryberry, 2005; Hodsoll et al., 2011). Also, depending on task demands and the current stimuli used, modulations by emotional compared to neutral distractors affect relatively early and/or late ERP components, including the P1 and the N1, as well as the so-called early posterior negativity (EPN), the P2 and the Late Positive Component late positive component (LPC; e.g., digit-categorization tasks: Carretié et al., 2009; perceptual

discrimination tasks: Doallo et al., 2006; De Cesarei et al., 2009; Pourtois et al., 2010). Finally, studies providing spatial information on brain activity have revealed the involvement of visual cortices and fronto-parietal attentional networks in the processing of task-irrelevant emotional stimuli (e.g., Vuilleumier et al., 2001; Pessoa and Ungerleider, 2004; Mitchell et al., 2007; Carretié et al., 2012).

Correlates of exogenous attentional capture by emotional taskirrelevant stimuli have been also observed in the language domain, although the mechanisms that operate for the processing of verbal distractors have been much less explored than for pictorial materials. Several studies have used experimental paradigms in which targets and emotional distractor words were not concurrent in time, such as the dot probe task (e.g., MacLeod et al., 1986), affective variants of the cue-target paradigm (e.g., Stormark et al., 1995; Amir et al., 2003), or the attentional blink paradigm (e.g., Keil and Ihssen, 2004; Arnell et al., 2007). Overall, these studies have provided important information on exogenous attention processes, which suggests that emotional verbal distractors elicited an involuntary capture of attention. However, some limitations have been noted since orienting toward and disengaging from a stimulus are processes that may be difficult to differentiate in these paradigms (Salemink et al., 2007; Cisler et al., 2009). Similar concerns have been raised about using tasks in which emotional distractors and targets engaging voluntary attention are not physically segregated. Examples of these paradigms are those exploring the emotional Stroop effect (e.g., McKenna and Sharma, 1995; Thomas et al., 2007; González-Villar et al., 2014), those using affective lexical decision tasks (Hofmann et al., 2009; Hinojosa et al., 2010b), or those where specific non-emotional aspects (e.g., letter font detection) of words have to be identified (e.g., Schacht and Sommer, 2009a; Hinojosa et al., 2014). It has been claimed that these tasks may not trigger some of the processes involved in exogenous attention, such as spatial reorienting mechanisms (Carretié, 2014). Indeed, they have been most commonly used to explore lexical or conflict-related processes rather than exogenous attention.

A different source of evidence comes from studies using *concurrent but distinct target distractor paradigms* (CDTD) or *directed attention tasks* (MacNamara et al., 2013; Carretié, 2014). In these tasks, elements on the screen to which voluntary attention must be directed to perform a task (targets) and elements that are taskirrelevant (distractors) appear at the same time but are physically segregated. The use of CDTD tasks may be a suitable tool to explore exogenous attention mechanisms since both orienting of attention and sensory enhancement processes seem to be operating in these paradigms (Carretié, 2014). To the best of our knowledge, however, only three studies have compared the processes triggered by emotional and neutral distractor words with CDTD tasks (see also Rampone et al., 2014, who did not include neutral distractor words). The results of these studies suggest that emotional words capture attention to a lesser extent than do scenes or faces (Carretié, 2014), which are in line with reports showing differences in the processing of emotional pictorial and verbal stimuli (Hinojosa et al.,2009; Schacht and Sommer,2009b; Frühholz et al., 2011; Schlochtermeier et al., 2013). In this sense, Harris and Pashler (2004) found slowed reaction times to negative distractors only after the first presentation of task-irrelevant words using a digit categorization task. Also with this paradigm, Aquino and Arnell (2007)reported increased reaction times to sexually explicit distractors compared to neutral words, but not between threatening or school-related items and neutral words. Finally, Trauer et al. (2012) used a visual foreground perceptual task to investigate distraction effects by emotional words on steady-state visual evoked potentials (SSVEPs). Behavioral data and SSVEP amplitudes showed no differences regardless of the emotional content of distractor words, which was taken to suggest an absence of attentional modulation in early visual areas. Lexico-semantic effects in middle and late latency ERP components were also explored. The authors found enhanced amplitudes in the P2 and N400 components to negative task-irrelevant words and concluded that emotional distractor words captured lexico-semantic processing resources.

The heterogeneity of the findings suggests that more studies are needed in order to clarify how the distinct processes involved in exogenous attention modulate the processing of task-irrelevant emotional words. In this sense, compared with behavioral measures, ERPs make it possible to determine which stages are being affected by a specific experimental manipulation. Another advantage over behavioral methods is that they can provide a measure of processing stimuli even when there is no behavioral change. In the only prior ERP study, Trauer et al. (2012) focused their ERP analyses on the stage of elaborated meaning evaluation - P2, N400, and LPC components- due to some limitations of the SSVEP procedures to explore early latency components. Thus, the involvement of orienting mechanisms and/or enhanced sensory processing that occur at early attentional processing stages remains still unexplored with ERPs. The present study sought to clarify the mechanisms involved in exogenous attention to verbal stimuli. To this end, emotional and neutral words were presented as distractors while participants carried out a demanding digit categorization CDTD task. We expected effects to arise in those ERP components that have been associated with the automatic orientation of processing resources and/or the modulation of perceptual neural mechanisms in prior literature, namely the P1 and the N1 (Hillyard et al., 1998; Di Russo et al., 2005). Additionally, we examined those components - the P2, the EPN, the N400, and the LPCthat have been modulated by emotional content in word processing studies with a variety of experimental paradigms including lexical decision tasks (Kanske and Kotz, 2007; Scott et al., 2009; Méndez-Bértolo et al., 2011), silent reading (Kissler et al., 2007, 2009; Herbert et al., 2008), structural decision tasks (i.e., identification of italicized letters, Schacht and Sommer, 2009a), or grammatical decision tasks (i.e., counting of nouns or adjectives, Kissler et al., 2009).

As a second goal, we explored the neural origin of exogenous attention to emotional distractor words, a question that has not been addressed in previous research. To this aim, source location analyses were performed using exact low resolution brain electromagnetic tomography (eLORETA; Pascual-Marqui, 2007). According to previous literature, activation of those brain regions underlying attentional networks and emotional processing was hypothesized, namely frontal, parietal and/or extrastriate visual cortices (Vuilleumier, 2005).

#### **MATERIALS AND METHODS PARTICIPANTS**

Thirty undergraduate students (23 females and 7 males) from the *Universidad Rey Juan Carlos*, with an age range between 18 and 26 (mean = 18.96, SD = 1.92), participated in this experiment. Participants were native speaker of Spanish and right-handed, as assessed with the Edinburgh Handedness Inventory (Oldfield, 1971): LQ > +72. All subjects gave written informed consent and reported normal or corrected-to-normal visual acuity. The study was approved by the Ethics Committee of the *Universidad Rey Juan Carlos*.

#### **STIMULI AND PROCEDURE**

Three types of distractor words were presented to participants in a digit categorization task: negative, positive and neutral words. The complete set of verbal stimuli consisted of 150 Spanish nouns (50 per emotional category). These words were selected from a pilot study that comprised 720 nouns. In this study, 45 individuals (different from those participating in the current study) rated valence, arousal, and the level of concreteness of each word on a 9-point Likert scale (for a detailed description of the pilot study see Hinojosa et al., 2009). Equal numbers of negative, positive and neutral distractor words were selected according to several criteria that were contrasted with analyses of variance (ANOVAs; see **Table 1**): (a) negative and positive words were matched in arousal rating but both differed from neutral words; (b) negative, positive and neutral nouns differed in valence ratings; (c) all nouns had similar concreteness, word length and frequency of use (Alameda and Cuetos, 1995). **Table 1** summarizes mean values in arousal, valence and concreteness for nouns, as well as mean word frequency and word length.

Participants sat in an electrically and acoustically isolated room in a comfortable chair. The stimuli were presented on a computer monitor that was positioned at eye level about 60 cm in front of the participant. Words were presented in lower case letters at fixation with digits in the left and the right periphery (10<sup>o</sup> eccentricity). The size of all words ranged between 7.64 and 2.86◦ (width) × 0.95◦ (height). Only digits from 2 to 8 were used (0.95◦ height). Words and digits appeared in black against a light gray background. The sequence of events in each trial is represented in **Figure 1**. First, a fixation cross appeared in the center of the screen and remained there for 500 ms. This fixation cross was followed by a blank screen interval of 300 ms and then words flanked by

the two digits were presented for 150 ms and were followed by a 1700 ms blank interval. The intertrial interval was 2650 ms.

As indicated, participants performed a digit categorization task. They were told to press, 'as accurately and rapidly as possible,' one key of a response device if both digits were either even or odd (i.e., if they were 'concordant'), and a different key if one digit was even and the other was odd (i.e., if they were 'discordant'). In half of the trials digits were concordant whereas they were discordant in the other half. The same combination of digits was repeated across emotional conditions in order to ensure that task demands were identical in trials with negative, positive and neutral distractors. The order of presentation of the 150 trials (50 trials for each of the three emotional categories) was pseudorandomized so no more than three consecutive trials of the same emotional or numerical category appeared consecutively. Stimuli were presented in two runs of 75 stimuli with a brief resting period between them. Participants were requested to avoid blinking as much as they could. A training block of nine trials was provided at the beginning of the session to familiarize participants with the task.

#### **EEG RECORDING AND PRE-PROCESSING**

Continuous electroencephalographic (EEG) activity was recorded using an electrode cap (ElectroCap International) with 60 homogeneously distributed scalp electrodes. All electrodes were referenced to the linked mastoids. Electrooculographic (EOG) data were recorded supra- and infraorbitally (vertical EOG), as well as from the left versus right orbital rim (horizontal EOG). Electrode impedances were kept below 5 k-. An online bandpass filter from 0.1 to 40 KHz was used (3 dB points for -6 dB/octave rolloff), and digitization sampling rate was set to 250 Hz. Off-line pre-processing was performed using Brain Vision Analyzer software (Brain Products). The continuous EEG recording was divided into 1000-ms epochs for each trial, beginning 200 ms before stimulus onset. Baseline correction was made using the 200-ms period prior to the onset of stimulus. Trials in which subjects responded erroneously or did not respond were eliminated. EOG-artifact removal was carried out following the procedure described by Gratton et al. (1983). A careful EEG visual inspection was then performed in which epochs with artifacts were eliminated from further analyses. This artifact and error rejection procedure led to an average admission of 86.6% positive, 90% neutral, and 91.8% negative trials. The ERP averages were categorized according to each distractor category (negative, neutral, and positive).

**Table 1 | Means and SD of valence (1 highly unpleasant, 9 highly pleasant), arousal (1 highly calming, 9 highly arousing), concreteness (1 highly abstract, 9 highly concrete), frequency of use (per one million), number of syllables, and number of letters.**


d.f. = 2,98; n.s., non-significant, \*p < 0.001.

# **DATA ANALYSIS**

#### *Behavioral analysis*

Mean reaction times (RTs) of correct responses and error rates (omissions and commissions) were analyzed. Repeated-measures ANOVAs on each measure were carried out with respect to Distractor type (three levels: negative, neutral, and positive). The Greenhouse–Geisser epsilon correction was applied when the assumption of sphericity was violated. *Post hoc* pairwise comparisons were two-tailed, paired-samples *t-tests* with Bonferroni correction for multiple comparisons. As a measure of effect size, partial η -square (η<sup>2</sup> p) is reported for significant effects.

#### *ERP analysis*

Detection and quantification of ERP components was carried out through covariance-matrix-based temporal principal component analysis (tPCA). All analyses were performed using IBM SPSS v20. The main advantage of tPCA over traditional procedures based on visual inspection of recordings and on'temporal windows of interest' is that it presents each ERP component separately and with its 'clean' shape, extracting and quantifying it free of the influences of adjacent or subjacent components (Chapman and McCrary, 1995; Dien and Frishkoff, 2005). Indeed, the waveform recorded at a site on the head over a period of several 100 ms represents a complex superposition of different overlapping electrical potentials. Such recordings can stymie visual inspection. In brief, tPCA computes the covariance between all ERP time points, which tends to be high between those time points involved in the same component, and low between those belonging to different components. The solution is therefore a set of independent factors made up of highly covarying time points, which ideally correspond to ERP components. *Temporal factor scores*, the tPCA-derived parameter in which extracted temporal factors (TFs) may be quantified, is linearly related to amplitude. In the present study, the decision on the number of components to select was based on the scree test (Cattell, 1966). Extracted components were submitted to Promax rotation, as recommended (Dien, 2010, 2012).

Given that signal overlapping may occur also at the space domain, we performed subsequent spatial PCAs on every temporal factor. At any given time point, several neural processes (and hence, several electrical signals) may concur, and the recording at any scalp location at that moment is the electrical balance of these different neural processes. While temporal PCA "separates" ERP components along time, spatial PCA (sPCA) separates ERP components along space, each spatial factor ideally reflecting one of the concurrent neural processes underlying each temporal factor. Additionally, sPCA provides a reliable division of scalp into different recording regions, an advisable strategy prior to statistical contrasts, since ERP components frequently behave differently in some scalp areas than in others (e.g., they present opposite polarity or react differently to experimental manipulations). This method of analysis is reference-independent since the configuration of the scalp topography is independent of the reference electrode position (Pourtois et al., 2008). Basically, each region or spatial factor is formed with the scalp points where recordings tend to covary. As a result, the shape of the sPCAconfigured regions is functionally based, and scarcely resembles the shape of the geometrically configured regions defined by traditional procedures. Moreover, each spatial factor can be quantified through the *spatial factor score*, a single parameter that reflects the amplitude of the whole spatial factor. Also in this case, the decision on the number of factors to select was based on the scree test, and extracted factors were submitted to Promax rotation.

Finally, repeated-measures ANOVAs on temporospatial factor scores were carried out with respect to Distractor type (three levels: negative, neutral, and positive). The Greenhouse–Geisser epsilon correction was applied when the assumption of sphericity was violated, and *post hoc* pairwise comparisons were two-tailed, paired-samples *t*-tests with Bonferroni correction for multiple comparisons. Effect sizes were also reported using the partial η -square (η<sup>2</sup> p) method.

#### *Source localization analysis*

In order to three-dimensionally locate the cortical regions that were sensitive to the experimental effects observed at the scalp level, exact low-resolution brain electromagnetic tomography (eLORETA; Pascual-Marqui, 2007; Pascual-Marqui et al., 2011) was applied to relevant temporal factor scores. eLORETA is a 3D, discrete linear solution for the EEG inverse problem, which provides inverse solutions that are reference-independent (Pascual-Marqui et al., 2011; Michel and Murray, 2012). Although, in general, solutions provided by EEG-based source-location algorithms should be interpreted with caution due to their potential error margins, LORETA solutions have shown significant correspondence with those provided by hemodynamic procedures in the same tasks (Dierks et al., 2000;Vitacco et al., 2002; Mulert et al., 2004). Moreover, the use of tPCA-derived factor scores instead of direct voltages (which leads to more accurate source-localization analyses: Dien et al., 2003, 2004; Carretié et al., 2004), contribute to reducing this error margin. In its current version, eLORETA computes the current density at each of 6239 voxels mainly located in the cortical gray matter of the digitized Montreal Neurological Institute (MNI) standard brain.

Specifically, three-dimensional current–density estimates for relevant temporal factor scores were computed for each participant and each experimental condition. Subsequently, the voxel-based whole-brain eLORETA-images (6239 voxels) were compared between conditions using the non-parametric mapping (SnPM) tool, as implemented in the sLORETA/eLORETA software package. As explained by Nichols and Holmes (2002), the non-parametric methodology inherently avoids multiple comparison-derived problems and does not require any assumption of normality. Voxels that showed significant differences between conditions (log-F-ratio statistic, two-tailed corrected *p* < 0.05) were located in anatomical regions and Brodmann areas (BAs).

#### **RESULTS**

#### **BEHAVIORAL RESULTS**

Average values for RTs, omission and commission error rates to each emotional word category are shown in the **Table 2**. Three repeated-measures ANOVAs were conducted on RTs, omission and commission error rates including Distractor type as a factor Although RTs for positive distractor trials were slower than for the rest of trials, statistical analyses did not reach significance [*F*(2,58) = 0.883, *p* = 0.372]. Also, no significant results were found for error rates [*F*(2,58) = 1.715, *p* = 0.191, for omissions, and *F*(2,58) = 1.359, *p* = 0.265 for commissions].

#### **ERP RESULTS**

**Figure 2** shows a selection of grand averages once the baseline value (prestimulus recording) was subtracted from each ERP. As described later, experimental effects were observed at around 120 ms (N1) over anterior electrode positions (see F1 and F2 locations). **Figure 3** represents the topographic distribution of this effect.

As a consequence of the application of the tPCA, several TFs were extracted from the ERPs (see **Figure 4**). Factor peak-latency and topography characteristics revealed TF8 as the component being associated with both posterior P1 and anterior N1, which typically overlap in time (Di Russo et al., 2003). Indeed, tPCA revealed that the two components were evoked at the same latency (peaking at 120 ms). However, differential characteristics of the posterior P1and the anterior N1 were patent both at the polarity and the scalp topography (as described later, see also **Figure 3**). Furthermore, TF7 (peaking at 140 ms), TF5 (peaking at 192 ms), TF6 (peaking at 270 ms), and TF2 (peaking at 380 ms) were related to posterior N1, P2, EPN, and N400 components, respectively. Finally, the LPC was decomposed in two centroparietal factors: TF9, peaking at 525 ms, and TF1, peaking at 730 ms.

As can be observed in **Table 3**, the sPCA decomposed TF8 in one anteriorly distributed factor (corresponding to the anterior N1) and two factors with posterior distributions (corresponding to the P1). Also, sPCA extracted three spatial factors for each of the remaining TFs. Therefore, the temporospatial PCA yielded a total of 24 factor combinations (three spatial factors extracted for each of 8 TFs).

Repeated-measures ANOVAs on these temporospatial factors with respect to Distractor type (three levels: negative, neutral, and positive) were carried out as previously described. **Table 3** provides the statistical details of these analyses. As can be appreciated, the effect of Distractor type was only significant for the anterior N1. *Post hoc* tests with Bonferroni correction for multiple comparisons showed enhanced anterior N1 amplitudes for positive compared to neutral and negative distractor words (*p*s < 0.05). The anterior N1 amplitude did not differ between neutral and negative distractor words (*p* = 1). As **Table 3** shows, no significant effects were found on other ERP components.

#### **SOURCE LOCALIZATION RESULTS**

The last analytic step consisted of three-dimensionally localizing the cortical regions that were responsible for the differences observed in the anterior N1. To achieve this, N1 temporal factor scores of each subject, electrode, and condition were submitted to eLORETA. Then, the voxel-based whole brain eLORETA-images (6239 voxels) were compared between conditions using the SnPM approach. N1-related activation in response to positive distractor words was associated with enhanced activity compared to negative distractor words in several voxels. As illustrated in **Figure 5**, these voxels were located in the orbitofrontal cortex (OFC; peak MNI coordinates: *X* = 45, *Y* = 55, *Z* = −5; BAs 11/10/47). Activation differences between positive and neutral distractor words did not reach significance. Consistent with results from scalp ERPs, no activation differences were found between neutral and negative distractor words in any voxel.

#### **DISCUSSION**

In the current study we investigated the processing of emotional distractor words while participants performed a digit categorization task. In line with previous studies using CDTD tasks, we did not observe any sign of attentional capture by emotional distractor words in behavioral measures. In this direction, Trauer et al. (2012) failed to report behavioral indices that evidenced the interference of emotional word content with a perceptual foreground task. Weak effects were found in other studies. In particular, delayed reaction times for emotional with respect to neutral distractor words have been reported only after the first

**Table 2 | Means and SD (in parenthesis) of reaction times (RTs) and errors rates (commission/omission) to each word category (positive, negative, and neutral).**


occurrence of a negative word (Harris and Pashler, 2004), or for sexually explicit words (Aquino and Arnell, 2007). Since behavioral correlates of attentional capture by task-irrelevant emotional pictures and faces are usually observed (e.g., Vuilleumier et al., 2001; MacNamara and Hajcak, 2009; Calvo and Nummenmaa, 2011; Carretié et al., 2013b; but see Holmes et al., 2006; Carretié et al., 2013a), our data fits well with the idea that word distractors may be able to interrupt ongoing processing to a lesser extent than pictorial distractors (Carretié, 2014). Nonetheless, it should be

remarked that behavioral measures are the final single output of a large set of neural processes that may not be always convergent. Notably, one advantage of using ERPs is that the components can be examined in the absence of an overt behavioral response (Luck, 2005). Indeed, current results corroborate the greatest sensitivity of ERPs to the effects of certain experimental manipulations. In this respect, neural results clearly showed that the emotional content of the distractor words modulated processing-resources devoted to a primary ongoing task, as suggested both by scalp and source-location data. In particular, positive distractor nouns compared to both neutral and negative distractor words were associated with enhanced amplitudes in an anteriorly distributed negative component peaking around 120 ms. Activity in the OFC was identified as the neural origin of this scalp-recorded component. Latency, amplitude and source-location analyses suggest that this component would be associated with attentional capture by positive distractor words. These results will be discussed in detail bellow.

As indicated in the Results section, a wave peaking around 120 ms after trial presentation was subdivided into two components by spatial principal component analyses. A posterior P1 deflection showed no amplitude differences between neutral and emotional distractors. Interestingly, however, positive distractor words elicited larger anterior N1 amplitudes than both negative and neutral task-irrelevant words. Similar modulations in a frontal N1 component for emotional task-irrelevant pictures have been recently found when participants' attention was engaged in a counting task (Zhang et al., 2014). Prior studies linked this component to involuntary orientation of attention to relevant stimuli (Luck et al., 1993; Di Russo et al., 2003). Specifically, it has been suggested that the anterior N1 may reflect a prefrontal attentional



TF, temporal factor; SF, spatial factor; d.f., degrees of freedom.

mechanism that regulates sensory processing in visual cortices (Barceló et al., 2000; Pérez-Edgar and Fox, 2003).

The neural origin of our anterior N1, which seems to be generated in the OFC (BAs 11/10/47), argues in favor of the involvement of this region in attentional capture by positive distractor words. The OFC has been critically implicated in both the modulation of emotion and attentional control (Vuilleumier, 2005; Domínguez-Borrás and Vuilleumier, 2013). Neuroanatomical studies indicate that the OFC is reciprocally connected with the amygdala and extensive areas of prefrontal, motor and sensory cortices (Pandya and Yeterian, 1996; Cavada et al., 2000; Rolls, 2000). Specifically, it has been suggested that early activation of the OFC would modulate sensory cortices via direct feedback or indirect projections to attention and object-recognition systems in prefrontal, parietal and temporal cortices (Amaral et al., 2003; Vuilleumier, 2005). In particular, Bar et al. (2006) reported that object recognition elicited activity in the OFC around 130 ms and 50 ms before it developed in recognition-related fusiform regions. Also, activations to emotional cues in this prefrontal region have been reported around 120 ms, using intracranial (Kawasaki et al., 2001) and scalp recordings (Pourtois et al., 2004). In line with these findings, a recent proposal postulates that the medial part of the OFC is involved in the generation of affective predictions that initiate appropriate reactions to visual information, whereas the lateral regions of the OFC seems to be implicated in computing and sending predictions about the identity of visual stimuli to the visual system (Chaumon et al., 2014). Interestingly, enhanced activity in the OFC while exogenous attention is directed to task-irrelevant emotional pictures and faces has been previously reported (Vuilleumier et al., 2001; Bishop et al., 2004; Zhang et al., 2014). Thus, our current finding provides additional evidence supporting the implication of the OFC in exogenous attention to emotional verbal distractors, which may be triggered in part by the activation of predictive mechanism involving the processing of affective and identity-related information.

The selective enhancement of detection sensitivity to positive distractor words deserves further consideration. This finding agrees with the results of a growing body of research indicating that the OFC is a key structure in the neural circuitry of positive emotions and the processing of reward (Rolls, 2000; Burgdorf and Panksepp, 2006). In this direction, activation of the OFC has been found when mothers viewed pictures of their own compared to unfamiliar children (Nitschke et al., 2004), when participants received financial reward in a gambling task (Elliott et al., 2000), or when pleasant taste stimuli were delivered to participants (O'Doherty et al., 2001). Also, patients with OFC lesions responded faster to targets subsequent to positive distractors in a lateralized visual discrimination task (Hartikainen et al., 2012). Crucially, the results of anfMRI study by Lewis et al. (2007)showed a selective role of the OFC in the processing of valence during word processing. Thus, our data suggest activation in OFC seems to underlay selective attention to positive word distractors in CDTD tasks. Furthermore, the present results can be interpreted in terms of the positivity offset. This represents a tendency from the positive motivational system to respond more than the negative emotional system to comparably low levels of evaluative input, which seems to be the case of the processing of word distractors (Cacioppo et al., 1997; Ito and Cacioppo, 2005). Indeed, there is recent evidence indicating that as early as in the 80–120 ms time interval, the processing of positive and negative words implicates neural activity in

different networks. Specifically, the processing of positive words was associated with activations in language and attention-related regions in left temporal, frontal and visual association cortices, whereas negative words activated the anterior cingulate cortex (Keuper et al., 2013). These effects were interpreted in terms of an "emotional tagging" of word forms associated to different processing strategies developed during language acquisition. These strategies include enhanced lexical processing of positive words and a fast language-independent alert response to negative words (Keuper et al., 2013). In agreement with this view, several studies reported valence-dependent effects at different processing stages that show facilitated lexical processing for positive words with both behavioral and ERPs measures (Kissler and Koessler, 2011; Kuchinke and Lux, 2012; Kissler and Herbert, 2013). This processing advantage has been linked to the orbitofrontal reward system (Kuchinke and Lux, 2012).

Resembling current findings, increased attentional capture by positive distractor compared to negative and neutral taskirrelevant words has been observed in a prior study with a similar digit categorization task (Aquino andArnell,2007), whereas effects for negative distractor words have been reported when participants carried out a perceptual primary task (Trauer et al., 2012). Following the proposal made by Keuper et al. (2013), it may be speculated that processing requirements imposed by the primary task may determine valence-dependent effects elicited by distractor words. In this sense, the processing of positive distractor words would be more evident in tasks demanding conceptual analysis to some extend (as in the current and in Aquino and Arnell's studies), given the greater implication of lexico-semantic processing in digit categorization tasks (see below). In contrast, activity associated with the processing of negative distractor words would be preferentially observed with primary tasks that do not require conceptual processing (e.g., the perceptual task used by Trauer et al., 2012) since the processing of negative content in words seems to rely in language-independent mechanisms according to the proposal by Keuper et al. (2013).

On another level, our results complement prior findings with CDTD tasks in several aspects. They suggest that ERP modulations triggered by task-irrelevant emotional words may emerge at different processing stages. On the one hand, in convergence with the results by Trauer et al. (2012) with a SSVEP paradigm we did not observed that emotional compared to neutral distractor words enhanced sensory processing in visual areas. This claim seems to be supported by the lack of amplitude differences in the posterior P1, which is mainly elicited in visual cortices (Di Russo et al., 2003, 2005). On the other hand, we only found modulations at early processing stages, which disagree with effects during meaning derivation – in P2 and N400 components- reported in Trauer et al.'s (2012) study. Tentatively, these discrepant results may be again related to the functionally different processes involved in the primary task in both studies (see above). In the experiment by Trauer et al. (2012), participants attended an array of squares in order to detect brief coherent movements in one direction, a task that mainly implies early perceptual processing. In contrast, we used a digit categorization task that relies on numerical skills that require more elaborated conceptual knowledge at the stage of meaning evaluation (Delazer, 2003). Thus, it could be speculated

that the emotional content of word distractors interrupted ongoing task performance by capturing those processing resources that were involved to a lesser extent in the processing of target stimuli. The foreground task in our experiment may also account for the lack of effects in other components such as the EPN or the LPC. In this direction, although similar EPN modulations were found in tasks placing different processing demands, such as structural analysis or lexico-semantic processing (e.g., Kissler et al., 2009; Schacht and Sommer, 2009b), there is some evidence indicating that the EPN is more likely to be elicited when emotional words are deeply processed (e.g., Hinojosa et al., 2010b; Rellecke et al., 2011; Bayer et al., 2012). Similarly, task-effects have been found to modulate the amplitude of LPC (e.g., Fischler and Bradley, 2006; Schacht and Sommer, 2009b). Therefore, emotional modulations in these components seem to be more evident as the level of attention to the valence increases, although this idea requires further confirmation. Nonetheless, the results of a recent metaanalysis (Carretié, 2014) emphasized the nature of the primary task, as well as the characteristics of the distractors and individual differences, as a modulatory factor mediating attentional capture by emotional task-irrelevant stimuli (see also Mogg and Bradley, 1998).

The anterior N1 effects indicate that the processing of positive content in distractor words may operate at very early stages of the processing, as proposed by automatic vigilance models (Pratto and John, 1991) or the affective-primacy hypothesis (Zajonc, 1980; Delaney-Busch and Kuperberg, 2013), at least when the primary task implicates conceptual processing to some extent. However, the early latency of our effects raises the question about the mechanism underlying such a fast activation of emotional meaning from written words. Current findings suggest that some of the processes involved in word recognition become evident around 100 ms (Hauk et al., 2006). Indeed, ERP evidence has been reported suggesting a rapid access to the affective content of words as early as 80 ms using the semantic differential technique (Skrandies, 1998). Also, the finding of specific ERP effects for positive words between 100 and 150 ms with lexical decision (Bayer et al., 2012) or picture naming tasks (Hinojosa et al., 2010a) suggests that the analyses of emotional meaning has already started at 100 ms after word onset. An alternative explanation, however, might be outlined based on the proposal made by Bayer et al. (2012; see also Kissler et al., 2009, for similar arguments). These authors suggested that instead of fast semantic processing, non-linguistic mechanisms may contribute to early emotion effects in words. They argued that early emotional responses to words may originate from associative learning that does not depend on the semantic system given the results of previous studies that reported very early ERP modulations for non-linguistic stimuli associated with threat related pictures (Stolarova et al., 2006) and reward (Schacht et al., 2012). Additional support for this view, with verbal stimuli, comes from recent evidence showing that the activity elicited by emotionally and neutrally conditioned pseudowords differed in a negative component between 80 and 120 ms (Fritsch and Kuchinke, 2013). Interestingly, the OFC seems to be critically involved in rapid stimulus-reinforcement association learning for positive reinforcers (Rolls, 2000; Gottfried et al., 2003). This leaves

open the possibility that associative learning mechanisms that are non-linguistic in nature underlie anterior N1 effects to positive distractor words.

The current study has several potential limitations. In this sense, the absence of jitter between the fixation cross and the stimulus onset may have increased early attentional processes as a result of the expectation generated when the cross appeared in the screen. Also, the presentation of the blank screen following rather short stimulus durations (150 ms) may have interfered, and thus interrupted, subsequent stimulus, and attentional processing. Future research can address these issues by randomly varying the time between the fixation cross and the stimulus and by directly comparing the processing of stimulus with different presentation durations. Finally, although the main focus of the current study was on early latency components (N1 and P1), the relatively high number of temporo-spatial factors that we explored may have increased the probability of finding significant effects.

In sum, several conclusions can be derived from current results. First, complementing previous findings with pictorial stimuli in CDTD tasks, our data show that salient but task-irrelevant words disrupt processes involved in a primary digit categorization task. Second, positive distractor words are able to engage automatic attentional resources at early stages of the processing, as reflected by modulations in an anterior N1 component. Third, activation of the OFC underlies exogenous attentional mechanisms devoted to the processing of task-irrelevant emotional words. Finally, the fact that attentional capture was selectively triggered by positive words emphasizes the involvement of this brain structure in the processing of positive emotion.

#### **AUTHOR CONTRIBUTIONS**

Conception and design of the work: José A. Hinojosa and Luis Carretié. Acquisition, analysis, or interpretation of data for the work: José A. Hinojosa, Francisco Mercado, Jacobo Albert, Paloma Barjola, Irene Peláez, Cristina Villalba-García and Luis Carretié. Drafting the work or revising it critically for important intellectual content: José A. Hinojosa, Francisco Mercado, Jacobo Albert, Paloma Barjola, Irene Peláez, Cristina Villalba-García and Luis Carretié. Final approval of the version to be published: José A. Hinojosa, Francisco Mercado, Jacobo Albert, Paloma Barjola, Irene Peláez, Cristina Villalba-García and Luis Carretié. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved: José A. Hinojosa, Francisco Mercado, Jacobo Albert, Paloma Barjola, Irene Peláez, Cristina Villalba-García and Luis Carretié.

#### **ACKOWLEDGMENTS**

This work was supported by grants PSI2012-37535 and PSI2011- 26314 from the Ministerio de Economía y Competitividad (MINECO) of Spain, and grant PI13/01759 from the Institute of Health Carlos III (ISCIII) of Spain.

#### **REFERENCES**


the neural network for social cognition?. *Neuropsychologia* 41, 517–522. doi: 10.1016/S0028-3932(02)00310-X


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 September 2014; accepted: 07 January 2015; published online: 26 January 2015.*

*Citation: Hinojosa JA, Mercado F, Albert J, Barjola P, Peláez I, Villalba-García C and Carretié L (2015) Neural correlates of an early attentional capture by positive distractor words. Front. Psychol. 6:24. doi: 10.3389/fpsyg.2015.00024*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2015 Hinojosa, Mercado, Albert, Barjola, Peláez, Villalba-García and Carretié. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Implicit and Explicit Attention to Pictures and Words: An fMRI-Study of Concurrent Emotional Stimulus Processing

Tobias Flaisch<sup>1</sup> \*, Martin Imhof <sup>1</sup> , Ralf Schmälzle<sup>1</sup> , Klaus-Ulrich Wentz <sup>2</sup> , Bernd Ibach<sup>3</sup> and Harald T. Schupp<sup>1</sup>

<sup>1</sup> Department of Psychology, University of Konstanz, Konstanz, Germany, <sup>2</sup> Department of Radiology, Kantonsspital Münsterlingen, Münsterlingen, Switzerland, <sup>3</sup> Department of Psychiatry, Psychiatrische Dienste Thurgau, Münsterlingen, Switzerland

Edited by:

Cornelia Herbert, University of Ulm, Germany

#### Reviewed by:

Francesca M. M. Citron, Lancaster University, UK Sebastian Schindler, University of Bielefeld, Germany

> \*Correspondence: Tobias Flaisch

tobias.flaisch@uni-konstanz.de

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 31 January 2015 Accepted: 17 November 2015 Published: 18 December 2015

#### Citation:

Flaisch T, Imhof M, Schmälzle R, Wentz K-U, Ibach B and Schupp HT (2015) Implicit and Explicit Attention to Pictures and Words: An fMRI-Study of Concurrent Emotional Stimulus Processing. Front. Psychol. 6:1861. doi: 10.3389/fpsyg.2015.01861 The present study utilized functional magnetic resonance imaging (fMRI) to examine the neural processing of concurrently presented emotional stimuli under varying explicit and implicit attention demands. Specifically, in separate trials, participants indicated the category of either pictures or words. The words were placed over the center of the pictures and the picture-word compound-stimuli were presented for 1500 ms in a rapid event-related design. The results reveal pronounced main effects of task and emotion: the picture categorization task prompted strong activations in visual, parietal, temporal, frontal, and subcortical regions; the word categorization task evoked increased activation only in left extrastriate cortex. Furthermore, beyond replicating key findings regarding emotional picture and word processing, the results point to a dissociation of semantic-affective and sensory-perceptual processes for words: while emotional words engaged semantic-affective networks of the left hemisphere regardless of task, the increased activity in left extrastriate cortex associated with explicitly attending to words was diminished when the word was overlaid over an erotic image. Finally, we observed a significant interaction between Picture Category and Task within dorsal visual-associative regions, inferior parietal, and dorsolateral, and medial prefrontal cortices: during the word categorization task, activation was increased in these regions when the words were overlaid over erotic as compared to romantic pictures. During the picture categorization task, activity in these areas was relatively decreased when categorizing erotic as compared to romantic pictures. Thus, the emotional intensity of the pictures strongly affected brain regions devoted to the control of task-related word or picture processing. These findings are discussed with respect to the interplay of obligatory stimulus processing with task-related attentional control mechanisms.

Keywords: emotion, language, pictures, attention, perception, fMRI

# INTRODUCTION

Multiple processes determine the regulation of selective attention processes. On the one hand, selective attention can be regulated voluntarily (i.e., "explicitly") if attention is focused on goalrelevant stimuli in the environment. On the other hand, inherent features of a stimulus may also regulate attention processes (i.e., "implicitly") such as when novel stimuli appear suddenly in the environment or when pictures grab attention due to the emotional significance conveyed by the image<sup>1</sup> . A large array of studies was conducted to examine the interaction among implicit and explicit processes in the regulation of selective attention processes. Interaction effects were detailed with respect to implicit emotion and explicit goal relevance in conditions of cooperation and competition for processing resources, as well as in conditions of implicit emotion significance in different sensory modalities. To extend these lines of research, the present study investigated effects of both cooperation and competition among emotionally arousing and neutral stimuli by directing the task focus to either words or the scene of the image presented concurrently in a compound stimulus.

# Selective Attention: Implicit and Explicit Processes

Explicitly directed attention toward visual features, objects, and higher-order semantic categories revealed accentuated activations in occipital and inferior temporal cortical regions preferentially engaged by specific stimulus attributes such as color, stimulus orientation, or object category (Kastner and Ungerleider, 2000; Vuilleumier, 2005; Jehee et al., 2011). Additionally, the activity in these regions is also modulated by explicit spatial attention. Specifically, directing attention toward a lateralized stimulus in either visual hemifield enhances activity in corresponding areas contralaterally to the location of the stimulus (Heinze et al., 1994; Mangun et al., 1998; Kastner and Ungerleider, 2000). Thus, explicit attention toward visual stimuli regulates selective attention processes in sensory-perceptual brain regions.

A similar pattern of findings was seen in studies examining the implicit regulation of attention processes by varying emotional arousal of the stimuli. Specifically, a large array of studies consistently demonstrated that the processing of emotionally arousing (pleasant and unpleasant) as compared to nonemotional picture stimuli leads to increased activations in extended regions of the visual system including the extrastriate visual cortex and widespread regions of the inferior temporal cortex (Lang et al., 1998; Junghöfer et al., 2005; Sabatinelli et al., 2005; Flaisch et al., 2009). Of note, in these studies those effects are also reliably observed when participants view pictures passively and when the task does not require them to actively process the stimulus' emotional connotation. In sum, explicit task-relevancy, as well as the emotional significance of pictures regulate attention processes in brain regions devoted to visual stimulus processing.

Beyond pictures, there is also robust evidence for the preferential processing of emotional words (reviewed in Citron, 2012). Specifically, emotional (positive and negative) as compared to neutral words (i.e., nouns and adjectives) elicited increased activations in the inferior and middle frontal gyrus, middle temporal gyrus, dorso-medial prefrontal cortex, and inferior parietal lobe (Cato et al., 2004; Kensinger and Schacter, 2006; Herbert et al., 2009; Hoffmann et al., 2015). Furthermore, these effects are often obtained most reliably in left-hemispheric regions (Kotz and Paulmann, 2011). Thus, emotional words regulate attention processes in a brain network devoted to semantic processing with the left-hemispheric focus being consistent with a large array of studies examining non-emotional language processing (Price, 2012). As with pictures, visually presented emotional words also engage extrastriate visual areas (Kensinger and Schacter, 2006). In one instance this occurred exclusively in the left hemisphere (Herbert et al., 2009), suggesting an overlap in neural regions for the visual processing of emotional pictures and words.

However, the mechanism of preferential stimulus processing seems to be at least partially different for implicit emotional and explicit task-related attention processes. In many studies, the amplified processing of emotional pictures is accompanied by activation increases in limbic and para-limbic regions, i.e., the amygdala, orbitofrontal cortex, cingulate gyrus, and dorso-medial prefrontal cortical regions (Junghöfer et al., 2005; Sabatinelli et al., 2011; Lindquist et al., 2012). Similarly, limbic structures also respond to the emotionality of words, most prominently the amygdala (Hamann and Mao, 2002; Cato et al., 2004; Kensinger and Schacter, 2006; Herbert et al., 2009; Kanske and Kotz, 2011; Straube et al., 2011; Hoffmann et al., 2015). While the specific outcome of an individual study may vary, possibly due to differences in experimental design, used stimuli, or technical constraints, recent meta-analyses largely confirmed the involvement of these regions (Sabatinelli et al., 2011; Lindquist et al., 2012). On the other hand, explicit attention studies usually reveal the activation of distinct neural structures which are thought to regulate selective attention processes. Specifically, the regulation of attention has been associated with activity in frontal cortical regions, including frontal and supplementary eye fields as well as the dorso-lateral prefrontal cortex accompanied by regions of the superior and inferior parietal lobe (Desimone and Duncan, 1995; Kastner and Ungerleider, 2000; Corbetta et al., 2008). In sum, while implicit emotional and explicit task-related attention processes share common neural substrates such as enhanced sensory-perceptual processing, they are also characterized by distinct activations in limbic brain areas implicated in emotion processing and prefrontal regions associated with the volitional regulation of selective attention, respectively.

# Selective Attention: The Interaction Among Implicit and Explicit Processes

Studying the interaction of implicit emotional and explicit attention processes was spurred by examining the hypothesis

<sup>1</sup>Here, the terms "explicit" and "implicit" are used to refer to selective attention processes which need not be specifically aimed at the emotional connotation of a stimulus. Accordingly, they should not be confused with the direct vs. indirect processing of the emotional connotation of a stimulus, for which the same labels are often used in the domain of emotion research.

that emotion processing occurs automatically. In a first study, Vuilleumier et al. (2001) presented multiple stimuli, i.e., faces (fearful and neutral) and houses aligned vertically and horizontally, and directed the participants' explicit attentional focus either toward the faces or the houses by asking them to decide whether the respective stimulus dimension showed the same pictures or not. Supporting the notion of automaticity, the selective processing of fearful and neutral faces was maintained in the amygdala and fusiform cortex even when the focus of attention was on the house stimuli. There were also neural regions responsive to fearful faces only when the stimuli were the focus of attention, e.g., anterior cingulate and orbitofrontal cortex. Thus, while selective emotion processing in some brain regions appears to depend on explicit task-focus, others seem to respond to stimulus emotionality automatically, i.e., even if they are processed outside the explicit focus of attention. However, the notion of automaticity has been challenged by subsequent studies. For instance, Pessoa et al. (2002) reported emotionally enhanced activity in the amygdala and visual cortex only if the emotional faces were actively attended. Since then, numerous studies have confirmed the finding that implicit attention to emotion competes with explicit attentional demands not only in the amygdala but also in other brain regions, consequently decreasing preferential emotion processing under conditions of heightened task-load and/or distraction (Blair et al., 2007; Hsu and Pessoa, 2007; Mitchell et al., 2007; Van Dillen et al., 2009; McRae et al., 2010; Yates et al., 2010; Kanske and Kotz, 2011).

In addition to studying the interaction of implicit emotion and explicit attention processes, multisensory studies enable examining the interaction of multiple implicit processes by concurrently presenting emotional stimuli in different sensory modalities (for recent reviews see Klasen et al., 2012; Gerdes et al., 2014 ). In according studies, participants view e.g., emotional facial expressions while listening at the same time to human voices with emotionally modulated prosody. The findings demonstrate the concurrent preferential processing of emotional stimuli in different modalities. Specifically, visual emotional stimuli elicited increased activity in primary and associative visual cortical regions and, simultaneously, auditory emotional stimuli enhanced activity in primary and higherorder auditory cortices (e.g., Ethofer et al., 2006). This finding suggests that the brain is able to process the concurrent call for preferential processing in parallel when the different sources of emotional significance demand resources from different processing channels. Accordingly, this is consistent with the notion put forward by Lavie (2005) maintaining that competition effects are primarily a function of competition for shared processing resources. On the other hand, this also implies that competition effects should be more pronounced when several concurrent sources of implicit emotional significance within the same sensory modality demand shared processing resources.

# The Present Study

The present study was designed to further detail the emotionattention relationship by exploring how the brain processes concurrently presented visual emotional stimuli under varying explicit and implicit attention demands. Toward this end, the different lines of research, i.e., explicit attention and preferential processing of emotional words and pictures were brought together in the present study with the intent to capitalize on the finding that the preferential processing of emotional pictures and words is associated both with shared, as well as distinct brain regions. Specifically, while implicit emotional attention conveyed by either stimulus class is associated with enhanced perceptual processing, emotional words in particular are characterized by stimulus-specific activation increases in semantic brain regions associated with word processing. This allowed us to assess effects of implicit and explicit attention on stimulus-specific and shared brain regions by presenting the two stimulus classes simultaneously. A task varying between trials manipulated the focus of attention by asking participants to indicate either the pre-defined category of the pictures as "erotic" vs. "everyday," or of the words as "positive" or "neutral." Consequently, when attention was directed toward one class of stimuli, i.e., picture or word, the other stimuli were taskirrelevant. The main goals of the present study were to assess neural structures implicated in regulating explicit attention toward pictures and words and to examine the interaction of attention with emotional stimulus significance. A first set of hypotheses regarded main effects of emotional intensity and explicit task instruction. Based on previous findings on picture and word processing, it was predicted that emotionally arousing pictures and words are preferentially processed as compared to control stimuli in regions of the extended visual cortex for pictures and (left-hemispheric) regions of the semantic network for words. In the present study design, simple main effects of the task indicate the net effect between the attention focus toward and away from either the picture or word stimuli. The phrase "a picture is worth a thousand words" indicates that pictures are more salient than words. Accordingly, it was predicted that the demand of attention regulation is most pronounced for the picture categorization task. In addition, the overlap of task activations with regions sensitive to the emotional significance of stimuli would suggest that such effects are associated with selective attention, per se, rather than reflecting attention control regions which should only be observed as a function of the task manipulation. Finally, the need for attention control is presumed to vary for emotional and neutral stimuli serving as target and distracter stimuli. Specifically, diverting attention away from erotic stimuli seems most challenging, leading to an interaction of Picture Category by Task most likely observed in pre-frontal and parietal regions associated with attention regulation and showing greater activation for word categorization trials presented over task-irrelevant erotic pictures.

# MATERIALS AND METHODS

# Participants

Thirty-one volunteers (18 females; 1 left-handed) between 18 and 34 years of age (M = 21.8) with normal or corrected-tonormal vision participated in the study. Behavioral data for two participants were lost due to technical problems. Thus, data from 29 participants entered behavioral analysis. All participants were native German speakers. They were recruited at the University of Konstanz and received either course credits or e8 per hour. All participants provided informed consent to the study protocol, which was approved by the ethical review board of the University of Konstanz. All participants were healthy at the time of measurement and reported no history of neurological or psychiatric disorders.

# Stimulus Materials, Tasks, and Experimental Procedure

Word stimuli were selected from the Berlin Affective Word List Reloaded (BAWL-R; Võ et al., 2009) and included 22 emotionally positive and 22 neutral German nouns<sup>2</sup> referring to different categories of human experience. According to normative ratings, the categories differed in terms of valence (positive: M = 8.1, SD = 0.32; neutral = 5.0, SD = 0.32; p < 0.001) as well as arousal (positive: M = 5.7, SD = 1.13; neutral: M = 3.5, SD = 0.82; p < 0.001)<sup>3</sup> . The two word categories were matched for word length (3–6 letters), number of syllables (1–3), imageability, and word frequency (Võ et al., 2009).

Picture selection comprised 22 images of nude couples in erotic poses and 22 images of dressed couples in romantic situations. Previous research provides strong evidence that the activation of visual-associative as well as subcortical limbic structures is driven by the emotional arousal dimension and accentuated for erotic stimuli (Junghöfer et al., 2005; Sabatinelli et al., 2005). The "romantic" control category was selected to promote the comparability of the two picture categories in terms of picture composition and categorical homogeneity. Specifically, pictures did not differ in complexity, color, or number of people i.e., all pictures were black and white and showed heterosexual dyads of socially interacting couples. Subjective ratings collected from an independent sample of 16 participants (8 females) revealed that both picture categories did not differ regarding valence (self-assessment manikin; Bradley and Lang, 1994; erotic: M = 5.8, SD = 1.16; romantic: M = 6.3, SD = 1.13; ns.), but that erotic images were rated as significantly more arousing (erotic: M = 6.3, SD = 0.99; romantic: M = 2.7, SD = 1.09; p < 0.001).

The compound stimulus was constructed by centrally overlaying the respective word, in gray-blue capital letters and Consolas font, over the respective erotic or romantic pictures (**Figure 1**). For each participant, the respective pairings of specific words and pictures were randomly assigned for each experimental cell of the Picture Category-by-Word Category interaction (i.e., erotic-positive, erotic-neutral, romanticpositive, romantic-neutral). This assignment was then kept constant across the Task factor, i.e., each participant viewed the same word-picture combinations twice, once under the word and once under the picture categorization instruction, respectively. This resulted in eight experimental cells overall. The stimuli were displayed on a back-projection screen and participants viewed them via a mirror attached to the head-coil. The pictures subtended a vertical visual angle of 16.1◦ and a horizontal visual angle of 21.5◦ ; the words subtended vertically 3.9◦ and horizontally between 9.8◦ (3-letter word) and 19.6◦ (6-letter word). A white rectangle on a black background served as pre-stimulus response cue and its size was matched to the picture or word stimulus-dimension to signal an upcoming picture or word categorization trial.

To minimize effects of task difficulty and to avoid categorical ambiguity, participants were familiarized with the entire stimulus set and each stimulus' categorical assignment before scanning. Toward this end, participants were shown each exemplar of the two picture and two word categories in separate blocks and the distinct labels for the picture (erotic or everyday) and the word (positive or neutral) categories were introduced. The order of blocks during familiarization was randomized across participants. Afterwards, participants received the instructions and then worked through 12 practice trials for which random stimuli were drawn from the regular stimulus set. The task was to categorize either the background picture or the overlaid word as fast and as accurately as possible. To minimize effects of response conflict, each response alternative was assigned to a specific finger, respectively, and differing verbal descriptions for picture and word categories were deliberately chosen to avoid direct semantic mapping onto each other. Participants responded by pressing the corresponding right and left index and middle fingers, respectively. Hereby, picture category had to be categorized with one, and word category with the other hand, balanced across participants. "Erotic picture" and "positive word" as well as "everyday picture" and "neutral word" were always mapped onto either the index or the middle fingers, which was again balanced across participants.

Each trial began with the presentation of a pre-stimulus cue for 516 ms indicating the stimulus dimension to be categorized, i.e., word or picture, followed by the main compound stimulus for 1516 ms, and a black inter-trial-interval (ITI) whose duration was exponentially distributed with a mean of 2500 ms and a range of 2000–4000 ms (Dale, 1999; **Figure 1**). The main experiment comprised 352 trials (44 per experimental cell) which were presented consecutively in a single session lasting approximately 29 min. Hereby, order of trials was randomized and the same picture or word could not appear in succession.

# Data Acquisition and Analysis

Scanning was conducted using a 3-Tesla Siemens Verio MR-System. For functional imaging, a T2<sup>∗</sup> -weighted gradient singleshot echo planar imaging (EPI) sequence was acquired. In-plane resolution was 3.0 × 3.0 mm and slice thickness was 3.5 mm (36 axial slices; no gap; FOV = 240 mm; acquisition matrix: 80 × 80 voxels; TE = 30 ms; flip angle = 90◦ ; TR = 2500 ms). In addition, a high-resolution T1-weighted structural scan was obtained for each participant.

<sup>2</sup>Full List of word stimuli. Positive: Liebe, Sex, Sonne, Urlaub, Herz, Sieg, Held, Party, Freude, Schatz, Feier, Sommer, Charme, Spaß, Erfolg, Erotik, Gewinn, Gefühl, Lust, Ferien, Glück, Chance; Neutral: Boden, Uhr, Fahne, Neubau, Rede, Form, Test, Besen, Gegend, Treppe, Kabel, Karton, Lesung, Wand, Stelle, Ordner, Urteil, Metall, Note, Klinke, Meter, Inhalt.

English translation in same order. Positive: Love, Sex, Sun, Holiday, Heart, Victory, Hero, Party, Joy, Treasure, Celebration, Summer, Charm, Fun, Success, Erotic, Prize, Feeling, Lust, Vacation, Luck, Chance; Neutral: Floor, Clock, Flag, Reconstruction, Talk, Form, Test, Broom, Area, Stairs, Cable, Box, Reading, Wall, Place, Folder, Opinion, Metal, Note, Handle, Meter, Content.

<sup>3</sup>To promote comparability between valence and arousal ratings of pictures and words, respectively, the reported values for words were transformed to a 9-point-Likert scale as utilized for the SAM.

Statistical analyses of the functional images were conducted using Statistical Parametric Mapping (SPM8; Wellcome Department of Imaging Neuroscience, University College London, UK; http://www.fil.ion.ucl.ac.uk/spm/software/spm8; Friston et al., 1994). Preprocessing included slice-time correction and realignment without unwarping. Additionally, the functional images were spatially normalized to the standard EPI-template and smoothed with a kernel of FWHM = 8 × 8 × 8 mm. On the fixed-effects level, the data were analyzed in an eventrelated design comprising eight covariates-of-interest classifying each trial in terms of Picture Category (erotic vs. romantic), Word Category (positive vs. neutral), and experimental Task (picture categorization vs. word categorization). To improve model-fit, additional covariates-of-no-interest were included comprised by the modeled covariates-of-interest's time and dispersion derivatives, six movement parameters obtained during realignment, and one covariate incorporating an overall intercept to the model. A high-pass filter with a cutoff period of 128 s was applied to the data. To avoid a bias of the global signal from the emotionally intense erotic picture category, no global scaling was applied (Junghöfer et al., 2005). BOLD-activity associated with each experimental condition was determined by contrasting each covariate-of-interest with the implicit baseline.

Random-effects analysis was implemented by calculating a flexible-factorial model including the within-subject main effects of Picture Category (erotic vs. romantic), Word Category (positive vs. neutral), and Task (picture categorization vs. word categorization), as well as all possible two-way interactions. Additionally, a subject factor was included in the model to account for between subject variance. Activated voxels were determined by means of bi-directional F-contrasts for interactions and directed T-contrasts for main effects and were considered meaningful if they reached a statistical threshold of p < 0.05 (FDR-corrected at voxel level, cluster size k > 15). Figures were created using MRIcron software (http:// www.mccauslandcenter.sc.edu/mricro/mricron/; Rorden and Brett, 2000) displaying activations in neurological orientation. Coordinates in **Tables 1**–**5** are reported in MNI space, and the respective labels of their anatomical locations were obtained using the maximum probability tissue atlas from the OASISproject (http://www.oasis-brains.org/) as provided in SPM12 by Neuromorphometrics, Inc. under academic subscription (http:// neuromorphometrics.com/).

One research objective was to identify brain regions which are modulated both by implicit emotional, as well as explicit taskdirected attention. Accordingly, to find voxels displaying main effects that are common to, as well as distinct from Task and Picture Category, respectively, conjunction plots were created by overlaying both thresholded main effects<sup>4</sup> . Regarding the interactions, significant activations were only found for the Taskby-Picture Category contrast. To assess whether the according main effects were also qualified by this interaction a further conjunction plot was created overlaying these activation maps with the interaction contrast. Finally, to assess the exact pattern

of the interaction in voxels showing main and interaction effects, the averaged beta values across the main clusters of common activation were extracted for each participant and then submitted to repeated-measures ANOVAs.

Reaction time (RT) data provide a behavioral test of response preferences. Error trials and outliers (i.e., trials faster than 300 ms and slower than three standard deviations above the RT mean) were excluded from the RT analyses, resulting in an average of 41 trials per cell. These trials were entered into repeatedmeasures ANOVA incorporating the factors Picture Category (erotic vs. romantic), Word Category (positive vs. neutral), and Task (picture categorization vs. word categorization). Error rates were very low (M = 4.8%) and were not examined further.

# RESULTS

# Reaction Times

Participants responded faster to pictures (M = 712.1 ms) than to words (M = 817.2 ms), Task: F(1, 28) = 68.2, p < 0.001. This main effect, however, was qualified by a Task by Picture Category interaction, F(1, 28) = 29.7, p < 0.001. Posthoc tests revealed that participants responded faster to erotic (M = 692.7 ms) than to romantic (M = 731.5 ms) pictures during the picture categorization task, t(28) = 4.4, p < 0.001. However, if the participants had to categorize words, erotic (M = 823.5 ms) pictures prompted relatively slower responses compared to romantic control images (M = 811.0 ms), t(28) = 2.3, p < 0.05.

# fMRI

#### Emotion Main Effects

Contrasting erotic with romantic images ([erotic > romantic]) yielded sizeable activations in bilateral extrastriate cortical areas (**Figure 2A**, **Table 1**). These clusters covered large portions of lateral occipito-temporal cortex, reaching from fusiform areas

<sup>4</sup>No activations were found common to word categorization and word emotionality. Accordingly, only the conjunction of the main effects for picture categorization and picture emotionality is presented in **Figure 4**.

#### TABLE 1 | Activated voxels from contrast [erotic > romantic pictures].


T-contrast, p < 0.05, FDR-corrected at voxel level, cluster size > 15. Local maxima more than 8 mm apart were extracted and reported. Side indicates hemisphere in which peak voxel is located (R, right; L, left). Labels provided by Neuromorphometrics, Inc. under academic subscription. Entries in italics indicate sub-peak regions within the cluster indicated above. \*Indicates nearest gray matter approximation, Voxels indicates N voxels, T indicates peak t-values, p indicates peak p-values.

<sup>a</sup>For extended clusters (>1000 Voxels) we extracted and reported local maxima > 20 mm apart in order to illustrate the cluster adequately.

ventro-laterally up to superior occipital cortex dorsally. Another large cluster was found in medial prefrontal cortex, almost exclusively in the left hemisphere. This activation included the anterior cingulate cortex as well as regions of the frontal pole. Further clusters were located in the left-sided superior frontal gyrus and in the precuneus.

Contrasting positive with neutral words ([positive > neutral]) predominantly resulted in activation clusters located in the left hemisphere (**Figure 2B**, **Table 2**). The largest was found in left parietal regions, mostly covering areas in the vicinity of the intraparietal sulcus and neighboring angular gyrus and reaching into superior parietal lobe. Two further clusters were located in the left inferior frontal gyrus: the larger located in the anterior portion, the smaller more posteriorly. Further clusters were apparent in the left-hemisperic medial superior frontal cortex and posterior superior frontal gyrus as well as in the right cerebellum and temporal lobe. Most notably, a final cluster was found in anterior regions of the left hippocampus, extending into the left amygdala<sup>5</sup> .

<sup>5</sup>For the pictures, the reversed contrast revealed a single cluster of increased activity to romantic, as compared to erotic pictures ([romantic > erotic]).

#### Task Main Effects

The contrast [picture categorization > word categorization] resulted in a large contiguous cluster encompassing posterior, frontal, temporal, and subcortical regions (**Figure 3A**, **Table 3**). In posterior areas, this included extended activations in bilateral

This cluster was located in early visual cortex bilaterally, mainly including occipital pole regions but also extending into cuneus and calcarine cortex (Supplementary Figure 2). In contrast, no further activations were found when comparing neutral with positive words ([neutral > positive]).

#### TABLE 2 | Activated voxels from contrast [positive > neutral words].

occipito-temporo-parietal regions, reaching into inferior parietal areas and incorporating broad activations in dorso-medial extrastriate regions. It also reached into postero-medial areas covering almost the whole extent of the precuneus and posterior cingulate cortex. Furthermore, this cluster also included strong and sizeable activations of medial regions of the ventral visual stream, including lingual and medial fusiform gyri, parahippocampal areas, and the hippocampus. In frontal regions, this cluster covered large areas of the bilateral medial prefrontal


T-contrast, p < 0.05, FDR-corrected at voxel level, cluster size > 15. Local maxima more than 8 mm apart were extracted and reported. Side indicates hemisphere in which peak voxel is located (R, right; L, left). Labels provided by Neuromorphometrics, Inc. under academic subscription. Entries in italics indicate sub-peak regions within the cluster indicated above. \*Indicates nearest gray matter approximation, Voxels indicates N voxels, T indicates peak t-values, p indicates peak p-values.

#### TABLE 3 | Activated voxels from contrast [picture > word categorization].


T-contrast, p < 0.05, FDR-corrected at voxel level, cluster size > 15. Local maxima more than 8 mm apart were extracted and reported. Side indicates hemisphere in which peak voxel is located (R, right, L, left). Labels provided by Neuromorphometrics, Inc. under academic subscription. Entries in italics indicate sub-peak regions within the cluster indicated above. \*Indicates nearest gray matter approximation, Voxels indicates N voxels, T indicates peak t-values, p indicates peak p-values.

<sup>b</sup>For extended clusters (>1000 Voxels) we extracted and reported local maxima more than 20 mm apart in order to illustrate the cluster adequately. Regions with several sub-peaks were summarized and only the largest peak of the sub-region is reported.

cortex, which included the anterior cingulate cortex and reached into frontal pole regions. Additionally, it extended into left and right lateral prefrontal cortex, including superior and middle frontal gyri. The cluster also included anterior temporal lobe regions exclusively in the right hemisphere, mostly covering middle temporal gyrus, but also reaching into superior and inferior temporal cortex. Finally, subcortical areas were also covered by this extensive cluster. Specifically, this included the posterior thalamus and antero-ventral striatum bilaterally as well as the amygdala, which was activated to a considerably larger extent in the left hemisphere. Further clusters were found in dorsal areas of the left post-central gyrus, in the inferior and orbito-frontal cortex on the right side, and in the left temporal gyrus.

The contrast [word categorization > picture categorization] revealed only a single cluster located in the early extrastriate cortex in the left hemisphere (**Figure 3B**, **Table 4**).

#### Overlap of Main Effects

Comparing main effects of picture categorization and picture emotionality showed that the activations found for the processing of erotic pictures were to a large degree also activated when participants had to categorize pictures (**Figure 4**). Specifically,


T-contrast, p < 0.05, FDR-corrected at voxel level, cluster size > 15. Local maxima more than 8 mm apart were extracted and reported. Side indicates hemisphere in which peak voxel is located (R, right; L, left), Voxels indicates N voxels, T indicates peak t-values, p indicates peak p-values. Labels provided by Neuromorphometrics, Inc. under academic subscription. Entries in italics indicate sub-peak regions within the cluster indicated above.

the vast extra-striate activations for both main effects largely overlapped each other, although they were generally even more extended for the picture categorization contrast. Only relatively few more inferiorly located voxels in the lateral occipito-temporal cortex were exclusive to erotic picture viewing. All additional clusters found for picture emotionality in the cuneus as well as the frontal regions also largely overlapped activity associated with the picture categorization task.

In contrast, main effects of word categorization and word emotionality did not yield any commonly activated voxels, at all.

### Interactions Between Picture Category, Word Category, and Task

As illustrated in **Figure 5A** (**Table 5**), a significant interaction between Task and Picture Category was obtained, consisting of widespread bilateral activations in the dorsolateral-prefrontal cortex, inferior parietal cortex, frontal eye-fields, cerebellum, and the precuneus and cuneus. Further clusters were detected in the right antero-ventral striatum and the right anterior insula, extending into the adjacent inferior frontal cortex and right posterior thalamus. Additional clusters were also found in the pons, pre-SMA, and anterior cingulate cortex. To further detail these findings, we conducted directed interaction Tcontrasts for the activated voxels. These confirmed that all voxels were characterized by the same directed interaction pattern. Specifically, in the word categorization task these voxels showed increased activation when the words were overlaid over erotic as compared to romantic pictures. In contrast, this differentiation reversed under the picture task instruction. Here, activity in these voxels was relatively decreased when categorizing erotic as compared to romantic pictures.

To determine whether the effects of picture emotionality were qualified by this interaction, we compared them with regard to the found interaction pattern. From **Figure 5B** it becomes apparent that there was no substantial overlap between this interaction and brain regions showing a significant main effect of

Picture Category, i.e., increased activation to erotic as compared to romantic pictures.

In contrast, the effects of Task yielded several regions of overlap with the found interaction (**Figure 5B**). Most notably, these included large portions of the left-hemispheric extrastriate activations, for the word categorization task, and sizeable regions of the precuneus and both the left and right inferior parietal cortex, for the picture categorization task. Region-of-interest assessment of these voxels (**Figure 5C**) revealed that precuneus and inferior parietal regions only showed task-related activation differences when participants viewed romantic images. In contrast, the extrastriate region was always more activated during the word, as compared to the picture categorization task—albeit this difference was more pronounced when the words were overlaid onto erotic images.

No significant interactions including the factor Word Category were observed.

# DISCUSSION

The present study examined the interplay of implicit emotion and explicit task relevance on the processing of concurrently presented word and picture stimuli. Consistent with the notion of the flexible tuning of processing resources, i.e., benefits of being the focus of attention and cost effects when shared processing resources are taxed, four main findings emerged. First, differential activation of attentional control regions was specific to the picture categorization task, suggesting a pronounced difference between words and pictures in demanding attention regulation. Second, a significant interaction of task and picture category was observed covering large scale neural networks including dorsal visual associative cortex regions and inferior parietal and dorsolateral prefrontal cortices, indicating differential activity to romantic and erotic pictures as a function of task. Third, the selective processing of emotionally arousing pictures and words was independent from task relevance. Fourth, explicit attention enhanced sensory-perceptual processing of pictures and words. Interestingly, only extrastriate activation to words showed effects of competition with picture emotionality as indicated by relatively decreased activity when the words were overlaid over erotic images. Overall, these data suggest the flexible entrainment of large-scale neural networks depending on current behavioral goals and the processing demands of the stimulus, i.e., word or picture and the emotional intensity of the distracter.

# Task Effects: Words and Picture Categorization

The present findings suggest a pronounced difference in processing demands associated with the regulation of attention toward pictures and words. Extended activations were observed

#### TABLE 5 | Activated voxels of Task x Picture Category interaction.


F-contrast, p < 0.05, FDR-corrected at voxel level, cluster size > 15. Local maxima more than 8 mm apart were extracted and reported. Side indicates hemisphere in which peak voxel is located (R, right, L, left). Labels provided by Neuromorphometrics, Inc. under academic subscription. Entries in italics indicate sub-peak regions within the cluster indicated above.

\*Indicates nearest gray matter approximation, Voxels indicates N voxels, F indicates peak F-values, p indicates peak p-values). <sup>a</sup>For extended clusters (>1000 Voxels) we extracted and reported local maxima more than 20 mm apart in order to illustrate the cluster adequately. in corresponding brain regions when the focus of attention was directed toward picture processing. In contrast, none of the neural regions implicated in regulating the allocation of attention to stimuli showed larger activations during the word recognition task. Importantly, these differences were obtained during the processing of stimuli which were physically identical. Furthermore, the task to classify the stimuli was structurally similar for pictures and words, requiring participants to sort the stimuli into two categories defined by emotion. Noteworthily, differences in task difficulty do not seem to account for the pronounced and widespread activations observed for the picture categorization task. Specifically, error rates were low and pictures were classified faster than words, with erotic stimuli showing fastest reaction times. The need to regulate selective attention processes is presumed to depend on demanding task conditions and processing load (Luck et al., 2000; Lavie, 2005). With regard to selectively focus either on the foreground word or background picture, the processing of words showed neither benefits nor cost effects, suggesting little cognitive demand by word processing and indicating automaticity (Augustinova and Ferrand, 2014). In contrast, there was a strong need to regulate processing resources during the picture task, reflecting the flexible tuning of attention processes according to processing goals.

The picture as compared to the word categorization task not only elicited activity in widespread areas of medial and lateral parietal as well as dorso-lateral prefrontal cortices but also in subcortical limbic structures and right temporal areas. While it is difficult to determine whether these effects primarily reflect enhanced activation during the picture task or reduced engagement during the word task, it is clear that the activity in these structures is highly dependent on processing goals. Specifically, the posterior parietal cortex, including the precuneus and lateral parietal areas, has been implicated in visuo-spatial processing, often by using tasks that require visuospatial attention shifting (Kastner and Ungerleider, 2000; Simon et al., 2002; Molenberghs et al., 2007; Chica et al., 2013). Additionally, frontal regions in the vicinity of the superior frontal sulcus have also been shown to be involved in voluntary attention shifting and as acting in concert with medial and lateral parietal areas to provide voluntary attentional control in the perceptual as well as the mnemonic domain (Tamber-Rosenau et al., 2011). This conforms well to the present results and it may accordingly be presumed that the processing of pictures invoked attention shifts to a larger degree than word stimuli. From a broader perspective, widespread activity has also been reported for goal-directed stimulus processing and successful recognition memory in neural networks that show a striking overlap to the pattern of findings observed here. For instance, a supramodal limbic-paralimbic-cortical network has been identified by contrasting the processing of Go and NoGo stimuli (Laurens et al., 2005). Furthermore, Keightley et al. (2011) reported regions associated with successful recognition of visual stimuli including ventral prefrontal areas, subcortical structures such as the amygdala and hippocampus, and regions of the anterior temporal lobe which were also restricted to the right hemisphere. Overall, focusing attention on pictures was associated with modulations in cortical and subcortical limbic regions implicated in goal-directed picture processing and recognition memory.

The present findings concur with the notion that selective attention enhances sensory-perceptual stimulus processing. This was apparent regarding the intentional processing of both words as well as pictures. Here, left-lateralized areas of early extrastriate cortex responded most strongly when words were the focus of attention. This result relates to previous reports of visual word processing (Wandell, 2011; Price, 2012) as well as to the present finding of extended bilateral extrastriate activations during picture categorization. This finding, in turn, aligns well with previous studies, suggesting that selective attention to pictures or to specific features of a picture amplifies the perceptual encoding of these features in extrastriate visual cortex (Kastner and Ungerleider, 2000; Pessoa et al., 2002; Jehee et al., 2011). Overall, selective attention to pictures was associated with increased activity in higher-order temporo-occipital visual areas related to object recognition (Grill-Spector and Malach, 2004) while attention to words was reflected in left-lateralized areas devoted to visual word processing.

# Interaction Effects: Task by Picture Category

Amplifying the pronounced differences in the engagement of attention-related regions by the goal to process the pictures, interaction effects of task and emotional intensity were only seen for pictures but not words. The posterior parietal cortex and precuneus belonged to the regions in which the main effect of task was further qualified by an interaction with Picture Category. Detailed assessment of this interaction revealed that the main effect was largely carried by relative activation increases to romantic pictures in the picture task as compared to the word task, while no differential response was apparent to erotic images (see also Supplementary Figure 1). Previous research has shown that emotional images automatically direct saccades (Calvo and Lang, 2005; Nummenmaa et al., 2009) and facilitate spatial orienting toward these stimuli (Ohman et al., 2001; Koster et al., 2004; De Houwer and Tibboel, 2010). Furthermore, the posterior parietal cortex and precuneus are believed to be important regions involved in the regulation of visuo-spatial attention (Vossel et al., 2014). One hypothesis is accordingly that the interaction observed in these regions reflects that erotic images inherently direct visual attention toward features facilitating recognition and categorization regardless of the task requirements while spatial attention needs to be voluntarily directed toward relevant features when romantic pictures have to be categorized.

A number of regions were observed which revealed interaction effects without overlapping task effects. These included sizeable activations in the bilateral dorso-lateral prefrontal cortex, frontal eye-fields, intra-parietal regions, and midline regions, including pre-SMA, the anterior cingulate cortex, and the right anterior insula. Follow-up analyses characterized the interaction pattern as relatively enhanced activation toward romantic pictures during the picture categorization task and relatively enhanced activation toward erotic pictures during the word categorization task. With regard to the understanding of potentially underlying processes, a previous study by Wessa et al. (2013) appears particularly informative (see also Iordan et al., 2013). Specifically, the authors examined the effects of emotional pictorial distracters on mental arithmetic. Assessing task-execution under the presence of emotional as compared to neutral pictures, they report a strikingly similar pattern of brain networks and emphasize these regions' importance for the upholding of task goals under conditions of emotional distraction. Their experiment directly corresponds to the word categorization task in the present study, in which the picture stimuli are task-irrelevant. Here, the picture stimulus dimension effectively acts as a distracter and this appears to be particularly pronounced for erotic stimuli. However, this may at first seem to be at odds with increased activity toward romantic pictures under the picture categorization task. Conceivably, while acting as distracters during word categorization, erotic pictures may instead facilitate categorization under the picture task instruction. Under this premise, the found interactions likely reveal the differential activation of brain networks involved in maintaining task goals under differential demands for executive control. The reaction time data also corroborate this notion as they indicate a response benefit of erotic pictures in the picture task which apparently translates into a disadvantage in the word task. Additionally, this conclusion is further supported by research utilizing visual Stroop tasks. In related studies, networks largely compatible with the present observations are often implicated in conflict processing (Roberts and Hall, 2008). Interestingly, exclusively right-hemispheric activation of the anterior insula, as observed here, has previously been associated with conflicting approach-withdrawal reaction tendencies brought forward by highly-arousing, positive stimuli (Citron et al., 2014). Finally, the anterior insula has also been suggested to be associated with emotional awareness by integrating bottom-up and top-down information (Gu et al., 2013). This aligns well with the present study in which participants had to cognitively evaluate a stimulus while this stimulus's emotional salience called upon involuntary physiological reactions. The observation that the emotionality of words apparently did not affect task-related activation underscores the pre-eminence of processing pictorial information. In sum, the networks brought forward by the task-by-picture category interaction likely reflect task-related processing which may be facilitated or impeded depending on the emotional intensity of the pictures.

# Stimulus Effects: Processing of Emotional Pictures and Words

Previous research indicated that the processing of emotional pictures and words is seen in distinct brain regions. The present study confirmed these findings by presenting these two stimulus classes concurrently (see also Kensinger and Schacter, 2006). With regard to pictures, the processing of high-arousal erotic as compared to low-arousal control pictures was associated with increased activations in extended regions of the extrastriate visual and inferior temporal cortices. Previous research observed that the sensory-perceptual processing of emotional stimuli varies with the availability of processing resources (Pessoa et al., 2002; De Cesarei et al., 2009; Schupp et al., 2014). However, given the strong and sizeable effects observed both for the interaction between task and picture category, as well as for erotic picture viewing, modulations of the latter by task focus seen in visual processing regions were comparably minute. This presumably reflects little competition by words for processing resources claimed by erotic pictures. Given that no interaction with word category was found, this attests to a strong attentional bias toward erotic pictures and highlights the automaticity and expertise in extracting semantic meaning from pictures and words (Thorpe et al., 1996; Augustinova and Ferrand, 2014). Furthermore, larger activations in regions of the dorso-medial prefrontal cortex and the precuneus using erotic stimuli replicated previous research investigating emotional stimulus processing (Sabatinelli et al., 2011; Lindquist et al., 2012). However, the present study did not observe a differential response to the picture categories in sub-cortical limbic structures, most notably the amygdala, which has often been observed to be associated with erotic stimulus processing. The difference in findings may relate to the control category. Specifically, the picture control category depicted couples in pleasant romantic contexts, and the affective distance between the stimulus categories may have been suboptimal in bringing forward emotional differentiation in the amygdala and other limbic regions. This interpretation possibly relates to findings that these regions respond to both highly and mildly arousing social stimuli (Goossens et al., 2009; Vrticka et al., 2013). This reasoning is also broadly consistent with the observation in the present study that the amygdala was activated when attention was explicitly directed toward pictures, regardless of picture category. This may be taken as an indication for competition between explicit task demands and implicit attention in the amygdala (Pessoa et al., 2003; Hsu and Pessoa, 2007).

The processing of positive as compared to neutral words led to increased activations in several left-lateralized clusters, including the inferior and medial superior frontal gyri, left parietal cortex, left hippocampus, and amygdala. These findings largely replicate strongly left-lateralized activation patterns reported in previous studies of emotional word processing (Kensinger and Schacter, 2006; Herbert et al., 2009; Hoffmann et al., 2015) and are consistent with the view of left-lateralized language functions in humans (Price, 2012). More specifically, areas in left ventrolateral prefrontal, mesial superior frontal and inferior parietal regions have all been connected to semantic and evaluative processing of language (Devlin et al., 2003; Salmelin and Kujala, 2006; Binder et al., 2009; Price, 2012). Interestingly, in the present study the finding of enhanced activations in extrastriate visual cortex associated with the processing of words depended on task focus and the goal-directed allocation of attention. Specifically, although cortical brain regions related to semantic stimulus processing and limbic regions related to affective evaluation responded to word emotionality irrespective of task, increased activations in extrastriate regions to words were only seen when participants were conducting the word categorization task. This observation relates to a recent study examining neural correlates of reading (Hillen et al., 2013). In this study, activation in according extrastriate regions was associated with the visual scanning of written language but not with semantic, syntactic, or orthographic processing. These processes in contrast were most notably associated with activation in areas of left-lateralized prefrontal cortex. This study's results are highly reminiscent of the present observations regarding word processing and suggest a dissociation of sensoryperceptual and affective-semantic processing in extrastriate and prefrontal/subcortical regions, respectively. While affectivesemantic evaluation of the words seems to be automatic and undisturbed by task demands or picture emotionality, perceptual processing of words during reading is affected by both processes as indicated by the interaction in extrastriate cortex (**Figure 5C**). In addition, considering that other research reported similar activations to words also during cognitively undemanding silent reading (Herbert et al., 2009), extrastriate activity to visually presented words may thus not depend on task focus per se. Rather, these observations are consistent with the view of competition for shared resources in extra-striate visual cortex while activity in stimulus-specific semantic and limbic word processing regions is preserved (Lavie, 2005). One may accordingly speculate that the increased activation in extrastriate cortex reflects recurrent processing loops flexibly engaged depending on behavioral goals and the availability of processing resources. Overall, regarding emotional word processing the present data suggest a dissociation of semantic and affective evaluative processes, on the one hand, and sensory processing, on the other hand, when explicit attention is directed toward pictures.

# Limitations

While the present design was successful at detailing common and specific brain responses to the implicit emotional significance of pictures and words as well as to explicit attentional demands, some characteristics of the used stimuli require further consideration. Specifically, the emphasis on stimulus selection was on the emotional arousal dimension and the comparability of the stimulus categories in terms of linguistic parameters of the words, i.e., word length, number of syllables, imageability, and word frequency as well as stimulus characteristics of the image, i.e., picture complexity, color, number of people and categorical homogeneity. High control on some stimulus properties led to differences in other characteristics. Specifically, pictures were drawn from selected categories of human experience while words represented a broad range of experiences. Furthermore, while both stimulus classes differed in emotional arousal, the strong physical and semantic control exerted for the pictures made it not feasible to select a control category differing both in arousal, as well as valence. Thus, while emotional modulation of word processing may be attributed either to variations in arousal or valence, differentiations due to picture category may only be associated with arousal. This may account in part for the lack of congruency and/or incongruency effects between picture and word categories in the present results (Klasen et al., 2011). In addition, extensive previous research has demonstrated that the preferential processing of emotional stimuli is associated both with common, but also with distinct brain regions depending on emotional valence and arousal, as well as specific emotional content (Vytal and Hamann, 2010; Sabatinelli et al., 2011; Citron, 2012). With regard to erotic pictures, the regions found in the present study are not characterized by high content specificity (Sabatinelli et al., 2011) and thus likely reflect attentional processes evoked by a large variety of emotionally arousing pictures. Regarding words, previous research has detailed differentiations according to valence and arousal of the stimulus materials but also according to whether the emotional connotation of the words had to be processed directly (reviewed in Citron, 2012). However, only one study addressed both issues utilizing fMRI (Straube et al., 2011). Most notably, in this study none of the regions reported here were found to be modulated by task or by stimulus valence. Another study by Citron et al. (2014) orthogonally manipulated both arousal, as well as valence of words using an indirect lexical decision task. Of note, in this study none of the regions reported here were modulated by valence. In addition, several previous studies reported comparable leftlateralized semantic and subcortical limbic regions associated with the processing of both positive, as well as negative emotional words as observed here (Hamann and Mao, 2002; Cato et al., 2004; Kensinger and Schacter, 2006; Herbert et al., 2009; Straube et al., 2011). Thus, the present results most likely reflect selective processing associated with the emotional arousal of the words. However, the present study is not conclusive toward this end and future research should strive to further detail the involvement of specific brain regions in the processing of valence, arousal and emotional task by selecting experimental stimuli which systematically vary with regard to semantic categories, valence (including negative stimuli), and arousal (including low and high arousing stimuli) of the word and picture stimuli.

# CONCLUSION

The present study examined costs and benefits of the processing of emotionally arousing pictures and words when the stimuli were either task-relevant or task-irrelevant. The implicit significance of emotional stimuli was reflected in distinct brain regions for the processing of pictures and words, respectively. Of note, the activity in these regions was similar when the stimuli were task-relevant or irrelevant suggesting that there was no competition for processing resources in respective brain regions. However, effects of competition were observed in the leftlateralized visual cortex between explicit attention to words and implicit attention to picture emotionality. Finally, widespread fronto-parietal networks were apparent as a function of the interaction between explicit task demands and picture category, specifically. Overall, these results attest to the brain's ability to process emotional information from different visual sources in parallel when these do not share common resources and suggest the flexible entrainment of large-scale neural networks depending on processing goals, obligatory processing demands of the stimulus type, and the emotional intensity of distracter stimuli.

# ACKNOWLEDGMENTS

We thank Martina Nuding, Nicole Roth, René Göller, Manuela Reichen, Anna Kenter and Bea Heger for their assistance in data acquisition and stimulus selection. This work was supported in

# REFERENCES


part by the German Research Foundation [DFG, Schu 1074/10-3 and RE 3430].

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01861


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Flaisch, Imhof, Schmälzle, Wentz, Ibach and Schupp. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The emotion potential of simple sentences: additive or interactive effects of nouns and adjectives?

Jana Lüdtke1, 2 \* and Arthur M. Jacobs 1, 3, 4

<sup>1</sup> Department of Education and Psychology, Experimental and Neurocognitive Psychology, Freie Universität Berlin, Berlin, Germany, <sup>2</sup> Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin Berlin, Germany, <sup>3</sup> Languages of Emotion, Freie Universität Berlin, Berlin, Germany, <sup>4</sup> Dahlem Institute for Neuroimaging of Emotion, Berlin, Germany

#### Edited by:

Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Germany

#### Reviewed by:

Christoph W. Korn, University of Zurich, Switzerland Sascha Schroeder, Max Planck Institute for Human Development, Germany

#### \*Correspondence:

Jana Lüdtke, Department of Education and Psychology, Freie Universität Berlin, Habelschwerdter Allee 45, 14195 Berlin, Germany jana.luedtke@fu-berlin.de

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 04 December 2014 Accepted: 21 July 2015 Published: 11 August 2015

#### Citation:

Lüdtke J and Jacobs AM (2015) The emotion potential of simple sentences: additive or interactive effects of nouns and adjectives? Front. Psychol. 6:1137. doi: 10.3389/fpsyg.2015.01137 The vast majority of studies on affective processes in reading focus on single words. The most robust finding is a processing advantage for positively valenced words, which has been replicated in the rare studies investigating effects of affective features of words during sentence or story comprehension. Here we were interested in how the different valences of words in a sentence influence its processing and supralexical affective evaluation. Using a sentence verification task we investigated how comprehension of simple declarative sentences containing a noun and an adjective depends on the valences of both words. The results are in line with the assumed general processing advantage for positive words. We also observed a clear interaction effect, as can be expected from the affective priming literature: sentences with emotionally congruent words (e.g., The grandpa is clever) were verified faster than sentences containing emotionally incongruent words (e.g., The grandpa is lonely). The priming effect was most prominent for sentences with positive words suggesting that both, early processing as well as later meaning integration and situation model construction, is modulated by affective processing. In a second rating task we investigated how the emotion potential of supralexical units depends on word valence. The simplest hypothesis predicts that the supralexical affective structure is a linear combination of the valences of the nouns and adjectives (Bestgen, 1994). Overall, our results do not support this: The observed clear interaction effect on ratings indicate that especially negative adjectives dominated supralexical evaluation, i.e., a sort of negativity bias in sentence evaluation. Future models of sentence processing thus should take interactive affective effects into account.

Keywords: sentence comprehension, affective sentence structure, emotional valence, supralexical evaluation, neurocognitive poetics model, affective congruency effect, sentence verification, situation model building

# Introduction

In contrast to a comprehensive neurocognitive poetics model of literary reading (Jacobs, 2011, 2015a,b) most theories of word recognition and sentence processing disregard the role of affective content and emotional experiences. Nevertheless, there is much empirical evidence showing that at the three main levels of psychological description, i.e., experiential, behavioral, and neuronal, the processing of emotion-laden words differs from that of neutral words (for review: Citron, 2012; Jacobs et al., 2015, this issue). Despite differences in experimental designs and measures, "emotional" words are typically understood as words expressing emotions (e.g., sad, lonely, proud, jolly) or possessing "emotional connotations" (e.g., betrayer, nasty, family, successful). Such words are usually characterized within the framework of dimensional models of emotion along the two axes of arousal and valence. Studies on single word processing, which constitute the vast majority of research on affective text processing, have highlighted various processing differences for emotional compared to neutral words in various time windows following word (e.g., Kuchinke et al., 2005, 2007; Kissler et al., 2006, 2009; Herbert et al., 2008; Hofmann et al., 2009; Kousta et al., 2009; Schacht and Sommer, 2009; Scott et al., 2009; Palazova et al., 2011; Citron, 2012; Sheikh and Titone, 2013; Kuperman et al., 2014; Recio et al., 2014). In most studies differences are more pronounced for positive compared to negative words (Kuchinke et al., 2005, 2007; Kanske and Kotz, 2007; Estes and Verges, 2008; Larsen et al., 2008; Hofmann et al., 2009; Schacht and Sommer, 2009; Scott et al., 2009; Palazova et al., 2011; Recio et al., 2014).

# Affective Word Processing in Context

In order to develop sufficiently precise neurocognitive models of affective word processing (Jacobs et al., 2015, this issue) such studies on single word processing appear to be in need of complementary research characterized by higher degrees of ecological validity, as shown in recent attempts at more natural approaches to language use and reading (e.g., Altmann et al., 2012; Hsu et al., 2014, 2015a,b,c; Wallot, 2014; Willems, 2015). Up to now only, a few studies have examined affective word processing in sentence contexts. Pioneers in this respect were Fischler and Bradley (2006), who investigated the processing of coherent adjective noun phrases. They basically replicated ERP differences between negative and positive words compared to neutral ones in different time windows, which were usually found in studies on single word processing. The enhanced processing for emotional compared to neutral words were also replicated for words embedded in whole meaningful sentences (De Pascalis et al., 2009; Holt et al., 2009; Bayer et al., 2010; Martín-Loeches et al., 2012; Scott et al., 2012; Delaney-Busch and Kuperberg, 2013; Ding et al., 2014). Other studies demonstrated that the pronounced effect for positive words could also be observed within phrases and sentences. For example, Schacht and Sommer (2009) examined the processing of positive, negative and neutral verbs in a minimal semantic context (i.e., word pairs) and found that the ERPs for emotional verbs following a single noun reflect enhanced processing compared to neutral ones. Moreover, their behavioral data showed an advantage for positive verbs across different tasks. Furthermore, Jiang et al. (2014) demonstrated that sentences containing a high pleasure adjective lead to shorter reaction times in a valence decision task than sentences containing low pleasure adjectives. This effect occurred irrespective of sentence polarity. Even when containing a negation, reaction times for sentences with high pleasure adjectives were shorter, although the negation changed the valence of the whole sentences. The behavioral effects were accompanied by an early ERP effect for the emotional adjectives indicating advanced processing for high pleasure adjectives. Besides the main effect of valence, the authors described a significant interaction between word valence and sentence polarity in later time windows, which were more strongly associated with contextual integration. They assumed that after the rapid extraction of word valence, further processing of emotional words like the integration in a mental representation of the whole phrase (e.g., a situation model), is influenced by the sentence context. There exists empirical evidence, that not only context information influences the processing of emotional word**s** but also that the emotional salience of a word can modulate its integration into sentences or discourse (León et al., 2010; Moreno and Vázquez, 2011; Leuthold et al., 2012; Ding et al., 2014; Hsu et al., 2014, 2015a,b,c). Wang et al. (2013) presented plausible questionanswer pairs while varying the emotional salience of the target word and its linguistic focus. Besides typical early ERP effects indicating an initial highly automatic processing advantage of emotional words, an interaction between emotional salience and information structure in the later N400 component was observed. Corresponding with Jiang et al.'s (2014) results the authors interpreted this interaction as evidence for an attentionemotion interaction at later processing stages associated with the integration of the emotional meaning into the mental representation of the whole sentences (see also De Pascalis et al., 2009; Holt et al., 2009; Martín-Loeches et al., 2012). Whether the emotionality of words influences the processing of following neutral words was tested by Ding et al. (2014). They presented orthographically correct and incorrect neutral object nouns after emotional and neutral verbs in the context of simple declarative sentences. The ERP effects observed for the orthographic violation differed for emotional compared to neutral context. Whereas, after neutral verbs orthographically incorrect nouns elicited a smaller P2 and a larger N400 compared to correct nouns, only a late positive effect starting at 500 ms was observed after emotional verbs. The authors' interpretation was that emotional words captured and held more attentional processes compared to neutral ones and thus compromised the early processing of following neutral words leading to a general reanalysis especially on perceptual and lexico-semantic levels.

# Does the Processing of Emotional Words Influence the Processing of other Emotional Words?

Despite the growing number of studies investigating effects of embedded emotional words, up to now is not clear to what extent the processing of such words influences the processing of other emotional words presented in the same sentence. Regarding the processing of single words effects of the emotional connotation of one word on the processing of a following word are usually described as affective priming (cf. Fazio, 2001; De Houwer et al., 2009). Since Fazio et al. (1986) first described a processing advantage for emotional target words following an emotionally congruent prime in an evaluative decision task, subsequent research replicated and extended the original findings many times (see Klauer and Musch, 2003, for a review). Typically, faster and less error-prone responses were observed when prime and target are affectively congruent (i.e., positive–positive, negative– negative) than when they are incongruent (i.e., positive–negative, negative–positive). Most important, affective priming effects have also been found for (implicit) tasks not focusing on the processing of the emotional meaning, e.g., naming and lexical decision (Hill and Kemp-Wheeler, 1989; Bargh et al., 1996; De Houwer and Randell, 2004; Spruyt et al., 2007). These studies support the view, that affective priming effects can be explained with a pre-activation of evaluatively congruent targets by spreading activation within a semantic network or by semantic pattern priming in a distributed memory system (Bargh et al., 1996; Fazio, 2001; Spruyt et al., 2007; see Hofmann and Jacobs, 2014; for a neurocomputational model implementing such a mechanism). Especially in evaluative decision tasks, processes related to response priming also seem relevant. Here it is assumed that affective primes automatically activate the corresponding evaluative response that is the correct one in congruent, but the incorrect one in incongruent trials (Klauer and Musch, 2003). Klauer et al. (2005) assumed that response-related priming should be larger than semantically mediated priming effects. This is in line with the observation that in tasks in which the affective prime-target congruency is unrelated to the response and the task at hand, affective priming effects could not be observed reliably (Klauer and Musch, 2003). Primarily due to the last fact it still remains unclear, whether or not the processing of emotional words is mutually interrelated when they are embedded in sentences. Ding et al. (2014) demonstrated that emotional words influenced the processing of upcoming neutral words. Whether these influences differ for upcoming emotional words as suggested by semantically mediated affective priming effects reported for single word processing is an empirical question. Up to now only Fischler and Bradley's (2006) study explored possible interaction effects. They reported no significant congruence effects on ERPs recorded for positive, neutral, and negative nouns following positive, neutral, or negative adjectives. The ERPs observed for the nouns did not differ as a function of the emotional meaning of the preceding adjective, irrespective of whether the two consecutive words were processed as phrases or as single words. As discussed by the authors, one reason for the non-occurrence of any congruence effects could be the presentation mode, i.e., the ERP-typical serial presentation of words separated by a blank monitor. As shown by Hermans et al. (2001) affective priming effects are moderated by the stimulus onset asynchrony (SOA) between prime and target. They appear to be based on fast-acting automatic processes, are quite shortlived, and thus should only be observed reliably for SOAs below 300 ms. Fischler and Bradley, however, used longer SOAs of 750 ms between the emotional words. As discussed by the authors these circumstances might have minimized the possibility to observe any interaction.

In spite of the possibility that affective priming is limited to tasks focusing on valence evaluation (Klauer and Musch, 2003), results from a recent study on sentence comprehension indicated, that the processing of emotional words embedded in sentences could be influenced by emotional information delivered by preceding words. León et al. (2010) measured ERPs for positive or negative adjectives qualifying the emotion of protagonists that were introduced in narratives describing emotional episodes presented before the critical sentences. The protagonist's emotions mentioned in the critical sentences were either consistent or inconsistent with the preceding story. Inconsistent emotions were found to elicit larger N1/P2 and N400 complexes than consistent emotions, indicating clear interactions between the emotional valence of the critical word and the emotional meaning of the context. Given the significant temporal interval between the reading of the context story and the on-line sentence processing, the authors interpreted the congruency effects as a discourse level phenomenon. Although the authors argued, that such effects could not be observed in isolated words. It is unclear whether these effects could also be due to some form of long-term affective priming based on an automatic spread of evaluative activation (e.g., Eder et al., 2012).

# Affective Meaning Making at the Supralexical Level

Most of the studies presented above focus on the early processing of emotional words or its integration into sentence context. The issue of how the affective meaning of phrases or sentences as supralexical units is constructed from the words constituting this unit was not explored. Based on the logico-philosophical tradition since Frege, according to which the literal meaning of a sentence could be determined by the meanings of its parts and their syntactical combination, it can be assumed that the emotion potential of supralexical units is a (linear or non-linear) function of the emotion potential of the words included therein (Hsu et al., 2015c; Jacobs, 2015a,b). Accordingly, the simplest model predicting the emotion potential of a sentence should take into account only the emotional or connotative meaning of its component words while neglecting other potentially relevant influences like their syntactic role or the constituents' order (Jacobs, 2015a). Following this account a simple declarative sentence containing a positive noun and a negative adjective like "The mother is bad" should —on average—be evaluated as neutral. First empirical evidence for this "null-model" of supralexical affective meaning was obtained by Bestgen (1994) and Whissell (1994), both demonstrating that the valence of supralexical units could be predicted—to a considerable extent as a function of the emotional or connotative meaning of their component words.

This most simple model does not take into account other potentially relevant variables, e.g., different syntactic roles of words, which could also be relevant for affective meaning construction. The simple "The mother is bad" contains a noun and an adjective. While nouns occur as the head of a noun phrase and refer to concrete entities such as people or things, adjectives characterize noun phrases and modify their meaning. Therefore, especially evaluative and emotive adjectives denoting specific features of possible noun referents may induce deeper elaborative affective processing than nouns. Empirical evidence for this assumption comes from an ERP-study by Palazova et al. (2011) reporting differences in emotional effects in single word processing of nouns and adjectives. Besides an early effect for emotional compared to neutral words for both nouns and adjectives, Palazova et al. observed a second emotion effect around 450 ms, but this time only for adjectives. They interpreted this Late Positivity Complex (LPC) observed for positive compared to both neutral and negative adjectives as an index of sustained and elaborate processing of the emotional aspects of adjectives, which was possibly not induced by nouns. Thus, if an adjective is presented not in isolation but as predicative adjective modulating the meaning of a noun, it can be assumed, that the emotional meaning of the adjective dominates the affective meaning of the supralexical unit as a whole. A recent study by Liu et al. (2013) demonstrated such a dominance effect for emotional adjectives. They compared valence evaluations for positive, neutral, and negative nouns, which were read after a positive or negative adjective or even in a non-context condition without a preceding word. Although the participants evaluated only the emotional valence of the noun, preceding emotional adjectives modulated the results. Positive adjectives biased the noun evaluation toward stronger positive ratings compared to evaluations for isolated nouns whereas negative adjectives led to stronger negative noun evaluations compared to the non-context condition. The modulation effect was greatest when the preceding adjective was negative and the to-be-evaluated noun was positive. This superiority effect for negative adjectives is in line with the often observed negativity bias describing the stronger impact of negatively valenced compared to positively valenced events on different evaluation and attention related processes (for an overview see Baumeister et al., 2001). It is assumed, that the negativity bias operates especially at the evaluative-categorization stage (Ito et al., 1998) and that negative information therefore dominates the evaluation of combinations of negative and positive entities yielding more negative evaluations than the algebraic sum of individual valences would predict (Rozin and Royzman, 2001). Taken together, the assumed dominance of emotional adjectives in the evaluation of simple supralexical units and the ubiquitous negativity bias lead to the prediction, that valence ratings of simple declarative sentences like "The mother is bad" should be characterized by a negativity bias especially for adjectives. Challenging the simple null model outlined above, this prediction includes an interactivity assumption of affective word processing.

# Aims of the Present Study

The present study was designed to investigate to what extent the processing of emotional words within a sentence context shows interactive effects. It is now a well-established result that both the early processing of a word as well as the following integration into the context of a phrase, sentence, or short discourse can be modulated by the emotional connotation of that word. Whether or not the processing of an emotional word is also influenced by the processing of other emotional words presented in the same sentence remains, however, an open question. Results from the field of affective priming suggest that interactive effects are possible even in tasks not focusing on the emotional meaning of the words or phrases. However, possible congruency effects are very short-lived. To observe reliable interaction effects, it seems necessary that the crucial words are processed within a time window of about 300 ms. In order to test such interaction effects between emotional words we therefore presented simple declarative sentences containing a noun and an adjective (e.g., The grandpa is clever) separated only by a short auxiliary verb. Combined with a self-paced reading paradigm this ensured that the critical time window was obtained. To investigate the influence of the emotional connotation the valence of the nouns and adjectives was manipulated using the Berlin Affective Word List (BAWL; Võ et al., 2009; Jacobs et al., this issue). To ensure processing at sentence level participants had to decide whether a sentence was meaningful (e.g., The grandpa is clever) or not (e.g., The cheese is intelligent).

Besides the question whether the early processing of emotional words presented in sentences is interrelated, we were also interested in seeing how the affective meaning of single words influenced the interpretation of the sentences as a supralexical unit. To investigate this aspect of meaning construction, we expand the present study by a second task requiring an explicit and deep processing of the affective meaning of words. The meaningful emotional sentences used in the sentence verification task were presented again, but this time participants had to rate the emotional valence and arousal of the whole sentences as supralexical units.

# Hypotheses

Although a sentence verification task does not require explicit processing of the affective meaning of words to yield correct responses, we generally expected that the emotional valence of nouns and adjectives automatically influences the reaction times. Empirical research on single word processing provides ample demonstrations that even a superficial semantic elaboration as required for lexical decisions is sufficient for observing emotional effects (Jacobs et al., 2015, this issue). Based on the above-mentioned literature, we therefore anticipated shorter verification times for sentences containing emotional adjectives compared to sentences with neutral ones. Moreover, due to the often-reported processing advantage of positive over negative words, we also expected shorter verification times for sentences with positive compared to negative adjectives and nouns, i.e., a positivity superiority effect. If the emotionality of a word basically influenced its early processing (e.g., Recio et al., 2014)—and if the positivity superiority was a general phenomenon—the following rank order of verification times should be obtained: emotionally congruent sentences with no positive word > emotionally incongruent sentences with only one positive word > emotionally congruent sentences with two positive words. If, on the other hand, the processing of emotional nouns and adjectives interacted, as suggested by the affective priming literature, verification times should be faster for sentences with emotionally congruent words (e.g., sentences with positive nouns followed by positive adjectives) than for emotionally incongruent sentences and sentences with neutral adjectives. This rank order should be reflected in a significant interaction between the two variables indicating the valence of the noun (valence group—noun) and the adjective (valence group—adjective). To ensure that a potential interaction effect depends primarily on the emotional relation between nouns and adjectives, we controlled the semantic associations between them. As shown neurocomputationally by Hofmann and Jacobs (2014) direct first-order co-occurrences of two words are valid indices of their semantic association. We thus matched the sentence based first-order co-occurrences for nouns and adjectives in all six experimental conditions.

For the evaluation task the simple "null-model" presented above predicts that the valence ratings conducted in the second part of the study should depend equally on the valence of both of their constituents, the noun as well as the adjective. More precisely, lower valence ratings should be observed for sentences with negative compared to positive adjectives and nouns. This should lead to the following rank order of the sentence valence ratings: sentences with two negative words < sentences with only one negative word < sentences with only one positive word < sentences with two positive words.

Taking into account other potentially relevant variables especially the syntactic role, different predictions arise. The assumed dominance of emotional adjectives in the evaluation of simple supralexical units and the ubiquitous negativity bias would predict that the valence ratings of our simple declarative sentences show a negativity bias especially for adjectives. We thus hypothesized that sentences with a negative adjective were evaluated as strongly negative, while the valence of the nouns should have only a minor influence. For sentences with neutral and positive adjectives, the overall valence evaluation should also be influenced by both the emotional connotation of the adjectives and the nouns. If these adjectives were preceded by negative nouns, the overall valence ratings should be biased in a more negative direction compared to sentences with positive nouns.

# Methods

#### Participants

Thirty-six participants (21 female; age: M = 28.36, SD = 6.93, range = 20–47), all native German speakers, were recruited at the Freie Universität Berlin. All had normal or corrected-to-normal vision and were paid for participation. The whole experiment followed the rules set by the ethical guidelines of the German Psychological Society's 121 (DGPs, 2004, CIII). Participants were informed about taking part in research, about the possibility of quitting the experiment with no disadvantage at any time and about the fact that all data was anonymously collected and analyzed. They provided informed consent and allowed us to use their collected data anonymously for publications.

#### Stimuli

Ninety-six item sets were constructed, each containing six different meaningful declarative sentences. Each of the resulting 576 different experimental sentences contained a noun in the subject position, the auxiliary verb be, and an adjective describing a feature of the noun. To construct the different sentences, we selected a positive and a negative noun as well as a positive, a neutral, and a negative adjective for each item set from the extended version of the BAWL-Reloaded (Conrad et al., unpublished) which included valence ratings from a seven-point rating scale (reaching from −3 to 3) for over 6000 words. The selection was based on the following criteria: (1) mean rating of emotional valence in one of three emotional valence categories negative (mean rating <-1.0), neutral (−0.5 < mean rating < 0.5), or positive (mean rating >1.0); (2) no differences due to the standard deviation of single word ratings between groups of positive and negative words, and (3) the combination of the positive and the negative noun with the positive, neutral, and negative adjective should yield six different meaningful sentences per item set. **Table 1** gives examples for the six combinations in one item set. **Table 2** reports means and standard deviations for important features of the words used in the emotional valence categories of nouns and adjectives.

Nouns in the two emotional valence categories did not differ significantly in mean frequency [t(95) = −0.42, p = 0.68], number of letters [t(95) = −0.83, p = 0.41], number of syllables [t(95) = −0.19, p = 0.85], mean imageability ratings [t(86) = 0.55, p = 0.58], and standard deviations for valence ratings [t(71) = −1.43, p = 0.16]. In line with our selection criteria, the mean valence ratings for the nouns differed significantly [t(95) = 50.12, p < 0.0001]. Due to the typical asymmetrical inverted U-shaped relation between valence and arousal ratings of the BAWL (e.g., Võ et al., 2009), the mean arousal ratings for the group of positive and negative nouns also differed significantly [t(93) = −8.71, p < 0.001]. Adjectives in the three different emotional valence categories did not differ significantly in mean frequency [F(2,94) = -0.64, p = 0.53], number of letters [F(2, 94) = 0.88, p = 0.42], number of syllables [F(2, 94) = 0.04, p = 0.96], and mean imageability ratings [F(2, 74) = 0.01, p = 0.99]. The standard deviations of the valence ratings were equal for positive and negative adjectives [t(62) = 1.83, p = 0.14], whereas neutral adjectives had significantly higher standard deviations [negative vs. neutral: t(69) = −3.33, p = 0.001; positive vs. neutral: t(69) = −5.16, p < 0.001]. As for the nouns, mean valence ratings

TABLE 1 | Example sentences and mean co-occurrence measures (Ms and SDs) for each of the six conditions.


Sentences based co-occurrence measures were taken from the German corpus of the "Wortschatz" project (http://corpora.informatik.uni-leipzig.de/, status: December 2006; Quasthoff et al., 2006).

#### TABLE 2 | Stimulus characteristics (Ms and SDs).


<sup>a</sup>Word frequencies were taken from the dlexDB database (Heister et al., 2011).

<sup>b</sup>Ratings were taken from the extended version of the Berlin Affective Word List–Reloaded (Conrad et al., unpublished).

differed across the three emotional valence categories [positive vs. negative: t(95) = 55.84, p < 0.0001; positive vs. neutral: t(95) = 29.77, p < 0.0001; negative vs. neutral: t(95) = −33.81, p < 0.0001]. The arousal ratings differed between positive and negative adjectives [t(95) = −6.67, p < 0.0001], and neutral vs. negative adjectives [t(95) = 6.14, p < 0.0001], but not for positive vs. neutral adjectives [t(95) = 0.09, p = 0.93]. To control for different semantic relations between nouns and adjectives in the sentences of each condition, sentences based cooccurrence measures were collected from the German Corpus of the "Wortschatz" project (http://corpora.informatik.uni-leipzig. de/, status: December 2006; Quasthoff et al., 2006). There were no significant differences between the six conditions [F(5, <sup>567</sup>) = 0.60, p = 0.70; all pairwise comparisons using Tukey's honestly significant difference tests were also not significant (all p > 0.69)]. Moreover, the mean sentence based co-occurrences (see **Table 1**) indicated no significant semantic associations between nouns and adjectives in most sentences: Only in 4% of all sentences, the co-occurrence measures exceeded the critical value indicating a semantic association (Hofmann et al., 2011). Ninety-six additional nonsense filler sentences were constructed. All of them followed the structure of the meaningful declarative sentences with the exception that nonsense "meanings" were generated by combining animated nouns with adjectives describing features of inanimated objects or vice versa (e.g., The milk is careful).

#### Design and Procedure

The study was divided into two parts: a sentence verification task followed by a rating task. In both parts each participant read only one of the six sentences of an item set. The 96 experimental item sets were assigned to six groups, the 36 participants to six groups, and the assignment of versions to both groups followed a 6×6 × 6 Latin square. We employed a 2 (valence group–noun) × 3(valence group–adjective) design with both variables being manipulated within participants and item sets. Each participant verified 16 sentences in each of the six conditions. Each of the six sentences per item set were verified by eight participants. The sentences used in the verification task were presented again in the rating task. To prevent fatigue and increase the reliability and ecological validity of the evaluations, the participants rated only half of the experimental stimuli. We therefore divided each of the six participant groups further into two subgroups. One subgroup rated the first half of the experimental sentences for this participant group, the other subgroup rated the second half. Each participant evaluated eight sentences per condition and each sentences was rated by four participants.

In the sentence verification task, each trial started with the presentation of a blank monitor for 1000 ms, followed by a fixation cross in the center of the screen for 800 ms. After presenting an additional blank screen for 800 ms, a sentence appeared in the center with black letters on white background (in 20 point Arial font). All sentences appeared in one line. The participants were instructed to decide as quickly and accurately as possible whether the presented sentences made sense or not. They indicated their responses with pressing the left and the right arrow key with the left and right index finger. Upon response registration, the sentence disappeared. During the experimental trials participants were not given any feedback on their responses. However, the verification task started with 10 practice trials, which included feedback. Then the 96 experimental sentences were presented in random order intermixed with the 96 nonsense filler sentences. We used a complete randomization to make sentence order individual for each participant.

The sentence ratings started with two examples. Emotional valence was rated on a nine-point scale ranging from very negative (−3) to neutral to very positive (+3). In addition to the verbal anchor, the valence scale of the Self-Assessment Manikin (SAM; Bradley and Lang, 1994) was presented. Arousal was rated on a five-point scale ranging from 1 [ruhig (very calm)] to 5 [aufregend (exciting)], again using the corresponding SAM-scale (cf. Schmidtke et al., 2014). Each trial in the rating part started with the presentation of a sentence at the top of the screen together with the valence scale. Participants used the numbers of the keyboard to indicate their response. Then the arousal scale appeared below. Both ratings and reaction times were recorded. An experimental session took approximately 35 min.

#### Analysis

All analyses are based only on the results of the experimental sentences. Results for filler sentences were discarded. Following recent recommendations for psycholinguistic experiments, statistical analyses were conducted using linear mixed effects regression models (e.g., Baayen, 2008; Baayen et al., 2008; Jaeger, 2008). These were run in R version 3.10 (R Core team, 2014) employing models with crossed effects of subjects and item sets using the lme4 package (Bates et al., 2014). For the analysis of verification times, as well as valence and arousal ratings of the whole sentences, the fixed effects in the models included the categorical variables valence groupnoun (VG-N: positive vs. negative) and valence group-adjective (VG-A: negative vs. neutral vs. positive). Moreover, due to the reported arousal differences between valence categories, the arousal ratings for nouns (ARO-N) and adjectives (ARO-A) taken from the extended BAWL (Conrad et al., unpublished) were included as metrical covariates. To avoid collinearity and maximize likelihood of model convergence, both variables were centered prior to analysis (Baayen, 2008). Fixed effects were checked with Wald F-tests with a Kenward–Roger approximation of degrees of freedom. Random intercepts were included for subjects and item sets with, if possible, maximal random slopes (Barr et al., 2013). Error rates were analyzed using a logistic linking function (Jaeger, 2008). For the sake of conciseness, only significant tests associated with fixed effects are reported, as these are directly relevant to our hypotheses.

To further test our hypotheses, single contrasts based on the glht-function of the multicomp packages of R (Hothorn et al., 2008) were calculated for verification times and sentence evaluation ratings. To test the assumption of an affective priming effect in the sentence verification task, we first compared verification times of both types of emotional congruent sentences (sentences with only positive or only negative words) with those for both types of incongruent ones (positive nouns followed by negative adjectives and vice versa). Afterwards, further single contrasts were calculated to test the affective priming for positive and negative adjectives. A direct comparison of the priming effects for positive vs. negative adjectives was done with the testInteraction-function of the phia package (De Rosario-Martinez, 2013). Both the test of single contrasts as well as the testInteraction-function were also used to analyse the valence ratings of the evaluation task.

# Results

A first analysis of the accuracy data from the verification task showed that six of the 576 experimental sentences were falsely verified by more than 80% of the participants. These were excluded from further analysis. After the elimination, mean accuracy in the verification task was 94.53% (SD = 2.75) for the experimental sentences and 92.07% (SD = 5.18) for the filler sentences.

# Sentence Verification

To analyze the accuracy data, logistic linking functions with random intercepts for subjects and item sets were calculated. To test the fixed effects of the two categorical Variables VG-N and VG-A, their interaction, and the two covariates ARO-N and ARO-A, Wald-Chi-squared statistics were calculated. We observed neither significant main effects for the two categorical variables VG-N (χ <sup>2</sup> = 1.81, p = 0.18) and VG-A (χ <sup>2</sup> = 4.61, p = 0.10), nor a significant interaction effect between them (χ <sup>2</sup> = 2.34, p = 0.31), and also no significant main effects for the covariates (ARO-N: χ <sup>2</sup> = 2.00, p = 0.16; ARO-A: χ <sup>2</sup> = 0.48, p = 0.49). Hence, the plausibility of the experimental sentences appears identical for all six conditions.

Only correct responses were included in the analysis of verification times. After eliminating extreme verification times of over 15,000 ms, response times more than 3 standard deviations above a participant's and items mean were excluded (0.38% of correct responses). Because of a rightward skewed distribution of verification times, the Box–Cox transformation test was conducted to identify an optimal transformation to improve normality of distribution (Box and Cox, 1964). The test strongly suggests that reciprocal RTs but not log transformation are in a metric compatible with the normal-distribution assumption. Therefore, the linear mixed models were performed on 1/RT transformed verification times. We repeated the LMM analyses reported below using the untransformed response times instead of the reciprocal transformation and found essentially the same results.

In a first step, the appropriate random effect structure was tested starting with a model containing a maximum random effects structure with by-subject and by-item set intercepts, as well as by-subject and by-item set slopes for VG-A, VG-N, their interaction, and the covariates ARO-N and ARO-A (Barr et al., 2013). A stepwise elimination of the slopes was combined with a comparison of the fit of the model with and without this random effect based on the R-function ANOVA (Crawley, 2007) applying a chi-square test. If the removal of one slope caused no significant difference, the random effect was eliminated. At the end, the analysis of the fixed effects was done with a model including both intercepts and the by-item set slopes for VG-A and VG-N.

Estimates of the fixed effects based on effect coding for the two categorical predictors are reported in **Table 3**. The analysis yielded significant main effects for the factors VG-N [F(1, <sup>120</sup>.65) = 4.69, p = 0.03] and VG-A [F(1, <sup>101</sup>.80) = 3.93, p = 0.02]. In line with the hypothesized positivity superiority effect verification times of sentences with positive nouns (M = 1454.54, SD = 792.82) were shorter than those to sentences with negative nouns (M = 1485.562, SD = 792.82). Moreover, sentences containing a positive adjective (M = 1440.58, SD = 751.59) were also verified faster than sentences with neutral (M = 1497.46, SD = 758.43), and negative adjectives (M = 1473.01, SD = 808.46). Pairwise comparisons with Tukey's contrast showed that only the difference between sentences with positive vs. neutral adjectives was significant (z = 2.76, p = 0.02). The differences between sentences with positive and negative adjectives (z = 1.71, p = 0.20), and with neutral and negative adjectives (z = −1.04, p = 0.55) were not significant. The estimates for the two metrical covariates ARO-N and ARO-A indicated slightly positive relationships, but both effects were only marginally significant [ARO-N: F(1, <sup>178</sup>.13) = 3.51, p = 0.06; ARO-A: F(1, <sup>263</sup>.18) = 2.87, p = 0.09]. Most important, there was a highly significant interaction effect between the categorical variables VG-N and VG-A, as illustrated in **Figure 1**. The biggest



\*Effect coding was used for the categorical predictors VG-N and VG-A. Factor VG-A has three factor levels. Therefore, two fixed effects were reported. We called them VG-A<sup>1</sup> and VG-A2.

<sup>a</sup>Verification times were 1/RT transformed. As random effects were included the intercepts for item set and subject, together with by-item set slopes for VG-N, VG-A1, and VG-A2.

<sup>b</sup>Valence ratings and arousal ratings were squared transformed. As random effects were included the intercepts for item set and subject, together with by-subject slopes for VG-A<sup>1</sup> and VG-A2, and by-item set slopes for VG-N,VG-A1, VG-A2, and the interactions between VG-N and VG-A.

differences occurred for sentences containing positive adjectives. Sentences were verified faster when positive adjectives were read after positive nouns (M = 1376.18, SD = 811.71) than after negative ones (M = 1505.35, SD = 681.30). Sentences with neutral adjectives following positive (M = 1481.26, SD = 687.02) nouns were verified slightly faster than sentences with neutral adjectives after negative nouns (M = 1513.56, SD = 823.67). For sentences containing negative adjectives, a reverse effect was observed. They were verified faster when the negative adjectives were read after negative nouns (M = 1437.27, SD = 739.31) than after positive ones (M = 1507.68, SD = 869.59).

To test the assumption of affective priming verification times of both types of emotional congruent sentences were compared with those for both types of incongruent ones. Emotionally congruent sentences were verified significantly faster than emotionally incongruent ones (z = −2.66, p = 0.008). Further contrasts showed that the priming effect occurred only for sentences with positive adjectives (z = 4.37, p < 0.0001) indicating emotional priming for positive adjectives after positive nouns. Although the described differences for negative adjectives were also compatible with an emotional priming effect after an emotional congruent noun, the difference was not significant (z = −0.13, p = 0.90). Moreover, the affective priming effect for positive adjectives was significantly stronger than that for negative adjectives (χ <sup>2</sup> = −15.41, p < 0.0001). The difference observed between both types of sentences with neutral adjectives was not significant (z = 0.55, p = 0.58). To test whether the observed emotional priming effect for positive adjectives after positive nouns indicated a processing advantage, we compared verification times for sentences with positive adjectives after positive nouns to those for sentences with neutral adjectives after positive nouns. Again, this difference was significant (z = −4.23, p < 0.0001).

#### Valence Ratings

Since valence ratings for the whole sentences were not normally distributed, analyses were performed on squared-transformed values as indicated by the Box–Cox transformation test (Box and Cox, 1964). Again, the first step was the identification of the appropriate random effect structure starting with a model containing a maximum random effects structure with by-subject and by-item set intercepts, as well as by-subject and by item set slopes for VG-A, VG-N, their interaction and the covariates ARO-N and ARO-A (Barr et al., 2013). The stepwise elimination of the slopes together with a comparison of the fit of the model with and without this random effect indicated that only the bysubject slope for VG-A and the by-item set slopes for VG-N, VG-A as well as the interaction between both should be included in the analysis of the fixed effects.

Estimates of the fixed effects based on effect coding for the two categorical predictors are reported in **Table 3**. There was a significant main effect for the factor VG-N [F(1, <sup>119</sup>.11) = 61.63, p < 0.0001], indicating lower valence ratings for sentences with negative (M = −0.98, SD = 2.38) compared to positive nouns (M = 0.06, SD = 2.56). The main effect for VG-A was also significant [F(1, <sup>35</sup>.27) = 73.97, p < 0.0001]. Sentences with negative adjectives (M = −1.92, SD = 2.10) yielded lower ratings than sentences with neutral ones (M = −0.36, SD = 2.22) which were rated lower than sentences with positive adjectives (M = 0.89, SD = 2.40). Pairwise comparisons with Tukey's contrast showed that all differences were significant (z > 7.80, p < 0.0001). The covariates ARO-N [F(1, <sup>265</sup>.33) = 0.20, p = 0.65] and ARO-A [F(1, <sup>175</sup>.34) = 0.01, p = 0.91] had no predictive power. Again, as for verification times, there was a highly significant interaction between VG-N and VG-A [F(1, <sup>88</sup>.81) = 21.72, p < 0.0001], illustrated in **Figure 1**. Sentences with negative adjectives following negative nouns (M = −1.91, SD = 2.17) were rated as negative as sentences with negative adjectives after positive nouns (M = −1.94, SD = 2.03; single contrast: z = 0.07, p = 0.95). Sentences with neutral adjectives after positive nouns were rated significantly more positive (M = 0.20, SD = 2.13) than sentences with neutral adjectives after negative nouns (M = −0.94, SD = 2.17; single contrast: z = 6.38, p < 0.0001). The same pattern was observed for sentences with positive adjectives. Ratings were higher for sentences with positive adjectives after positive (M = 1.91, SD = 1.87) than after negative nouns (M = −0.12, SD = 2.45; single contrast:

z = 9.05, p < 0.0001). The differences between the two sentence types with positive adjectives were stronger than those between the two types of sentences with neutral adjectives (χ <sup>2</sup> = 9.14, p = 0.003).

# Discussion

We presented simple declarative sentences with positive and negative nouns followed by either positive, neutral, or negative adjectives to test whether the processing of emotional words embedded in a sentence context is interactive. In part I of the study, participants read the sentences and decided as quickly as possible whether they were meaningful. In part II they read half of the sentences again and rated the valence and arousal of the sentences as supralexical units.

The verification task yielded three main results. First, we replicated the positivity superiority effect often observed in single word processing (e.g., Hofmann et al., 2009; Citron, 2012) and in EEG studies exploring emotional effects in sentences processing (e.g., Fischler and Bradley, 2006; Bayer et al., 2010; Ding et al., 2014; Jiang et al., 2014). This was indicated by significant main effects for VG-N and VG-A. Second, we observed a significant interaction between these two variables. Third, single contrasts showed that shorter verification times for sentences with emotional compatible words were observed only for positive words. In the following, these effects are discussed in more detail.

Although a priori the sentence verification task does not require processing of the affective meaning of words to yield correct responses, we replicated the positivity superiority effect indicating a clear processing advantage for sentences with positive words compared to sentences with negative and/or neutral words. The assumed enhanced attention allocation for emotional compared to neutral words, and especially for positive words compared to neutral and negative ones, likely facilitated their early processing and the subsequent meaning-based decisions. When sentences contained a neutral or a negative word, no facilitation occurred and participants needed more time to decide about the sentences' meaningfulness.

In contrast to Fischler and Bradley (2006), we observed a clear processing advantage for emotionally congruent sentences. Verification times for sentences with words from the same valence category (e.g., positive adjectives after positive nouns) were shorter than verification times for sentences with words from different valence categories (e.g., positive adjectives after negative nouns). This result corresponds with affective priming effects reported for single word processing (cf. Fazio, 2001; Klauer and Musch, 2003) and also with the assumed discourse dependent congruency effect reported by León et al. (2010). Two mechanisms can be hypothesized to explain the observed interaction, a first one operating at the lexical level, and a second one at the supralexical level. At the lexical level, the benefit for congruent sentences may be related to an automatic spread of semantic activation (Hofmann et al., 2011; Eder et al., 2012; Hofmann and Jacobs, 2014). Different studies have convincingly demonstrated that such processes operating at an early encoding level contribute to the affective priming effect (e.g., Spruyt et al., 2007). Although sentence verification does not require explicit processing of affective word meanings, the observed positivity superiority effect supports the assumption that participants automatically processed affective stimulus dimensions, as is also known from lexical decision studies. Moreover, the short distance between nouns and adjectives provided a quasi-optimal condition for observing short-lived effect of preactivating memory representations of affectively related words (Hermans et al., 2001). When an emotional noun was followed by an emotionally congruent adjective, spreading activation presumably facilitated its early processing.

At the supralexical level, the emotional congruency between noun and adjective could also facilitate the integration of the emotional words in a meaningful mental representation of the described state of affairs, i.e., a situation model. Some authors have interpreted the larger N400 response to affectively incongruent trials in standard affective priming paradigms in terms of integration difficulties of affective information in incongruent conditions (Eder et al., 2012). However, empirical studies on context or discourse related integration effects of affective congruency are rare. León et al. (2010), for example, observed an early automatic ERP response (N100/P200), followed by a later discourse-level N400 for emotionally incongruent sentences. Whether or not the observed ERP are related to prolonged reading times was not measured. Behavioral studies on elaborative inferences about the emotional states of story protagonists reported shorter reading times for emotional congruent compared to incongruent sentences (cf. Gernsbacher et al., 1992; de Vega et al., 1996). The authors interpeted the prolonged reading times, at least in part, in terms of integration difficulities during situation model construction and updating. Because such integration is also a necessary step for understanding and verifying single sentences, we propose that the interaction observed in our study is based on both faster early processing and facilitated later integration in the congruent conditions. To obtain more information regarding the underlying processes we will use neurocognitive methods (EEG, fMRI) in future replications of this study.

Apart from the overall congruency effect, we observed stronger affective priming for congruent positive than congruent negative sentences: Sentences with positive adjectives after positive nouns were verified faster than sentences with positive adjectives after negative nouns. Mean verification times for sentences with negative adjectives after negative nouns were also shorter than those for sentences with negative adjectives after positive nouns, but this effect was smaller than the priming effect for congruent positive ones and was not significant. Such an unbalanced priming effect only for congruent positive sentences was neither suggested by the affective priming literature nor by the literature on supralexical consequences of emotional congruency. In both fields, congruent and incongruent trials usually are not differentiated with respect to the emotional connotation of the prime. We can reasonably well rule out the possibility that this unbalanced priming for positive words was due to a confound of the valence manipulation with the associative strength between nouns and adjectives. As described in the Methods Section, semantic association strength was kept constant across all conditions. One explanation for the observed unbalanced priming effect is suggested by the recently reported phasic affective modulation hypothesis of Topolinski and Deutsch (2013). This hypothesis rests upon the assumption that affect regulates the breadth or extent of spreading activation from a prime to close and remote semantic associates, with positive mood fostering semantic spread and negative mood inhibiting it (cf. Storbeck and Clore, 2008). Topolinski and Deutsch demonstrated that this affective modulation can be observed not only for modulation on a tonic temporal, but also on a phasic level. For example, the presentation of a positive tone or a positive face in one trial increased semantic priming particularly for weak associations even if the prime was presented simultaneously with the affect-inducing stimuli. They therefore concluded that an affective prime not only induced a spread of activation, but also a phasic affective modulation.

Thus, this phasic affective modulation might also play a role in the sentence verification paradigm. Sentences with positive nouns might induce a positive phasic mood modulation increasing spreading activation between semantic word units. This could lead to stronger affective priming for congruent positive sentences observed in our study. Still, this modulation effect should also at play in sentences with neutral adjectives. Semantic associations between positive and negative nouns and positive adjectives were as high as those between both noun types and neutral adjectives. If positive phasic modulation increased semantic priming, sentences with neutral adjectives after positive nouns should also be verified faster than sentences with neutral adjectives after negative nouns. However, we found no evidence for this. Thus, the phasic mood modulation hypothesis cannot fully account for the unbalanced priming effect for congruent positive sentences. This might be the case for all approaches focusing on mechanisms, which influence early processing stages like automatic spread of semantic activation. We assume that mechanisms at the supralexical level offer an alternative or complementing approach. In contrast to the standard affective priming paradigm, which does not require deeper semantic processing and especially integration of prime and target, the sentence verification task clearly requires the construction of a situation model to yield correct answers.

A promising account of the unbalanced priming effect is based on recently described valence effects related to the distribution and/or frequency of affective words and the semantic cohesiveness hypothesis (cf. Hofmann and Jacobs, 2014). Westbury et al. (2014) demonstrated for a very large corpus of affective words, that positive words are usually characterized by high frequencies. Negative words are more extreme on average in absolute valence magnitudes. The authors concluded that positive and negative words should be interpreted as distinct sets with possible differences on other dimensions than valence and frequency. The fact that negative events tend to be more finely differentiated than positive events is one example (Rozin and Royzman, 2001; Rozin et al., 2010) that is best illustrated in discrete emotion theories which since Darwin contain more negative than positive emotions. It could be assumed that finer differentiation of negative events hampers processing at the supralexical level, especially semantic integration and situation model construction: if negative words are less homogenous building coherent situation models for sentences with two negative words could be harder compared to sentences with two positive words. Recently, reported results according to which positive words provide a greater amount of semantic associations than negative words are in line with this account (Hofmann et al., 2011; Hofmann and Jacobs, 2014). For sentences with two positive words, semantic activation can spread across wider associative pathways, and thereby elicit a positivity bias during meaning construction. First evidence for this assumption is reported by Jacobs et al. (2015, this issue). In a study on the comprehension of affectively uni- and bi-valent noun-noun-compounds (NNCs), Jacobs et al. reported that comprehensibility ratings for NNCs containing two positive nouns were significantly higher than ratings for bipolar NNCs (containing both a positive and a negative noun), and also higher than univalent NNCs with two negative nouns. Based on the assumption that meaning construction is a necessary step for sentence verification, we thus would like to propose that the processing advantage observed for emotionally congruent sentences with two positive words also occurs at the supralexical level. Nevertheless, future studies are necessary to test this hypothesis directly.

In addition to the online processing of sentences, we were also interested in the emotional evaluation at the supralexical level, as expressed by sentence-level affective ratings. In general, our valence ratings indicated that the affective meanings of both nouns and adjectives contributed to the valence of the whole sentences, since we observed simple main effects for both. Sentences with negative nouns were rated more negatively than sentences with positive nouns. The same was observed for adjectives. Sentences with negative adjectives were rated more negatively than sentences with neutral adjectives, which were rated more negatively than sentences with positive adjectives. While these main effects are in line with the simple model predicting the emotion potential of supralexical units to be an algebraic sum of constituent word valences (Bestgen, 1994; Whissell, 1994), this is not the case for the observed strong interaction effect for the supralexical valence evaluations. Sentences with positive and neutral adjectives after negative nouns were evaluated more negatively than sentences with positive and neutral adjectives after positive nouns. For sentences with negative adjectives, no differences in supralexical valence ratings were observed. These sentences were generally evaluated as strongly negative, independently of whether they contained a positive or negative noun. These interactions are not in line with the simple model, but fit nicely with the assumed negativity bias especially for adjectives, described in the Hypothesis Section. The assumed dominance of emotional adjectives in the evaluation of supralexical units and the often reported negativity bias (Baumeister et al., 2001) predicted that supralexical valence ratings for sentences with negative adjectives are biased into a more negative direction. Our results demonstrated not only a simple bias into a more negative direction. Rather, negative adjectives dominated the supralexical valence evaluation, indicated by the fact that noun valence had no notable influence on the overall valence evaluation for sentences with negative adjectives. Our results for the supralexical valence ratings correspond to the results reported by Liu et al. (2013) who also observed a strong influence of negative adjectives especially on positive noun evaluation. Liu et al. discussed the general negativity bias as a main explanation for this effect. It is widely believed that negative stimuli attract more attention than positive ones (for review see Rozin and Royzman, 2001) and have a stronger influence on evaluation processes (Cacioppo et al., 1997; Ito et al., 1998). We thus propose, that, at least in part, the superiority effect for negative adjectives observed in this study can be explained with the negativity bias as well. Nevertheless, the fact that we observed the negativity bias only for negative adjectives points toward the importance of the specific syntactic role of adjectives. Normally, adjectives are used as noun phrase modifiers, a fact that underlines the relevance of adjectives for the meaning construction of whole sentences. But while this is true for all six conditions used in this study, a superiority effect was observed only for negative adjectives. For sentences with neutral and positive adjectives the valence of the noun still influenced the evaluation of the whole sentence. We therefore concluded that the observed superiority effect for negative adjectives indicated an interaction between the case role and the emotional valence of adjectives. Alternatively, the serial position of adjectives, which were presented always at the end of the sentences, could also be relevant. Further studies thus have to be conducted to fully explore the contributions of valence, syntactic role, and order for affective evaluations of supralexical units.

Besides the negativity bias for negative adjectives, we also observed a difference in the valence ratings for sentences with positive and neutral adjectives. The difference between the valence evaluations of sentences with positive and negative nouns was stronger for sentences with positive than for sentences with neutral adjectives. In other words, the condition in which the shortest verification times were observed, received the highest valence ratings. It can therefore be assumed that ease of processing and meaning construction are positively related to valence evaluation. This kind of relationship is discussed with regard to aesthetic responses and perceived beauty (Reber et al., 2004). The hedonic fluency hypothesis states that simply because a stimulus is processed faster or more fluently, it is accompanied by a positive affective evaluation leading to more positive aesthetic responses (Winkielman and Cacioppo, 2001; Reber et al., 2004). The results of two studies using pictorial stimuli are in line with this. Kuchinke et al. (2009) showed that the time to recognize a depicted object was shortest for high processing fluency paintings, which were also rated higher in their preference. Similarly, Albrecht and Corbon (2014) demonstrated especially for pictures with initially positive valence that highly fluent pictures were rated more positive than pictures of low processing fluency. For supralexical units, Bohrn et al. (2013) reported higher beauty ratings for familiar compared to unfamiliar and therefore harder-to-process German proverbs. Nevertheless, the same study demonstrated that the hedonic fluency hypothesis is insufficient to explain other effects observed in the processing of supralexical units, especially at the neuronal level. Hence, future studies should try to shed light on the interaction of processing fluency in encoding and meaning construction and the emotional evaluation of simple sentences and longer supralexical units.

# Conclusion

Our study introduces the sentence verification task as a paradigm for investigating the influence of emotional features of single words on the processing of supralexical units. We replicated both the processing advantage for positive compared to neutral and/or negative words often described in studies of single word processing, and an affective priming effect within a sentence context. Sentence verification was easiest for sentences containing emotionally congruent words with positive valences. We interpret this as a first evidence for easier semantic integration and situation model construction for sentences

containing positive words. Sentence valence evaluations showed a negativity bias especially for negative adjectives indicating an interaction of the general negativity bias reported in the emotion processing literature, and the syntactic role of words during sentence processing. The comparison of verification times and valence evaluations pointed to an interrelationship between ease of processing, meaning construction, and affective evaluation, because easy-to-process sentences were rated more positive than harder-to-process sentences. Nevertheless, although we controlled for possible arousal effects in all our analyses, we could not fully exclude an independent effect of arousal as reported for example in Bayer et al. (2010). Further research should therefore manipulate both valence and arousal simultaneously.

Taken together, all emotional effects indicated that the comprehension of simple supralexical units is a highly interactive

# References


process (e.g., McClelland et al., 1989). If these results can be replicated also with more complex sentences it would seriously challenge future models of sentence comprehension and text processing to go beyond "cold" information processing and include "hot" affective and aesthetic processes, and provide stronger constraints for very general frameworks like the neurocognitive poetics model of literary reading which includes such processes (Jacobs, 2011, 2015a,b). The model has received empirical support at the experiential, behavioral and neuronal levels for word, phrase, poem, and story processing (Altmann et al., 2012; Bohrn et al., 2013; Hsu et al., 2014, 2015a,b,c; Lüdtke et al., 2014; Aryani et al., 2015; Jacobs, 2015b; Lehne et al., 2015) but still is underspecified for predicting effects in tasks like sentence verification.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Lüdtke and Jacobs. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Appendix

# (A) The Lmer Specification for the Model Predicting Verification Times


# (B) The Lmer Specification for the Model Predicting Valence Ratings


# Approach and Withdrawal Tendencies during Written Word Processing: Effects of Task, Emotional Valence, and Emotional Arousal

#### Francesca M. M. Citron1, 2, <sup>3</sup> \* † , David Abugaber <sup>4</sup> and Cornelia Herbert 5, 6, 7, 8 \* †

<sup>1</sup> Department of Psychology, Lancaster University, Lancaster, UK, <sup>2</sup> Cluster of Excellence "Languages of Emotion", Free University of Berlin, Berlin, Germany, <sup>3</sup> Psychology Department, Humanities Council, Princeton University, Princeton, NJ, USA, <sup>4</sup> Department of Theoretical and Applied Linguistics, University of Cambridge, Cambridge, UK, <sup>5</sup> Applied Emotion and Motivation Research, Institute of Psychology and Education, Ulm University, Ulm, Germany, <sup>6</sup> Department of Psychiatry, University of Tübingen, Tübingen, Germany, <sup>7</sup> Department of Biomedical Magnetic Resonance Imaging, University of Tübingen, Tübingen, Germany, <sup>8</sup> Department of Psychology, University of Würzburg, Würzburg, Germany

#### Edited by:

Marcela Pena, Pontifical Catholic University of Chile, Chile

#### Reviewed by:

Erich David Jarvis, Duke University School of Medicine, USA Ivilin Peev Stoianov, Centre National de la Recherche Scientifique, France

#### \*Correspondence:

Francesca M. M. Citron fmm.citron@gmail.com; Cornelia Herbert cornelia.herbert@uni-ulm.de

† These authors have contributed equally to this work and shared first authorship.

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 26 March 2015 Accepted: 01 December 2015 Published: 06 January 2016

#### Citation:

Citron FMM, Abugaber D and Herbert C (2016) Approach and Withdrawal Tendencies during Written Word Processing: Effects of Task, Emotional Valence, and Emotional Arousal. Front. Psychol. 6:1935. doi: 10.3389/fpsyg.2015.01935 The affective dimensions of emotional valence and emotional arousal affect processing of verbal and pictorial stimuli. Traditional emotional theories assume a linear relationship between these dimensions, with valence determining the direction of a behavior (approach vs. withdrawal) and arousal its intensity or strength. In contrast, according to the valence-arousal conflict theory, both dimensions are interactively related: positive valence and low arousal (PL) are associated with an implicit tendency to approach a stimulus, whereas negative valence and high arousal (NH) are associated with withdrawal. Hence, positive, high-arousal (PH) and negative, low-arousal (NL) stimuli elicit conflicting action tendencies. By extending previous research that used several tasks and methods, the present study investigated whether and how emotional valence and arousal affect subjective approach vs. withdrawal tendencies toward emotional words during two novel tasks. In Study 1, participants had to decide whether they would approach or withdraw from concepts expressed by written words. In Studies 2 and 3 participants had to respond to each word by pressing one of two keys labeled with an arrow pointing upward or downward. Across experiments, positive and negative words, high or low in arousal, were presented. In Study 1 (explicit task), in line with the valence-arousal conflict theory, PH and NL words were responded to more slowly than PL and NH words. In addition, participants decided to approach positive words more often than negative words. In Studies 2 and 3, participants responded faster to positive than negative words, irrespective of their level of arousal. Furthermore, positive words were significantly more often associated with "up" responses than negative words, thus supporting the existence of implicit associations between stimulus valence and response coding (positive is up and negative is down). Hence, in contexts in which participants' spontaneous responses are based on implicit associations between stimulus valence and response, there is no influence of arousal. In line with the valence-arousal conflict theory, arousal seems to affect participants' approach-withdrawal tendencies only when such tendencies are made explicit by the task, and a minimal degree of processing depth is required.

Keywords: approach, withdrawal, valence, arousal, emotion, words, polarity effects

# INTRODUCTION

According to dimensional models of emotion, valence describes the extent to which a stimulus is positive or negative whereas emotional arousal refers to its degree of physiological activation, i.e., how calming or exciting/agitating a stimulus is (Russell, 1980, 2003; Reisenzein, 1994; Lang et al., 1997). This twodimensional approach to emotions originates from work by Osgood et al. (1957) in which large word samples were rated on several stimulus dimensions that could methodologically be reduced to three underlying common factors: emotional evaluation (i.e., valence), potency (i.e., arousal), and activity (i.e., dominance). The first two dimensions could account for most of the variance in ratings. When stimuli are mapped in affective space according to their subjective ratings, emotional valence and arousal ratings typically show a quadratic relationship, whereby highly positive or negative stimuli are also rated higher in their level of arousal; furthermore, negative stimuli tend to be rated higher in arousal than positive stimuli (Bradley and Lang, 1999; Lang et al., 1999; Võ et al., 2009; Montefinese et al., 2013; Citron et al., 2014b). Despite this relationship, the two dimensions of valence and arousal are considered distinct affective dimensions; in fact, they are associated with different physiological and affective behavioral responses (Lang et al., 1990, 1993), activate partially-dissociable brain networks (Small et al., 2003; Lewis et al., 2007; Wilson-Mendenhall et al., 2013) and are correlated with different lexico-semantic properties such as familiarity, imageability, and concreteness (Kousta et al., 2011; Montefinese et al., 2013; Citron et al., 2014b; Schmidtke et al., 2014). In line with this dimensional view, empirical research has shown that a wide range of emotional stimuli including pictures, faces, words (denoting emotions, personality traits, or other concepts), and even short scenarios describing specific emotions, can be successfully mapped onto this two-dimensional affective space and distinguished by their position within that space (Abelson and Sermant, 1962; Russell, 1980; Cacioppo and Berntson, 1994; Barrett and Russell, 1999; Lang et al., 1999; Wilson-Mendenhall et al., 2013).

Crucially, it has been proposed that the position within the affective space determined by the two major dimensions of valence and arousal reflects the activation of two motivational systems of approach and withdrawal: positive valence elicits the activation of the approach system and negative valence the activation of the withdrawal system. In contrast to valence, the arousal dimension indicates the intensity of physiological activation in either of the two systems (for an overview, see Lang et al., 1997).

In recent years, several studies aimed to find behavioral evidence for this assumption by investigating the processing of highly arousing positive and negative stimuli compared to neutral stimuli, either during viewing of emotional pictures or reading of emotional words. These included reaction time (RT) studies (e.g., Algom et al., 2004; Larsen et al., 2006; Kousta et al., 2009; Nasrallah et al., 2009) as well as studies on motivational priming of the startle reflex during picture viewing or word processing (e.g., Herbert et al., 2006, 2011, 2014; Herbert and Kissler, 2010; for an overview, see Lang et al., 1997; Bradley et al., 2006). The startle reflex and its affective modulation is one of the most prominent and basic bio-psychological measures of approach and withdrawal. In sum, research on affective startle reflex modulation suggested that the higher a stimulus is rated on the emotional arousal dimension, the stronger the physiological motivation to approach or withdraw from it (Lang et al., 1997). This suggests that stimulus valence and arousal do have additive effects on affective processing, such that responses to positive and negative stimuli should be more pronounced if their level of arousal is high.

However, Robinson et al. (2004) suggested a different view: based on their own empirical observations, they proposed the valence-arousal conflict theory and suggest that the two affective dimensions of a stimulus (i.e., its valence and its arousal) automatically elicit a specific response tendency independently from the other. In particular, they proposed that, independently from their valence, stimuli rated high in emotional arousal will be appraised as negative/unpleasant, whereas stimuli rated as low in stimulus arousal will be more likely appraised as positive/appetitive, thereby producing a conflict in responding if stimulus valence and arousal do not match, i.e., positive valence and high emotional arousal (PH) or negative valence and low emotional arousal (NL). So far, there exist a number of behavioral studies to support this assumption. For instance, Robinson and colleagues themselves demonstrated, in several experiments employing emotional pictures or words, that RTs to conflicting stimuli (PH and NL) are slower in comparison to stimuli producing no conflict, i.e., positive valence, low arousal (PL) and negative valence, high arousal (NH). In these studies, participants had to appraise the valence of the stimulus in a valence judgment task (positive vs. negative) (see also Eder and Rothermund, 2010). Similar effects have been reported in an arousal judgment task (high vs. low), using pictures as well as words (Purkis et al., 2009). These tasks explicitly direct the attention to the emotional content of the stimuli. However, similar effects have also been reported during more implicit tasks such as color judgment (Feng et al., 2012), the affective Simon task (Eder and Rothermund, 2010), or the lexical decision task (LDT; Larsen et al., 2008; Hofmann et al., 2009; Eder and Rothermund, 2010; Feng et al., 2012; Citron et al., 2014c).

Together, these findings suggest that stimulus valence and stimulus arousal both influence participants' evaluation of emotional stimuli across a variety of tasks; this does not occur linearly or in an additive way as predicted by traditional emotion models, but interactively, as suggested by Robinson et al.'s conflict theory. This interactive view is also suggested by neuroscientific studies that showed significant processing differences between high- and low-arousal positive and negative stimuli already during early stages of cortical processing, associated with sensory stimulus processing and attention orientation (Hofmann et al., 2009; Feng et al., 2012; Citron et al., 2013; Recio et al., 2014). Furthermore, there is some evidence that brain regions associated with the integration of physiological and cognitive responses such as the insula are modulated differently by high- and lowarousal positive and negative stimuli (Citron et al., 2014a).

Nevertheless, it is still an open issue whether differential processing effects reported for PH and NL vs. PL and NH stimuli in previous studies are actually related to approach and withdrawal tendencies. In fact, none of the aforementioned studies that investigated the preparation of approach and withdrawal tendencies by using biological measures and paradigms such as the affective startle modulation found evidence in favor of this thesis. Furthermore, some empirical evidence suggests a higher cognitive accessibility of the valence than the arousal dimension (Nicolle and Goel, 2012). In addition, studies that used more response-focused tasks such as the stopsignal task in combination with verbal material suggest serial effects of perception, attention and action (e.g., Herbert and Sütterlin, 2012), which contrasts with the idea that approach and withdrawal tendencies are primed effortlessly before a minimal degree of linguistic processing has taken place. However, according to Robinson et al. (2004), PH and NL stimuli are expected to elicit conflicting response tendencies that slow down RTs irrespective of whether the task requires stimulus evaluation or a motor response (e.g., a key press associated with moving one's finger or hand forward or backward; see Robinson et al., 2004, Study 4). This assumption should hold for a broad range of stimuli including abstract and symbolic stimuli such as words, which have been shown to evoke similar affective responses as pictures (Citron, 2012; Tempel et al., 2013).

Building upon and extending these previous findings, the current study aimed to address two main research questions: (1) Do high- and low-arousal, positive and negative stimuli elicit differential responses when these are explicitly associated with approach and withdrawal tendencies? (2) Will these effects arise even in an implicit task that does not necessarily require either explicit evaluation of approach- or withdrawal-related action tendencies or linguistic analysis of the words?

In order to address the first question, participants in Study 1 were presented with single written words and asked whether they would approach or withdraw from the concept expressed by each word. We predict faster RTs to PL and NH words than to PH and NL words if valence and arousal contribute to the evaluation of a stimulus' approach and withdrawal tendencies in an interactive manner, as suggested by Robinson and colleagues. If, on the other hand, arousal contributes to the evaluation of a stimulus' approach and withdrawal tendencies linearly for positive and negative stimuli, faster RTs to highly arousing words (PH and NH) than to low-arousal words (PL and NL) will be expected. We also predict that participants will more often decide to approach positively valence words than negative ones and to withdraw from negative words than positive ones. This response pattern should be independent from the arousal dimension of the stimuli and is based on empirical research that showed higher cognitive accessibility of the valence than the arousal dimension (Nicolle and Goel, 2012).

In order to address the second experimental question, participants in Study 2 were asked to respond to each word by either pressing an upper or lower button on the keyboard. Here, reaction tendencies were implicitly associated with spatial information, which has previously been shown to carry implicit embodied meaning: in fact, positive and negative verbal information is conceptually associated with vertical position in space, i.e., with upper vs. lower space, respectively (e.g., to cheer up, to feel down; Lakoff and Johnson, 1980). In particular, Meier and Robinson (2004) presented emotionally valenced words either at the top or at the bottom of a computer screen and asked participants to judge their valence (positive or negative) by pressing one of two buttons. They showed faster RTs to congruent stimuli, i.e., to positive than negative words when presented in the upper position, and to negative than positive words when presented at the bottom of the screen. This result demonstrates activation of a conceptual mapping between valence and vertical position, and was replicated in a more implicit task, i.e., responding to target letters in different positions, after having judged the words' valence (see also Rotteveel and Phaf, 2004; Casasanto and Dijkstra, 2010 for associations between valence and vertical position). In line with these findings, we predict that participants will more often respond to positive words by pushing the upper button and to negative words by pushing the lower button.

Most importantly, if implicit approach and withdrawal tendencies are primed automatically during reading and modulated by valence and arousal as suggested by Robinson and colleagues, participants' decisions should be faster for PL and NH words than for PH and NL words. If, on the other hand, stimulus arousal contributes to the priming of approach and withdrawal tendencies linearly for positive and negative stimuli, faster RTs to high-arousal words (PH and NH) than to low-arousal words (PL and NL) will be expected.

# STUDY 1: EXPLICIT APPROACH AND WITHDRAWAL TASK

# Methods

# Ethics Statement

The present studies (1–3) were approved by the Ethics Committee of the Free University of Berlin and are in line with the guidelines of the American Psychological Association. All participants gave written informed consent before taking part in any of the experiments, in accordance with the Declaration of Helsinki.

# Participants

Nineteen native speakers of German were recruited (16 women, 3 men; age range: 21–67 years, M = 33, SD = 12). Participants were all right-handed except one and had normal or correctedto-normal vision; 12 of them were students and seven were workers. They were either paid 5e or given course credit for their participation.

# Materials

One hundred and sixty German nouns were selected from the BAWL-R (Võ et al., 2009): 40 positive, high-arousal words (PH), 40 positive, low-arousal words (PL), 40 negative, high-arousal words (NH), and 40 negative, low-arousal words (NL). Word examples and descriptive statistics are reported in **Tables 1, 2**, respectively. A full list of the stimuli can be found in Appendix A of the Supplementary Material. Words in the four conditions were matched for length in letters, phonemes, and syllables, logarithm of frequency of use, neighborhood (N) size and frequency, and imageability [all Fs(3, 156) < 1.03, ns], according to the values provided by the BAWL-R.

PH words had significantly higher arousal ratings than PL words [t(78) = 17.54, p < 0.0001], but PH and PL did not differ in valence [t(78) = 0.10, ns]. Similarly, NH words had significantly higher arousal ratings than NL words [t(78) = 14.61, p < 0.0001], but did not differ in valence [t(78) = 0.18, ns]. As can be seen in **Figure 1**, PH and NH as well as PL and NL words were not exactly matched for arousal because negative words tend to naturally be more arousing than positive words (e.g., Võ et al., 2009; Montefinese et al., 2013; Citron et al., 2014b; Schmidtke et al., 2014). In addition, regarding stimulus arousal, positive words were distributed within a greater range than negative words. This

TABLE 1 | Examples of stimuli used for all studies, broken down by condition.


stimulus selection was intended to mimic the natural distribution of affective ratings of words.

#### Procedure

The experiment was programmed with Presentation software (Neurobehavioral Systems, Inc.) and run on a desktop computer. Participants were seated in front of the monitor (monitor screen size 15 inches) at a distance of ∼70 cm. The stimuli were presented in the center of the screen in capitalized white letters on a black background (25-point Arial font).

Participants were asked to silently read single words (nouns) that could describe events, sensations, objects, or abstract things (**Table 1**), and to decide for each word whether they would

FIGURE 1 | Distribution of stimulus ratings of emotional valence and arousal (from the BAWL database) for the four experimental conditions: positive high-arousal (PH), positive low-arousal (PL), negative high-arousal (NH), and negative low-arousal (NL).



PH, positive valence, high arousal; PL, positive valence, low arousal; NH, negative valence, high arousal; NL, negative valence, low arousal; SEM, standard error of the mean; N-Size/Frequency, neighborhood-size/frequency.

approach it or withdraw from it. Two response buttons on a German keyboard (F and J, i.e., left and right) corresponded to "approach" vs. "withdrawal" responses. The response buttons were counterbalanced across participants to control for any implicit relations between the valence of the word (positive or negative), the instructions, and the response given (left vs. right): thus, each type of response would be given by half of the participants on one side and by the other half on the opposite side. Key presses had to be made using the index fingers of the right and left hand.

At the start of each trial, a fixation cross appeared in the center of the screen for a variable duration between 400 and 800 ms, followed by a word which remained either until participants gave a response or for a maximum duration of 2000 ms. The screen was then blank for 600 ms; after that, a new trial would start. This short inter-stimulus interval (ITI) was chosen in order to prompt rapid decisions that prevent participants from reflecting too much about the meaning of the stimulus. Stimuli were presented randomly in order to avoid carryover effects from one trial to the other.

A fifteen-trial practice run was followed by an experimental run divided into two sessions with a short break in between. The experimental run contained three filler words at the beginning (which were not used in the analyses) and 160 target words. Word order and condition order (i.e., PH, PL, NH or NL) were pseudo-randomized across participants, i.e., we made sure the same condition would not occur more than three times consecutively in order to avoid carryover effects for a specific emotional category (e.g., Võ et al., 2009; Citron et al., 2014b; Schmidtke et al., 2014). RTs and accuracy (i.e., % of approach vs. total key presses) were recorded for each item. The experiment lasted ∼15 min.

# Data Analysis

For each participant, outlying RTs exceeding ±3 SDs from the participant's mean RT, as well as trials with no response, were excluded from the analysis, i.e., 1.5% of trials overall. Mean RTs, mean percentage of approach divided by total responses, and SDs for each participant and each condition (i.e., PH, PL, NH, and NL), as well as for each stimulus, were calculated. As a standard procedure in psycholinguistic research, we performed all inferential statistical analyses by participant and by item, in order to consider both sources of variability (Clark, 1973). The results of the analyses by item should confirm those obtained in the analyses by participant and allow generalization of the findings on the specific word sample to a broader set of words. However, given the large number of variables that influence word recognition (length, frequency, etc.), item analyses tend to show less significant or weaker effects than the participant analyses. Discrepancies between the two will index less robust effects. As such, confirmation of the findings through careful control for possibly confounding variables will strengthen the reliability of the findings.

For both dependent variables, ANOVAs by participant (indexed by a subscripted 1) and by item (subscripted 2) were conducted, with factors Valence (positive, negative) × Arousal (high, low). Effect sizes were calculated for significant effects and reported as Pearson's r coefficients: 0.10 ≤ r < 0.30 represents a small effect size, 0.30 ≤ r < 0.50 medium and r ≥ 0.50 large.

# Results

## Reaction Times

A significant interaction between the factors "Valence" and "Arousal" was found [F1(1, 18) = 14.93, p = 0.001, r = 0.67; F2(1, 156) = 6.23, p = 0.014, r = 0.62]. PH and NL words were responded to more slowly than PL and NH words (**Figure 2A**). No main effects of "Valence" [F1(1, 18) = 0.001, ns; F2(1, 156) = 0.04, ns] or "Arousal" [F1(1, 18) = 1.25, ns; F2(1, 156) = 0.31, ns] were found.

# Response Type

As predicted, we found a main effect of "Valence" [F1(1, 18) = 18.19, p = 0.0001, r = 0.71; F2(1, 156) = 745.90, p = 0.0001, r = 0.91]. Participants pushed the button corresponding to "approach" more often to positive than negative words and vice versa. No effect of the factor "Arousal" and no interaction between the factors "Valence" and "Arousal" [Fs1(1, 18) < 1.48, ns; Fs2(1, 156) < 1.21, ns] were found (**Figure 2B**).

# STUDY 2: IMPLICIT APPROACH AND WITHDRAWAL TASK

# Methods

### Participants

Twenty native speakers of German from the Berlin area were initially recruited. None of them took part in Study 1. The data from two participants had to be excluded from the analyses because they responded "up" to all trials and had extremely fast RTs, possibly suggesting that they did not follow the instructions. The remaining 18 participants (8 women, 10 men; age range: 19– 67 years, M = 35, SD = 12) were all right-handed and had normal or corrected-to-normal vision; eight of them were students, eight were workers, one was unemployed, and one retired. They were either paid 5e or given course credit for their participation. The excluded participants were right-handed female students, aged 30 and 27 years.

# Materials and Procedure

The linguistic material was exactly the same as in Study 1. The procedure was identical except for the type of task and the response buttons. For the responses, the number keyboard was used: the number 8 was covered with an arrow pointing upwards and the number 2 with an arrow pointing downwards. Participants were asked to keep their index finger on the central button (number 5, covered with paper) and then to spontaneously decide whether to move it upward to press the "up" button or downward to press the "down" button as soon as they saw a word. After each response, they were required to locate their finger on the central button again, in order to avoid facilitation or cost effects due to the position chosen for the last response. Since reading occurs automatically and even against one's own intention (Stroop, 1935), we expected that participants will read the words before responding. Words were presented with the same duration and ITI as in Study 1.

#### Data Analysis

The statistical analyses were the same as in Study 1. Detection of outliers led to the exclusion of 2.3% of trials overall.

# Results

#### Reaction Times

A main effect of "Valence" was found [F1(1, 17) = 18.64, p = 0.0001, r = 0.72; F2(1, 156) = 63.26, p < 0.0001, r = 0.54]: overall, positive words were responded to faster than negative words. No effect of arousal was found [F1(1, 17) = 1.76, ns; F2(1, 156) = 0.50, ns]. In contrast to Study 1, no interaction between the factors "Valence" and "Arousal" was observed [F1(1, 17) = 2.49, p = 0.13 r = 0.36; F2(1, 156) = 1.75, p = 0.19, r = 0.10; **Figure 2C**].

# Response Type

There was a main effect of valence [F1(1, 17) = 113.06, p < 0.0001, r = 0.93; F2(1, 156) = 1782.44, p = 0.0001, r = 0.96]: participants responded by pushing the "up" button more often to positive than negative words and vice versa. No effect of arousal and no interaction were found [Fs1(1, 17) < 1.56, ns; Fs2(1, 156) < 1.28, ns; **Figure 2D**].

# STUDY 3 (FOLLOW UP ON STUDY 2): IMPLICIT TASK WITH DIFFERENT INSTRUCTIONS

RTs were slower in the implicit task compared to the explicit task (**Figures 2A,C**), which may reflect larger uncertainty regarding the type of response to be given. However, in contrast to the explicit task, no interactive effects of stimulus valence and arousal were found. This suggests that when participants are unaware of the association between stimulus content and response ("good is up" and "bad is down"), responses are based on stimulus valence, and stimulus arousal does not have any additional effect on response selection. In order to further investigate this, we conducted a third study in which the instructions of the implicit task were slightly modified in order to ensure that participants would fully read the words before responding.

# Methods

### Participants

Twenty native speakers of German from the Berlin area were recruited (16 women, 4 men; age range: 23–58 years, M = 33, SD = 12). None of them took part in either Study 1 or 2. Participants were all right-handed and had normal or corrected-to-normal vision; 13 of them were students and seven were workers. They were either paid 5e or given course credit for their participation.

# Materials and Procedure

The linguistic material used was exactly the same as in Studies 1 and 2 and the procedure identical to Study 2, except for a slight variation in the instructions. In Study 3, participants were asked "to read each word" (while keeping their index finger on the central button, as previously) "and then" to spontaneously decide whether to move their finger upward to press the "up" button or downward to press the "down" button. At the end of the experiment, we also asked participants whether they used any particular strategy for their response decisions.

# Data Analysis

The statistical analyses were the same as in Studies 1 and 2. Detection of outliers led to the exclusion of 0.7% of trials overall.

# Results

#### Reaction Times

We confirmed a main effect of "Valence:" as in Study 2, positive words were responded to faster than negative words [F1(1, 19) = 4.90, p < 0.05, r = 0.45; F2(1, 156) = 18.09, p < 0.0001, r = 0.32]. Akin to Study 2, no effect of "Arousal" [F1(1, 19) = 0.96, ns; F2(1, 156) = 0.26, ns] and no interaction between the factors "Valence" and "Arousal" were observed [F1(1, 19) = 0.58, ns; F2(1, 156) = 0.003, ns; **Figure 2E**].

# Response Type

The analyses showed a main effect of "Valence:" [F1(1, 19) = 50.43, p < 0.0001, r = 0.85; F2(1, 156) = 6.72, p < 0.01, r = 0.20], no effect of "Arousal" [F1(1, 19) = 2.16, ns; F2(1, 156) = 0.18, ns], and no interaction between the factors "Valence "and "Arousal" [F1(1, 19) = 3.05, p = 0.10; F2(1, 156) = 1.04, ns; **Figure 2F**].

At the end of the experiment, upon enquiry by the experimenter most of the participants reported having associated positive words with upper position and negative words with lower position during the task.

# ANALYSIS OF POTENTIALLY CONFOUNDING FACTORS: AGE OF ACQUISITION AND FAMILIARITY

Two further factors need a more thorough consideration as they could have influenced the results of the present experiments. Age of acquisition (AoA) and subjective frequency of encounter with a word (often labeled familiarity) have been shown to be positively correlated with a word's emotional valence (Kousta et al., 2011; Citron et al., 2014b). These variables had not been matched during stimulus selection as they were not available in the BAWL-R database. In order to reject these variables as possibly confounding factors, we collected AoA and subjective frequency ratings for our experimental stimuli and re-ran the analyses of our three studies by partialling out their effects. A replication of the current results would reject possible confounding effects of these variables in the present study and further strengthen our findings.

# Methods

AoA was rated by 46 native German speakers (36 women, 10 men; age range: 21–66 years, M = 32, SD = 11) whereas familiarity was rated by a different group of 48 native speakers (37 women, 11 men; age range: 21–66 years, M = 31, SD = 11), using seven-point Likert scales<sup>1</sup> .

The main analyses of the three experiments presented were conducted again by partialling out the effects of these two variables. Specifically, in the analysis by participant, raw response types and RTs for each participant were regressed onto item AoA and familiarity ratings separately, and then the resulting standardized residuals were used as the dependent variables; in the analysis by item, the two continuous variables were used as covariates. Since Studies 2 and 3 showed the same results, we merged these two data sets in the current analyses.

# Results

# Explicit Task

In the RTs we replicated a significant interaction between "Valence" and "Arousal," which was marginally significant in the analysis by item [F1(1, 18) = 9.87, p < 0.01, r = 0.59; F2(1, 154) = 2.98, p = 0.086, r = 0.14; Appendix B, a in Supplementary Material]. No main effects of either "Valence" [F1(1, 18) = 0.90, ns; F2(1, 154) = 2.06, ns] or "Arousal" [F1(1, 18) = 0.67, ns; F2(1, 154) = 0.02, ns] were found.

In the analysis of the type of response, after partialling out the effects of AoA and familiarity, we replicated a main effect of "Valence" [F1(1, 18) = 18.94, p = 0.0001, r = 0.72; F2(1, 154) = 692.72, p < 0.0001, r = 0.90], no effect of "Arousal" and no interaction [Fs1(1, 18) < 1.04, ns; Fs2(1, 154) < 0.75, ns; Appendix B, b in Supplementary Material].

<sup>1</sup>The scale for AoA had the following intervals: 0–2, 3–4, 5–6, 7–8, 9–10, 11– 12, more than 13 years; whereas the scale for familiarity ranged from 1 (I never read/hear this word) to 7 (I very often read/hear this word). Age ranges for AoA were transformed into point values between 1 and 7 for the analyses. Mean AoA and familiarity ratings and SDs across participants were calculated and two continuous variables obtained.

#### Implicit Tasks Merged

In the RTs, we replicated a significant main effect of "Valence" [F1(1, 37) = 14.52, p < 0.001, r = 0.53; F2(1, 154) = 39.74, p < 0.0001, r = 0.45], no "Arousal" effect [F1(1, 37) = 0.01, ns; F2(1, 154) = 0.01, ns], and no interaction [F1(1, 37) = 0.43, ns; F2(1, 154) = 0.67, ns; Appendix B, c in Supplementary Material].

In the analysis of the type of response, we replicated a significant main effect of valence [F1(1, 29) <sup>2</sup> = 40.51, p < 0.0001, r = 0.76; F2(1, 154) = 463.22, p < 0.0001, r = 0.87] but no main effect of arousal [F1(1, 29) = 0.25, ns; F2(1, 154) = 0.97, ns]. A significant interaction between valence and arousal was only found in the analysis by participant [F1(1, 29) = 6.75, p < 0.05, r = 0.43; F2(1, 154) = 1.87, ns; Appendix B, d in Supplementary Material].

# DISCUSSION

The present study investigated reaction times and response type to high- vs. low-arousal positive and negative words in order to test the hypothesis that emotional valence and emotional arousal can affect explicit and implicit approach vs. withdrawal tendencies. In line with the idea proposed by Robinson et al. (2004) that stimulus valence (positive vs. negative) and stimulus arousal (low vs. high) can elicit conflict in processing if the two dimensions do not match (i.e., positive in valence but high in arousal, PH, or negative in valence but low in arousal, NL), we found slower reaction times in response to PH and NL compared to PL and NH words. Interestingly, this response pattern was found only in Study 1, in which participants were explicitly asked whether they would approach or withdraw from the presented words. Thus, this finding extends previous research, which used a range of tasks such as valence or arousal judgment, lexical decision, or the Affective Simon task (Robinson et al., 2004; Purkis et al., 2009; Eder and Rothermund, 2010; Citron et al., 2014c), by showing that both valence and arousal affect participants' explicit decisions on whether to approach or withdraw from the stimulus, as predicted by Robinson et al.'s valence-arousal conflict theory.

Nevertheless, we found that the decision to respond "approach" vs. "withdrawal" through button press was mainly driven by the more cognitively accessible dimension of emotional valence (e.g., Nicolle and Goel, 2012). In fact, participants responded "approach" significantly more often to positive words than to negative ones, and vice versa, irrespective of whether words were high or low in arousal. This is line with the idea that the valence dimension of a stimulus is associated with higher-order cognitive and evaluative processes (such as the ones involved in decision making, as in the present task), while arousal is associated with more automatic physiological reactions (e.g., Herbert et al., 2008; Kissler et al., 2009; for an overview, see Citron, 2012), which are less cognitively accessible (Nicolle and Goel, 2012). Thus, it appears that participants' decisions about whether to approach or withdraw from positive and negative stimuli is strongly influenced by the valence of the stimuli, whereas stimulus arousal appears to influence the speed of this decision in case of conflict (i.e., for PH and NL words). According to Robinson et al. (2004), this conflict occurs because arousal by itself carries emotional and motivational information in such a way that stimuli of high arousal are appraised as negative and elicit a withdrawal orientation whereas stimuli of low arousal are more likely appraised as positive, and elicit approach. Thus, conflict processing should primarily affect evaluation speed as reflected in reaction times, which slow down in response to PH and NL words.

Next, we investigated whether the interactive effects of emotional valence and arousal found during the explicit task will still arise during an implicit task which requires neither explicit evaluation of approach or withdrawal action tendencies (Studies 2 and 3), nor deep linguistic analysis of the words (Study 3). The task employed required participants to respond to visually presented words by pressing either an upper or a lower button (with arrows pointing upward vs. downward). This allowed us to test whether positive and negative words would automatically "push" participants' reactions into a specific direction in space that, according to previous research, would be associated with positive (up) or negative (down) meaning. In both studies (2 and 3) participants responded more often with "up" to positive words and with "down" to negative words, confirming that a mapping of valence onto spatial position is automatically activated. Akin to Study 1, we found that the type of response was not affected by the words' arousal level (low vs. high). Moreover, in contrast to Study 1, reaction times did not differ between PH and PL or NH and NL stimuli, suggesting that spontaneous responses to positive and negative words are primarily driven by "up" and "down" decisions, irrespective of the possibly conflicting information elicited by the arousal dimension of the stimulus. Rather, in Studies 2 and 3 the reaction time data showed a main effect of valence: positive words high and low in arousal were responded to faster than high and lowarousal negative words, suggesting no interference of the arousal dimension with evaluation speed. Thus, in order to activate conflict processing between information conveyed by the valence and the arousal dimension, it seems necessary to employ either an explicit approach vs. withdrawal evaluation task, or a task in which stimulus valence is completely irrelevant and a minimal degree of processing depth is required, such as in the Simon task or the LDT (Eder and Rothermund, 2010; Citron et al., 2014c). In the LDT, real words must be distinguished from pseudowords, i.e., orthographically legal letter strings that could be real words but do not possess any meaning in the target language. In this case, a minimum degree of processing depth is required in order to identify the words. Previous research has shown that, during a word identification task, if words are intermixed with nonrecognizable stimuli, no effects of a word's emotional content on either early or late ERP components associated with processing of the emotional content of verbal or pictorial stimuli are elicited, despite behavioral responses being at ceiling (Hinojosa et al., 2010). Thus, words can be identified correctly with no necessary access to their affective connotation.

However, one might doubt whether participants' "up" and "down" responses to positive and negative words actually reflect

<sup>2</sup>For some participants, it was not possible to calculate the residuals of the logistic regression because they had more than two missing values.

their implicit motivation to approach or avoid, despite a clear response bias to respond "up" to positive words and "down" to negative words, in line with research showing implicit activation of the conceptual mapping of emotional valence onto vertical position (Lakoff and Johnson, 1980; Meier and Robinson, 2004; Rotteveel and Phaf, 2004; Casasanto and Dijkstra, 2010). The task we employed might carry a confound: whereas up and downpointing arrows may be associated with high and low spatial position, the finger movement needed to press the upper vs. lower button may require finger extension and contraction and therefore be associated with withdrawal (i.e., pushing a concept away) and approach (i.e., pushing a concept toward oneself), respectively (Solarz, 1960; Cacioppo et al., 1993; Chen and Bargh, 1999). Hence, these two possible associations would lead to opposite predictions. Similarly, there exists research showing that limb extension vs. contraction can be associated with opposite tendencies, i.e., approach vs. withdrawal, respectively, depending on whether the task and instructions require an object-centered or participant-centered perspective (see for instance Lavender and Hommel, 2007; Eder and Rothermund, 2008; Seibt et al., 2008) Whether simple finger movements as required in the present study (instead of whole body or arm movements as required in previous studies) are actually associated with such tendencies regardless of their context is unclear. A task that congruently maps vertical spatial position onto approach could employ a vertically oriented axis with a central button and require participants to push either the upper or lower button by moving their whole hand and then pushing the button; in this way, no difference between finger extension or contraction would be present.

Nevertheless, from the present study we can still confidently conclude that the response decision in our implicit tasks is solely driven by the valence dimension (Nicolle and Goel, 2012) and its implicit mapping onto vertical space. Moreover, we can conclude that this decision is not differentially influenced by the arousal dimension (Meier and Robinson, 2004), therefore extending previous research.

So far, no previous study has investigated whether up and down responses are automatically primed by emotional stimuli and how both valence and arousal dimensions affect participants' responses to emotional stimuli in such tasks. For example, prior studies have either ignored stimulus-response compatibility effects (e.g., see a detailed discussion in Lynott and Coventry, 2014), and manipulated both valence and space associations (e.g., words presented either above or below on a computer screen) without considering arousal, or only compared positive and negative stimuli rated high in arousal with neutral, low-arousal stimuli.

Interestingly, in the implicit tasks positive words were responded to faster than negative ones. At first glance, this result could be due to the fact that finger movements that require stretching of the fingers are generally faster than finger movements that require a flexion of the fingers. An alternative interpretation of this finding comes from Lakens (2012), who proposed a polarity-based framework. According to this framework, various structural dimensions involved in a task can be considered as dimensions with a +polar and a −polar end, such that stimuli that fall under the "high" +polar end of a scale are processed preferentially and significantly faster than stimuli that fall under the "low" −polar end of this scale. A processing advantage for +polar vs. −polar ends has been demonstrated recently for various stimuli including positive (+polar) vs. negative (−polar) stimuli, spatial dimensions (up vs. down), and moral stimuli (e.g., Clark and Brownell, 1975); furthermore, polar elements of a stimulus or a task can be added together and predictions about processing advantages can be made (Lakens, 2012; Lynott and Coventry, 2014). Notably, even the finger movements required in Studies 2 and 3 could be grouped according to the polarity account into +polar and −polar movements. Thus, responses to positive words are characterized by at least two and up to three +polar ends, i.e., positive valence, "up" position, and finger stretching, whereas responses to negative words have 2 up to 3 −polar ends, i.e., negative valence, "low" position, and finger flexion, therefore causing faster RTs for the former stimuli. Still, this reaction time advantage for +polar over −polar stimuli is not affected by arousal.

The successful replication of all of our results (including Studies 1, 2, and 3) after having partialled out possibly confounding effects of additional variables known to correlate with emotional valence strengthens the effects found in our three experiments, although the interactive effect in the explicit task seems to be somewhat less robust than the main effects. However, the replication of the valence effect only in the implicit tasks seems to strengthen the validity of our findings and shows that these are not spurious effects due to imbalance in AoA or familiarity ratings.

Nevertheless, one possible limitation of the present study might concern the stimulus selection. We aimed to reduce the perceived discrepancy between the arousal level of positive and negative words. In fact, in well-matched experimental selections, negative stimuli with high arousal (matched with positive stimuli) may very likely be perceived as only mildly negative; this is because, in our natural environment, negative stimuli have a clearly higher arousal level than positive ones. By mimicking this natural distribution in our experimental manipulation, on the one hand we have the advantage of reducing or eliminating a perceptual bias (Citron et al., 2014a,c) and of using more ecologically valid stimuli. However, on the other hand we also use a numerically unbalanced 2×2 manipulation, e.g., PH stimuli do not differ from NH stimuli only in valence, but also in arousal. However, in case of absence of genuine interactive effects in our data, we would expect a very large arousal effect based on the "numerically unbalanced" stimulus manipulation, i.e., a very large RT difference between the NH and PL words (highest and lowest arousal level, respectively), with an advantage for the former condition. This was not the case in our data.

In addition, the lack of neutral stimuli could be considered a weakness as the emotionality of the material comes out as more obvious. The inter-stimulus interval is relatively short, possibly causing transfer effects of affective variables from one trial to the subsequent one. However, our stimuli were randomized differently across participants, so transfer effects cannot be systematic. Furthermore, Studies 2 and 3 would have benefited from an occasional control task to make sure participants were reading the words. Nevertheless, the systematic mapping between valence and vertical position confirms that participants must have read the words. In fact, the participants we excluded from the analyses responded by pressing the same button for all trials. Finally, the participant samples in Studies 1 and 3 have an unbalanced gender proportion (i.e., more women), unlike Study 2. This might limit the comparison across studies and generalization of the results to a larger, balanced population.

Finally, there exists research showing that individual differences in empathy can affect the strength of emotionembodiment associations. Specifically, faster reaction times to feeding disgusted faces compared to happy or neutral faces and slower feeding times for happy than disgusted or neutral faces were found; crucially, higher scores on an empathy scale among participants led to stronger effects (Ferri et al., 2010). Therefore, in the present study it would have been interesting to investigate how individual differences in empathy would affect the results found. This is something that may be addressed in future research.

To conclude, the present work furthers empirical research on affective processing and our understanding of the interaction between emotional valence and arousal. While prior research showing interactive effects of emotional valence and arousal during explicit as well as implicit tasks assumed that such effects were due to an integration of implicit approach-withdrawal tendencies (Robinson et al., 2004; Larsen et al., 2008; Hofmann

# REFERENCES


et al., 2009; Purkis et al., 2009; Eder and Rothermund, 2010; Feng et al., 2012; Citron et al., 2014a,c; Recio et al., 2014), the present work explicitly tested this assumption and found evidence for this account only in tasks that require explicit approach-withdrawal decisions for positive and negative words but not in implicit tasks characterized by spontaneous decisions (up vs. down) to positive and negative meaning.

# ACKNOWLEDGMENTS

Francesca M. M. Citron was funded by an Einstein Visiting Fellowship awarded to Professor Adele Goldberg by the Einstein Foundation of Berlin, in conjunction with the Cluster of Excellence "Languages of Emotion," Free University of Berlin, further supported by the DFG-HE5880/3-1 grant awarded to Cornelia Herbert, and by the University of Ulm in the funding programme Open Access Publishing. The authors would like to thank Michael Kucharski and Nora Michaelis for their help with data collection as well as two reviewers for their helpful and constructive comments, which substantially contributed to improving our manuscript.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.01935

way: neuroimaging evidence for an approach-withdrawal framework. Neuropsychologia 56, 79–89. doi: 10.1016/j.neuropsychologia.2014.01.002


of negated emotional words. Cogn. Affect. Behav. Neurosci. 11, 199–206. doi: 10.3758/s13415-011-0026-1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Citron, Abugaber and Herbert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# N450 and LPC Event-Related Potential Correlates of an Emotional Stroop Task with Words Differing in Valence and Emotional Origin

Kamil K. Imbir <sup>1</sup> \*, Tomasz Spustek <sup>2</sup> , Joanna Duda<sup>2</sup> , Gabriela Bernatowicz <sup>2</sup> and Jarosław Zygierewicz ˙ 2

<sup>1</sup> Faculty of Psychology, University of Warsaw, Warsaw, Poland, <sup>2</sup> Faculty of Physics, University of Warsaw, Warsaw, Poland

Affective meaning of verbal stimuli was found to influence cognitive control as expressed in the Emotional Stroop Task (EST). Behavioral studies have shown that factors such as valence, arousal, and emotional origin of reaction to stimuli associated with words can lead to lengthening of reaction latencies in EST. Moreover, electrophysiological studies have revealed that affective meaning altered amplitude of some components of evoked potentials recorded during EST, and that this alteration correlated with the performance in EST. The emotional origin was defined as processing based on automatic vs. reflective mechanisms, that underlines formation of emotional reactions to words. The aim of the current study was to investigate, within the framework of EST, correlates of processing of words differing in valence and origin levels, but matched in arousal, concreteness, frequency of appearance and length. We found no behavioral differences in response latencies. When controlling for origin, we found no effects of valence. We found the effect of origin on ERP in two time windows: 290–570 and 570–800 ms. The earlier effect can be attributed to cognitive control while the latter is rather the manifestation of explicit processing of words. In each case, reflective originated stimuli evoked more positive amplitudes compared to automatic originated words.

Keywords: duality of emotions, emotional stroop task, mechanisms of cognitive control, ERP, emotional words

# EMOTIONAL STROOP TASK

The Emotional Stroop Task (EST) is a modification of the standard procedure introduced by Stroop (1935), which allows measuring the cognitive control in the case of interference control (Nigg, 2000). Interference in EST is made by the affective content of a word and arises as a result of the competition of two processes (Imbir, 2016a). The first process is an automated reading and understanding of the semantic meaning of words. This captures participants' attention and generates slowdown in the second, controlled process. The second process is related to the task of naming of the font color. In the classical Stroop Test, the interference is caused by the understanding the meaning of words, which are the names of colors. The interference is observed when comparing congruent (e.g., responding "red" to a word "RED" written in red font) and incongruent (responding "red" to word "BLUE" written in red font) trials. The congruent trials are perceived as easier and are performed faster. The EST differs from the classical Stroop Task in the nature of the interference measured (Nigg, 2000; Larsen et al., 2006). In the EST, an incongruent

#### Edited by:

Cornelia Herbert, University of Ulm, Germany

#### Reviewed by:

Jianfeng Yang, Shaanxi Normal University, China Nathaniel Delaney-Busch, Tufts University, United States

> \*Correspondence: Kamil K. Imbir kamil.imbir@gmail.com

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 17 February 2016 Accepted: 15 May 2017 Published: 30 May 2017

#### Citation:

Imbir KK, Spustek T, Duda J, Bernatowicz G and Zygierewicz J ˙ (2017) N450 and LPC Event-Related Potential Correlates of an Emotional Stroop Task with Words Differing in Valence and Emotional Origin. Front. Psychol. 8:880. doi: 10.3389/fpsyg.2017.00880

and congruent trials are constructed by carefully choosing words so that they differ only in one affective factor (e.g., valence or arousal), while they are matched in respect of other properties (e.g., frequency, grammatical class, or length). This allows drawing an unambiguous conclusion from collected data. As the incongruent trials, involving the automatic disposition of attention toward a task-irrelevant lure (supposed to be analogical to the classical Strop effect) are treated those with extreme levels of the chosen factor, while congruent are those with moderate or neutral levels of the factor. For example, as far as valence is considered in the EST, trials with positive or negative words are considered incongruent while trials with neutral words are considered congruent (Burt, 2002; Imbir and Jarymowicz, 2013).

# Factors Underlying Behavioral Effects in EST

The behavioral phenomenon of EST has been discovered in clinical trials. Subjects experiencing particular trauma had longer reaction times for trauma-related words than for other words (Watts et al., 1986; McKenna and Sharma, 1995, 2004). The EST effects were demonstrated also in a clinical and subclinical psychological probes suffering from anxiety disorders (see Williams et al., 1996 for a review). The EST appeared to be a useful tool to detect the source of anxiety, because longer reaction times were observed for words connected in meaning with a particular source of threats. Subsequently, it was reported that EST slowdown could be observed in a normal population with no trauma experience (Nigg, 2000; Larsen et al., 2006; Siakaluk et al., 2014). Valence was shown to influence reaction latencies in cases of words with negative valence (e.g., Williams et al., 1996; McKenna and Sharma, 2004) and also with positive valence (e.g., Pratto and John, 1991; Richards et al., 1992; McKenna and Sharma, 1995). The effect for positive words was usually smaller than for negative words. Also, individual experience with objects or states represented by the words was shown to boost the slowdown (Reiman and McNally, 1995).

Careful inspection of valence effects revealed that other factors, not controlled in advance, could explain the behavioral effects (Burt, 2002; Larsen et al., 2006). Those factors, identified as far, were arousal load and frequency of appearance in language. It appeared that arousal causes slowdown that is independent from valence (Dresler et al., 2009; Imbir, 2016a). High arousing words result in a greater slowdown in reaction latencies than low arousing stimuli. Also less frequent words cause higher slowdown than more frequent ones (Burt, 2002). Some recent results suggest that origin of an affective state may be an another factor responsible for slowdown observed in EST (Imbir and Jarymowicz, 2013), thus the question concerning factors underlying EST performance is still open. Since the valence effect was shown to be confusing and appeared to be blurred by other factors effects, we decided to ask a question concerning the nature of emotion and its' influence on EST. To find the answer, we have applied the dual-mind theories perspective, especially recently introduced framework concerning emotion and cognition interactions viewed from dual mind perspective (Imbir, 2016b).

# Duality of Mind and EST

Recently, the duality of emotion framework was proposed in order to explain diversity of emotions (Jarymowicz and Imbir, 2015) as well as emotion-cognition interactions (Imbir, 2016b). This proposition is based on duality of mind theories (for broad review see Gawronski and Creighton, 2013), distinguishing between so-called automatic and controlled processes. There is a huge diversity of dual-mind theories, but all of them highlight the main role of the above-mentioned processing modes. Taking into account Epstein's (2003) proposition of existence of two aspects of mind, namely experiential and rational, we argue that both have their cognitive (c.f. Strack and Deutsch, 2004, 2014; Kahneman, 2011) manifestations in form of associative (or heuristic) vs. systematic (or rational) processes and emotional (Jarymowicz and Imbir, 2015) manifestations in form of automatic vs. reflective originated emotional states (c.f. Imbir, 2016b). In the classical view on emotions seen from the dual-mind theory framework, they were thought to be associated only with simplified processing of experiential mind (c.f. Epstein, 2003; Kahneman, 2011), but such view was not sufficient to describe more complex, self-conscious emotions (Weiner, 2005). The emotion duality model (Jarymowicz and Imbir, 2015) states that emotional experiences themselves can originate due to either automatic or reflective evoking mechanisms. This implicates, that affective processes are not necessarily automatic and may be understand as a results of controlled and rational mind processing (c.f. Reykowski, 1989; Strack and Deutsch, 2004, 2014). What is more, according to the classical theories only one of the mind-system could be tested in a given experimental protocol with use of a single task, attention manipulation or stimulus presentation parameters, while the emotional duality model applied to emotional words processing allows to study activation of either of the systems within the same experimental protocol, for the same subject, by treating the ability of evoking automatic or controlled mind processes as inherent property of individual word meanings. This fact address the most important critique addressed to dual-mind perspective (c.f. Ferguson et al., 2014).

In the case of experiential mind, so-called automatic originated emotions are characteristic for the direct affective responses to environmental stimulation. Such responses do not need language to appear and we assume that they are based on evaluation of criteria of biological value (Damasio, 2010). Certain objects help in maintaining life and thus are automatically evaluated as pleasant (e.g., fatty and sweet foods), while other things are threats to life and thus are evaluated as unpleasant (e.g., smelly or sour meals). The experiences of automatic originated emotions can be labeled with words (e.g., pain) and thus are widely represented in language (Rolls, 2000). In the case of rational mind, so-called reflective originated emotions are postulated (Jarymowicz and Imbir, 2015). Their most characteristic feature is that formation of reflective originated emotional states requires language (Reykowski, 1989; Strack and Deutsch, 2004, 2014). The propositional mechanisms in the form of evaluative standards (Reykowski, 1989) serve as a source of reflective emotions. Reflective emotion arises when a situation or behavior is compared to a standard represented in the mind. It is clear that the evaluative standards are both subject dependent and plastic; thus different reflective emotions toward single situation can arise in different subjects, or even in the same subject at different times (Jarymowicz and Imbir, 2015).

To measure the nature of affective reaction (automatic or reflective), the origin dimension was proposed and used in the case of assessing affective reactions to words (Imbir, 2015, 2016b). On the theoretical level (c.f. Epstein, 2003; Strack and Deutsch, 2014; Jarymowicz and Imbir, 2015; Imbir, 2016b) emotional origin is a clearly dichotomic factor, but the direct measurement of the processing style underlying emotion formation is yet not possible, thus we have to base on subjective perception toward the processing mechanisms. Origin dimension is measured on a scale constructed as a type of Self-Assessment Manikin (SAM) scale (Lang, 1980). This scale allows for non-verbal assessments of feelings connected with presented stimuli (c.f. **Figure 1**). SAM scale was supplemented with a description of its meaning in order to provide unambiguous interpretation of the "origin" concept. We think that origin is not an intuitive dichotomy when emotions are considered (Jarymowicz and Imbir, 2015). Origin is rather a hidden underlying mechanism (Russell, 2003); thus we use in creation of SAM scale the heart vs. mind metaphor (c.f. **Figure 1** legend), widely represented in culture, as a good exemplification of dual-mind dichotomy of underlying processes (Imbir, 2015). Heart represents immediate reactions that do not require hesitation, in contrast to mind that represents careful inspection of all opportunities and interpretations of the situation. We argue that some of the emotions are based on nonverbalized criteria of evaluation, as proposed by Damasio's (2010) biological value. Other emotions require cognitive resources and language to interpret and appraise reality. Mechanisms underlying those more cognitive-based emotions are evaluative standards (Reykowski, 1989) or propositional thinking (Strack and Deutsch, 2004) based on processing with the use of sentences and rules of logic. The first case describes so-called automatic emotions and the second case reflective ones. Origin SAM scale allowed for reliable measurement of perception of automatic vs. reflective origins of affective reactions to words (Imbir, 2015, 2016c). Nevertheless, it appeared that not all stimuli had unambiguous associations of their origins. In real world most of states are results of activation of both mental systems (Epstein, 2003; Kahneman, 2011; Jarymowicz and Imbir, 2015). Also this aspect was found in the SAM scale measures collected. Some words received moderate assessments (based on ambiguous interpretations made by different people), thus in fact not allowing for specification of certain origin. We treat those words as stimuli with no specified origins, because no particular and clear associations were drawn (cf. Imbir et al., 2016). Distinct mechanisms underlying affective processes formation, that are reflected in words connotations defined from a dualmind perspective, can be compared in a single experiment, due to high level of similarity for materials specific to both mind systems (Imbir, 2017). In a traditional view on dualmind systems, there was an expectation to operationalize them as a distinct, especially because experiential system works in visual representations (Epstein, 2003), while reflective systems is based on verbalizations (Strack and Deutsch, 2004). The lack of

possibility to create a single experiment testing consequences of both mind systems was the main weakness raised for dual-mind perspective (Ferguson et al., 2014). Origin factor, found to be reliably measured for words (Imbir, 2016c), offers an important advantage for understanding the role of dual mind processes in word processing and understanding of affect influence on cognitive control. **Figure 1** presents the SAM scale for origin assessments.

EST is a type of task that involves the two types of processes: automated and controlled (Imbir, 2016a). Explicit task is the controlled one. It is not a standard action to ignore the meaning of a word and focus on its font color instead. Processing of this action requires an effort (Kahneman, 2011) in order to prevent (Nigg, 2000) more automated reading of words and subsequent understanding of their semantic meaning (Imbir, 2016a). The current study tests a hypothesis, posted in dualmind model of emotion-cognition interactions (Imbir, 2016b), stating that cognitive and emotional processes are in fact results of broader mental systems (experiential and rational). If it is true (c.f. Imbir, 2017), triggering automatic emotion should activate experiential mind processing (automatic one), while reflective emotions should activate rational mind mechanisms (controlled ones). Taking into account the dual nature of EST, one may conclude that processing in this type of task should be specific to the nature of stimuli presented, stimulating one or another system, thus influencing the pool of resources available for completion of the task (c.f. Imbir, 2016a; p. 4, **Figure 2**). Automatic originated stimuli, as associated with experiential mind, should activate or enhance automated part of EST, thus enlarge reaction latencies. Simply automatic originated stimuli should make reflex of reading stronger, because decoding of those stimuli triggers the experimental mind responsible for automated actions. Opposite effects should be observed for reflective originated stimuli. They should activate controlled part of EST, as they are associated with rational mind governing controlled processing. Understanding of the reflective originated stimuli meaning should trigger rational mind and thus controlled processing should be stronger.

Guided by described above expectations, a behavioral results of early EST study, involving words that differed in levels of

origin and valence (Imbir and Jarymowicz, 2013), showed that valence effects disappear when stimuli are controlled for origin (contrasted orthogonally with valence). Slowdown in reaction latencies was observed when automatic originated words were used in EST, but it was not observed when neutral or reflective words were considered. It is important to highlight the fact that words used in the Imbir and Jarymowicz (2013) study were selected by competent judges (assessing compliance with automatic and reflective systems of evaluation definitions) and thus were explicitly connected with both investigated origins (automatic vs. reflective). Words from Imbir and Jarymowicz (2013) study were contrasted by valence and origin levels and matched with respect to their frequency, but were not matched with arousal. Therefore, we decided to probe the effects with a new set of words, carefully chosen from assessments performed in normative studies for words (Imbir, 2015, 2016b).

Using Russell's (2003) concept of bimodal affective space, composed of two orthogonal dimensions of (1) valence, representing pleasantness vs. unpleasantness of an affective state, and (2) arousal, representing the activation underlying an affective state, we decided to check each dimension in the context of duality of emotion framework. In dual mind emotion-cognition relation framework (Imbir, 2016b), there is a postulate of existence of a mind system specific aspects of both dimensions, namely origin of an affective state for valence and subjective significance for arousal. Taking into account origin of an affective state, we treat it as a process that influences affective processing but should be thought of as outside of valence and arousal affective space. Defined by Russell (2003) valence and arousal are the simplest conscious accessible affective feelings. Origin is not an intuitive factor and as mentioned earlier, the measurement of origin was not unambiguous (c.f. Materials and Methods Section). Apart this, both automatic and reflective originated states should possess two distinct valences, as well as they should differ one another by the levels of arousal. Origin reflect the basic distinction of two types of processing (based on automatic vs. controlled mechanisms) in the domain of emotion formation, while valence and arousal affective space define subjectively perceived pleasantness and activation. As far as the activational aspect of an affective reaction is considered, we distinguish between arousal-like activation, which is specific to the experiential system (Epstein, 2003), and a postulated subjective significance (Imbir, 2015), which is an activation for reflective mind processes (Imbir, 2016b). It was shown (c.f. Imbir, 2016a) that subjective significance shaped the arousal effect—the slowdown of reaction latencies observed in EST—in such a way that latencies were reduced for both low and high subjectively significant and high arousing stimuli compared to medium subjectively significant and high arousing stimuli. An analog of this pattern of behavioral results was observed in an electrophysiological (EEG) study focusing on Event-Related Potential (ERP) correlates of processing both dimensions in EST (Imbir et al., submitted, second list described in Procedure Section). Although comparable manipulation of origin and valence did not reveal behavioral effects (unpublished research report), we decided to check if the ERP measurements would uncover effects of manipulation of origin and valence factors, since ERP measurements are more sensitive and are able to show underlying processes even when the behavioral level of analysis shows no differences (Thomas et al., 2007).

# ERP Correlates of Word Processing and EST Performance

Studies conducted so far identified a number of ERP components that are altered by processing of emotional words. These are typically labeled as P1, N1, P2, Early Posterior Negativity (EPN), P3, N450, and Late Positive Complex (LPC) components (Van Hooff et al., 2008; Citron, 2012). Each of them correlates with a specific aspect of task processing or interference control required while performing the task. The first component is P1, typically observed at around 80–130 ms after stimulus onset, with the maximum located at the occipital areas (Hillyard et al., 1998; Van Hooff et al., 2008). The early timing and location suggest that P1 is the component related to early visual processing and attention employment (Citron, 2012). It has maximal amplitude over occipital regions, which suggests that it originates from the extrastriate areas of visual cortex (Sass et al., 2010). Amplitudes of P1 were found to be larger for attended than unattended stimuli (Hillyard et al., 1998). In addition valence can influence the amplitude of this component, and enlarge amplitudes for negative words compared to neutral (Van Hooff et al., 2008). The next deflection is called N1 and was found to differentiate valence of words used in a Posner-cued attention task (Pérez-Edgar and Fox, 2003). Observed amplitudes were smaller for negative than for positive and neutral words (in N1, but also in the N2 component).

The P2 component with maximum amplitude observed at about 200–250 ms (Van Hooff et al., 2008) was found in several studies to be sensitive to the emotional meaning of words. Unfortunately the pattern of results for this component is rather inconsistent. Enlarged amplitudes can be observed for positive words only (Schapkin et al., 2000), negative words only (Huang and Luo, 2006), or both positive and negative words (Carretié et al., 2004; Herbert et al., 2006). In the EST paradigm, P2 was shown to be sensitive to words related to some threat, eliciting larger amplitudes than neutral words (Thomas et al., 2007). In our previous study (Imbir et al., submitted) the P2 component was found to follow strictly the behavioral results, so it is probable that in the P2 time range the control of inhibition is manifested (Nigg, 2000). The EPN is the last one of early component associated with word processing rather than cognitive control. EPN is a negative deflection of amplitude appearing on occipitotemporal sites, peaking between 200 and 300 ms after stimulus onset (Citron, 2012). During silent reading the amplitude in EPN was found to be larger for emotionally valenced words (positive and negative) than neutral words (Kissler et al., 2007; Herbert et al., 2008). This component is therefore treated as an indicator of motivated attention.

In the literature examining ERP correlates of cognitive control, the traditional version of Stroop Task is more popular (Duncan-Johnson and Kopell, 1981; Rebai et al., 1997; West and Alain, 1999, 2000; Liotti et al., 2000; West, 2003) than the modified one, including the emotional version of this task (Metzger et al., 1997; Pérez-Edgar and Fox, 2003; Thomas et al., 2007; Van Hooff et al., 2008; Taake et al., 2009). The first component associated especially with EST performance is the N450 (West and Alain, 2000). It occurs at about 350–500 ms after stimulus onset. This component is most pronounced in fronto-central locations, but may also have a form of broadly distributed negativity (Van Hooff et al., 2008). Amplitude of this component is more negative for incongruent than congruent trials (West, 2003; West et al., 2004). The underlying mechanism might be the activation of the anterior cingulate cortex (Liotti et al., 2000). The N450 in the EST was found to be sensitive to the valence of presented words, showing greater negativity of amplitude after negative words and causing behavioral slowdown in reaction latencies (Van Hooff et al., 2008).

The second component found to be influenced by interference control in Stroop Task is P3, sometimes identified with LPC (Sass et al., 2010). The P3 in EST shows centro-posterior localization within 340–600 ms time range. This component was originally detected in the oddball paradigm and was thought to be a manifestation of surprise when less frequent stimuli appear (Luck, 2005), or it can be a manifestation of update of recalled memories content or process of event categorization (Coles et al., 2000). P3 is supposed to be reflection of automatic attention shifted to stimuli having meaning in the context of task requirements, in other words stimuli that are motivationally relevant to the task (Hajcak et al., 2010). Polarized affective valences are thought to be the factors that indicates validity of stimulation and its significance, therefore triggers attention toward such stimuli and elicits more positive amplitudes of P3 component (Naumann et al., 1992). The LPC is claimed to have a predominantly parietal distribution, peaking ∼500– 800 ms after stimulus onset (Citron, 2012). The amplitude is higher for threatening words than neutral ones, even without reaction latencies behavioral differentiation between categories in healthy (no trauma reported) individuals (Thomas et al., 2007). This component was found to be sensitive to valence, reward and motivational significance of the experimental procedures as well as when more controlled, explicit cognitive processes are required from the task (Citron, 2012). Some evidences suggests that LPC effects can accompany even automatic actions during execution of evaluating priming tasks (Herring et al., 2011). Nevertheless, LPC is claimed to be a manifestation of later stages of semantic processing (Sass et al., 2010; Zhang et al., 2014) associated with conscious recognition of stimulus (Hajcak et al., 2010). From that reason LPC may be interpreted as manifestation of understanding of the word connotations (Citron, 2012), but the scientific debate over this issue is still open, especially because the results for word processing in the LPC time range are rather inconsistent. Some authors (e.g., Cuthbert et al., 2000; Herbert et al., 2006, 2008) found that processing of positive words evoked a more positive LPC amplitude than neutral or negative words while others reported the opposite pattern of results (e.g., Kanske and Kotz, 2007; Hofmann et al., 2009; Schacht and Sommer, 2009; Gootjes et al., 2011); a more positive LPC amplitude to negative words than neutral or positive words. Those inconsistencies might be due to some other differences in materials used, such as concreteness (Kanske and Kotz, 2007) or the origin of affective response (Imbir, 2015; Imbir et al., 2015).

# Aim and Hypothesis

The aim of our current study was to check if the factors: valence and emotional origin of stimuli modulate ERP correlates of EST processing. Valence and origin were operationalised with use of SAM scales (c.f. **Figure 1** and Imbir, 2015). Using SAM scales rating we have created factorial manipulation for both factors. Current study is based on the same stimuli as our previous experiment concerning Lexical Decision Task (Imbir et al., 2016); thus it is worth to compare the results of both. The LDT involves involuntary semantic processing, but this processing does not interfere with the task performance. The EST also involves involuntary semantic processing, but this cause the interference and slowdown in reaction latencies. Results of LDT may give us chance to draw some expectations concerning origin effects in EST. Correlates of involuntarily semantic processing in LDT (Imbir et al., 2016) were localized and affected two time ranges: 290–375 and 375–670 ms after stimulus onset. The task was to discriminate words from pseudo-words; thus no meaning processing was required. First time range was identified as FN400 component, found to be a manifestation of stimuli familiarity, greater for words than non-existing pseudo-words (c.f. Curran, 2000). We found the main effect of valence in centro-frontal ROI, showing more positive amplitudes for positive words than for neutral and negative words (Imbir et al., 2016). The subsequent 375–670 ms time range we identified as LPC component and the main effect of origin was identified in left-parietal ROI. Amplitudes for Automatic and Reflective originated words were more positive than amplitudes for control words (Imbir et al., 2016).

We intended to search amplitude differences in components typically reported for EST such as P2 and N450, as well as those associated with stimuli meaning connotations and associations processing such as LPC. We expected to find the amplitude differences in P2 component to be related to behavioral differences. This expectation was based on results concerning list of words differing in arousal and subjective significance levels (Imbir et al., submitted). Another previous work indicated that automatic-originated words interfered with task performance on an EST more than reflective-originated words (Imbir and Jarymowicz, 2013). This could be due to the automatic-originated word meanings capturing attention and/or requiring more resources to suppress. If so, we might expect automatic (vs. reflective) words to elicit a larger N450 amplitude, indicating that more cognitive control resources were required to resolve the conflict induced by the triggered deviation from task demands. We might also expect reflective (vs. automatic) words to elicit a larger LPC amplitude, indicating more broad (multicriteria based) context evaluative processes characteristic for reflective evaluative system (Jarymowicz and Imbir, 2015), that is distinct from words complexity represented in concreteness (Kanske and Kotz, 2007; Palazova et al., 2013).

# MATERIALS AND METHODS

# Participants

The subjects (female = 16, male = 16), aged from 19 to 26 years (M = 21.63, SD = 1.98), were students at different Warsaw colleges and universities. They took part in the experiment voluntarily, for a small reward. All of the participants were right-handed, native Polish language speakers with normal or corrected-to-normal vision. Participants provided their verbal informed consent to participate in the presence of at least two lab members, which was documented in a research diary. We did not collect any personal data from our participants, to assure their anonymity. This procedure was suggested by the bioethical committee. The design, experimental conditions and consent procedure for this study were approved by the bioethical committee of the Maria Grzegorzewska University.

# Design

We investigated the behavioral and electrophysiological measures related to the reading of emotional words. We manipulated the factors of valence (3 levels) and origin (3 levels), while controlling the following properties of words: arousal, concreteness, frequency of appearance in language and length. The distribution of variables: response accuracy and number of correct and artifact-free trials was not Gaussian, therefore the significance of effects concerning these variables was assessed by means of the Friedman test for replicated block design. The effects concerning other variables, with approximately normal distribution, were assessed by means of ANOVA with repeated measures.

# Linguistic Materials

Linguistic materials were chosen from an Affective Norms for Polish Words Reload (ANPW\_R: Imbir, 2016c) dataset from among 4900 Polish words. The stimuli selection was aimed to create the 3 (valence: negative, neutral and positive) × 3 (origin: automatic, not specified and reflective) factorial manipulation with control for another potentially important factors, such as arousal, concreteness, frequency, or words' length. Valence of feelings toward stimuli was measured with use of bipolar scale varied from 1(negative feelings) to 9 (positive ones). Origin scale was also bipolar and varied from 1 (of automatic origins) to 9 (of reflective origins). Only nouns from ANPW\_R were selected. For the different levels of valence and origin we selected words, rated respectively: below −1 SD, from −0.5 to 0.5 SD, and above 1 SD from the average rating in the corresponding dimension. Further, the selected words had medium ratings (between −0.5 and 0.5 SD) for arousal and for concreteness. The selection procedure also ensured an equalization of the frequency of appearance and length (NoL) of words. Frequency estimations were based on online internet Polish texts (Kazojc´, 2011) and represented the number of occurrences of each word in the whole database used. The distribution of values in this database was right-skewed, but was corrected by natural logarithm LN transformation enabling the application of parametric statistics. Thus, all analyses we conducted used the LN of frequency estimation. This procedure has led us to select 15 words in each of nine categories (c.f. Supplementary Material). **Table 1** presents mean values of manipulated as well as controlled factors for each of 9 experimental groups of words. **Table 2** presents list of words in each category.

The properties of construction of the manipulation were assessed by means of 3 (valence levels) × 3 (origin levels) ANOVA analyses for each dimension measured. In the case of manipulated variables we have found for **valence ratings** significant differences for valence levels: F(2, 126) = 607.44, p < 0.001, η <sup>2</sup> = 0.91, but not for origin levels: F(2, 126) = 1.88, p = 0.16, η <sup>2</sup> = 0.03, nor for interaction between valence and origin levels: F(4, 126) = 2.09, p = 0.086, η <sup>2</sup> = 0.062. For **origin ratings** we have found significant differences for origin levels: F(2, 126) = 254.55, p < 0.001, η <sup>2</sup> = 0.80, but not for valence levels: F(2, 126) = 1.27, p = 0.28, η <sup>2</sup> = 0.02, nor for interaction between valence and origin levels: F(4, 126) = 0.5, p = 0.74, η <sup>2</sup> = 0.016.

In the case of controlled variables no statistically significant effects were found for three dimensions. **Arousal ratings**: no statistically significant effects were found [statistics summary: between valence levels: F(2, 126) = 1.98, p = 0.14, η <sup>2</sup> = 0.02, origin levels: F(2, 126) = 1.44, p = 0.24, η <sup>2</sup> = 0.02, interaction between valence and origin levels: F(4, 126) = 0.5, p = 0.72, η 2 = 0.016]. **Concreteness ratings**: no statistically significant effects were found [statistics summary: for valence levels: F(2, 126) = 1.19, p = 0.31, η <sup>2</sup>= 0.02, for origin levels: F(2, 126) = 0.4, p = 0.67, η <sup>2</sup> = 0.006, for interaction between valence and origin levels: F(4, 126) = 0.12, p = 0.98, η <sup>2</sup>= 0.004]. **Frequency of the words' appearance** in the Polish language (after logarithm transformation, with data taken from Kazojc´, 2011, dataset), showed no statistically significant effects [for valence levels: F(2, 126) = 2.3, p = 0.11, η <sup>2</sup> = 0.04, for origin levels: F(2, 126) = 1.0, p = 0.37, η <sup>2</sup> = 0.016, for interaction between valence and origin levels: F(4, 126) = 0.44, p = 0.78, η <sup>2</sup>= 0.014]. **Average length of the words** revealed no significant effects either between valence levels [F(2, 126) = 2.01, p = 0.14, η <sup>2</sup> = 0.03], or the interaction between valence and origin groups of levels [F(4, 126) = 0.82, p = 0.52, η <sup>2</sup> = 0.025]. But there was a difference between origin levels [F(2, 126) = 3.48, p = 0.034, η <sup>2</sup>= 0.052]. The post-hoc analysis showed that the difference concerned words of an automatic


TABLE 1 | Descriptive statistics (M, SD) for groups of words used in factorial manipulation (Source: Imbir et al., 2016).

origin vs. words of no particular origin: t(132) = 2.62, p = 0.01. Words of automatic origin were M = 7.3 (SEM = 0.3) letters long while words of no particular origin were M = 6.2 (SEM = 0.3) letters long. The other differences appeared insignificant. The linguistic materials are the same as in our previous studies (e.g., Imbir et al., 2016); thus more details concerning linguistic materials properties can be obtained there (c.f. **Table 1**, Imbir et al., 2016). A full list of stimuli used in the experiment and their affective assessments values is presented in Appendix 1 (Supplementary Material).

# Procedure

Subjects were seated in a comfortable chair. The words were displayed on a 15.6-inch LCD screen at a distance of ∼1 m from the subjects' eyes. The font was Helvetica 50 point size. Simultaneously with the target word cues indicating initial letters of Polish names of possible colors: P—orange (pomaranczowy ´ ), C—red (czerwony), Z—green (zielony), N—blue (niebieski), was displayed at the bottom of the screen. Each participant performed a training session to learn what the task was and how to perform it correctly. The training consisted of 20 initial trials (naming color squares displayed in one of the four target colors, reading color-meaning words) followed by 60 standard Stroop Tests (Stroop, 1935) i.e., naming the font color—both congruent and incongruent presented in random order. After those trial sessions, the main experiment was introduced, based on EST with use of emotional words selected. Each time participants were encouraged to respond as quickly and as accurately as possible. The subject's task in the main part of the experiment was to indicate the font color of the emotional words by pressing a response key labeled by the one of the letters P, C, Z, N. The experimental protocol is depicted in **Figure 2**.

The timing of a single trial in the main part of experiment was the following: a fixation cross was displayed for 700 ms; next a word was presented for as long as it took the subject to read and respond to it (no timeout was implemented, the exceptionally long responses were excluded from the offline analysis); after detecting the response the screen went blank for 300–400 ms. The trials were grouped, so that 15 words of homogeneous properties (i.e., the same level of valence and origin) were presented consecutively. We decided on a block design because EST effects are more pronounced in this type of presentation, in fact larger behavioral effects were found for block design in comparison to fully random presentation of words (c.f. Bar-Haim et al., 2007). The subject could rest for 3 s after the presentation of each group. There were altogether nine groups, one for each possible combination of factor levels (3 valence × 3 origin), comprising a list of 9 groups (9 × 15 = 135 words).


The order of groups on the list, as well as the order and font color of words within each group, was fully randomized for each participant in each repetition. The experimental session had three repetitions of the list separated by a longer, self-adjusted by the subject, break. This means that single group of words (e.g., negative of reflective origins) consisted of 45 trials (3 × 15). When considering main effects, single group of words (e.g., automatic originated) consisted of 135 trials (3 × 45).

The whole experiment was composed of three repetitions of 2 distinct lists of words. First list (described above) was designed to measure valence-origin factors influence on EST. Second list was designed to operationalize factorial manipulation of arousal and subjective significance, the two activational factors postulated in dual-mind model of Emotion-Cognition interactions (Imbir, 2016b). The second list was the same as used in earlier behavioral study with dual-mind approach to understanding of EST (c.f. Imbir, 2016a). One hundred and thirty-five items from the second list contrasted orthogonally 3 levels of arousal (low, medium, high) and 3 levels of subjective significance (low, medium, high). Also valence, concreteness, frequency of appearance and length were controlled. The tasks, related to each list did not interfere with each other because they were separated in time, due to block design used for stimuli presentation. The order of lists presentations was randomized between subjects.

# EEG Material

# Apparatus

Stimuli were displayed on a 15.6-inch LCD display controlled by a PC. A second PC was used for recording EEG data. Stimuli and EEG data were synchronized using a custom-made hardware trigger. The trigger consisted of a light sensor measuring the brightness of a small rectangular portion of the screen, which was covered by the sensor. Brightness of that part of the screen was modulated simultaneously with the stimulus presentation. The signal from the sensor was recorded, together with the EEG signal, on an auxiliary input of the amplifier. This auxiliary signal was later used to align trials. EEG activity was recorded from 19 derivations of 10–20 system: Fz, Cz, Pz, Fp1/2, F7/8, F3/4, T3/4, C3/4, T5/6, P3/4, O1/2, referenced to linked earlobes, grounded on the clavicle. The impedances of electrodes were below 5 k. The signal was acquired using a Porti7 (TMSI) amplifier at 256 Hz sampling frequency.

# Offline EEG Signal Processing

The offline processing of the signal was performed in Matlab <sup>R</sup> with the EEGLAB (Delorme and Makeig, 2004) toolbox. The signal was zero-phase filtered with Butterworth high- and lowpass filters (2nd order, corresponding to 12 dB/octave roll-off, with half amplitude cut-off frequency = 0.1 Hz and 30 Hz respectively), and with an IIR notch filter at 50 Hz, to remove line noise. Epochs from −200 ms pre-stimulus to 850 ms poststimulus were extracted and baseline-corrected (baseline data taken from −200 to 0 ms).

The statistical tests were implemented using the appropriate R procedures (R Development Core Team, 2008, available from http://www.R-project.org). Trials with erroneous responses, or

TABLE 2 | Full list of stimuli used in each category of experimental

manipulation.

corrupted with artifacts (e.g., eye blinks or muscle activity), or with extremely short (shorter than 2.5 percentile of the distribution of all response latencies), or long (longer than 97.5 percentile of the distribution) response latencies were excluded from the ERP analysis. The mean number of trials remaining in each of the 9 manipulation condition (from the initial 45) was M = 37 (SEM = 0.3). The Friedman test for replicated block design did not indicate significant differences in the average number of trials per condition for the origin groups with valence as a blocking variable [χ 2 (2) = 3.4, p = 0.2], nor for the valence groups with origin as a blocking variable [χ 2 (2) = 0.46, p = 0.8].

# RESULTS

# Behavioral Measures

The mean response accuracy was M = 90% (SEM = 0.4). The Friedman test for replicated block design did not indicate significant differences in the average accuracy per condition for the valence groups with origin as a blocking variable [χ(2) = 1.08, p = 0.58], or for the origin groups with valence as a blocking variable [χ(2) = 5.88, p = 0.053]. The response latency was analyzed for the trials that were accepted for ERP analysis (i.e., artifact free, correct responses, without the 5% most extreme RT values). Analysis by means of 3(valence levels) × 3(origin levels) ANOVA with repeated measures applied to log transformed reaction latencies did not reveal any significant effects [factor valence: F(2, 62) = 0.84, p = 0.44; factor origin: F(2, 62) = 0.65, p = 0.53; valence × origin interaction: F(4, 124) = 2.0, p = 0.1]. The average response latency was M = 820 (SEM = 8.6) ms.

# Electrophysiological Data

#### Selection of Time Windows and Regions of Interest

The following time windows were selected for evaluation of ERP effects: 50–150, 150–290, 290–570, 570–800 ms. This selection is based on the global field power curve GFP (**Figure 3**). The GFP is evaluated as spatial standard deviation. It quantifies the sum of electrical activity over all electrodes at a given time point. The latencies of GFP maxima may be interpreted as the latencies of evoked potential components (Lehmann and Skrandies, 1980; Skrandies, 1990). Since we do not expect lateralization effects, three regions of interest (ROI) were selected as follows: frontal (F) (electrodes: F3, Fz, F4), central (C) (electrodes: C3, Cz, C4), and parietal (P) (electrodes: P3, Pz, P4). A similar approach can be found in the other studies focusing on neural correlates of EST task (e.g., Schirmer and Kotz, 2003; Thomas et al., 2007; Taake et al., 2009) and gives us a chance to investigate a distribution of effects in a front-to-back dimension.

# Analysis of ERP Effects

The analysis was performed by applying a three-factor repeated measure analysis of variance (origin × valence × ROI) to the mean amplitude from each subject, in each of the time windows. For each of the time windows there was a significant main effect of ROI but this finding is not interesting and will not be discussed further. There were no significant interaction between ROI and the other two variables in any of the time windows, therefore further on the amplitudes averaged across ROIs were analyzed. No statistically significant effects were observed for time windows 50–150, 150–290 ms. No effects were obtained for valence levels in any time window (**Figure 4A**). But for two time windows there were significant effects for origin levels. Namely, in time window **290–570 ms** a main effect of origin [F(2, 62) = 5.078, p < 0.01] was obtained. The amplitude was less negative for stimuli with reflective origin (M = −0.27, SEM = 0.33) than for those with automatic origin (M = −0.72, SEM = 0.30), and with no specific origin (M = −0.81, SEM = 0.30); corresponding t-test results [t(31) = 2.37, p < 0.05; t(31) = 2.83, p < 0.02]. In the time window **570–800 ms** a main effect of origin [F(2, 62) = 4.78, p < 0.016] was obtained too, but the pattern of differences was slightly different. Only the amplitude for reflective stimuli (M = 0.56, SEM = 0.27) was more positive than for automatic origin (M = 0.13, SEM = 0.26); corresponding t-test result [t(31) = 2.92, p < 0.02]. No statistically significant effects of interaction between valence and origin were observed in these time windows [**290–570 ms:** F(4, 124) = 1.37, p = 0.25; **570–800 ms:** F(4, 124) = 0.46, p = 0.77]. The time course of the ERPs, which illustrate the results, is shown in **Figure 4B**. Since no significant interactions between origin levels and ROI were observed, the curves in **Figure 4** are the amplitudes of ERP averaged across all three ROIs.

# DISCUSSION

This study was focused on investigating the role of origin and valence of emotional words in the processing of EST and involuntary word processing. We hoped to find electrophysiological correlates of underlying mechanisms, even when the behavioral outcomes of the task are not visible (c.f. Thomas et al., 2007). Result confirmed that there are no differences in response latencies; meanwhile ERP amplitude differs between levels of origin of emotion included in word meaning, but not between valence levels. The differences appeared in time windows characteristic to components thought to be manifestations of cognitive control (N450: **290–570 ms)** and involuntary word processing including more controlled and explicit cognitive processing of words meaning (LPC: **570–800 ms**). Surprisingly, electrophysiological results were not localized at a particular site, but rather generally distributed over all the ROIs analyzed.

# Behavioral Results

As expected, we did not find behavioral differences in reaction latencies due to the type of word processed in EST. Those results are coherent with our other unpublished negative results concerning behavioral-only measures for the same list of words as used now. This lack of difference is also consistent with EEG measures in the P2 component, found in our earlier study (Imbir et al., submitted) to be a strict correlate of behavioral differences. Lack of behavioral differences related to variations of valence dimension can be attributed to the role of another dimensions found to be more crucial (c.f. Burt, 2002). Previous behavioral findings showed that arousal (Dresler et al., 2009) or frequency of appearance (Burt, 2002) or origin (Imbir and Jarymowicz, 2013) could account for effects shown in early studies with EST (c.f. Burt, 2002; Larsen et al., 2006; Imbir, 2016a). This is

also consistent with other EEG studies showing no behavioral differences (c.f. Thomas et al., 2007) for valence, while reporting ERP amplitude differences.

In this study, we did not find the results for the origin of the affective state, reported in our earlier studies involving EST (c.f. Imbir and Jarymowicz, 2013), where slowdown was caused by automatic originated stimuli, but not by reflective originated or neutral ones (with no effects of valence of emotion). The most probable reason for such difference in the results of both studies may be the method of selection of stimuli for both experiments. Imbir and Jarymowicz (2013) based the selection on judge-competent decisions of compliance with the automatic and reflective origins definitions. This makes stimuli explicitly connected with automatic or reflective origins, but not controlled for arousal or concreteness. In addition the content of the words was specified as labels of characteristic for both origins' emotional states or objects causing those states. This mean that participant could explicitly find out the actual aim of experiment. Our current study is based on a more precise stimuli selection from a large amount of words, with properties checked in advance by a different group of participants (c.f. Imbir, 2015, 2016a,c). For that reason, the choice of stimuli could not be explicitly attributed to certain origin by a participant, but the context of a whole list made sense of a specific category meaning. Another difference is precise control for arousal, concreteness, and frequency differences in stimuli presented in this current study (c.f. Supplementary Material). It is possible, that the behavioral results shown in EST were caused mostly by these dimensions (Burt, 2002; Larsen et al., 2006; Thomas et al., 2007), and not by valence or origin themselves. This could mean, that the interpretation of EEG results should be made more in the context of involuntary word processing, than cognitive control measured in EST. Nevertheless, lack of behavioral results does not exclude the urge for searching electrophysiological correlates of processing in the EST experiment (Thomas et al., 2007).

# ERP Results

Electrophysiological correlates of EST allowed us to inspect the stages of task processing and the role of valence and origin dimensions of words used in this process. What is the most interesting is that we found no effects of valence during the whole time course analyzed. It seems that if the origin of an affective state is controlled and aligned in all valence conditions, the traditional effects may disappear (Imbir and Jarymowicz, 2013). We claim that origin is one of the properties of affective reaction (Imbir, 2015) that can be attributed to distinct mind systems underlying formation of this reaction (Gawronski and Creighton, 2013; Jarymowicz and Imbir, 2015). Since affective reaction can be described in bimodal affective space of valence and arousal (Russell, 2003), mostly arousal differences were claimed to cause EST effect on both behavioral (Burt, 2002; Larsen et al., 2006; Dresler et al., 2009; Imbir, 2016a) and electrophysiological levels (Metzger et al., 1997; Pérez-Edgar and Fox, 2003; Thomas et al., 2007; Van Hooff et al., 2008; Taake et al., 2009; Imbir et al., submitted).

The lack of amplitude differences in P2 time range observed in this study corresponds with the lack of behavioral differences in response latencies. Previously mentioned second list used in this experimental protocol [contrasting 3 levels of arousal and 3 levels of subjective significance (Imbir et al., submitted)] resulted in both reaction latencies differences due to both factors as well as amplitude differences in P2 (150–290 ms) component closely resembled the pattern of behavioral results (longer reaction times ∼ more positive amplitude). Thomas et al. (2007) suggested that P2 amplitude might be a more sensitive measure of inhibitory control than behavioral responses. Valence and origin list of words results reported in current paper supports this claim, the same as result for arousal and significance list of words (Imbir et al., submitted).

The results concerning origin of emotional state consequences for EST processing (c.f. **Figure 4B**) are interesting. We found global effects in two time ranges that can be attributed to the N450 and LPC components. The first component, peaking at about 350–500 ms after stimulus onset, called N450 (c.f. West and Alain, 2000) is frequently reported (Van Hooff et al., 2008; Taake et al., 2009) in studies with EST paradigm. The N450 is located in the frontal regions of a head (Sass et al., 2010). Although no interaction with ROI was found for amplitudes in this study, the topography of the average amplitude distribution for this time window (c.f. **Figure 3**, bottom graphs) suggest that the most intensive negativity is indeed located in the frontal regions of the head. The amplitude of the N450 component was found to be more negative for incongruent than congruent trials (West, 2003; West et al., 2004). The underlying mechanism is possibly associated with conflict detection (West, 2003; West et al., 2004) or selection of competing responses (West and Alain, 1999). In ERP waveform we can see the N450 component (c.f. **Figure 4B**, 290–570 ms). The reflective originated conditions generated less negative amplitude than automatic originated and control words. This mean, that less incongruent were the reflective originated stimuli presentation conditions, thus one may expect that indeed the rational mind reduces interference of automated reading and meaning of stimuli understanding, even without behavioral outcomes in response latencies. The lack of correspondence with behavioral results may suggest, that decision concerning the type of answer is made earlier (in P2 time range), while N450 in this study reflect rather the conflict cost appearing after decision was made. Alternatively, they may be interpreted in the context of involuntary processing of a meaning of words included in manipulations (see below).

The last component, the LPC, is a manifestation of explicit and controlled cognitive activation of the content and connotations of a word, but rather made without explicit instruction in the context of current experiment. This is a late word-processing related activity rather than a cognitive control manifestation, especially we may assume that this a kind of post-semantic processing (Jonczyk, 2016 ´ ) of stimulus meaning. At the LPC time range some specific differentiation between distinct valences (Zhang et al., 2014) is manifested, but the debate over the nature of emotional valence still remains open question. We may claim that origin of an affective state is some kind of emotional complexity, derived from underlying mechanisms proposed by the duality of mind approach, but clearly distinct from concreteness precisely controlled in our studies (Imbir et al., 2016, submitted). The stimuli differing in complexity have to be processed in a different way when semantic meaning is considered; thus differences in LPC are expected and probable. In the literature (for review see: Citron, 2012), LPC is claimed to be sensitive to valence differences, but up to this point the emotional complexity of valenced stimuli has not been the subject of special attention of the scientific community. We mentioned in the Introduction Section that the LPC results were rather inconsistent; thus repetition of origin effects with no valence-related differences may suggest that origin is a more salient factor for this stage of processing (c.f. Imbir et al., 2015, 2016; Imbir et al., submitted). An interesting insight into this problem may be given by studies focusing on consequences of the concreteness dimension for word processing, done mostly in the LDT paradigm (e.g., Kanske and Kotz, 2007; Palazova et al., 2013). They showed that abstract words (both nouns and verbs) elicited more positive LPC amplitude responses than concrete nouns and verbs. This is in line with current results showing that reflective (more emotionally complex) originated words elicited more positive amplitude than automatic originated (less emotionally complex) even when both groups are matched for concreteness (cognitive complexity). Our analyses of words used in this experiment (c.f. Imbir et al., 2016) showed that origin and concreteness dimensions share no more than 10% common variance when all 4,905 words from ANPW\_R (Imbir, 2016c) are considered.

It is worth highlighting that results of our study were not localized in a specific site or ROI. This is a different result to other studies showing localized effects. It may be partially due to the half-exploratory analytical strategy chosen in this study. When ANOVA showed no significant interaction between one of the main factors and the ROI factor, we simply resigned from further investigation inside ROIs, despite the fact, that average amplitude distribution showed some topographical variations (c.f. **Figure 3**). In fact, effects found are distributed in the same way all over the head. This strategy allows us not to miss potentially important findings connected with new proposed origin dimension. Although some results with origin exist so far (c.f. Imbir et al., 2015, 2016; Imbir et al., submitted), the differences in tasks used still make us careful about potentially unpredicted effects.

It is worth to compare the results of current study to the results of study with use of LDT paradigm applied for the same list of stimuli (c.f. Imbir et al., 2016). Both EST and LDT tasks have in common the fact that processing of words' meaning is not required by the task; thus rather implicit and involuntary. In LDT participants have to react to stimuli type, while in EST, the task is to ignore the stimulus at lexical level; thus both are different to some extent. Comparing the results of this current study to those concerning LDT, one can see that no valence effects are visible in LPC, but origin differences look different in both paradigms. The difference concerned Automatic originated words having similar amplitude to Reflective originated words in LDT, while being lower in EST paradigms. We argue that Reflective originated stimuli should activate the resources for the controlled part of EST (c.f. Introduction Section; Imbir, 2016a); thus cortical response to them should be larger and should elicit more positive LPC. It is likely that in LDT, the origin construct has nothing in common with performance in the task; thus both origins generated similar amplitudes. The differences found earlier were associated with different components, specific to the used tasks; thus are hard to compare straightforward. In LDT the valence factor was present in the results pattern, while in EST no valence effects were found. We may assume that valence is the intuitive dimension on a subjective level (Russell, 2003), but probably not as important as one may expect in EST phenomenon.

Finally we would like to note, that in current study we have used a relatively low number of stimuli repeated three times. This is a result of a compromise between careful selection of stimuli and balancing them in all controlled dimensions, and the need to have enough events for averaging ERPs. In fact, the repetition of stimuli may potentially lead to attenuation of behavioral and ERP effects. Lack of valence related differences observed in results might be therefore attributed to this methodological issue. However, the fact that in these circumstances we observe an effect of origin supports the postulated higher importance of origin dimension even stronger.

# CONCLUSION

Results of this study showed that despite the lack of behavioral results, the processing of EST causes differences in electrophysiological correlates of this task. We have demonstrated that there were no differences in the P2 component, found in another study (Imbir et al., submitted) to be a strict manifestation of behavioral differences. We also found only origin, but not valence, shaped cortical responses of the brain while processing words in EST. It is possible that without including origin factor in experimental schema valence differences can be detected. When origin is controlled, the differences in amplitudes for negative as well as positive words can disappear.

# AUTHORS CONTRIBUTIONS

All authors contributed to final version of the manuscript. Theoretical proposition: KI; Design: KI, JZ Method (words): ˙ KI; Method (EEG measures) JZ, TS; Experimental procedure ˙ programming: TS, JZ Experiment execution: TS, J ˙ Z Statistical ˙ analyses: JZ, KI, TS, JD, GB; Results description: J ˙ Z, JD; Results ˙ discussion: KI; Figures: JZ, TS, KI, JD, GB. ˙

# FUNDING

The project was funded by the National Science Center on the basis of decision: DEC-2013/09/B/HS6/00303. KI was supported by the foundation for Polish Science (FNP).

# ACKNOWLEDGMENTS

We would like to express our thanks to Alicja Brzozowska for participation in data collection and technical assistance. We would also like to thank the reviewers for their insightful comments and recommendations for manuscript improvements.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2017.00880/full#supplementary-material

# REFERENCES


early facilitative processing of negative but not positive words. Cogn. Affect. Behav. Neurosci. 9, 389–397. doi: 10.3758/9.4.389


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Imbir, Spustek, Duda, Bernatowicz and Zygierewicz. This is an ˙ open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Phonological Iconicity Electrifies: An ERP Study on Affective Sound-to-Meaning Correspondences in German

#### Susann Ullrich1, 2 \*, Sonja A. Kotz 1, 3, 4, David S. Schmidtke1, 2, Arash Aryani 1, 2 and Markus Conrad1, 5

<sup>1</sup> Languages of Emotion Research Cluster, Freie Universität Berlin, Berlin, Germany, <sup>2</sup> Experimental and Neurocognitive Psychology, Freie Universität Berlin, Berlin, Germany, <sup>3</sup> Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, Netherlands, <sup>4</sup> Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany, <sup>5</sup> Department of Cognitive, Social, and Organizational Psychology, University of La Laguna, Tenerife, Spain

### Edited by:

Cornelia Herbert, University of Ulm, Germany

#### Reviewed by:

Yang Zhang, University of Minnesota, USA Michael Wolmetz, Johns Hopkins University, USA

> \*Correspondence: Susann Ullrich susann\_ullrich@msn.com

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 29 February 2016 Accepted: 28 July 2016 Published: 18 August 2016

#### Citation:

Ullrich S, Kotz SA, Schmidtke DS, Aryani A and Conrad M (2016) Phonological Iconicity Electrifies: An ERP Study on Affective Sound-to-Meaning Correspondences in German. Front. Psychol. 7:1200. doi: 10.3389/fpsyg.2016.01200 While linguistic theory posits an arbitrary relation between signifiers and the signified (de Saussure, 1916), our analysis of a large-scale German database containing affective ratings of words revealed that certain phoneme clusters occur more often in words denoting concepts with negative and arousing meaning. Here, we investigate how such phoneme clusters that potentially serve as sublexical markers of affect can influence language processing. We registered the EEG signal during a lexical decision task with a novel manipulation of the words' putative sublexical affective potential: the means of valence and arousal values for single phoneme clusters, each computed as a function of respective values of words from the database these phoneme clusters occur in. Our experimental manipulations also investigate potential contributions of formal salience to the sublexical affective potential: Typically, negative high-arousing phonological segments—based on our calculations—tend to be less frequent and more structurally complex than neutral ones. We thus constructed two experimental sets, one involving this natural confound, while controlling for it in the other. A negative high-arousing sublexical affective potential in the strictly controlled stimulus set yielded an early posterior negativity (EPN), in similar ways as an independent manipulation of lexical affective content did. When other potentially salient formal features at the sublexical level were not controlled for, the effect of the sublexical affective potential was strengthened and prolonged (250–650 ms), presumably because formal salience helps making specific phoneme clusters efficient sublexical markers of negative high-arousing affective meaning. These neurophysiological data support the assumption that the organization of a language's vocabulary involves systematic sound-to-meaning correspondences at the phonemic level that influence the way we process language.

Keywords: sublexical, lexical, affect, language, EEG, ERPs, phonological iconicity, sound-to-meaning correspondences

# INTRODUCTION

Most people would probably agree that not all words sound "neutral." But is it just personal taste or idiosyncratic individual experience that some words sound nicer and others rather harsh to us? Or do, on the contrary, sublexical phonological patterns possess systematic affective connotations? And if so, might these relate systematically to the meaning of words? A potential associative or even physical resemblance between sound and meaning of a word is called phonological iconicity in terms of Peirce's typology of semiotic elements (Peirce, 1931; see also Perniss et al., 2010; Aryani et al., 2013; Schmidtke et al., 2014a), challenging the conventional linguistic view that the relationship between the signifier and the signified be arbitrary (de Saussure, 1916). Note that our use of the term "sound" in this paper refers exclusively to phonological constituents of words themselves, not to speaker related issues such as prosody or the speaker's identity or affective state (for research on the latter ones see, for example, Belin et al., 2011; Hellbernd and Sammler, 2016). This conforms with the traditional literature on sound symbolism, which also posits that specific speech-sounds—phonemes—words are made of, may carry specific meaning (Jakobson, 1937; Allott, 1995).

Internal relations between phonological aspects and semantic meaning of words show most directly and prominently in onomatopoetic expressions (that typically describe acoustic phenomena by mimicking them): e.g., bears growl, snakes fizz, babies babble, or water splashes, sprinkles, squirts, drops, or drizzles. On a more abstract level, e.g., phonaesthemes involve the correspondence of specific sublexical patterns (typically word initial phoneme clusters) to specific semantic word fields (Firth, 1930). For instance, many English words related to vision and light start with "gl-": glance, glitter, gloom, glisten, glare, or gloss—while many words related to the nose start with "sn-": snore, sniff, snort, snuff, snoop, or sneeze (Wallis, 1699; Bloomfield, 1933). Although the reasons for the evolution of phonaesthemes remain somewhat opaque, Bergen (2004) could show in priming experiments that these subtle statistical associations influence language processing. Other systematic sound-to-meaning correspondences have also been found to support word learning (Nygaard et al., 2009; Lockwood et al., 2016).

That the sound of a word and its signified semantic concept may, in general, share a common quality has already been discussed by Socrates in Plato's Cratylus (Plato, 1892). Throughout the last century, a number of empirical psychological studies have investigated how potential correspondences between sublexical language sounds and attributes of meaning influence human perception of, e.g., size, shape, lightness, pleasantness, or excitement. For instance, back vowels (a, o) are perceived as bigger, heavier, or darker than front vowels (i, e), as has been shown, for example, by Sapir (1929) who asked people to connect pseudowords such as MAL and MIL with either a large or a small object. Other researchers replicated and refined these findings on vowels and extended them to consonants, showing, for example, that people perceive front consonants as smaller and more pleasant than back consonants, or voiced consonants as darker and larger than unvoiced consonants (Newman, 1933; Folkins and Lenrow, 1966). In general, such phenomena subsumed under the terms sound symbolism or phonological iconicity (for reviews see Perniss et al., 2010; Perniss and Vigliocco, 2014; Schmidtke et al., 2014a; Dingemanse et al., 2015) involve the view that that the sound of a word and the signified concept share a common quality (see already von Humboldt, 1836, or Plato, 1892). As a potential cause, it has been proposed that language may have phylogenetically evolved from the imitation of natural sounds (Darwin, 1871; Plato, 1892). Cross-language replications of, e.g., the kiki-bouba phenomenon—people, including toddlers, consistently match pseudowords such as kiki or takete preferentially to spiky shapes, vs. bouba or baluma to rounded shapes (Köhler, 1929; Werner, 1934, 1957; Davis, 1961; Maurer et al., 2006; also see Westbury, 2005)—suggest phonological iconicity to be a common feature of language in general, spurring theories about the biological origin of language (Ramachandran and Hubbard, 2001).

As communication of affect could be seen as a primordial feature of human communication (Jackendoff, 2002), phonological iconicity may well extend to affective meaning communicated through language—potentially since its very origins (see Darwin, 1871; Morton, 1977; Kita, 2008; Perniss and Vigliocco, 2014). The basic dimensions of affective meaning in the most influential emotion models (Wundt, 1896; Russell, 1978, 1980, 2003; Watson and Tellegen, 1985; Bradley et al., 1992) are those of valence and arousal, accounting also for a major amount of variance of semantic meaning according to semantic differential techniques (Osgood and Suci, 1955). Interestingly, analyzing the phonological content of 1000 English words rated for valence and arousal, Heise (1966) found that certain phonemes occur significantly more often in words of a specific affective meaning (see also Whissell, 1999, Whissell, 2000). Conrad et al. (in preparation) recently applied this approach to a large-scale database of over 6000 German words rated for valence and arousal (see also Aryani et al., 2015). Their analyses reveal systematic sound-to-meaning correspondences concerning the use of certain phonemes or phoneme clusters in words of specific valence and arousal ranges—in particular representing a combination of high arousal and negative valence that might be summarized as denoting potential threat. To quantify these patterns, they computed sublexical affective values (SAVs) for single sub-syllabic phoneme clusters—representing syllabic onsets, nuclei, and codas—by averaging valence and arousal values of all words these units are part of in the database. The choice of these subsyllabic phonological segments instead of single phonemes is motivated by linguistic theories of syllable segmentation (Davis, 1982; Hall, 1992; Wiese, 1996). Accordingly, both experimental (Nuerk et al., 2000; Brand et al., 2007) and simulation studies (Jacobs et al., 1998) of language processing support the importance of those segments as perceptual units encoding phonology in terms of syllabic onsets, nuclei and codas. Within the German database, SAVs for a number of such phonological segments show significant deviations from neutral global means (Conrad et al., in preparation), suggesting an intrinsic affective potential of specific language sounds, which might accordingly serve as sublexical markers of affect, in particular concerning threat. Following this rationale, the average of SAVs for all phonological segments in a word—henceforth called sublexical affective potential—might predict the affective appeal of the whole phonological word form at a sublexical level. Indeed, Conrad et al. (in preparation) reveal significant correlations of this sublexical affective potential with lexical valence and arousal ratings across the entire respective word database. These findings interestingly point toward phonological iconicity with regard to affective content as a systematic feature determining the organization of language (see also Aryani et al., 2015).

# The Present Study

In this study, we address the question of whether these numerical measures of SAVs—derived from a large-scale normative database for the German language, reflecting systematic sound-to-meaning correspondences within this database possess any psychological reality concerning the perception of language. In particular, we ask whether these sound-to-meaning correspondences or the underlying affective phonological iconicity of the German language would have any neuroscientific correlates during a standard lexical decision task using EEG measurements. If anything like sublexical markers of affective content, in particular threat, exist, those phonological segments typically occurring in words of high arousal and negative content should leave an impact on brain activity strong enough to be traceable with neuroscientific methods during the time course of language perception.

Furthermore, our study focuses on the potential role of formal salience for processes related to phonological iconicity. Concerning sublexical phonological units presumably encoding—according to the analyses of our database—negative high-arousing content, we consistently found structurally rather complex phonological segments (i.e., more than one consonant in a syllabic onset or coda) and phonological segments of low frequency of occurrence to appear preferentially in words of negative and high-arousing meaning. As high arousal is thought of as an early alert indicator attracting attention to potentially relevant stimuli (see Recio et al., 2014, for ERP effects disentangling valence and arousal effects during visual word recognition), it seems intuitive that formal salience could be crucial for making a sublexical unit a most efficient "sign of threat" at the conceptual level.

Event-related potentials (ERPs) obtained via EEG measurement with its high temporal resolution are most suitable to study if, when, and how such phenomena influence cognitive processes. A number of psycholinguistic studies have already investigated effects of lexical affective content during visual word recognition using ERPs. Two main ERP components were found to be modulated by the affective meaning of words: The early posterior negativity (EPN), a component that is larger for emotion-laden words compared to neutral ones (Kissler et al., 2007, 2009; Herbert et al., 2008; Schacht and Sommer, 2009; Conrad et al., 2011; Keuper et al., 2014), appears around 200– 300 ms after stimulus onset. It was first reported in the context of emotional face and picture processing (Junghöfer et al., 2001; Schupp et al., 2003, 2004), hence presumably reflecting general, modality-independent affective processing. The EPN is assumed to mirror fast and effortless detection of emotionally significant stimuli and thereby indexes natural selective attention (Olofsson et al., 2008). MEG studies reported that the neural loci of cognitive functions such as semantic memory, attention, and evaluation of emotional stimuli are involved in the formation of the EPN (Keuper et al., 2014). Furthermore, the late positive complex (LPC), appearing around 400–700 ms after stimulus onset, also proved sensitive to differences in the affective meaning of words (Dillon et al., 2006; Kissler et al., 2009; Schacht and Sommer, 2009; Conrad et al., 2011). This late component is assumed to indicate more elaborated and task-dependent cognitive processing of affective or emotional stimuli. This includes, for example, continued stimulus evaluation such as categorization or memory updating. Useful reviews on ERP emotion effects in visual word recognition have been provided by Citron (2012) or Kotz and Paulmann (2011).

To investigate potential effects of affect encoded at the sublexical phonological level within the framework of known general emotion effects during visual word recognition, we used a design including a classical manipulation of lexical affective content together with a novel manipulation of sublexical affective potential in a standard visual lexical decision task.

Most theoretical reasoning on phonological iconicity assumes phonology as the source of respective effects. If these effects exist, they should, though, also show and might most effectively be studied during silent reading which has been shown to involve mandatory phonological processing (e.g., Van Orden, 1987; Abramson and Goldinger, 1997; Ziegler et al., 2001; Conrad et al., 2007; Braun et al., 2009). The visual lexical decision task is the most standardized and most used research tool in the field of psycholinguistics. German is a shallow orthography with high grapheme-to-phoneme consistency, i.e., the presentation of specific German letter strings would evoke unambiguous phonological activations regardless of context and of whether a letter string is a word or not. Using a standard visual lexical decision task appears thus a reasonable initial step for the investigation of phonological iconicity effects in German. It provides both a methodological match to the available literature on emotion effects quoted above as well as an optimally standardized experimental context excluding potential distortion through auditory effects of, e.g., affective prosody or speaker identity.

At both the lexical and the sublexical level, our manipulations of affective content or potential involve the contrast between high arousal in combination with negative valence on the one hand, and low arousal combined with neutral valence on the other hand. This has both pragmatic and theoretical reasons: As already evident from Võ et al. (2009) and Schmidtke et al. (2014b), valence and arousal values of German words are characterized by a very tight correlation within the range of overall negative valence, but not within the positive valence range. That is, increasingly negative valence of concepts is generally associated with increasing arousal, whereas positive concepts can be either calm or exciting. As the SAVs we use for the operationalization of the sublexical affective potential represent the average values of words containing a given phonological segment, it goes—to some extend—by itself that comparable correlations are given for SAVs. That is, the majority of phonological segments with negative valence also have rather high arousal levels, whereas positive valence and arousal SAVs are less related. Further, the combination of negative valence and high arousal fits best the assumed reason underlying these phonological iconicity phenomena: the encoding of threat at a sublexical level (see Conrad et al., in preparation). Most of the phonological segments that might in general serve as icons of affective content displaying statistically significant deviations from global neutral means—in the database of German words indeed follow this pattern of combining negative valence with high arousal. That is why the combination of negative valence and high arousal contrasted against neutral valence and low arousal allows for a most pronounced contrast—potentially leading to most pronounced effects—for this novel manipulation of sublexical affective potential taking into account both dimensions of the affective space.

As already mentioned, when considering phonological segments of syllabic onsets, nuclei, and codas rather than single phonemes, affectively deviant segments of negative valence and high arousal often also are structurally more complex—i.e., contain more phonemes—and of lesser frequencies of occurrence as compared to affectively neutral ones. To account for both types of effects—intrinsic SAVs on the one hand and formal salience on the other—as two potentially additive sources of phonological iconicity influencing affective processing during language perception, we prepared two separate experimental stimulus sets to be presented in one and the same experimental session (see Conrad et al., 2007, 2009, for detailed elaboration of the methodological advantages of this approach):


We predict effects of the sublexical manipulation to be strongest when SAVs are allowed to co-vary with formal salience. Further, if any effects at all would still be obtained for the sublexical manipulation controlling for formal salience, these effects might—with even more confidence—be considered evidence for sublexical encoding of affectivity, especially if they resembled ERP effects established so far for general emotion processing during lexical decision, and predicted for our second factor affective content at the lexical level. In particular, such effects might be expected similar to an EPN, because sublexical effects should occur rather early during the time course of the reading process—or at least not later than lexical effects.

# MATERIALS AND METHODS

# Participants

Forty-one native speakers of German, university students of the Freie Universität Berlin, participated in the experiment after giving informed consent. All were right-handed (Oldfield, 1971) with normal or corrected-to-normal vision. None of them reported neurological or language problems. Six participants were excluded from the final data analysis due to bad signal-tonoise ratio of ERP data so that data from 35 subjects (21 women; age range: 18–36 years, M = 26.7 years, SD = 4.2) were submitted to analyses. All participants received financial compensation.

# Stimuli and Design

We selected two separate sets (set1: maximally manipulated; set 2: maximally controlled) of 312 German words each—containing between one and three syllables, with a maximum of nine letters length—from the extended BAWL database (Võ et al., 2009; publication of the extended version in preparation) as stimuli for the two experimental sets. Both sets involved twofold, independent manipulations of these two factors (each factor cell comprised 156 stimulus words):

– Lexical affective content (negative valence and high arousal vs. neutral valence and low arousal)

and

– Sublexical affective potential (negative valence and high arousal vs. neutral valence and low arousal, based on mean SAVs per word)

Lexical affective content was closely controlled for between the two cells of sublexical affective potential and vice versa.

Lexical affective content is operationalized in the database in form of rating values of valence on a scale from −3 to 3, and of arousal on a scale from 1 to 5. A word was entered in the negative high-arousing lexical affective content condition when the mean of its valence ratings in the database was more negative than −0.8 (furthermore, the sum of mean and standard deviation of the valence ratings for a word did not exceed 0) and its arousal ratings higher than 2.8. For the neutral low-arousing lexical affective content condition the valence ratings of the words had to be between −0.8 and 0.8 (and the standard deviation below 1) and the arousal ratings lower than 2.8.

The factor sublexical affective potential was operationalized as follows: We computed hypothetical affective values for sublexical segments (the aforementioned sublexical affective values—SAVs) as a function of the affective values of the words they occur in in our database of over 6000 German words (Conrad et al., in preparation): We calculated valence and arousal SAVs for all given syllabic onsets, nuclei, and codas by averaging the rating values of words they form part of. We then averaged these values for all segments found in a single given word to obtain an estimate of the sublexical affective potential of this word. Naturally, the resulting scale widths for valence (−0.7– 0.7) and arousal (2.5–3.2) of these sublexical affective potential values per word were much narrower than those of the lexical affective content rating scales. A word was entered in the negative high-arousing sublexical affective potential condition when its valence value was more negative than -0.05, and its arousal value higher than 2.9. For the neutral low-arousing sublexical affective potential condition the valence value of a word had to be between −0.04 and 0.45, and the arousal value

lower than 2.9. Specifically for the sublexically neutral lowarousing words, additional attention was paid to the following selection criteria: If words contained single very negative or high-arousing phonological segments—albeit the overall mean fit in the neutral low-arousing category—they were excluded, for we assume that such single salient phonological segments could already attract enough attention to not let the whole word sound affectively "neutral" anymore. Stimulus characteristics are shown in **Table 1**. While our manipulation of sublexical affective potential is based on numerical mean SAVs across all phonological segments in a word, this certainly involves that specific segments are more likely to occur in one condition, e.g., negative/high arousal sublexical affective potential, than in the other (neutral/low arousal). To make our manipulation more transparent to the reader, **Table 2** lists how many times specific phonological segments were used across conditions.

In both sets a large number of variables that are known to influence visual word processing (see Graf et al., 2005, for an overview) were controlled for between cells of the two factors (see also **Table 1**):


In the maximally controlled set we further controlled for the following sublexical variables:


To assure best overall comparability between data for the two sets, all stimuli were presented in a unique experimental session to the same participants. Overlapping items, i.e., stimuli that were used in both manipulations, entered the final stimulus set only once to avoid repetition. Thus, a total set of 521 stimulus words was presented together with 535 pseudowords that were matched to word stimuli in length and number of syllables. Pseudowords included pseudohomophones to assure a sufficiently difficult overall task environment where participants actually had to achieve lexical access for stimulus words. The pseudoword material involved a different experimental manipulation not addressed in the present study. All results presented in this paper refer exclusively to the word material possessing affective values at both the lexical and (hypothetically) the sublexical level.

# Procedure

All Stimuli were presented visually in randomized order using "Times New Roman" font, size 24, in white letters on a black background in the center of a 17′′ computer screen with 80 cm distance to the participant's eyes. Each trial began with the presentation of a fixation cross (500 ms) followed by a blank screen of 500 ms. The pseudo-randomized single word and pseudoword items were presented for 500 ms each and were followed by a blank screen that lasted until the key response had been carried out, followed by a scattered inter-stimulus interval of 700–1500 ms. The task of the participants was to decide whether the presented stimulus was a "word" or a "non-word" by pressing one of two respective push-buttons on a Playstation remote control. The labels "Wort" (word) and "Nichtwort" (non-word) were counterbalanced between left and right hand responses across participants. They were encouraged to respond as fast but also as accurately as possible. Before the actual experiment started, 10 initial practice trials (5 words, 5 pseudowords) were run. The whole experiment contained 1056 trials and was split into four blocks which lasted about 10–12 min each. In between these blocks participants were allowed to rest as long as they wished.

# EEG Recording and (Pre-)Processing

The EEG was recorded from 61 AgCl-electrodes (Fp1, Fpz, Fp2, AF3, AF4, F5, F3, F1, Fz, F2, F4, F6, FT7, FC3, FC1, FCz, FC2, FC4, FT8, T7, C5, C3, C1, Cz, C2, C4, C6, T8, TP7, CP5, CP3, CP1, CPz, CP2, CP4, CP6, TP8, P9, P7, P5, P3, P1, Pz, P2, P4, P6, P8, P10, PO9, PO7, PO3, POz, PO4, PO8, PO10, O1, Oz, O2, Iz, M1, M2) fixed to the scalp via an elastic cap using two 32-channel amplifiers (BrainAmp, Brain Products, Germany). Electrodes were arranged according to the International 10–20 system (Jasper, 1958; American Electroencephalographic Society, 1991) and average impedances were kept below 2 k. The electrooculogram (EOG) was monitored by two electrodes at the outer canthi of the participant's eyes and two electrodes above and below the right eye. EEG and EOG signals were recorded with a sampling rate of 500 Hz, referenced to the right


TABLE 2 | Phoneme (segments) distribution (in DISC Phonetic Encoding Convention; Burnage, 1990) across the conditions of sublexical affective potential in both stimuli sets.

#### TABLE 2 | Continued


(Continued)

neg-high, combination of negative valence and high arousal; neut-low, combination of neutral valence and low arousal.

mastoid, but re-referenced offline to linked mastoids. The AFz electrode was used as ground electrode. Later offline filtering included a bandpass filter of 0.1–20 Hz and a notch filter of 50 Hz. Independent component analysis (ICA; Makeig et al., 1996; Jung et al., 1998) was carried out to identify and remove eye movement artifacts. The continuous EEG signal was cut into segments of 950 ms total length, consisting of a 150 ms pre-stimulus baseline and an 800 ms post-stimulus interval. After baseline correction, trials containing artifacts were excluded from further analysis using an automatic artifact rejection: differences >80µV in intervals of 70 ms or amplitudes >50 or <−50µV were considered artifacts. Segments containing correctly answered word trials got averaged per condition, participant and electrode, before grand averages were computed across all participants. To visually compare the ERP signals of different conditions the (sublexically) neutral low-arousing words were always subtracted from the (sublexically) negative high-arousing words.

# Data Analysis Behavioral Data

Mean correct response latencies and error rates of the word stimuli were submitted to separate ANOVAs—testing whether a potentially given effect generalizes over subjects (F1 analysis) and over items (F2 analysis)—for the factors lexical affective content (2) and sublexical affective potential (2).

# EEG Data

Time windows for the expected ERP components of the lexical affective content of words were defined based on the literature (see Citron, 2012) and visual inspection of the grand averages: 200–300 ms for the EPN, and 400–700 ms for the LPC.

For potential effects of the sublexical affective potential of the word stimuli, there are no prior studies to base hypotheses on. We thus used an exploratory approach where a time-line analysis with 20 ms time windows (starting from each data point) was carried out. To reduce the chances of false positives potentially arising through consecutive testing, only total time windows of at least 50 ms length—consisting of consecutively significant single time windows revealed by the time-line analysis—were used for further analysis (based on the approach suggested by Guthrie and Buchwald, 1991).

Repeated-measures ANOVAs were conducted with the mean activity [µV] values of the selected time windows using the software IBM SPSS Statistics. The ANOVAs involved the within-subject factors lexical affective content (2) or sublexical affective potential (2). In order to assess topographical potential distributions of relevant effects over the scalp through an a priori designed, hypothesis-independent approach using data from a maximum of electrodes, the ANOVAs further included the topographic factors left-mid-right (3) and anterior-centralposterior (3). For these topographic analyses the scalp electrodes were subdivided into the following 9 clusters of 6 electrodes each: right anterior (FP2, AF4, F4, F6, FC4, FT8), mid anterior (F1, Fz, F2, FC1, FC2, FCz), left anterior (FP1, AF3, F3, F5, FC3, FT7), right central (C4, C6, T8, CP4, CP6, TP8), mid central (C1, Cz, C2, CP1, CPz, CP2), left central (C3, C5, T7, CP3, CP5, TP7), right posterior (P4, P6, P8, PO4, PO8, O2), mid posterior (P1, Pz, P2, POz, Oz, Iz), and left posterior (P3, P5, P7, PO3, PO7, O1).

Furthermore, a region of interest (ROI) for the EPN was defined using a cluster of the 11 most posterior electrodes (PO9, PO7, PO3, POz, PO10, PO8, PO4, O1, Oz, O2, Iz), based on earlier topographic data regarding EPN effects in our research group (Conrad et al., 2011; Recio et al., 2014). If the visual topography patterns suggested so, data of the EPN ROI were submitted to paired t-tests between the affective conditions. The combination of these two approaches toward topographic analysis, one unbiased and one guided by hypotheses, should offer a most comprehensive insight in this novel research topic. All topographic clusters and the ROI are displayed in **Figure 1**.

Greenhouse-Geisser corrected p-values (Greenhouse and Geisser, 1959) are reported for all ANOVA results. Significant interactions with topographic factors were followed up by paired t-tests within the respective topographic clusters. The p-values of multiple post-hoc t-tests got Bonferroni-Holm adjusted (Holm, 1979) and are marked as padj. As measure of effect size η 2 p is reported for the ANOVAs (Keppel, 1991; Tabachnick and Fidell, 2001) and Pearson's r for the t-tests (Clark-Carter, 2003; Field, 2009).

# RESULTS

# Behavioral Results

# Maximally Manipulated Stimulus Set

The analysis of reaction times (RTs) for the sublexical affective potential yielded no significant differences between the RTs to

sublexically negative high-arousing words and to sublexically neutral low-arousing words [F1(1, 40) = 3.66, p = 0.06, η<sup>p</sup> <sup>2</sup> = 0.08; F2(1, 306) = 1.45, p = 0.23, η<sup>p</sup> <sup>2</sup> = 0.01]. For the lexical affective content, we found a significant F1 effect (with slower responses to negative high-arousing words than to neutral lowarousing words), but the F2 analysis remained non-significant [F1(1, 40) = 6.35, p = 0.02, η<sup>p</sup> <sup>2</sup> = 0.14; F2(1, 306) = 1.01, p = 0.32, η<sup>p</sup> <sup>2</sup> = 0.003]. Regarding error rates, again we do not find a significant effect for sublexical affective potential [F1(1, 40) = 3.43, p = 0.07, η<sup>p</sup> <sup>2</sup> = 0.08; F2(1, 306) = 1.28, p = 0.26, η<sup>p</sup> <sup>2</sup> = 0.004]. There is also no effect for error rates regarding the lexical affective content [F1(1, 40) = 0.01, p = 0.91, η<sup>p</sup> <sup>2</sup> = 0.00; F2(1, 306) = 0.00, p = 0.99, η<sup>p</sup> <sup>2</sup> = 0.00].

#### Maximally Controlled Stimulus Set

Although, the F1 analysis of RTs renders a significant effect for the sublexical affective potential with faster responses to the sublexically neutral low-arousing words, the F2 analysis is nonsignificant [F1(1, 40) = 5.56, p = 0.02, η<sup>p</sup> <sup>2</sup> = 0.12; F2(1, 305) = 1.29, p = 0.26, η<sup>p</sup> <sup>2</sup> = 0.004]. Also for the lexical affective content, there is no significant effect in RTs' analysis [F1(1, 40) = 2.96, p = 0.09, η<sup>p</sup> <sup>2</sup> = 0.07; F2(1, 305) = 1.93, p = 0.17, ηp <sup>2</sup> = 0.01]. Looking at the error rates, a significant F1 difference between lexical affective conditions (more errors on negative high-arousing words compared to neutral low-arousing ones) is not accompanied by a significant F2 analysis [F1(1, 40) = 5.52, p = 0.02, η<sup>p</sup> <sup>2</sup> = 0.12; F2(1, 305) = 1.03, p = 0.31, η<sup>p</sup> <sup>2</sup> = 0.003]. Further, there is no significant effect for the sublexical affective potential [F1(1, 40) = 1.67, p = 0.2, η<sup>p</sup> <sup>2</sup> = 0.04; F2(1, 305) = 0.27, p = 0.6, η<sup>p</sup> <sup>2</sup> = 0.001].

# ERP Results Maximally Manipulated Stimulus Set

# **Lexical affective content**

An early effect of the lexical affective content was found in the time window of the EPN between 200 and 300 ms in interaction with the topographic factor left-mid-right [F(2, 68) = 3.7, p = 0.03, η<sup>p</sup> <sup>2</sup> = 0.1]. T-tests within each of the three laterality clusters only showed a trend toward a difference between neutral lowarousing and negative high-arousing words in the left cluster [t(34) = −2.17, padj = 0.12, r = 0.35] with a larger negativity for negative high-arousing words. Yet, the topographic map (**Figure 2**) reveals that this negativity is of a shape that cannot be caught well by the cluster formation of the exploratory topographic analysis. Rather, most distinct negativity shows in a left posterior area, as would be hypothesized for the expected EPN. Results of EPN ROI analysis were: t(34) = −1.87, p = 0.07, r = 0.31. Although, here again, we can only find a trend toward significance, in both analyses the postulated effect is of a medium size, which cannot be neglected (see discussion for why the effect might not be as strong as in previous literature).

A late positive complex (LPC) can be found between 400 and 700 ms as a significant main effect for the lexical affective content [F(1, 34) = 8, p = 0.01, η<sup>p</sup> <sup>2</sup> = 0.19] with more positive values for the negative high-arousing words compared to the neutral lowarousing words. Furthermore, we find a significant interaction of this lexical effect with the topographic cluster division anteriorcentral-posterior [F(2, 68) = 7.88, p = 0.004, η<sup>p</sup> <sup>2</sup> = 0.19]: ttests within each of these clusters revealed significant differences between the two lexical affective content conditions in the anterior [t(34) = 4.12, padj < 0.003, r = 0.58] and the central cluster [t(34) = 2.74, padj = 0.02,r = 0.43]. This fronto-central positivity is also reflected in the topographic map as shown in **Figure 2**.

### **Sublexical affective potential**

Visual inspection already suggested a robust and long-lasting negativity between 250 and 650 ms that proved to be a significant main effect of the sublexical affective potential [F(1, 34) = 7.77, p = 0.01, η<sup>p</sup> <sup>2</sup> = 0.19] with sublexically negative high-arousing words eliciting a larger negativity over this whole time interval than sublexically neutral low-arousing words. Also the 3-fold interaction of sublexical affective potential × topographic factor anterior-central-posterior × topographic factor left-mid-right turns out significant [F(4, 136) = 4.76, p = 0.003, η<sup>p</sup> <sup>2</sup> = 0.12]. After correction for multiple testing, one of the t-tests in each of the nine topographic clusters turned out significant [in the right central cluster with t(34) = −3.06, padj = 0.036, r = 0.46], one marginally significant [in the right posterior cluster with t(34) = −2.86, padj = 0.056, r = 0.44], and four more neighboring clusters still showed trends [left anterior cluster with t(34) = −2.42, padj = 0.11, r = 0.38, mid anterior cluster with t(34) = −2.57, padj = 0.09, r = 0.4, right anterior cluster with t(34) = −2.32, padj = 0.11, r = 0.37, and mid central cluster with t(34) = −2.74, padj = 0.07, r = 0.43], always with a larger negativity for sublexically negative high-arousing words. **Figure 3** displays the topography of this right-central negativity and the ERP graphs at selected electrodes.

# Maximally Controlled Stimulus Set

### **Lexical affective content** In the EPN time window between 200 and 300 ms an early

effect of lexical affective content exists in interaction with the topographic factor anterior-central-posterior [F(2, 68) = 8.23, p = 0.003, η<sup>p</sup> <sup>2</sup> = 0.2] as well as in a 3-fold interaction also including the left-mid-right factor [F(4, 136) = 3.68, p = 0.01, ηp <sup>2</sup> = 0.1]. T-tests within the respective topographic clusters reveal a significant difference between neutral low-arousing and negative high-arousing words in the whole posterior cluster [t(34) = −2.71, padj = 0.03, r = 0.42] as well as trends in the single posterior clusters: left posterior [t(34) = −2.88, padj = 0.06, r = 0.44], mid posterior [t(34) = −2.46, padj = 0.13, r = 0.39], and right posterior [t(34) = −2.5, padj = 0.14, r = 0.39], always showing a higher negativity for the lexically negative and higharousing words. A t-test within the EPN ROI shows a significant difference between the two lexical affective conditions [t(34) = −3.17, p = 0.003, r = 0.48] going in the same direction. The topography of this effect does well reflect the EPN pattern as expected. It is shown together with the EEG graphs at selected electrodes in **Figure 4** (upper part).

A late positive complex (LPC) shows between 400 and 700 ms as a significant main effect for lexical affective content [F(1, 34) = 6.16, p = 0.02, η<sup>p</sup> <sup>2</sup> = 0.15] with more positive values for the

negative high-arousing words compared to neutral low-arousing words. Also the interaction of lexical affective content with the topographic division anterior-central-posterior is significant [F(2, 68) = 8.04, p = 0.01, η<sup>p</sup> <sup>2</sup> = 0.19], with a significant t-test result in the anterior cluster [t(34) = 3.71, padj = 0.003, r = 0.54] as well as a trend showing within the central cluster [t(34) = 2.22, padj = 0.07, r = 0.36]. This fronto-central positivity with negative high-arousing words displaying a higher positivity than neutral low-arousing words is displayed in the lower topographic map of **Figure 4**.

#### **Sublexical affective potential**

The exploratory time-line analysis revealed contiguous significant time windows between 226 and 276 ms for the interaction of the sublexical affective potential with the topographic factors anterior-central-posterior. Thus, we analyzed this time window as a whole, which yields a significant interaction of sublexical affective potential with the anteriorcentral-posterior clustering [F(2, 68) = 6.67, p = 0.01, ηp <sup>2</sup> = 0.16]. Solving this interaction only leads to a rough trend within the whole posterior cluster [t(34) = −1.9, padj = 0.2, r = 0.31] with a more negative amplitude for the sublexically negative high-arousing words, yet of medium effect size. Visual inspection of the topographic map (see **Figure 5**) reveals that this posterior negativity looks quite similar to the lexical EPN. Hence, we also tested for significance within the EPN ROI: the t-test shows a significant difference between sublexically negative high-arousing words and sublexically neutral low-arousing words [t(34) = −2.68, p = 0.01, r = 0.42]. The topography and ERP graphs at selected electrodes are displayed in **Figure 5**.

# DISCUSSION

The present study investigates whether systematic sound-tomeaning correspondences that we had detected in the German language influence the neural processes of language perception assessed by EEG recordings during the most standard task used in psycholinguistic research: visual lexical decision.

There is a longstanding debate in theoretical linguistics oscillating between the well-known axiom of arbitrary relations

between the signifier and the signified on the one hand, and numerous studies on phenomena of sound symbolism and phonological iconicity on the other hand (for reviews see Perniss et al., 2010; Schmidtke et al., 2014a; Dingemanse et al., 2015).

Here, we focused on sound-to-meaning correspondences assumed to represent phonological iconicity with regard to a sublexical encoding of affect: Certain phonological segments—syllabic onsets, nuclei or codas—were found to occur particularly often in words of negative and/or high-arousing semantic meaning. As these findings proved statistically reliable across a large-scale database of over 6000 German words, we assume they might represent a certain degree of iconic organization of language rather than merely idiosyncratic "Gestalt" features of single words (Conrad et al., in preparation).

Based on this assumption, we calculated:


We then tested—using EEG measurements—whether apparent sound-to-meaning correspondences represent anything more than a hard-to-interpret "intriguing finding" arising from statistical analyses of large-scale lexical databases. We used these measures of sublexical affective potential—derived directly from the large-scale database—as an experimental factor distinguishing between words that "should" sound according to these sound-to-meaning correspondences in the database—highly arousing and negative vs. words with rather neutral phonological affective qualities.

Our data suggest that these sound-to-meaning correspondences or statistical regularities of German with regard to sublexical phonology and affective content of words are rooted in phenomena that crucially influence basic online reading processes: Regardless of the actual lexical affective content of stimuli, words that were composed of phonological segments typically occurring in words of negative high-arousing meaning caused a very robust and long-lasting negativity in the ERP signal when participants simply tried to lexically access these words—compared to words consisting of affectively "neutral" phonological segments. As the most important finding of our study, this effect is strong evidence for the psychological relevance of affective sound-to-meaning correspondences in the German language at the level of sublexical units.

However, it is more difficult to attribute this effect to a specific type of processing. This is because those phonological segments typically occurring in words with threatening affective content (high arousal and negative valence) tend to be of formal salience as well: their frequency of occurrence is considerably low and/or they are phonologically rather complex, i.e., combining several consonants in syllabic onsets or codas. Note that this makes perfectly sense from an evolutionary perspective: If language would choose a specific phonological segment as a sublexical sign of threatening affective content, it should use this sign not too often to avoid inflation or decay of the alerting sign character. Further, the alerting character of the sign would clearly benefit from salient perceptive characteristics such as, for instance, complex phonological structure requiring increasing effort for articulation processes for several consonants combined in one syllabic onset or coda. In a strict sense, this confound with structural saliency makes it difficult to interpret our robust effect for the manipulation of sublexical affective potential in the maximally manipulated set as anything else than an effect of general sublexical encoding processes during silent reading—arising from the complexity and/or low frequency of the sublexical units (see Nuerk et al., 2000, for phonological/subsyllabic component frequency; Goslin et al.,

2006, for syllabic structure; Barber et al., 2004; Hutzler et al., 2004, for syllable frequency; Hauk et al., 2006a,b, for bigram frequency). According to a two-fold representation of phonological units comprising an auditory as well as motor template (Hickok, 2012), also articulatory activations—especially with regard to the complex phonological clusters—are possibly involved. Neuroimaging studies, indeed, show motor circuits responsible for articulatory movements to be activated in response to visually presented word stimuli (Hagoort et al., 1999; Burton et al., 2005).

To control for the influence of these potential intervenient factors we had prepared and presented an additional, maximally controlled stimulus set involving the same manipulations but controlling for the confounds of sublexical affective potential with formal complexity and frequency. In this set—though massively deteriorating the natural variance of the manipulated variable and respectively the strength of the manipulation—the sublexical affective potential of stimulus words still produced a small but significant effect in the ERP signal of non-neglectable medium effect size. More interestingly, the distribution of this effect across the scalp and the moment it appears during the reading process closely resemble what is typically reported—and also present in our data—for manipulations of affective content at the lexical level: an increased negativity at posterior electrode sites arising at around 200 ms after stimulus onset (EPN). Yet, although this topographic and temporal coincidence with the lexically driven EPN appears somewhat striking, this novel finding obtained through explorative time-line analysis—certainly calls for corroboration in future studies that should also explore which brain regions may be involved in these processes.

Note also that both EPN and LPC effects of lexical affective content manipulations appear somewhat diminished in our data when compared to previous experimental reports focusing on general emotion effects during visual word recognition (e.g., Conrad et al., 2011; Recio et al., 2014; just to quote two from the same lab). In our study, these manipulations of lexical affective content only served as control measures allowing us to relate both the moment when effects of the sublexical affective potential would arise and how their morphology would look like in comparison to more classical effects of lexical affective content within one and the same experimental context. Such

simultaneous manipulations of different factors that have to be kept independent from each other clearly have the consequence that the strength for each manipulation gets attenuated as compared to when manipulated alone. In consequence, resulting empirical effects may have got attenuated too.

Further, our specific manipulations of affective content combining negative valence with high arousal may not have favored lexical affective effects to show up in most robust ways, as these effects have been shown to be stronger for positive as compared to negative valence (Recio et al., 2014). We assume that this restriction to negative affective content may be responsible for the lack of effects in our behavioral data. Whereas a processing advantage for positive stimuli is consistently being reported in the literature, the picture is more heterogeneous for negative contents: One the one hand, the automatic evaluation hypothesis predicts faster processing of positive or negative words compared to neutral words, supported by several lexical decision studies (Hofmann et al., 2009; Kousta et al., 2009). However, also opposite findings, where reaction times for negative words are not different from neutral words (Briesemeister et al., 2012; Recio et al., 2014) or even longer compared to neutral or positive words (Carretié et al., 2008; Estes and Adelman, 2008) have been reported. Such findings are explained by the automatic vigilance hypothesis (Pratto and John, 1991), according to which fast and automatic evaluation of especially negative stimuli directs attention away from the actual task, e.g., lexical decision, causing prolonged response times and higher error rates due to a deeper processing of the negative word content or even because of a tendency to withdraw from negative stimuli.

The same may, of course, explain the absence of sublexical affective potential effects in our behavioral data. But note also that even though our ERP data show that this sublexical affective potential together with its formal salience do play a role for automatic reading processes, we do not see why this should necessarily bias—speed or delay—the tendency to decide that a given stimulus is a word or not. We do clearly not posit that these phenomena should—besides potentially attracting attention at some point of the reading process—trigger a fundamental general cognitive bias, and sublexical and lexical affective content are, further, unrelated in our stimuli. Taken together, the contrast between significant ERP effects and the lack of such effects at the behavioral level in our study may best serve as a good example of how RT effects only represent the final point of a decision process, whereas ERPs may better reveal fine-grained and potentially contradicting processes that precede a final response—concerning the latency of which their contradictory effects may have canceled each other out.

Whereas the topographical potential distribution of our early ERP effects aligns well with homogenous reports on classical EPN effects, the topography of the LPC effects deserves a bit more discussion, as in some studies, the LPC has been found to be more posterior (Herbert et al., 2008; Kissler et al., 2009). Yet in general, the amplitude, latency, and topographic dispersion of the LPC have been found to be task-dependent (Fischler and Bradley, 2006; Schacht and Sommer, 2009). Whereas a word counting task yielded a posterior LPC (Kissler et al., 2009), it showed a bit further central when subjects just had to passively listen to words (Herbert et al., 2008). With lexical decision tasks, the LPC usually is found in a fronto-central position (Schacht and Sommer, 2009; Conrad et al., 2011; Recio et al., 2014), and even further frontal when asking the participants to rate the words on affective dimensions (Dillon et al., 2006)—all latter reports being compatible to our findings for lexical affective content. On the other hand, we found no such typical LPC-like component for the contrast of sublexical affective potential. The reason therefore is probably that this component generally appears linked to higher-cognitive elaborative processing, whereas our sublexical manipulation taps into more basic processing stages.

What our data—obtained with highly controlled experimental manipulations and providing an excellent signal-to-noise ratio involving more than 150 stimuli per condition and 35 participants—suggest is that already specific phonological segments can trigger at the sublexical level what is classically observed and reported as (lexical) emotion effects during the reading process: an EPN at around 200 ms after stimulus onset. In combination with the finding of the long-lasting negativity in the less controlled stimulus set, our data thus represent novel neurophysiological evidence for phonological iconicity as a principle systematically influencing the organization of the vocabulary AND the online processing of a language like German. The reading system appears to be sensitive to the transport of affective information via sublexical signs of affective meaning. The EPN is usually interpreted as evidence for an early automatic attention shift toward emotionally relevant stimuli. So far, this emotional relevance was determined by the lexical affective meaning (or content) of word stimuli in a number of previous ERP studies (see Citron, 2012, for a review). In the case of our study, the same effect might already be elicited by sublexical phonological segments alone. One possibility of how this effect might arise can be seen in statistical learning: the sound-to-meaning correspondences our experimental manipulations are based upon could represent such well learned regularities, that presentation of certain phonological segments is sufficient to elicit the same emotional attention processes as whole word forms representing emotionladen concepts. Phonological segments, in that case, would have acquired symbolic affective values via associative links across the lexicon. However, an alternative explanation would refer more directly to an internal relation between acoustic or phonological properties of specific phonological segments and affective meaning at the conceptual level: As we outline in Conrad et al. (in preparation), phonemes occurring more frequently in words of high arousal (and negative valence) tend to possess phonemic features—e.g., sibilants or unvoiced stops that go along with an increasing arousal at the level of acoustic impressions, according to the distinct features theory by Jakobson et al. (1952). Therefore, it might have been the increasing arousal at the level of phonemic features typically occurring in words of high arousal and negative valence that has triggered the EPN in our data. This interpretation aligns with the general assumption of phonological iconicity to represent an internal relation between the conceptual and the sublexical level: Certain phonological segments—iconic for high arousal—could provoke the same pattern of electrophysiological activity—reflected by the EPN—as emotion-laden words, because the phonemic features of these segments are of similar affective salience. The fact that respective ERP effects of the sublexical affective potential appear as clearly diminished in the maximally controlled stimulus set compared to the maximally manipulated stimulus set is probably mainly due to the constraint of controlling for the major covariation of sublexical affective potential with formal salience. But it has to be kept in mind that already this empirical confound per se sheds light on the phonological iconicity effects, as the German language apparently made use of phonological segments that leave most impressive "footmarks" in neural correlates of the language processing—as evident from the robust effects of our maximal manipulation of sublexical affective potential—to encode threatening affective meaning. Taken together, this pattern of findings strongly points toward an internal relation between sublexical signs and affective meaning at the conceptual level and is in clear opposition to the arbitrariness axiom of linguistic theory concerning the relation between a signifier and the signified.

Finally, note that also processes of production or articulation preparation may have influenced our ERP data for sublexical affective potential—even though the task was visual lexical decision. Phonological iconicity may well be rooted in articulation processes determining an internal relation between the conceptual and the sublexical level. This appears even more plausible considering the relation between SAVs and structural complexity of consonant syllabic segments (increasing complexity of negative/high arousal segments). As the motor theory of speech perception (Liberman and Mattingly, 1985) states, perception, and articulation aspects are highly entangled during neural processing of language (Pulvermüller et al., 2006; D'Ausilio et al., 2009), and our design does not allow to clearly distinguish between either perception or articulation preparation as potential sources of effects—which, in turn, appears a most fruitful field for future research.

Language comparisons could provide interesting insights concerning potentially "universal" vs. language-dependent features of phonological iconicity. In particular, as our data involve "phonological" iconicity effects after visual presentation using orthographic codes from a shallow orthography, it might be interesting to see whether similar effects could be obtained in languages with less transparent orthographies, e.g., using English words. If effects persisted for both consistent and inconsistent grapheme-to-phoneme mappings, this would suggest that iconicity with regard to affective content might have already generalized from the phonological to the orthographic domain.

# AUTHOR CONTRIBUTIONS

MC developed as principal investigator the idea for the project, got the funding, and was crucially involved—providing major contributions to all aspects of the work from stimulus selection, data analyses, to writing of the manuscript. SU conducted the experiment and analyzed the data, helped with stimulus preparation, and also wrote major parts of the manuscript. SK was involved in developing the idea, fundraising, interpreting the data, and preparation of the manuscript. DS was mainly involved in the corpus analyses behind this study and the calculation of the new SAVs. AA assisted through all steps of the experiment with his critical thinking and important feedback. AA and DS also contributed to conducting the EEG experiment. All authors contributed substantially to the conception of the experiment and the interpretation of the data, revised and approved the final manuscript and agreed to be accountable for all aspects of the work.

# ACKNOWLEDGMENTS

This research was funded by two grants—to MC (201 "Bilingualism and affectivity in reading") and to MC and SK (410 "Sound physiognomy in language organization, processing, and production")—from the German Research Foundation (DFG) via the Cluster of Excellence "Languages of Emotion" at the Freie Universität Berlin. We thank Luisa Egle, Maren Luitjens, Mariam Murusidze, Kathrin Schreiter, Susanne Löhne, Johannes Ecker, Hauke Blume, Chun-Ting Hsu, and Gesche Schauenburg

# REFERENCES


Bloomfield, L. (1933). Language. New York, NY: Holt.


for their help in conducting the EEG experiments. This study got approved by the ethics committee of the Freie Universität Berlin and was conducted in compliance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). We acknowledge support by the Open Access Publication Funds of the Freie Universität Berlin.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Ullrich, Kotz, Schmidtke, Aryani and Conrad. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Emotion word processing: does mood make a difference?

Sara C. Sereno1, 2 \*, Graham G. Scott <sup>3</sup> , Bo Yao<sup>4</sup> , Elske J. Thaden<sup>1</sup> and Patrick J. O'Donnell <sup>1</sup>

<sup>1</sup> School of Psychology, University of Glasgow, Glasgow, UK, <sup>2</sup> Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK, <sup>3</sup> Applied Psychology Research Group, School of Media, Culture and Society, University of the West of Scotland, Paisley, UK, <sup>4</sup> School of Psychological Sciences, University of Manchester, Manchester, UK

Visual emotion word processing has been in the focus of recent psycholinguistic research. In general, emotion words provoke differential responses in comparison to neutral words. However, words are typically processed within a context rather than in isolation. For instance, how does one's inner emotional state influence the comprehension of emotion words? To address this question, the current study examined lexical decision responses to emotionally positive, negative, and neutral words as a function of induced mood as well as their word frequency. Mood was manipulated by exposing participants to different types of music. Participants were randomly assigned to one of three conditions—no music, positive music, and negative music. Participants' moods were assessed during the experiment to confirm the mood induction manipulation. Reaction time results confirmed prior demonstrations of an interaction between a word's emotionality and its frequency. Results also showed a significant interaction between participant mood and word emotionality. However, the pattern of results was not consistent with mood-congruency effects. Although positive and negative mood facilitated responses overall in comparison to the control group, neither positive nor negative mood appeared to additionally facilitate responses to mood-congruent words. Instead, the pattern of findings seemed to be the consequence of attentional effects arising from induced mood. Positive mood broadens attention to a global level, eliminating the category distinction of positive-negative valence but leaving the high-low arousal dimension intact. In contrast, negative mood narrows attention to a local level, enhancing within-category distinctions, in particular, for negative words, resulting in less effective facilitation.

Keywords: emotion, mood induction, valence, arousal, word frequency, visual word recognition, lexical decision

# Introduction

For several decades, research into visual word recognition has sought to identify and delineate the factors affecting the access of word meaning. One focus of more recent research has been on the processing of written emotional words. In general, such research has established that emotion words provoke differential responses in comparison to neutral words. Words, however, are typically recognized not in isolation, but within a context. A context can be the prior sentence or paragraph that makes a word more or less predictable. Alternatively, a context can be the inner emotional state of the comprehender. The current study investigates the effect of induced mood

#### Edited by:

Cornelia Herbert, University Hospital of Tübingen, Germany

#### Reviewed by:

Francesca M. M. Citron, Lancaster University, UK Kevin B. Paterson, University of Leicester, UK

#### \*Correspondence:

Sara C. Sereno, Institute of Neurscience and Psychology, School of Psychology, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, UK sara.sereno@glasgow.ac.uk

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 13 December 2014 Accepted: 28 July 2015 Published: 24 August 2015

#### Citation:

Sereno SC, Scott GG, Yao B, Thaden EJ and O'Donnell PJ (2015) Emotion word processing: does mood make a difference? Front. Psychol. 6:1191. doi: 10.3389/fpsyg.2015.01191

**179**

on the recognition of emotional and neutral words. We begin by reviewing recent advances in emotion word recognition. We then consider studies that have investigated how mood affects word recognition. Our study attempts to address some of the perceived limitations of the research that has been conducted to date.

Emotion words are typically characterized within a twodimensional framework of valence, a measure of value or worth, and arousal, a measure of internal activation (e.g., Osgood et al., 1957; Russell, 1980). Because extreme valence is correlated with higher arousal (e.g., Bradley and Lang, 1999), positive and negative words, when compared with neutral words, also tend to have higher associated levels of arousal. In terms of their semantics, emotion words, broadly construed, can either express an emotional state (e.g., happy, panic) or elicit one (e.g., puppy, shark).

Investigations of emotion word processing have often examined different categories of emotion words, controlled for different lexical variables, and used diverse experimental paradgims, making direct comparisons and generalizations difficult (Scott et al., 2009, 2012). Until more recently, most studies did not compare positive, negative, and neutral words within an experiment, but instead examine only two of these three categories or sometimes words comprising specific emotional categories (e.g., happiness, sadness). Nonetheless, a processing advantage for positive over neutral words is generally demonstrated (e.g., Kanske and Kotz, 2007; Kuchinke et al., 2007; Kousta et al., 2009; Schacht and Sommer, 2009; Scott et al., 2009, 2012, 2014; Sheikh and Titone, 2013; Knickerbocker et al., 2015). Some studies have shown an advantage for negative over neutral words (e.g., Tabert et al., 2001; Windmann et al., 2002; Nakic et al., 2006; Kanske and Kotz, 2007; Kousta et al., 2009; Schacht and Sommer, 2009; Knickerbocker et al., 2015). Others have shown an advantage for positive over negative words (e.g., Kiehl et al., 1999; Wentura et al., 2000; Dahl, 2001; Atchley et al., 2003; Estes and Adelman, 2008; Citron et al., 2014; Kuperman et al., 2014).

Research of ours and of others has investigated the interaction of emotion with word frequency (Kuchinke et al., 2007; Scott et al., 2009, 2012, 2014; Méndez-Bértolo et al., 2011; Sheikh and Titone, 2013). A word frequency effect represents the behavioral advantage in recognizing commonly used high frequency (HF) words over low frequency (LF) words that occur less often (e.g., Hand et al., 2010, 2012). A word frequency effect is considered to be a reliable indicator of lexical access (e.g., Sereno and Rayner, 2003). Consequently, an interaction between word emotionality and frequency would imply that a word's emotional quality can influence the early, lexical stages of word recognition. Scott et al. (2009, 2012, 2014) have found such an interaction in lexical decision reaction times (RTs), in brain electrophysiology measures, and in eye fixation durations during fluent reading. The pattern of behavioral effects is as follows: for LF words, positive and negative word responses are faster than neutral word responses; for HF words, positive word responses alone are faster than negative or neutral word responses (which do not differ from each other). The differential pattern of responses to negative words across frequency may be able to account for the different patterns of emotion word effects in the literature in that different studies may have used different ratios of higher and lower frequency negative words within their stimulus sets. Nevertheless, converging evidence from recent brain electrophysiological studies has confirmed an early, lexical (i.e., before ∼250 ms) locus of emotion in word recognition tasks (Herbert et al., 2006, 2008; Kissler et al., 2007, 2009; Scott et al., 2009; Bayer et al., 2012; Kissler and Herbert, 2013; Keuper et al., 2014; Zhang et al., 2014).

Another factor that influences the recognition of emotion words is the mood state of the reader. According to Bower's (1981) notion of mood congruency, there is a link between mood state and cognitive processes such as attention and memory, whereby processing is facilitated when the affective tone of received information matches the valence of the mood. A mood can be reliably induced in individuals via several different laboratory procedures (Martin, 1990), including the self-statement or Velten (1968) technique, music listening (Västfjäll, 2002), film watching, and hypnotic suggestion. For word recognition experiments, inducing mood via the nonverbal method of listening to instrumental music is generally preferred.

A number of studies have investigated mood effects on the recognition of emotion words (e.g., Small, 1985; Halberstadt et al., 1995; Niedenthal et al., 1997; Olafson and Ferraro, 2001; Ferraro et al., 2003). In these studies, a mood is first induced in participants by having them listen to either "happy" or "sad" music, and this is followed by a word recognition task (sometimes the music is also played in the background during the task). In general, these studies find that mood-congruent words are facilitated relative to mood-incongruent words. However, there are certain methodological concerns which may weaken the generalizability of the findings. We focus on the three studies that used lexical decision as the response time measure (Niedenthal et al., 1997; Olafson and Ferraro, 2001; Ferraro et al., 2003). In Halberstadt et al. (1995), participants wrote down auditorily presented words that were purposely selected as homophones having both emotional and non-emotional realizations (e.g., won, one). In Small (1985), words were presented tachistoscopically for increasing durations until they were identified.

Our concerns with the lexical decision studies were as follows. First, relatively few stimuli were used and lexical specifications of the stimuli were not always controlled or presented. Niedenthal et al.'s (1997) Experiments 1 and 2 used either six or eight words, respectively, within each of their four conditions ("happy words," "sad words," "love words," and "anger words") and equal numbers of neutral words (24 or 32, respectively). Olafson and Ferraro (2001) used 25 "happy words" and 25 "sad words" (which included homophones from Halberstadt et al., 1995); no neutral words were included. Ferraro et al. (2003) replicated Olafson and Ferraro (2001) with identical stimuli, extending the original experiment by testing older adults (N.B., the stimuli are not listed in either study). In fact, neither of these studies presented any lexical characteristics of their stimuli (e.g., frequency, length, valence, arousal). In Niedenthal et al. (1997), the stimuli were not explicitly controlled for arousal—happy words had numerically higher arousal values than the sad words (accd. to the norms of Bradley and Lang, 1999). In terms of mood induction, Niedenthal et al.'s (1997) Experiment 2 was the only study to include a control group of participants that were not exposed to any moodinducing music. The selection of music chosen to induce the different moods is another concern. Across all studies, many of the happy and sad music pieces used are relatively well-known (e.g., Mozart's "Eine Kleine Nacht Musik," and Barber's "Adagio for Strings"). As such, individuals' own affective associations may or may not be consistent with the desired mood that was to be induced. In addition, the tempo of the sad music is much slower than that of the happy music, which should correspondingly affect RTs (e.g., Kämpfe et al., 2010; Bottiroli et al., 2014). In all three studies, only the discrete emotions of happy and sad were examined [although it may be that Olafson and Ferraro's (2001) happy and sad words could be classified more generally as positive and negative words]. It is possible that implementing the broader positive and negative categories, derived from the dimensions of valence and arousal, may also demonstrate facilitation within a mood-induction framework (e.g., Eerola and Vuoskoski, 2011).

The current study attempted to address these concerns. As in our prior research (Scott et al., 2009, 2012, 2014), we implemented an Emotion (Positive, Negative, Netural) × Frequency (LF, HF) design. We used a total of 240 words, with 40 words in each of the 6 conditions. Words were matched across conditions on an item-by-item basis for word frequency and length. We also used several sets of published norms to obtain values on all our stimuli for valence and arousal, as well as imageability and age of acquisition (AoA). We induced positive and negative mood and also had a control condition in which no mood was induced. Positive and negative music clips were selected from a variety of sources that we anticipated would make them less recognizable (e.g., from movie soundtracks), with the deliberate selection of positive and negative clips having similar tempos. Music clips were normed ahead of time to ensure that they were equally intense in valence and arousal. Finally, we sought to broaden the scope of both the induced mood and emotional stimuli from discrete to categorical emotions (i.e., from "happy" and "sad" to "positive" and "negative").

# Methods

# Participants

A total of 144 members of the University of Glasgow community participated in this study. All were native English speaking, had not been diagnosed with dyslexia, had normal or correctedto-normal vision, had normal hearing, and were naïve as to the purpose of the experiment. An additional nine participants took part in the study, but their data were excluded due to a high amount of data loss from word errors, non-word errors, and/or very slow responses. Participants were compensated for their time with either experimental credits or £5. All participants gave written informed consent and the experimental procedure was approved by the College of Science and Engineering Ethics Committee at the University of Glasgow.

Participants were opportunistically assigned to one of the three mood groups—Control, Positive, and Negative. All groups comprised 48 participants. The average age and number of females within each group were as follows: 22 years and 35 females for the Control group; 23 years and 31 females for the Positive group; and 24 years and 39 females for the Negative group.

# Design and Materials

A 3 (Mood: Control, Positive, Negative) × 3 (Emotion: Positive, Negative, Neutral) × 2 (Frequency: LF, HF) mixed design was used. Mood was the between-participants factor and was implemented via a mood-induction procedure for Positive and Negative groups (no mood induction was used for the Control group). Emotion and Frequency were within-participant factors and the different levels of these factors were achieved via stimulus selection based on existing norms and databases.

#### Mood Induction Stimuli

In accordance with previous studies, pieces of music were used to induce positive or negative mood (e.g., Eerola et al., 2009; Eerola and Vuoskoski, 2011). In order to select the appropriate music, a norming study was run on a set of 28 participants (mean age 20 years; 19 females), none of whom (later) took part in the main experiment. In view of constraints of the main experiment, it was necessary to have a large selection of musical pieces to contribute to the mood induction procedures. The participants were run in small groups in sessions lasting ∼1.5 h. They were presented with 52 music clips, each lasting around 1 min. Participants were asked to rate each clip in terms of its valence and arousal, both on 9-point scales. Valence ranged from 1 (low, negative) to 9 (high, positive) and arousal ranged from 1 (low) to 9 (high). For each clip, participants were also asked to indicate whether they recognized it.

Based on the average valence and arousal ratings, individual pieces were then chosen for inclusion in the main experiment for mood induction. Positive music selections had valence ratings greater than 6 and negative music selections had valence ratings less than 4. Both positive and negative music selections had comparable arousal ratings of around 6. Since the main experiment included a large number of trials, we wanted to ensure that participants' moods were maintained throughout the experiment. Thus, we opted for three separate musical moodinduction exposures, each lasting around 5 min. Each 5-min set of music comprised five different pieces, with a total of 15 pieces for each mood induced. For these pieces (15 positive, 15 negative), participants' recognition rate was 19%. Thus, on average, participants reported recognizing just under three of the 15 pieces for each mood set. A complete list of the selected music is presented in Appendix A. The valence and arousal ratings (with SDs) from the final sets of positive and negative music are presented in **Table 1**.

## Lexical Decision Stimuli

The 3 (Emotion: Positive, Negative, Neutral) × 2 (Frequency: LF, HF) design gave rise to 6 conditions. With 40 words in each of the 6 conditions, the lexical decision experiment comprised a total of 240 words, ranging from 3 to 9 characters in length. Non-words comprised 240 pronounceable, orthographically legal pseudowords that were matched to word stimuli in terms of

TABLE 1 | Means (with SDs) of music specifications for Positive and Negative mood conditions.


Units of measurement are as follows: Duration in seconds; Valence on a scale from 1 (low, negative) to 9 (high, positive); Arousal on a scale from 1 (low) to 9 (high).

TABLE 2 | Means (with SDs) of target specifications across experimental conditions.


LF, low frequency; HF, high frequency; AoA, age of acquisition; PoS, part of speech. Units of measurement are as follows: Length in number of letters; Frequency in occurrences per million; Arousal on a scale from 1 (low) to 9 (high); Valence on a scale from 1 (low, negative) to 9 (high, positive); Syllables in number of syllables; Imageability on a scale from 1 (low) to 7 (high); AoA on a scale from 1 (early) to 7 (late). For PoS, the grammatical class of each word was determined (some words were classified as belonging to more than one class), and the average frequencies of Adjective, Noun, and Verb usage across conditions are listed.

string length (e.g., wid, felp, chire, narvey, bruddle, durledge, slamperic). Words were matched across the 6 conditions on an item-by-item basis for word frequency (occurrences per million) and word length (number of letters). The complete list of 240 words is presented in Appendix B. The specifications of the words in terms of length, frequency, valence, and arousal are presented in **Table 2**. Other word characteristics that were not directly controlled for, but were matched as best as possible across conditions, are also presented in **Table 2**. These include number of syllables, imageability (i.e., whether a word represents something that is easy or difficult to imagine or picture), age of acquisition (AoA; i.e., the age at which a word was initially learned), and grammatical class.

The different types of emotion words were determined by their valence values from the Affective Norms for English Words (ANEW), a database of 1000 words (Bradley and Lang, 1999). Each word has associated ratings for valence, from 1 (low, having a negative meaning) to 9 (high, having a positive meaning), and for arousal, from 1 (low) to 9 (high). As extreme valence values correlate with higher levels of arousal (Bradley and Lang, 1999), Positive and Negative words also tended to have higher arousal ratings. Mean valence and arousal values (with SDs) across all word conditions are presented in **Table 2**.

Word frequencies were obtained from the British National Corpus (BNC; http://www.natcorp.ox.ac.uk), a corpus of 90 million written-word tokens, using the on-line resource provided by Davies (2004; http://corpus.byu.edu/bnc). Word frequencies (with SDs) across all conditions are presented in **Table 2**.

While the chief variables affecting the speed of recognizing a word are its length, frequency, and contextual predictability, several other lexical variables are also known to influence processing of words (e.g., Sereno et al., 2009; Yao et al., 2013). For example, high imageable or early AoA words are facilitated relative to low imageable or late AoA words (e.g., Juhasz and Rayner, 2003; Balota et al., 2004; Sereno and O'Donnell, 2009). In addition, the grammatical class of a word also affects its processing (e.g., Sereno, 1999; Palazova et al., 2011). Means (with SDs) of these variables across all conditions are presented in **Table 2**. Imageability ratings were obtained from five sources: the Bristol Norms (Stadthagen-Gonzalez and Davis, 2006), the MRC Psycholinguistic Database (Wilson, 1988), and norms of Bird et al. (2001), Clark and Paivio (2004), and Cortese and Fugett (2004). AoA ratings were obtained from the first four sources listed for imageability as well as the norms of Morrison et al. (1997).

#### Apparatus

Stimuli were presented via E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA) on a Dell OptiPlex GX520 desktop computer and a 17′′ LCD flat-screen monitor (1024×768 resolution; 75 Hz). Letter strings appeared in Courier New, 24 point bold font (black characters on a white background). At a viewing distance of ∼84 cm, 2.3 characters of text subtended 1◦ of visual angle. Responses were made via the computer keyboard and were recorded with millisecond accuracy. Music was played through headphones and was adjusted to a comfortable volume.

### Procedure

Participants were tested individually. They were given written information about the experiment and a consent form. Participants were assigned (in order of their arrival) to one of the three mood groups. Participants were given mood assessment sheets to rate the current state of their mood via the dimensions of valence and arousal (described below). Participants in all groups rated their mood at the beginning of the experiment. They were then given instructions for the lexical decision task. They were told that half of the stimuli were words and half were non-words and that they should respond as quickly and as accurately as possible. They were instructed to make word responses using their right forefinger on the "L" key (labeled "W") and non-word responses with their left forefinger on the "S" key (labeled "NW"). They were then presented with a short practice block of items (N = 12) to become accustomed to the task.

Each trial consisted of the following events. A blank screen was initially presented for 1000 ms. A fixation cross (+) then appeared in the center of the screen for 200 ms, replaced by another blank screen for 500 ms. A letter string was then presented centrally until the participant responded. Experimental trials (240 words, 240 non-words) were presented in a different random order for each participant.

The lexical decision experiment was presented in three equal blocks of trials. Participants in the Control mood condition performed the experiment with short break periods preceding each block. The procedure for participants in the Positive and Negative mood conditions was as follows. For each of the three blocks, they first listened to a set of mood-appropriate music (∼5 min), rated their mood, then proceeded with a block of lexical decision trials. Positive and Negative mood condition participants were not asked whether they recognized any of the music (a total of 15 clips over the course of the experiment). We thought this would disrupt the flow of the experiment. Moreover, as these participants were selected from the same participant pool as those who had provided ratings for the pieces (none were the same), we assumed that recognition rates would be similarly minimal. The experiment lasted ∼30 min for the Control mood participants and 45 min for Positive and Negative mood participants.

The mood rating sheets provided the following information to participants. Valence was described as a measure of value or worth and used a 9-point scale from 1 (very negative) to 5 (neutral) to 9 (very positive). Scale endpoints of "very positive" and "very negative" would indicate that they felt very good and very bad, respectively. Arousal was described as a measure of excitement vs. calmness and used a 9-point scale from 1 (very low arousal) to 5 (intermediate arousal) to 9 (very high arousal). The scale endpoint of "very high arousal" would indicate that they felt stimulated, excited, frenzied, jittery, or wide-awake, and that of "very low arousal" would indicate feeling relaxed, calm, sluggish, dull, or sleepy.

# Results

#### Mood Induction Manipulation Check

At the outset of the experiment (prior to any mood induction procedure), all participants provided valence and arousal ratings of their current mood. Mean ratings (with SDs) across the participant groups are presented in **Table 3**. A 1-factor analysis of variance (ANOVA) was carried out on the valence and arousal rating data comparing the three mood groups. No differences in ratings between mood groups were found either for valence [F1(2, 141) = 1.09, p > 0.30] or for arousal [F<sup>1</sup> < 1].

For the Positive and Negative mood groups, participants listened to positive and negative music, respectively, before each of the three blocks of lexical decision trials. Participants in these mood groups provided additional ratings of their mood on each of these occasions. Mean valence and arousal ratings (with SDs) for Positive and Negative mood groups are presented in **Table 3**. Paired-sample t-tests were carried out separately for Positive and TABLE 3 | Means (with SDs) of valence and arousal ratings across mood groups during the experiment.


Units of measurement are as follows: Valence on a scale from 1 (low, negative) to 9 (high, positive); Arousal on a scale from 1 (low) to 9 (high).

Negative mood groups, comparing their pre-experiment to postmusic valence and arousal mood ratings. The Positive mood group showed a significant increase in valence (+0.4) [t(47)= 2.46, p < 0.05], and a marginal increase in arousal (+0.5) [t(47) = 1.97, p = 0.055]. The Negative mood group showed a significant decrease in valence (−1.4) [t(47) = −7.44, p < 0.001], as well as a significant increase in arousal (+0.8) [t(47) = 3.36, p < 0.01].

## Lexical Decision Data

For correct word responses (97.77% of the data), items having RTs less than 250 ms or greater than 1500 ms were first excluded. In addition, for each participant in each condition, items with RTs beyond two standard deviations of the mean were then excluded. These trimming procedures resulted in an average data loss of 5.78% per participant (∼2 items per condition). Overall, participants on average provided RT data on 37 of the 40 possible items per condition.

The mean RT data across experimental conditions are presented in **Table 4**. The RT means (with standard error bars) are presented in **Figure 1**. A three-way mixed design ANOVA was performed on the RT data both by participants (F1) and by items (F2). Mood (Control, Positive, Negative) was the betweenparticipant factor; within-participant factors were the word variables of Emotion (Positive, Negative, Neutral) and Frequency (LF, HF). A summary of all RT main effects and interactions is presented in **Table 5**. The mean percent error (%Error) data are also presented and similarly analyzed (see **Tables 4**, **5**, and **Figure 2**). However, as errors only comprised 2.23% of the total data, our focus is on the RT data.

# RTs

#### **Main effects**

The between group factor of Mood was not significant by participants, but was significant by items (see **Table 5**). This disparity resulted from the much higher level of variance among participants than items (evidenced in the MSEs). Unlike participants, items were matched across groups. Bonferroni pairwise comparisons in the items analysis showed that participants in the Control mood condition (571 ms) were slower than those in both the Positive (557 ms) and Negative (552 ms) mood conditions [p2s < 0.001], which did not differ from each other [p2s > 0.30].


TABLE 4 | RT and %Error means (with SDs) as a function of Mood (Control, Positive, Negative), Emotion (Positive, Negative, Neutral), and Frequency (LF, HF).

RT in ms; LF, low frequency; HF, high frequency.

The main effect of Emotion was significant (see **Table 5**). Bonferroni pairwise comparisons by participants and items demonstrated reliable differences between all word types, with Positive words (548 ms) responded to faster than both Negative (559 ms) and Neutral (571 ms) words, which also significantly differed from each other [all ps < 0.001].

The main effect of Frequency was also significant (see **Table 5**). Responses to HF words (544 ms) were faster than those to LF words (575 ms).

#### **Interactions**

Two of the interactions were significant: Emotion × Frequency and Mood × Emotion (see **Table 5**). The associated RT means (with standard error bars) for these interactions are presented in **Figures 2**, **3**, respectively. Neither the Mood × Frequency nor the Mood × Emotion × Frequency interactions were significant (see **Table 5**).

For the Emotion × Frequency interaction (see **Figure 3**), participant and item Bonferroni pairwise comparisons examined frequency effects for each type of emotion word and emotion word differences within each level of frequency. Word frequency effects were significant for all types of emotion words [all ps < 0.001]. RTs to HF Positive, Negative, and Neutral words (533, TABLE 5 | Main effects and interactions by participants (F1 ) and by items (F2 ) for RT and %Error measures.


MSE, mean squared error.

550, and 550 ms, respectively) were faster than those to their LF counterparts (564, 568, and 592 ms, respectively). For LF words, RTs to Positive (564 ms) and Negative (568 ms) words were faster than those to Neutral words (592 ms) [ps < 0.001]. The LF Positive-Negative contrast was marginal by participants [p<sup>1</sup> = 0.099], and not significant by items [p2> 0.25]. For HF words, a different pattern emerged. RTs to HF Positive words (533 ms) were significantly faster than those to both HF Negative (550 ms) and Neutral (550 ms) words [all ps < 0.001], which did not differ from each other [all ps = 1].

For the Mood × Emotion interaction (see **Figure 4**), participant and item Bonferroni pairwise comparisons examined mood effects for each type of emotion word as well as emotion word differences within each level of mood. By participants, Control, Positive, and Negative mood groups did not differ significantly in their responses to Positive words (557, 551, and 536 ms, respectively), Negative words (571, 553, and 553 ms, respectively), nor Neutral words (583, 566, and 563 ms, respectively) [all p1s > 0.35]. The lack of significance (given apparent differences) is due to the high variability in RTs across participants. Item variability, in contrast, is much less as items are matched across groups (cf. the main effect of Mood). By items, significant differences did emerge. The Control mood group was significantly slower than the Positive and Negative mood groups in response to Negative words (571 ms vs. 553 and 553 ms, respectively) [p2s < 0.001] and to Neutral words (583 ms vs. 566 and 563 ms, respectively) [p2s < 0.01]. Positive and Negative mood groups did not differ in response to either Negative or Neutral words [p2s = 1]. In partial contrast, both the Control and Positive mood groups were significantly slower than the Negative mood group in response

FIGURE 4 | Mean RT (ms), with SE bars, on words as a function of Mood (Control, Positive, Negative) and Emotion (Positive, Negative, Neutral).

to Positive words (557 and 551 ms vs. 536 ms, respectively) [p2s < 0.001]. The difference between Control and Positive mood groups to Positive words was not significant [p<sup>2</sup> > 0.15].

Participant and item Bonferroni pairwise comparisons also examined emotion word differences within each level of mood. Within the Control mood group, Positive words (557 ms) were responded to faster than Negative words (571 ms), and both types of words were responded to faster than Neutral words (583 ms) [all ps < 0.001]. A similar pattern emerged for the Negative mood group: Positive words (537 ms) were responded to faster than Negative words (553 ms), and both types of words were responded to faster than Neutral words (563 ms) [all ps < 0.01]. Within the Positive mood group, however, there was no difference between Positive (551 ms) and Negative (553 ms) words [all ps = 1], although both types of emotion words were responded to faster than Neutral words (566 ms) [all ps < 0.001].

# %Error

#### **Main effects**

The between group effect of Mood was not significant (see **Table 5**). Similar to the RT findings, the within-participant effects of Emotion and Frequency were both significant (see **Table 5**). For Emotion, Bonferroni pairwise comparisons by participants and items demonstrated that more errors were reliably made with Neutral (3.02%) compared to both Positive (1.61%) and Negative (2.06%) words [all ps < 0.001]. Errors to Positive and Negative words differed significantly by participants [p<sup>1</sup> < 0.05], but marginally by items [p<sup>2</sup> = 0.071]. For Frequency, participants made fewer errors on HF (1.24%) than on LF (3.21%) words.

#### **Interactions**

The only interaction that was significant was Emotion × Frequency (see **Table 5**). Participant and item Bonferroni pairwise comparisons examined frequency effects for each type of emotion word and emotion word differences within each level of frequency. Word frequency effects were significant for all types of emotion words [all ps < 0.001]. The percentage of errors on HF Positive, Negative, and Neutral words (0.97, 1.41, and 1.35%, respectively) was less than that on their LF counterparts (2.24, 2.71, and 4.69%, respectively). For LF words, significantly fewer errors were made on both Positive (2.24%) and Negative (2.71%) words in comparison to Neutral words (4.69%) [all ps < 0.001]. There was no difference between errors on Positive and Negative words [all ps > 0.20]. For HF words, none of the comparisons reached significance. The %Error on Positive words (0.97%) was marginally less than that on Negative words (1.41%) [p<sup>1</sup> = 0.062, p<sup>2</sup> = 0.086], and no different than that on Neutral words (1.35%) [all ps > 0.10]. Negative and Neutral words did not differ in %Error [all ps = 1].

# Discussion

The current study investigated effects of mood on emotion word recognition. While past studies have demonstrated moodcongruency effects (e.g., Niedenthal et al., 1997; Olafson and Ferraro, 2001; Ferraro et al., 2003), they may be limited by the methodologies that were employed. For example, tight experimental control over lexical variables associated with the stimuli was not always implemented, baseline conditions (i.e., neutral words, no mood induction) were not always used, happy and sad mood-inducing music differed in tempo and arousal, and effects were restricted to discrete emotions (i.e., happy, sad). We attempted to address these concerns. In our study, our betweengroup factor of mood was induced via positive and negative music equated for intensity of valence and arousal. A no-mood control group was also included. In line with recent emotion word studies, we used an Emotion (Positive, Negative Neutral) × Frequency (LF, HF) stimulus design. Word stimuli (N = 240) varied systematically in valence and arousal and were explicitly controlled for word frequency and length. In contrast to the prior mood-induction studies, we also attempted to match stimuli as closely as possible for imageability, AoA, and grammatical class, although strict equivalences of these variables were not always achieved (see **Table 2**) which could limit the generalizability of our findings.

We found main effects of Mood (significant only by items due to inter-participant variability), Emotion, and Frequency. Positive and Negative mood groups were faster overall in their responses than the Control (no music) group. This was most likely due to participants' relatively higher levels of arousal produced by the mood-inducing music (see **Table 3**) as well as a possible consequence of the music's tempo (e.g., Husain et al., 2002; Kämpfe et al., 2010; Bottiroli et al., 2014). The Emotion-Frequency results are similar to what we have found in the past (Scott et al., 2009, 2012, 2014). For Emotion, Positive words were responded to faster than Negative words, and both had faster responses than Neutral words. For Frequency, HF words were responded to faster than LF words. The Emotion × Frequency interaction arose from the pattern associated with Negative words—responses to LF Negative words were as fast as Positive words (both faster than Neutral words), whereas responses to HF Negative words were as slow as Neutral words (both slower than Positive words). The relative slowing of responses to negative (vs. positive) stimuli has often been explained by differential effects at different stages of stimulus processing. Two-stage models of emotion word processing— Taylor's (1991) mobilization-minimization hypothesis and Pratto and John's (1991) automatic vigilance hypothesis—propose that all emotionally valenced words enjoy an initial facilitation relative to neutral words because of their high arousal, but that negative words are subsequently inhibited due to their low valence and, hence, inherent threat. This would predict a consistent advantage in processing for positive over neutral words, and an advantage for negative over neutral words under some circumstances. Scott et al. (2009) suggested that salience in the form of word frequency may be one such moderating factor. Various models of this process have been reviewed by Kuperman (2015) who distinguished between the "motivated attention" account, explaining equal speeding of positive and negative words, and the "automatic vigilance" account, which argues for fast attention capture in negative words but slower disengagement, producing a relative advantage for positive over negative words.

The main aim of our study, however, was to investigate the effect of mood on the processing of emotion words. We had expected to find mood-congruency effects within the more general categories of "positive" and "negative." Although we found a significant Mood × Emotion interaction, it did not appear to be the result of mood-congruency effects (see **Figure 4**). Instead, we found that Neutral and Negative mood conditions behaved similarly, mirroring the main effect of Emotion (with fastest responses to Positive words, followed by Negative, then Neutral words). In the Positive mood condition, the relative advantage for Positive words disappeared— responses to Positive and Negative words did not differ, but both were faster than responses to Neutral words. From these findings, we are left with two patterns of data to explain. First, for the Positive mood group, mood congruency would predict that responses to Positive words should be even faster than that found in a baseline (Control mood) condition. In fact, there was no difference between responses to Positive and Negative words. It is not clear whether this represents a relative slowing down of responses to Positive words or a relative speeding up of responses to Negative words. Second, for the Negative mood group, mood congruency would again predict that responses to Negative words should be speeded in comparison to the Control mood condition. On this view, responses to Negative words should be as fast or faster than those to Positive words. However, our results showed that the Negative mood group behaved no differently than the Control group, with the exception that the overall response time was speeded.

It has been proposed that internal affective cues can direct our attention, with positive mood focusing attention on the metaphorical forest and negative on the trees (e.g., Easterbrook, 1959; Gasper and Clore, 2002; Fredrickson and Branigan, 2005; Huntsinger, 2013). This is traditionally attributed to a broadening or narrowing of attention to the global or local level, respectively. Within this context, it becomes possible to account for the pattern of our findings. For the Positive mood group, a broadening of attention could diminish the impact of any negative content of words in the second stage of a two-stage processing mechanism, removing the need to inhibit the processing of negative stimuli and eliminating the difference in response time between positive and negative emotional stimuli. A positive mood might act as a buffer against potential threat inherent in negative stimuli (e.g., Das et al., 2012). Under such circumstances, the initial processing advantage (or "mobilization") enjoyed by negative words would not only be maintained, but would be preserved because the subsequent inhibition (or "minimization") stage would not be prompted. In this way, positive mood could eliminate the category distinction of positive-negative valence but leave the high-low arousal dimension intact. For the Negative mood group, a narrowing of attention could enhance distinctions between words within each of the categories of Positive, Negative, and Neutral. Traditionally, emotions have been classified into six subtypes—"happiness," "surprise," "sadness," "anger," "fear," and "disgust" (e.g., Ekman and Friesen, 1971). As such, negative emotions comprise a broader range of subtypes. Moreover, Unkelbach et al. (2008) have suggested that positive information is more densely clustered in semantic space than negative information, and this leads to processing benefits such as speeded access. As a consequence, a negative mood may only serve to

# References


enhance the intrinsic diversity of "negative" as a category and, thus, it may lose its potency as a facilitative agent, in particular, for negative words.

In sum, our study sought to investigate the effect of mood on emotion word recognition, notably by employing strict experimental controls over both the mood-inducing music as well as the word stimuli. Past studies have found moodcongruency effects, but only for the discrete emotions of "happy" and "sad." We tried to extend these findings to the more general categories of "positive" and "negative." Our findings did replicate prior studies in terms of the pattern of Emotion × Frequency effects. However, our Mood × Emotion interaction was not driven mood-congruency effects. Instead, it seemed that moodinduced attentional effects differentially modulated responses to emotion words when situated within the context of categories defined only by their valence and arousal.

# Acknowledgments

We would like to thank A. M. Murray for her assistance with some of the data collection.

and acceptance of quit-smoking messages. Psychol. Health 27, 116–127. doi: 10.1080/08870446.2011.569888


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Sereno, Scott, Yao, Thaden and O'Donnell. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Appendix

Appendix A | Music Stimuli.


<sup>a</sup>Adapted from Eerola and Vuoskoski (2011). <sup>b</sup>Looped.

#### Appendix B | Word Stimuli.


LF, low frequency; HF, high frequency.

# Brief learning induces a memory bias for arousing-negative words: an fMRI study in high and low trait anxious persons

#### *Annuschka S. Eden1,2\*, Vera Dehmelt1,2, Matthias Bischoff 3, Pienie Zwitserlood4, Harald Kugel5, Kati Keuper 6, Peter Zwanzger7 and Christian Dobel1,8,9\**

*<sup>1</sup> Institute of Biomagnetism and Biosignalanalysis, University Hospital of Münster, Münster, Germany, <sup>2</sup> Institute of Psychology, University of Münster, Münster, Germany, <sup>3</sup> Institute of Sport and Exercise Sciences, University of Münster, Münster, Germany, <sup>4</sup> Department of Psycholinguistics and Cognitive Neurosciences, Institute of Psychology, University of Münster, Münster, Germany, <sup>5</sup> Department of Clinical Radiology, University of Münster, Münster, Germany, <sup>6</sup> University of Hong Kong, Hong Kong, Hong Kong, <sup>7</sup> kbo-Inn-Salzach Clinic, Academic Hospital of Psychiatry, Psychotheray and Neurology, Wasserburg am Inn, Germany, <sup>8</sup> Department of Psychology, University of Bielefeld, Bielefeld, Germany, <sup>9</sup> Department of Otolaryngology, Jena University Hospital, Jena, Germany*

Persons suffering from anxiety disorders display facilitated processing of arousing and negative stimuli, such as negative words. This memory bias is reflected in better recall and increased amygdala activity in response to such stimuli. However, individual learning histories were not considered in most studies, a concern that we meet here. Thirty-four female persons (half with high-, half with low trait anxiety) participated in a criterion-based associative word-learning paradigm, in which neutral pseudowords were paired with aversive or neutral pictures, which should lead to a valence change for the negatively paired pseudowords. After learning, pseudowords were tested with fMRI to investigate differential brain activation of the amygdala evoked by the newly acquired valence. Explicit and implicit memory was assessed directly after training and in three follow-ups at 4-day intervals. The behavioral results demonstrate that associative word-learning leads to an explicit (but no implicit) memory bias for negatively linked pseudowords, relative to neutral ones, which confirms earlier studies. Bilateral amygdala activation underlines the behavioral effect: Higher trait anxiety is correlated with stronger amygdala activation for negatively linked pseudowords than for neutrally linked ones. Most interestingly, this effect is also present for negatively paired pseudowords that participants could not remember well. Moreover, neutrally paired pseudowords evoked higher amygdala reactivity than completely novel ones in highly anxious persons, which can be taken as evidence for generalization. These findings demonstrate that few wordlearning trials generate a memory bias for emotional stimuli, indexed both behaviorally and neurophysiologically. Importantly, the typical memory bias for emotional stimuli and the generalization to neutral ones is larger in high anxious persons.

Keywords: trait anxiety, fMRI, emotions, memory bias, consolidation, statistical word-learning, amygdala

#### *Edited by:*

*Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Tübingen, Germany*

#### *Reviewed by:*

*Christoph Scheepers, University of Glasgow, UK Hsu-Wen Huang, National Taiwan Normal University, Taiwan*

#### *\*Correspondence:*

*Annuschka S. Eden, Institute of Biomagnetism and Biosignalanalysis, University Hospital of Münster, Malmedyweg 15, 48143 Münster, NRW, Germany annuschka.eden@uni-muenster.de; Christian Dobel, Department of Psychology, University of Bielefeld, P.O. Box 10 01 31, D-33501 Bielefeld, Germany Christian.Dobel@Uni-Bielefeld.de*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 13 February 2015 Accepted: 03 August 2015 Published: 21 August 2015*

#### *Citation:*

*Eden AS, Dehmelt V, Bischoff M, Zwitserlood P, Kugel H, Keuper K, Zwanzger P and Dobel C (2015) Brief learning induces a memory bias for arousing-negative words: an fMRI study in high and low trait anxious persons. Front. Psychol. 6:1226.*

*doi: 10.3389/fpsyg.2015.01226*

# Introduction

Emotionally arousing situations and stimuli are processed preferentially. This has been shown by a vast body of studies, with methods ranging from simple behavioral measures to stateof-the-art imaging techniques (e.g., Junghöfer et al., 2001; Koster et al., 2006; Laeger et al., 2012; Eden et al., 2014; Weierich and Treat, 2015; for a meta-analysis, see Bar-Haim et al., 2007; for a review, see Cisler and Koster, 2010). Preferential processing leads to a memory bias, apparent in enhanced memory for emotional as compared to neutral stimuli. This has been shown for many stimulus types, such as pictures (Bradley et al., 1992; Schupp et al., 2004a; Touryan et al., 2007; Yegiyan and Yonelinas, 2011), faces (Schupp et al., 2004b), scenes (Heuer and Reisberg, 1990), gestures (Flaisch et al., 2011), and words (Kissler et al., 2007, 2009; Herbert et al., 2008; Scott et al., 2009; Laeger et al., 2012; Keuper et al., 2013, 2014; Eden et al., 2014). The memory bias, especially for stimuli that are negative and arousing, seems more prominent in persons suffering from an anxiety disorder (Calvo et al., 1994; Friedman et al., 2000; Dalgleish et al., 2003; Eysenck et al., 2007) or from a subclinically anxious personality (Norton et al., 1988; McCabe, 1999; Russo et al., 2006; Mühlberger et al., 2009; Eden et al., 2014). The latter group exhibits high levels of trait anxiety, does not meet the criteria for an anxiety disorder, but is prone to develop one (e.g., McCabe, 1999; Russo et al., 2006; Mitte, 2008; Waldhauser et al., 2011). Increased preferential processing of negative stimuli, difficult disengagement from such stimuli, attentional avoidance, and their underlying mechanisms explain part of the anxiety disorders' etiology (e.g., Eden et al., 2015; for a review, see Cisler and Koster, 2010). Thus, for a better understanding of this large group of disorders, and for the improvement of therapeutic treatments, it is crucial to understand the mechanisms that underlie the acquisition and processing of arousing-negative stimuli. To this aim, we compare persons with high and low (subclinical) anxiety in an associative learning paradigm, using behavioral and neuroimaging measures.

Learning and memory have explicit and implicit components that are indexed by different measures and methods. Explicit memory involves conscious recollection of previous experience and information (semantic, episodic, and autobiographic). Memory is typically measured by explicit recall or recognition of learned material. In contrast, memory access remains unaware in implicit memory, as is the case for procedural information used in tie-knotting or bike-riding. Implicit memory reveals itself through priming, measured in tasks such as word-fragment completion (e.g., Eysenck and Byrne, 1994), or valence judgment of recently acquired words – when explicit knowledge about their meaning is absent. Implicit and explicit memories differ, and have different neurobiological correlates (e.g., Starr and Phillips, 1970; Cohen and Squire, 1980; Nissen and Bullemer, 1987; Gabrieli et al., 1995; Rugg et al., 1998). The memory bias for emotional stimuli is more reliable in explicit than in implicit measures (Eysenck and Byrne, 1994; Russo et al., 2006; for a review, see Mitte, 2008). Mitte (2008) showed that (trait) anxiety has an impact on explicit measures such as recall, but not on recognition. The existence of an implicit memory bias for emotional information is still under debate. While some provide support for its existence (e.g., Williams et al., 1988, 1996, 1997; Eysenck and Byrne, 1994), others could not replicate their findings (McCabe, 1999; Russo et al., 1999). Our results might contribute to this controversy.

The memory bias can be detected with neuroimaging measures such as functional magnetic resonance imaging (fMRI), which is sensitive enough to even reveal emotional responses to items that cannot be recalled explicitly (Dannlowski et al., 2007b). As neural correlates of emotion processing many fMRI studies have focused on limbic structures, particularly the amygdalae (e.g., Davis and Whalen, 2001; Phelps and LeDoux, 2005). It has been repeatedly demonstrated that the amygdala is hyperactive in response to threatening and negative emotional stimuli. This hyperactivity is evident for faces (e.g., Birbaumer et al., 1998; Sheline et al., 2001; Phan et al., 2006; Dannlowski et al., 2007a,b; Evans et al., 2008; Hall et al., 2014), scenes (e.g., Kryklywy et al., 2013; Frank and Sabatinelli, 2014; Radua et al., 2014), and words (e.g., Isenberg et al., 1999; Strange et al., 2000; Tabert et al., 2001; Hamann and Mao, 2002; Cunningham et al., 2004; Herbert et al., 2011; Kanske and Kotz, 2011; Straube et al., 2011; Laeger et al., 2012; Hoffmann et al., 2015), and is assumed to explain part of the pathogenesis and the maintenance of anxiety (disorders; e.g., Etkin and Wager, 2007; Etkin et al., 2010; Shin and Liberzon, 2010; Xu et al., 2013; Binelli et al., 2014; but see Dresler et al., 2013 for an alternative view). Given the ongoing debate about the neurobiological basis of (trait) anxiety, it is crucial to incorporate imaging measurements into the study design, especially when stimuli are presented below threshold, or trained in a shallow learning paradigm that does not guarantee robust explicit access. The latter is the case in our study, in which persons with high or low levels of anxiety should learn to associate meaning to pseudowords with only few learning instances. Pseudowords were either paired with neutral or with emotionally negative content, and tested with explicit as well as implicit memory measures. We investigated the memory bias for novel emotional (and neutral) words in persons with subclinically high and low trait anxiety. A criterion-based approach during learning allowed for distinguishing between novel words whose meaning could be accessed explicitly, and words that were learned less well.

Despite the many studies on emotion, our approach allows to address some important issues. First, we investigate memory bias for neutral stimuli that gain their emotional connotation via associative learning (similarly as in Eden et al., 2014). These stimuli possess no specific innate or learned valence (in contrast to a picture of an attacking tiger, or of a weapon). With meaningless and association-free pseudowords, we aim at better control of the individual learning histories and the depth of encoding. Second, we use explicit and implicit memory measures to represent these differing aspects of memory. Third, we applied a criterion-based learning approach, ensuring that participants learned the same amount of wordpicture pairs, and were presented with equal sets of explicitly learned and less well-learned items in the fMRI measurement, independent of their learning history. Fourth, we included an fMRI measurement into our design to analyze the amygdala's response to the newly acquired and to completely novel pseudowords.

In the following, we describe the rationale and the background of our learning design in more detail. With a statistical learning paradigm (e.g., Saffran et al., 1996; Breitenstein and Knecht, 2002; Saffran, 2002; Breitenstein et al., 2007; Cunillera et al., 2010; Eden et al., 2014; Laeger et al., 2014a), neutral pseudowords were paired with either arousing-negative or neutral pictures. This paradigm implements an increased conjoint probability of two events ("correct pairings") throughout the training, compared to two events with a random contingency ("incorrect pairings"). Participants extract relevant information without receiving feedback and without knowing the underlying learning principle – a shallow type of learning. By repeated presentation of stimulus combinations, highly robust long-term learning is possible. The more repetitions, the stronger the associations between the stimuli, which results in a typical learning curve. Hebbian cell assemblies probably provide the neural bases of these processes (Pulvermüller, 1999). The paradigm has some ecological validity and can be taken as a model for language acquisition in children and adults (Dobel et al., 2009, 2010). It also allows the investigation of explicit and implicit effects of shallow learning (as investigated similarly in our prior study, Eden et al., 2014).

Two groups of participants, one high and one low in trait anxiety, were tested. Various measurements were used to evaluate and analyze potentially biased memories: a valence rating, a cued-recall test and an fMRI assessment. The behavioral tests were presented directly after, 4, 8, and 12 days after learning, to assess effects of consolidation and forgetting. The valence rating entailed a spontaneous evaluation of the pseudowords' (gained) valence. Valence ratings do not require explicit memory and can thus tap into implicit memory, indexing the valence transferred from pictures to formerly neutral pseudowords. We chose the valence rating because it has been effectively applied after only a few learning instances, when explicit measures are not yet sensitive to learning (e.g., Steinberg et al., 2012; Eden et al., 2014). The cued-recall test measured explicit knowledge about the acquired meaning of the pseudowords. It is comparable to a vocabulary test.

In Eden et al. (2014), we used a very similar design to investigate learning of emotional words in persons with high anxiety. We obtained very little evidence for a memory bias during learning, but explicit and implicit measures revealed a bias after learning. High-anxious persons displayed a stronger memory bias than low-anxious individuals. In fact, they even judged neutrally paired words as negative when their meaning could not explicitly be recalled. We took this effect as evidence for generalization (Eden et al., 2014). Given these findings and the current literature, we expected for the current study enhanced memory effects (i.e., memory bias) for aversive stimulus material immediately after learning and a stabilization of this effect after a time delay allowing consolidation. In line with Mitte (2008), we predicted a stronger bias in explicit cued-recall and fMRI measurements than in (implicit) valence ratings. We also expected a more pronounced bias in high-anxious individuals, who should also show a generalization effect, with

increased emotional valence for neutrally paired pseudowords after training. We included completely novel pseudowords for particular fMRI contrasts (see below).

In sum, we instantiated shallow learning via the combination of pseudowords with arousing-negative or neutral picture content, which should nevertheless lead to a memory bias. Next to behavioral effects we focused on neurophysiological consequences of learning, as indexed by amygdalar activity. For this, we contrasted the following conditions:


In a second step, we tested whether effects are mediated by trait anxiety. This was done by entering trait anxiety as covariate.

# Materials and Methods

# Ethics Statement

All procedures were cleared by the ethical review board of the Ärztekammer Westfalen-Lippe and subjects gave informed consent to participate. All clinical investigation was conducted according to the principles of the Declaration of Helsinki.

# Participants

The Spielberger State Trait Anxiety Inventory (Spielberger et al., 1983) was completed via online survey by 310 non-clinical participants. On the basis of individual scores, 17 participants scoring thirty or below in the trait-anxiety inventory (range: 20–80) were assigned to the low-anxiety group (mean trait score = 27.76, SD = 4.15; mean age 26.53, SD = 6.21). Another 17 subjects scoring fifty or above were assigned to the high-anxiety group (mean trait score = 57.18, SD = 4.33; mean age 26.12, SD = 5.60). Both groups consisted of only females that were matched for age and years of schooling. All participants were native speakers of German, right-handed (as assessed by the Edinburgh Handedness Inventory, Oldfield, 1971) and exhibited no current axis I disorders, as diagnosed by the Mini-International Neuropsychiatric Interview (M.I.N.I., Sheehan et al., 1998). None took part in our earlier study (Eden et al., 2014).

# Materials

Forty-four pseudowords (e.g., "muxo," "alep") served as learning materials, presented visually during learning and testing. (Stimulus material and result files are available from the corresponding author). All pseudowords were disyllabic and phonotactically legal (in German). They were taken from Breitenstein and Knecht (2002; Breitenstein et al., 2007), who tested the stimuli for emotional neutrality and low similarity to existing German words. The selected 44 pseudowords were randomly assigned to 44 pictures depicting concrete objects. Half displayed neutral objects such as a bucket or a chair, and the other half showed arousing-negative objects such as a gun or a shark. Pictures were color photos taken from Hemera Photo Objects, Wikipedia Commons1 and the International Affective Picture System (Lang et al., 1999). Some pictures were cropped to ensure that only one object was visible and positioned in the center. (See Appendix in Supplementary Material for a list of all pseudowords and matched concepts. The picture material can be requested from the author).

A pre-test assessed neutral or arousing-negative appraisal of the pictures. Thirty participants (psychology students from the University of Münster) were presented with 100 pictures (50 subjectively judged to be negative and arousing, 50 subjectively judged neutral, non-arousing). Subjects rated valence and arousal of all pictures via Self-Assessment Manikin (SAM)-scales (Bradley and Lang, 1994), ranging from one (very pleasant or low arousal) to nine (very unpleasant or high arousal). The 22 most negatively rated pictures (valence: Mean = 7.85, SD = 0.30; arousal: Mean = 5.77, SD = 0.38) differed significantly from the 22 most neutrally rated (valence: Mean = 4.76, SD = 0.21; arousal: Mean = 1.90, SD = 0.21) pictures [valence: *t*(42) = 38.908, *p <* 0.001; arousal: *t*(42) = 42.149, *p <* 0.001]. These 44 pictures served as materials in the experiment. According to the German version of CELEX-Database (Baayen et al., 1995), the frequency of object names did not differ between arousing-negative and neutral concepts, *t*(42) = –0.032, *p* = 0.975. Participants who performed the pre-test rating did not take part in the main experiment.

#### Design and Procedure, Analysis

During training, the subject's task was to decide intuitively by button-press whether a visually presented pseudoword and object (color picture) matched. Training stopped as soon as the participant reached criterion, that is, a predefined level of knowledge concerning the pseudoword-picture associations (this allowed for a balanced block-procedure in the fMRItask; see below). Participants were not informed about the

1http://commons*.*wikimedia*.*org

upcoming recall and valence tests and received no feedback on their responses during training. The training consisted of at least six learning passes. During each learning pass, participants were confronted with one matching "correct" and one mismatching "incorrect" pseudoword-picture pair, separated by at least one other pair. Hence, after learning pass eight, for instance, participants had heard each pseudoword sixteen times, eight times paired correctly (the same pseudowordobject combination) and eight times paired incorrectly (the pseudoword paired with eight different other objects). Note that all pseudowords used in "incorrect" pairings were "correctly" paired with other pictures. Thus, all presented pseudowords could be associated with meaning, and all pseudowords and pictures appeared equally often. There were 88 pseudowordpicture pairs per learning pass (22 correct arousing-negative, 22 correct neutral, 22 incorrect arousing-negative, 22 incorrect neutral).

The training aborted when 11 arousing-negative and 11 neutral pseudoword-picture pairs were accurately identified (hit or correct rejection) for eight times. Training continued until this criterion was reached. The criterion approach ensured equal learning for all participants. Participants were confronted with 8.26 (range: 7–12) learning passes on average. The choice for 11 pairs per valence condition and eight correct answers ensured that enough pairings were learned well enough to be detectable in explicit/implicit behavioral and imaging measures. On the other hand, we wanted to avoid a ceiling effect, to be able to investigate pseudowords whose meaning was not learned to criterion. Thus, the total number of trials was not fixed and depended on the learners' pace of learning. Left/right assignment of "correct" and "incorrect" answers to reponse buttons was counterbalanced across participants. All participants finished within an hour (44 min on average). Presentation font size was 48, on a 15- monitor. All stimuli were presented centered, in white against a black background. All trials began with a fixation cross (500 ms), followed by a pseudoword (1000 ms). Another fixation cross (300 ms) and a picture (1000 ms) followed. Afer 3000 ms, a red exclamation mark ended the trial, providing sufficient time for the subjects to decide whether pseudoword and pictured object matched. If no answer was given, the next trial was initiated. If a button was pressed within the 3000 ms interval, the next trial began immediately. The training and the fMRI-paradigm for this study (described below) were programmed with Presentation<sup>R</sup> Software2 (Version 12.1, Neurobehavioral Systems, Inc., Albany, CA, USA).

Explicit knowledge of all picture-pseudoword pairings was assessed via cued-recall. Subjects were presented with the pseudowords in written format (cues) and were asked to write down the corresponding German word (comparable with a translation or vocabulary test). The cued-recall test was administered four times: directly after, 4 days after, 8 days after, and 12 days after learning. A pseudoword-valence rating assessed the transfer of valence from objects to pseudowords. Subjects were asked to spontaneously and intuitively rate the pseudowords in terms of valence, on a scale ranging from

<sup>2</sup>www*.*neurobs*.*com

minus five (very negative) to five (very positive), with zero marked as neutral. The valence rating was administered five times: directly before, directly after, 4 days after, 8 days after, and 12 days after learning. Note that the last three cued-recall tests and the last three valence ratings were carried out online3 , while all assessments on the first day took place in the Institute for Biomagnetism and Biosignalanalysis (Faculty of Medicine, University of Münster). Participants received written instructions but were not informed that their memory for the pseudowords would be tested.

The fMRI measurement took place in the Department of Clinical Radiology (Faculty of Medicine, University of Münster) 2 days after the training, allowing for memory consolidation through sleep (Payne and Kensinger, 2010, 2011; Bennion et al., 2013). The fMRI measurement used a block design, with six blocks, crossing valence (arousing-negative, neutral) with learning achievement (explicitly learned, less well-learned) and two additional blocks with completely novel pseudowords. Since learning outcome varied between participants (i.e., which picture-word pairs a participant had learned), the respective blocks were individually arranged for every participant. For this purpose, the pseudowords were divided into the eleven best, i.e., "explicitly learned" and the eleven remaining "less welllearned" pseudowords, individually for each valence condition. This was done on the basis of each participant's results from the cued-recall test directly after training. The novel pseudowords, taken from the corpus of Breitenstein and Knecht (2002), had not been used during training. Each block constisted of 11 pseudowords. The presentation format was the same as during training. Each pseudoword was presented for 950 ms, with a fixed interstimulus interval of 150 ms. The six blocks (explicitly learned arousing-negative pseudowords, explicitly learned neutral pseudowords, less well-learned arousing-negative pseudowords, less well-learned neutral pseudowords, and two blocks of novel pseudowords) were presented in a pseudorandomized order, to control for sequence effects. A 12500 ms resting phase (white fixation cross centered on a black screen) followed each block. Each block was presented twice, resulting in 26 instances in each learning condition, and 52 instances of novel pseudowords. In all, the paradigm took approximately 9 min. The stimuli were projected onto a screen at the rear end of the MR tunnel, using a beamer shielded against RF interference. Participants were instructed to read the words attentively. No further instruction was given.

# Image Acquisition

Magnetic resonance imaging scanning was performed on a 3 T whole-body scanner (Gyroscan Intera T3.0, Philips Medical Systems, Best, Netherlands) equipped with Quasar Dual gradients (nominal gradient strength in the setting used for fMRI 40 mT/m, maximal slew rate 200 mT/m/ms). For spin excitation and resonance signal acquisition, a circularly polarized transmit/receive birdcage head coil with an HF reflecting screen at the cranial end was used. T2∗ functional data were acquired using a single-shot echo planar (EPI) sequence (whole brain coverage, TE = 30 ms, TR = 2.5 s, FA = 90◦, slice thickness 3.6 mm, interleaved acquisition order, no gap, matrix 64 × 64, FOV 230 mm, in-plane resolution 3.6 mm × 3.6 mm). The 40 transversal slices were tilted 25◦ from the AC/PC line in order to minimize drop out artifacts in the orbitofrontal and mediotemporal region.

## Cued-Recall Analyses

Answers in the cued-recall test (translation of pseudowords into German) were treated as correct if they described the intended object (e.g., sofa), were synonyms (e.g., couch), or subordinate-category responses that were correct descriptions of the depicted object (e.g., chesterfield). Responses were regarded incorrect if they described the superordinate category (e.g., furniture), semantically related objects (e.g., armchair), or unrelated objects (e.g., scissors). Incorrect answers and misses (no answer given) were excluded from further analyses. Means of correctly translated pseudowords were subjected to an ANOVA with the additional factor *session*, with four levels (immediately after, 4, 8, and 12 days after). This resulted in a 2 (*pseudoword affect*: arousing-negative versus neutral) × 4 (*session*) × 2 (*trait anxiety*: high versus low) mixed within/between design.

# Valence-Rating Analyses

The factor *session* had five levels in the analysis of pseudoword valence ratings: before, immediately after, 4, 8, and 12 days after training. Mean valence ratings were calculated for arousing-negatively and neutrally linked pseudowords. With a 2 (*pseudoword affect*) × 5 (*session*) × 2 (*trait anxiety*) mixed within/between design, the development of valence ratings was investigated over time.

### Image Analysis

The imaging data were analyzed with the Statistical Parametric Mapping software4 (SPM8, Wellcome Department of Cognitive Neurology, London, UK) implemented in Matlab (Mathworks Inc., Natick, MA, USA). Preprocessing included unwarping, realignment and normalization to the standard MNI space (Montreal Neurological Institute). Smoothing was conducted with an isotropic three-dimensional Gaussian filter with a Gaussian kernel of 6 mm full width at half maximum (FWHM). Afterward, on the first level, we applied a general linear model to the data (modeled with the canonical hemodynamic response function). The conditions were: explicitly learned arousing-negative, explicitly learned neutral, less well-learned arousing-negative, less well-learned neutral, and novel. The eight contrasts of interest were: (1) Explicitly learned pseudowords vs. novel pseudowords, (2) Less well-learned pseudowords vs. novel pseudowords, (3) Arousing-negative pseudowords vs. neutral pseudowords, (4) Explicitly learned arousing-negative pseudowords vs. less welllearned arousing-negative pseudowords, (5) Less well-learned arousing-negative pseudowords vs. less well-learned neutral

<sup>3</sup>http://www*.*limesurvey*.*org/

<sup>4</sup>www*.*fil*.*ion*.*ucl*.*ac*.*uk/spm

pseudowords, (6) Less well-learned neutral pseudowords vs. novel pseudowords, (7) Less well-learned arousing-negative pseudowords vs. novel pseudowords. (8). Neutral pseudowords vs. novel pseudowords. These contrasts were also analyzed in a second analysis, taking into account the individual trait anxiety score as a covariate (regression analysis). Please note that due to the criterion-based design and the differentiation into explicitly learned and less well-learned words, there are to the best of our knowledge no (fMRI) studies with a comparable orientation. Thus, the contrasts investigating such differences (i.e., contrasts 2, 5, and 6) are more exploratory in nature.

To control for multiple testing on the second level (group) random-effects analysis, all group results were calculated using a combined height and extent threshold based on Monte–Carlo simulations, as implemented in the AlphaSim program (Forman et al., 1995). Based on this technique, we maintained a corrected false-positive detection rate for the amygdala, our region of interest (ROI) analysis, at *p <* 0.05, with a cluster extent (k) empirically determined by computing 1000 simulations (yielding *k* = 45 for the bilateral amygdala).

According to our hypotheses, ROI analyses of the bilateral amygdala were performed for all contrasts by one-sample *t*-tests, including all individual contrast maps of the first level. For this purpose, a mask for the bilateral amygdala was created with the aid of the WFU PickAtlas (Maldjian et al., 2003) implemented in the SPM-software. The defined mask was dilated (according to the AAL Atlas (Tzourio-Mazoyer et al., 2002) by 1 mm in radius. The regression analysis tested our *a priori* hypothesis concerning the relation between amygdala activity and degree of trait anxiety in the contrasts defined above. Voxelwise tests inside the ROI were performed and activity within the amygdalae was correlated with STAI-T (trait anxiety) scores separately for each subject.

# Results

# Cued-Recall

**Figure 1** displays the recall rates (correct translation) immediately after, 4, 8, and 12 days after training, for both participant groups and pseudoword affects. Note that performance is displayed in percentage correct, while statistical analyses were done on absolute values (maximum = 44). As expected, the brief and shallow training yielded moderate recall results. These were highest in the first session immediately after training (about 25% correct translations), decreased to around 15% correct translations by the second session 4 days later, but remained stable in session three and four (8 and 12 days after learning). The main effect of *session* was significant: *F*(3,96) = 50.655; *p <* 0.001, and is best explained by a linear effect: *F*(1,32) = 56.493; *p <* 0.001. Overall, pseudowords linked with arousing-negative pictures were recalled significantly better than neutrally linked ones, which is reflected in a main effect for *pseudoword affect*, *F*(1,32) = 4.347; *p* = 0.045. The three-way interaction between *pseudoword affect*, *session*, and *trait anxiety* also reached significance *F*(3,96) = 6.523; *p* = 0.013.

Additional ANOVAS (*word affect* × *session*), separately for each group, further investigated this interaction. The

data of the low-anxious group showed a main effect of *session F*(3,48) = 22.690; *p <* 0.001, and the interaction *pseudoword affect* × *session*, *F*(3,48) = 8.843; *p* = 0.005. The main effect of *pseudoword affect* did not reach significance *F*(1,16) = 1.652; *p* = 0.217. *Post hoc t*-tests calculated to assess the interaction yielded a significant difference between arousingnegative (Mean = 11.06, SD = 5.910) and neutral pseudowords (Mean = 9.29, SD = 5.565) at session one [*t*(16) = 2.624; *p* = 0.018]. No other session showed such a difference (all *p >* 1.444). The data for the high-anxious group also yielded a main effect of *session*: *F*(3,48) = 28.448, *p <* 0.001. No other main effects or interactions reached significance [*word affect*: *F*(1,16) = 2.698; *p* = 0.120 and *word affect* × *session*: *F*(3,48) = 0.255; *p* = 0.857].

### Pseudoword Valence Rating

**Figure 2** displays the mean valence ratings separately for participant groups and pseudoword affect, in all five sessions. The ANOVA with *session*, *pseudoword affect*, and *trait anxiety* yielded no main effect of *pseudoword affect*, *F*(1,32) = 1.872; *p* = 0.506. The rating behavior toward a more negative rating changed significantly over time, indicated by a main effect of *session F*(4,128) = 10.335; *p <* 0.001, best described as a linear trend, *F*(1,32) = 7,823; *p* = 0.009. Although the means suggest a difference between negative and neutral pseudowords for the high-anxiety group, no other main effects or interactions reached significance. (Note: An ANOVA on the same valence data where ratings of the first session were subtracted from ratings given at the second and third session (baseline correction), yielded qualitatively the same results.)

# fMRI Results

## Region of Interest Analysis Regarding Amygdala Responsiveness to Pseudowords

With contrasts one and two, we investigated a general learning effect. As expected, in contrast 1, explicit pseudowords elicited more amygdala activity than novel pseudowords, bilaterally *x* = 31, *y* = −10, *z* = −14, *t*(33) = 2.26, *k* = 56 voxels, *p* = 0.015 corrected. Contrast 2 was not significant, showing similar amygdala activity for less well-learned pseudowords and novel pseudowords. Contrast 3 tested for effects of pseudoword affect, independent of learning success. As expected, arousingnegative pseudowords elicited more amygdala activity than neutrally linked pseudowords *x* = −22, *y* = −8, *z* = −9, *t*(33) = 3.19, *k* = 108, *p* = 0.002 corrected. Contrast 4 tested for effects of explicit learning. There was no difference in amygdala activity between explicitly learned and less welllearned arousing-negative pseudowords. With contrasts five to seven, effects of acquired affect were investigated for pseudowords that are not well learned or remembered. As expected (and in line with contrast 3), contrast 5 showed that arousing-negative pseudowords elicited more bilateral amygdala reactivity than neutral pseudowords *x* = −30, *y* = 4, *z* = −14, *t*(33) = 2.45, *k* = 71, *p* = 0.010. However, contrast 6 (less well-learned neutral pseudowords vs. novel words), contrast 7 (less well-learned arousing-negative pseudowords vs. novel pseudowords) and contrast 8 (neutral pseudowords vs. novel pseudowords) did not yield significant results. (See **Table 1** for a clear arrangement of all results above).

pseudowords in dashed bars. Error bars represents 1 SE.


TABLE 1 | Region of interest analysis regarding amygdala responsiveness to pseudowords.

*Conducted at p < 0.05, uncorrected (corrected at p < 0.05 on the cluster level using the AlphaSim procedure, which resulted in an empirically determined cluster-extent threshold of k* = *45 voxels). Coordinates are given in MNI space.*

#### Trait Anxiety as Covariate

The ROI-analysis of the bilateral amygdala with trait anxiety as covariate revealed a significant contrast 1 (explicitly learned pseudowords vs. novel pseudowords, see **Figure 3**). Explicitly learned pseudowords elicited more amygdala activity than novel pseudowords in the bilateral amygdala, and this effect was positively related to measures of trait anxiety *x* = 32, *y* = 5, *z* = −21, *t*(32) = 2.62, *k* = 111 voxels, *p* = 0.007. Importantly, and contrary to the analysis without covariate, contrast 2 (less well-learned pseudowords vs. novel pseudowords) also yielded significant results *x* = 31, *y* = 2, *z* = −19, *t*(32) = 2.42, *k* = 77 voxels, *p* = 0.011, positively related to trait anxiety. Contrast 3 (arousing-negative pseudowords vs. neutral pseudowords) was also significant *x* = 34, *y* = −7, *z* = −11, *t*(32) = 2.74, *k* = 55 voxels, *p* = 0.005. Hence, arousing-negative pseudowords elicited more amygdala activity than neutral pseudowords, and this was positively related to trait anxiety. As in the analysis without covariate, contrast 4 (explicitly learned arousing pseudowords vs. less welllearned arousing pseudowords) was not significant. Contrast 5, comparing arousing-negative and neutral pseudowords that were not learned well, was significant *x* = 32, *y* = −7, *z* = −9, *t*(32) = 2.16, *k* = 47 voxels, *p* = 0.019. The arousing-negative pseudowords elicited more amygdala reactivity than the neutral pseudowords, again, positively related to measures of trait anxiety. Contrast 6 (less well-learned neutral pseudowords vs. novel pseudowords), however, was not significant. But contrast 7 (less well-learned arousing-negative pseudowords vs. novel pseudowords) and contrast 8 (neutral pseudowords vs. novel pseudowords), were highly significant (contrast 7: *x* = 31, *y* = 0, *z* = −23, *t*(33) = 3.25, *k* = 207 voxels, *p <* 0.001; contrast 8: *x* = 31, *y* = 4, *z* = −19, *t*(32) = 3.39, *k* = 116 voxels, *p <* 0.001.

Hence, pseudowords that were less well learned elicited more amygdala reactivity than novel pseudowords, independent of the linked affect. This was positively related to measures of trait anxiety. See **Table 2** for an overview of all regression analysis results and **Figure 3** for a visualization of the spatial extension of amygdala activation of contrast 1.

Note that the significant contrasts showed cluster within the amygdala's VOIs after correction for multiple testing (see **Tables 1** and **2**). Subnuclei of the amygdala could not be distinguished by our methods.

# Discussion

We analyzed the development of a memory bias for novel and neutral stimuli before and after a learning phase, during which the pseudowords were paired with pictures with arousingnegative or neutral content. High and low trait anxious persons participated. Our results demonstrate that a brief training in an associative learning paradigm suffices to elicit a memory bias for those pseudowords that were combined with arousing, negative pictures, compared to those that were linked to neutral pictures. This bias became evident in behavioral as well as in fMRI measures. A cued-recall translation test, administered directly after and at four 4-day-intervals following training, showed better recall for pseudowords that had been combined with arousing-negative content. However, the valence ratings showed no differences between arousing-negative and neutral pseudowords. The fMRI measurement that took place 2 days after learning showed a hyperactivation of the amygdala in response to the arousing-negative pseudowords, indicating that these stimuli were processed differently from neutrally linked pseudowords.


As expected, very few (i.e., six to nine) correct and incorrect pairings in an associative learning paradigm resulted in a memory bias for pseudowords that were paired with negativearousing pictures. This, once again, shows the effectiveness of this paradigm for word learning (Dobel et al., 2009, 2010; Eden et al., 2014). After learning participants showed a memory bias in the cued-recall test. Thus, all participants, independent of their level of trait anxiety, were better able to translate pseudowords that had been combined with arousing-negative content than pseudowords linked with neutral content. This replicates earlier findings, where participants were better able to memorize stimuli with aversive emotional content (e.g., Mitte, 2008). Here, as in our earlier work (Eden et al., 2014), we observe this advantage even after a very brief associative training. However, the same effect was not found in the (implicit) valence rating. Participants rated arousing-negative pseudowords more negative than neutral ones, but this difference did not reach significance (see **Figure 2**). Thus, participants showed an explicit but no implicit memory


*Conducted at p < 0.05, uncorrected (corrected at p < 0.05 on the cluster level using the AlphaSim procedure, which resulted in an empirically determined cluster-extent threshold of k* = *45 voxels). Coordinates are given in MNI space.*

bias. This is in line with the results of a meta-analysis by Mitte (2008), who indeed showed that implicit memory effects are seldom found. Some authors even question the existence of an implicit memory bias (e.g., McCabe, 1999; Russo et al., 1999). An explanation why explicit tests yield the memory bias but the implicit ones do not, could be as follows. According to Scott et al. (2009), valence features are part of the semantic representation of words. Hence it can be assumed that the activation of such features is required in explicit tasks such as cued-recall. In this task, participants had to perform a one-toone mapping of a pseudoword to an existing German word. In contrast, an implicit task such as the valence rating applied here does not require the activation of German words, with their semantic and valence features. This may explain why implicit memory bias effects for words are so much harder to detect than explicit ones. Note that we did obtain implict bias effects in our earlier study (Eden et al., 2014), which had more power in terms of items and partipants. The missing effect in the valence rating might at first glance invite to speculate about the processing depth of learned words and seems to suggest a shallow encoding. However, results of former studies from our group that used very similar associative word-learning paradigms (e.g., Breitenstein et al., 2007; Dobel et al., 2009, 2010; Liuzzi et al., 2010) strongly suggest that meaning is indeed acquired for the pseudowords. Breitenstein et al. (2007) showed with crossmodal priming that learned pseudowords primed existing words related to their acquired meaning as effectively as nativelanguage words. This was corroborated by Dobel et al. (2010), who applied magnetencephalography (MEG). They showed that the N400 component [an indicator for semantic (mis)matches between word and picture] to pictures was strongly reduced when they were preceded by pseudowords whose acquired meaning corresponded to the pictured concept. Based on these and other findings, we feel confident that the learning effects found in the present study truly reflect semantic/emotional learning, and that the pseudowords do not simply present superficial mnemonic cues to existing words and/or corresponding pictures.

We now turn to the fMRI data, for which we carried out two analyses: the first analysis served to investigate the general effect of memory bias, and the second to additionally assess the influence of trait anxiety. With respect to the first analysis, we observed general effects of learning. Explicitly learned pseudowords elicited more amygdala activity than novel pseudowords, showing that only few repetitions of pseudowordpicture pairs sufficed to activate the amygdala, even 2 days after learning. This amygdalar hyperactivation was seen for explicitly learned pseudowords only, not for the set of "not so well learned" pseudowords (the worst 11 for each participant). This suggests that they either were not learned at all, or that their superficial learning history leads to different processing or storage – evident in explicit recall and amygdala reactivity. The third contrast investigated pseudoword affect, more precisely whether arousing-negative pseudowords generally elicited more amygdala activity than neutral ones. Importantly, this was the case, when looking for explicitly and less well learned pseudowords together. This effect corroborates the memory bias found in prior studies (for a review, see Mitte, 2008) and the main effect in the

cued-recall data of the current study. Of interest is whether this neural correlate for a memory bias is driven solely by the explicitly recalled items. This, as contrast 4 showed, was not the case: Explicitly learned and less well-learned arousingnegative pseudowords elicited equal amygdala reactivity. This clearly goes against the suggestion that nothing is learned when stimuli cannot be recalled explicitly. Clearly, aversive stimuli have similar amygdalar effects, independent of their level of explicit recall. To explore this further, we contrasted less welllearned arousing-negative and neutral pseudowords (contrast 5), and observed more amygdala reactivity for the arousing-negative stimuli. This corroborates the above finding that affect is indeed acquired for these pseudowords, even though participants could not explicitly translate these words very well. These findings corroborate the observed amygdala sensitivity for emotionally arousing stimuli – even in the absence of explicit memory (e.g., Dannlowski et al., 2007a,b; Pichon et al., 2012; Suslow et al., 2013). Note, however, that the overall amygdala activation of these less well-learned pseudowords did not differ from the activation for completely novel pseudowords. This contrast was thus only significant for explicitly learned stimuli. There is ample evidence that completely novel stimuli are processed differently from items that have been seen before. Repetition decreases amygdala activity, and only explicitly learned and recalled stimuli overcome this repetition suppression (e.g., Ishai et al., 2004; Wendt et al., 2011). Finally, contrast 8 was implemented to investigate effects of generalization, comparing all neutrally linked pseudowords (explicitly and less welllearned) with all novel pseudowords. Both conditions elicited about equal amygdala activity. This provides no evidence for generalization. Neutral words were not associated or confounded with negative affect, and that the hyperactivation of the amygdala for arousing-negative pseudowords was indeed due to this negative affect.

The second fMRI-analysis, with the factor trait anxiety as a covariate, revealed the following. As expected and as in the first fMRI-analysis, explicitly learned pseudowords elicited more amygdala reactivity than novel pseudowords in the bilateral amygdala, and this effect was positively related to measures of trait anxiety. The effect increases with increasing levels of trait anxiety, showing that persons with higher levels of trait anxiety process stimuli, gathered in emotionally arousing situations, differently from persons with low levels of anxiety (e.g., Tolkunov et al., 2010; Asakawa et al., 2014; Burgess et al., 2014). Different from analysis one, the contrast was also significant for less well-learned pseudowords vs. novel pseudowords. This lends support to the amygdala's sensitivity (i.e., hyperreactivity) in persons with high levels of trait anxiety. Contrast 3 investigated the occurrence of a memory bias. As expected and as in analysis one, the arousing-negatively linked pseudowords elicited more amygdala activity than the neutrally linked pseudowords, supporting similar prior studies (Laeger et al., 2012, 2014a,b). Contrast 4, comparing the explicitly and less well-learned arousing-negative pseudowords, was – again – not significant. Thus, even when taking into account the trait anxiety levels, the amygdalae of our participants did not differentiate between explicitly and less well-learned pseudowords when both had been combined with negative content. Contrast 5 again corroborated that even the less well-learned arousing-negative pseudwords elicited more amygdala reactivity than less welllearned neutral pseudowords. In addition, this effect is the more pronounced the higher the level of trait anxiety. This shows that our high trait-anxious participants processed the arousingnegative words differently from the neutral words, although they could not translate these pseudowords very well. As in the first analysis, contrast 6 showed no differences between less well-learned neutrally linked pseudowords and completely novel pseudowords. Differently from analysis one, contrast 7 did reveal an effect. Less well-learned arousing-negative pseudowords elicited stronger amygdala reactivity than novel pseudowords, and the effect was stronger the higher the level of trait anxiety. This finding, once again, supports the sensitivity of trait-anxious persons for emotionally aversive stimuli, even if distinct explicit memory for these is not present. Contrast 8, investigating potential generalization effects, turned out to be significant. In contrast to analysis one, neutral pseudowords elicited more amygdala activity than novel pseudwords, and this effect was stronger with higher levels of trait anxiety. We believe that this is especially interesting, since it shows the down side of sensitivity in situations with aversive content. This sensitivity, that is, the hyperactivation of amygdalae in response to aversive situations/stimuli, may have evolved from evolutionary mechanisms, to protect humans from getting killed in dangerous situations. However, the final contrast shows that the amygdala of highly trait anxious persons overreacts in response to neutrally linked stimuli. This is an effect of generalization, found in earlier studies (e.g., Eden et al., 2014; see Resnik and Paz, 2014 for an animal model of the underlying mechanisms of generalization) and probably results from a transfer of aversion from arousing-negative stimuli to neutral stimuli during learning.

In the current study, we tried to overcome some criticism to earlier studies. Behavioral testing was done at various points after learning, to assess consolidation over time. We combined behavioral and imaging measures and differentiated between explicit and more implicit ("less well-learned") items. We showed that the behavioral data do not change much from the second measurement onward. Explicit recall is better, overall, for negatively paired stimuli, but this effect is not significant in highanxious individuals. The fMRI measures revealed an interesting pattern of results, with (1) more amygdala activation for explicitly learned than for novel stimuli; (2) evidence for memory bias, with more activation for pseudowords that were combined with negative content than for those paired with neutral content, (3) evidence that this memory bias was independent of the explicit learning success, but (4) dependent on trait-anxiety measures and (5) a difference in amygdalar activation between less welllearned pseudowords and completely novel ones that depended on trait-anxiety measures.

We would like to point out some limitations, caveats and open questions that should be addressed in future studies. First, we investigated two extreme groups. This well-established approach does not allow making predictions or drawing conclusions about the memory bias in persons with moderate levels of trait anxiety. Thus, we recommend that future studies integrate a third group of persons with moderate anxiety levels, or use a design that takes individual (trait) anxiety scores into account, as was done in the analysis of our fMRI data. Second, although we controlled for individual learning histories concerning the items relevant for behavioral and fMRI analyses, by use of valencefree pseudowords, we could not control individual differences concerning the pictures used to link negative-aversive or neutral valence to the pseudowords. Hence, despite the pre-test, the pictures might have evoked varying emotions to varying degrees in our participants, which results in unknown variance in acquired emotionality of the pseudowords. This problem is common to all studies that use neutral and aversive stimuli, and an assessment of the stimuli by the study participants themselves may be of help. Third, in the behavioral part of this study we used similar stimuli and measurements as in our earlier study (Eden et al., 2014), but did not exactly replicate the results. In Eden et al. (2014), participants rated all arousing-negative stimuli more negative than neutral ones, and ratings differed between highand low anxious individuals. Differences between the studies concern the number of items (44 instead of 60) and participants (34 instead of 54), and repetitions during training (7–12; i.e., 8.25 on average instead of 5). In both studies, recall is better for negative than for neutral pseudowords, but the interaction with participant group shows a different pattern. Power differences might be responsible for these differences. Next, in contrast to our earlier study, we observed no significant differences between neutrally and negatively paired stimuli in the valence ratings of high-anxious individuals. Again, the patterns are similar but there is less power in the current study. The differing results in two similar valence ratings actually stress that the implicit memory bias is indeed hard to replicate and not robust (for a meta-analytic review, see Mitte, 2008). Fourth, there is generally a positive correlation between (trait) anxiety and depression. Both anxiety and depression have been associated with a failure to adequately regulate the amygdala via top–down mechanisms (Johnstone et al., 2007). Thus, effects reported in anxiety research might partly be due to a potential depression, and vice versa. This is the reason why we used the reliable and well-validated M.I.N.I interview into our study, ensuring that none of our participants (ever) suffered from an affective disorder. However, the M.I.N.I is a dichotomous tool (a disorder is present or not). We thus cannot rule out the existence of subclinical depression, although none of our participants showed any signs. We suggest that future studies additionally apply a continuous measurement that reveals the intensity of a potential subclinical depression [such as the Beck Depression Inventory (BDI) or the Hamilton Depression Scale (HAMD), Hamilton, 1960; Beck et al., 1961]. With such measures, potential effects can be more clearly attributed to affective or anxiety disorders. Furthermore, the current study only focuses on the negativity bias and neglects the so-called positivity bias, a self-serving attribution bias that represents a well-attested and robust phenomenon in human cognition (e.g., Bradley, 1978; Zuckerman, 1979; Campbell and Sedikides, 1999). In their meta-analytic review, Mezulis et al. (2004) investigated numerous samples and showed that the bias was smallest for anxiety and depression patients. Since the sample of our study consists of highly trait anxious (potentially small positivity bias) and highly non-anxious persons (potentially large positive bias), it would be an interesting research issue to assess the extent of this bias and the difference between the two groups. However, this was not the aim of our study. We followed a strict hypothesis-driven approach and compared only negative arousing and neutral stimuli. Future studies might include positive arousing stimuli into the paradigm. This would of course lengthen the learning phase, and might thus considerably change the implicit results. Besides, the current study did not investigate mood-congruency effects. Mood congruency describes the phenomenon that emotional information congruent with the current mood is more likely to be recalled than information that is incongruent with the current mood (Bower, 1981). In many studies it has been shown that mood-congruent depressive information is likely to be recalled by persons in a depressed mood (for a meta-analytic study on explicit recall, see Matt et al., 1992; for a meta-analytic study on implicit recall, see Gaddy and Ingram, 2014). Patients suffering from major depressive disorder exhibited preferential recall of negative stimuli, dysphoric persons did showed no preferred recall of negative or positive stimuli, and healthy controls tended to recall positive stimuli. With the current design, we cannot decide whether non-clinical mood congruency processing played a role, because we did not implement a mood measurement. Given that anxiety and depression are highly correlated and in the absence of mood information, we cannot rule out that a depressive, an anxious or any other kind of negative mood is partially responsible for the obtained effects. A second kind of mood congruency (a congruency between behavior/symptoms and mental disorders) is certainly at stake in the study at hand. The people in our sample showed no clinical symptoms, but since trait anxious persons exhibit a greater risk to develop anxiety disorders, it is very likely that the effect found in our study can be traced back to this "anxious mood." Furthermore, we performed an fMRI measurement with a blocked design, because it is sensitive to small effects. Another improvement of future studies would be the use of an eventrelated design.

Finally, it is yet unclear how emotion and feelings are implemented and operated at the level of words, and how emotional information conveyed by words modulates and

# References


regulates emotional experience. These questions are currently under debate (see several contributions to this Frontiers research topic). However, what we do know and were able to show here is that persons with high levels of trait anxiety exhibit dysfunctional learning and memory mechanisms for affective verbal stimuli. This might originate from evolutionary shaped, adaptive behavior that maximizes chances of survival due to withdrawal from potentially threatening situations. However, with the tremendous changes in many modern societies during the last centuries, the advantage of this sensitivity diminishes continuously. Today, highly anxious persons, who possess this sensitivity as a character trait, primarily suffer from a higher probability to develop anxiety disorders, the most prevalent class of all psychological disorders (e.g., Kessler et al., 2005). As suggested by others (e.g., Lissek, 2012; Resnik and Paz, 2014), neural hyperactivity to items with only brief learning histories (that are explicitly not well remembered), together with generalization, might be underlying mechanisms in the development of anxiety disorders. As we hope to have illustrated here, learning of emotional words constitutes an important and experimentally well-controlled approach to investigate this further. To support individual health and to prevent high burden on health care systems, it is crucial to better understand the processes and mechanism that underlie the development of anxiety disorders, and to identify persons at risk.

# Acknowledgments

We acknowledge support for the article processing charge by the Deutsche Forschungsgesellschaft and the Open Access Publication Fund of Bielefeld University. Research was supported by the Interdisciplinary Center for Clinical Research (Do3/021/10). We are grateful to Cornelia Herbert and two reviewers for their helpful comments and suggestions.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fpsyg*.* 2015*.*01226


microstructure of fibers between amygdala and prefrontal cortex. *J. Neurosci.* 35, 6020–6027. doi: 10.1523/JNEUROSCI.3659-14.2015


using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. *Neuroimage* 15, 273–289. doi: 10.1006/nimg.2001.0978


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Eden, Dehmelt, Bischoff, Zwitserlood, Kugel, Keuper, Zwanzger and Dobel. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Contributions of emotional state and attention to the processing of syntactic agreement errors: evidence from P600

*Martine W. F. T. Verhees1\*, Dorothee J. Chwilla1, Johanne Tromp1,2 and Constance T. W. M. Vissers1,3*

*<sup>1</sup> Centre for Cognition, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands, <sup>2</sup> Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands, <sup>3</sup> Kentalis Academy, Sint-Michielsgestel, Netherlands*

#### *Edited by:*

*Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Germany*

#### *Reviewed by:*

*Yang Zhang, University of Minnesota, USA Thomas C. Gunter, Max Planck Institute for Human Cognitive and Brain Sciences, Germany*

#### *\*Correspondence:*

*Martine W. F. T. Verhees, Centre for Cognition, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Montessorilaan 3, Nijmegen 6525HR, Netherlands martine.verhees@gmail.com*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

> *Received: 14 November 2014 Accepted: 18 March 2015 Published: 09 April 2015*

#### *Citation:*

*Verhees MWFT, Chwilla DJ, Tromp J and Vissers CTWM (2015) Contributions of emotional state and attention to the processing of syntactic agreement errors: evidence from P600. Front. Psychol. 6:388. doi: 10.3389/fpsyg.2015.00388* The classic account of language is that language processing occurs in isolation from other cognitive systems, like perception, motor action, and emotion. The central theme of this paper is the relationship between a participant's emotional state and language comprehension. Does emotional context affect how we process neutral words? Recent studies showed that processing of word meaning – traditionally conceived as an automatic process – is affected by emotional state. The influence of emotional state on syntactic processing is less clear. One study reported a mood-related P600 modulation, while another study did not observe an effect of mood on syntactic processing. The goals of this study were: First, to clarify whether and if so how mood affects syntactic processing. Second, to shed light on the underlying mechanisms by separating possible effects of mood from those of attention on syntactic processing. Event-related potentials (ERPs) were recorded while participants read syntactically correct or incorrect sentences. Mood (happy vs. sad) was manipulated by presenting film clips. Attention was manipulated by directing attention to syntactic features vs. physical features. The mood induction was effective. Interactions between mood, attention and syntactic correctness were obtained, showing that mood and attention modulated P600. The mood manipulation led to a reduction in P600 for sad as compared to happy mood when attention was directed at syntactic features. The attention manipulation led to a reduction in P600 when attention was directed at physical features compared to syntactic features for happy mood. From this we draw two conclusions: First, emotional state does affect syntactic processing. We propose mood-related differences in the reliance on heuristics as the underlying mechanism. Second, attention can contribute to emotion-related ERP effects in syntactic language processing. Therefore, future studies on the relation between language and emotion will have to control for effects of attention.

Keywords: emotion, mood, syntactic processing, P600, attention, levels of processing

# Introduction

Emotions have an influence on how we see the world. Consider the emotion fear. Fear spreads through the human body and brain and it urges us to take actions for flight or defense. Furthermore, fear directs our attention to signs of danger or safety in the environment. The brain has, so to speak, been simplified to respond to danger and this mode is turned on by the emotional signal. The emotions we experience thus color the way we perceive the world around us. The aim of the present study was to investigate the relationship between emotional state, attention and language processing.

Emotional state, or mood, refers to a generalized affective state that is not directed at objects or events and has been proposed to be less intense and longer-lasting than emotions (Frijda, 1986; Morris, 1989; Isen, 1993; Watson and Clark, 1994; Oatley et al., 2006). Emotional state has been shown to influence perception, thinking and decision making, and the notion of mood-dependent processing styles is generally agreed upon in the emotion literature (Levenson, 1994; Fredrickson and Branigan, 2005). Positive mood is associated with flexibility of thinking and with a more global, category level information processing style. People in a good mood use heuristics (highly economical perceptual strategies), rely on their world knowledge and on their usual routines (e.g., Isen, 2001; Gasper and Clore, 2002). Negative mood, on the other hand, is associated with a more local, bottom–up, analytic and systematic information processing style, in which people rely less on heuristics and have a narrowed focus of attention (Gasper and Clore, 2002; Schwarz, 2002).

Attention is a processing system with limited capacity that can allocate its resources flexibly to one or more tasks (Kahneman, 1973). Several studies have shown that emotion influences attentional processing (e.g., Vuilleumier, 2005; see for an overview: Kaspar and König, 2012). For example, the internal state of a person (including emotional state) has been found to have an influence on visual attention (Kaspar and König, 2012). Less investigated, however, is whether attention also has an influence on emotion.

Relevant for the present study, recent fMRI studies have shown that emotion and cognition (memory, attention, language) strongly interact in the brain and that the neural basis of emotion and cognition should be viewed as interactive and non-modular (e.g., Pessoa, 2008). Likewise, research of language in interaction with other systems has revealed that processes of language comprehension do not operate in isolation but are affected by perception and action (see for embodied approaches to cognition, e.g., Glenberg, 1997; Barsalou, 1999; Pecher and Zwaan, 2005 and for event-related potential (ERP) support, e.g., Chwilla et al., 2007; Chwilla, 2013). More recently the relationship between emotion and language processing has been examined and interactions between language and a person's emotional state have been reported (e.g., Federmeier et al., 2001; Vissers et al., 2010; Chwilla et al., 2011; Pinheiro et al., 2013; Van Berkum et al., 2013). These interactions of language with other systems are of theoretical importance because they call into question modular theories of language and support interactive theories of language.

The focus of this article is on the relative contribution of emotional state and attention on the processing of syntactic anomalies. The ERP method is used to track effects of emotional state and attention on syntactic language processing online. An advantage of ERPs is that they have an excellent temporal resolution at the level of milliseconds, which allows to assess the effects of emotion and attention on language processing in real time. To our knowledge, the relationship between emotional state, attention and syntactic language processing has not yet been investigated.

Two for the present study language-relevant ERP components are the N400 and the P600. The N400 is a negative-going brain wave peaking around 400 ms after critical word onset and is highly sensitive to semantic processing (see for a review: Kutas and Federmeier, 2011). The P600 component is a positive shift peaking at around 600 ms after critical word onset. P600 has been shown to be sensitive to several syntactic anomalies and ambiguities (see e.g., Osterhout and Holcomb, 1992; Coulson et al., 1998a and for a review: Kutas et al., 2006). For instance, an increase in P600 amplitude has been observed after subject–verb agreement violations (e.g., Hagoort et al., 1993) and after phrase structure violations (e.g., Hahne and Friederici, 1999). More recently, it has been shown that a P600 can also occur to semantic anomalies in syntactically unambiguous sentences (see for reviews: Kuperberg, 2007; Van de Meerendonk et al., 2009). The focus of the current study is on the effect of syntactic violations, specifically subject–verb agreement errors, on P600. The typical finding is that P600 amplitude to syntactically incorrect words is larger (i.e., more positive going) than to syntactically correct words. This difference in P600 amplitude to incorrect vs. correct sentences is referred to as the P600 effect (Osterhout and Holcomb, 1992).

Event-related potential studies concerning the relationship between emotional state and language processing have consistently shown that emotional state influences semantic processing. In particular, effects of mood have been found on the use of semantic memory during reading (Federmeier et al., 2001; Pinheiro et al., 2013), on the standard N400 cloze probability effect (Chwilla et al., 2011) and on semantic integration in discourse comprehension (Egidi and Nusbaum, 2012).

Whether emotional state impacts syntactic language processing is a matter of debate. Given that studies on the effect of mood on syntactic processing are of direct relevance for the present article, they will be presented in more detail. In one study, a mood-related modulation of the P600 effect was found (Vissers et al., 2010). The authors manipulated mood (happy vs. sad) by showing affective film clips and studied its effect on the processing of emotionally neutral sentences with or without subject–verb agreement. Vissers et al. (2010) reported a broadly distributed P600 effect for the happy mood condition and a strong reduction in P600 effect for the sad mood condition. They propose three possible scenarios to account for the effect of emotional state on P600. The first scenario is that the mood by syntactic correctness interaction could be accounted for in terms of syntactic factors. That is, mood could affect language comprehension by increasing or decreasing syntactic processing. A second scenario is that mood could affect language processing via more general factors such as attention. On this account people in a happy mood might pay more attention to the sentences than people in a sad mood. According to a third scenario, mood selectively affects the use of heuristics. Heuristics are highly economical perceptual strategies that are usually very effective in extracting meaning and allow people to quickly solve problems and make judgments (e.g., Ferreira, 2003). On the assumption that people have a high expectancy for sentences to be syntactically correct (e.g., Coulson et al., 1998a; Vissers et al., 2013), the reduction of the P600 effect in the sad mood condition could reflect a reduced use of heuristics, whereas the increase in P600 effect for the happy mood could be due to an increased reliance on heuristics.

In contrast, in a recent study using a similar procedure to induce a happy vs. sad mood and the same type of syntactic anomaly (subject–verb agreement errors) no modulation of P600 by mood was obtained (Van Berkum et al., 2013). Note that Van Berkum et al. (2013) did observe a standard P600 effect to syntactic anomalies across mood conditions. The only difference between moods consisted of a slightly earlier onset of the P600 effect in the happy mood condition as compared to the sad mood condition. From this the authors conclude that emotional state has little impact on syntactic processing. In a third study by Jiménez-Ortega et al. (2012) mood was manipulated by presenting participants with emotionally positive, negative or neutral text paragraphs preceding a critical sentence that contained a syntactic anomaly (noun-adjective number disagreements). While these researchers did find an effect of emotional state on behavioral measures (i.e., higher error rates and reaction times (RTs) in happy mood compared to sad mood), no modulation of P600 by emotional state was observed. The authors explain the absence of a mood by syntactic correctness interaction in their study vs. presence of such an interaction in the Vissers et al. (2010) study in terms of the effectiveness of the mood induction procedure. Specifically, they propose that the emotional paragraphs might not have been effective in inducing the intended emotional state. This would also explain why this is the only ERP study in which no effect of emotional state on semantic processing – as reflected by N400 – was obtained.

Of interest for the present purposes there was one study that looked at the influence of depression on syntactic processing as indexed by P600 (Ruchsow et al., 2008). Patients with depression and healthy controls were presented with sentences containing syntactic mismatches. The authors found that, in contrast to healthy controls, patients with depression did not show a significant P600 effect to syntactic anomalies. Ruchsow et al. (2008) conclude that the absence of P600 effect in depressed patients points to an altered syntactic integration process. They state that it is an open question whether this altered syntactic integration process indicates a specific language processing deficit or is part of a more general cognitive deficit. If normal sadness and depression reflect a qualitatively similar process, then based on this finding one would predict a reduction in P600 effect for sad mood.

It has been proposed that an effect of emotional state on language processing can be influenced by more general factors like attention (Vissers et al., 2010, 2013; Chwilla et al., 2011; Van Berkum et al., 2013). Previous ERP studies investigating the effects of attention on syntactic processing have found that the P600 is modulated by list composition (a high vs. low proportion of syntactically correct vs. incorrect sentences), which induces differences in expectancy for correct or incorrect sentences (e.g., Coulson et al., 1998a). Relevant for the present article, Gunter and Friederici (1999) showed that depth of processing modulates the amplitude of the P600. They used a deep processing task and a shallow processing task to induce different levels of processing. In the deep processing task, participants had to judge whether a sentence was grammatically correct or not. In the shallow processing task, participants had to judge the sentences on purely physical features, i.e., whether a sentence contained a word in a deviant letter size. This task manipulation modulated P600 amplitude: in the physical judgment task a strong reduction of the P600 effect following incorrect verb inflections was observed as compared to the syntactic judgment task (see for a similar task modulation of N400: Chwilla et al., 1995).

The goal of the present study was twofold. First, to clarify whether and if so how emotional state affects syntactic processing. If syntactic processing is reliably affected by mood, this would further challenge modular views of language comprehension according to which syntactic processing is encapsulated (see e.g., Fodor, 1983). The second goal is to shed light on the underlying mechanisms by separating possible effects of mood on syntactic processing from those of attention. The crucial question is whether the effects of emotion and attention on P600 are additive and independent or whether they interact. In other words, if emotional state modulates P600, is this modulation then a true effect of emotion or could it be influenced by more general factors like attention?

To the first aim, we induced a happy mood or sad mood and presented sentences with or without subject–verb agreement errors to participants while their EEG was recorded. It has been well-established that subject–verb agreement errors elicit a P600 effect (see for a review: Vos et al., 2001). Emotional state (happy vs. sad) was manipulated by presenting film clips. It has been shown that the presentation of a film or story with explicit instructions to enter a specific mood is one of the most effective ways to induce both positive and negative emotional states (Westermann et al., 1996). For the happy mood induction, film clips from a happy movie, Warner Brother's "Happy Feet" were used. For the sad mood induction, fragments from a sad movie, "Sophie's choice" were used. Fragments from the same movies successfully induced the intended mood in previous studies (Vissers et al., 2010, 2013; Chwilla et al., 2011).

To the second aim, we manipulated attention in addition to the factor mood. Attention was manipulated in the same way as in the Gunter and Friederici (1999) study, by directing attention to syntactic features vs. physical features of the sentences. Specifically, participants either had to indicate whether the sentence was syntactically correct or whether the sentence contained a word in a deviant letter size. In the present study the factors emotional state (happy vs. sad) and task (syntactic vs. physical) are crossed. This design allows a determination of the relative role of the factors mood and attention (varied by task demands) in the processing of syntactic anomalies.

Based on previous ERP studies investigating the effects of mood and attention on the processing of syntactic anomalies, the predictions were as follows: first of all, we predicted a standard P600 effect to syntactic anomalies across mood and task conditions. As aforementioned, it is a matter of debate whether emotional state modulates the P600 effect to syntactic violations. If mood affects the size of the P600 effect we predict an interaction between mood and syntactic correctness, in particular a reduction in P600 effect for the sad mood as compared to the happy mood condition (Vissers et al., 2010). In contrast, if P600 is mainly insensitive to fluctuations in emotional state then no differences in P600 amplitude as a function of mood should be obtained (Jiménez-Ortega et al., 2012; Van Berkum et al., 2013).

Manipulation of the factor attention alongside the factor mood makes it possible to assess the (relative) contribution of attention to a mood-related modulation of P600. If general factors like attention contribute to an effect of emotional state on P600, this should be reflected in an interaction between emotional state, task and the P600 effect of syntactic correctness. On the other hand, if attention does not contribute to an effect of emotional state on P600, no interaction between emotional state, task and the P600 effect should be obtained. In the latter case an effect of emotional state and/or an effect of attention should be found on P600, in the absence of an interaction.

# Materials and Methods

# Participants

There were 38 participants (mean age = 20 years, age range = 18– 26). Recent research has shown that the assumption that subject sex matters little or not at all in studies on the neurobiology of emotional memory should be abandoned (Cahill, 2006). In line with this, a previous ERP study suggested that female participants are more sensitive to mood manipulations (Federmeier et al., 2001). Therefore, only female participants were tested in this study. Furthermore, only participants that reported no drug abuse, neurological, mental or chronic bodily diseases, or medication for any of these were selected. All participants were native speakers of Dutch, did not have any reading disabilities, had normal or corrected-to-normal vision and were right-handed. Hand dominance was assessed with an abridged Dutch version of the Edinburgh Inventory (Oldfield, 1971). This study was approved by the ethical committee of the faculty of Social Sciences of Radboud University. Six participants were excluded from the analyses due to equipment failure and muscular artifacts, leaving a total of 32 participants.

# Materials

A total of 100 Dutch subject–relative (SR) sentences with centerembedded clauses were presented. Of these sentences, 68 were used in Vissers et al. (2010), the other 32 were constructed for this study. For each sentence, a syntactically correct and incorrect version were created, yielding a total of 200 sentences. The incorrect sentences contained subject–verb agreement errors: the verb ending the relative clause did not have the same grammatical number as its subject. The incorrect sentences were derived from the correct sentences by switching the two noun phrases. For example, in the correct sentence 'De kameel die op de toeristen afliep*...*' (The camel who toward the tourists walked[singular]*...*), the head of the relative clause ('de kameel') was switched with the noun phrase in the relative clause ('de toeristen') creating the incorrect sentence 'De toeristen die op de kameel afliep*...*' (The tourists who toward the camel walked[singular]*...*). Because the two noun phrases always differed in number, the switch always yielded a subject–verb agreement error. This way, the correct and incorrect sentences did not differ at the verb critical position. One half of the sentences started with a singular noun phrase, whereas the other half started with a plural noun phrase. Furthermore, the grammatical number of the noun phrases was crossed with sentence grammaticality. The verbs used in the sentences all had a past tense plural inflection that involved the addition of one syllable (e.g., 'liep' [walked, 3 singular], versus 'liepen' [walked, 3 plural]). In this way, there was maximum discriminability between the plural and singular verb-forms, while holding the length of the verbs constant across conditions.

The two versions of each sentence were counterbalanced across two lists. This means there was no repetition of the experimental sentences within participants. Each list contained 50 SR acceptable and 50 SR unacceptable sentences. 100 filler sentences were added to each list: 25 acceptable SR sentences, 25 unacceptable SR sentences, 25 right-branching sentences (e.g., 'De taxateur keek naar de schilderijen die veel waard leken' – 'The appraiser looked at the paintings that seemed worth a lot') of which half contained a subject–verb agreement error at the sentence-final verb, and 25 conjunctions (e.g., 'De huisvrouw kookte voor de kinderen en deed daarna de afwas' – 'The housewife cooked for the children and then did the dishes') of which half contained a subject–verb agreement error at the verb right after the conjunction, yielding a total of 50 acceptable and 50 unacceptable filler sentences. The experimental and filler sentences were mixed in the same pseudo-random order for each of the two lists, with the conditions distributed evenly over lists.

The sentences in both lists were allocated to four blocks. There were two blocks for each task (syntactic and physical). All blocks contained 25 experimental and 25 filler sentences, with the conditions distributed evenly over blocks. In the syntactic task, all words in all sentences were presented in uppercase letters. In the physical task, all words in all experimental sentences were presented in uppercase letters, whereas all filler sentences contained a word in lowercase letters. The physical deviation of the word in lowercase letters was only positioned in the filler sentences to avoid a confound by comparing a single violation in the syntactic task with a double violation in the physical task (i.e., an incorrect verb that was in a deviant letter size) in the experimental sentences. The location of the word in lowercase letters differed per sentence type: for the SR filler sentences, the verb ending the relative clause was in lowercase letters (e.g., 'DE PINGUIN DIE ONDER DE IJSSCHOTSEN dook BEVOND ZICH OP DE ZUIDPOOL'– 'THE PENGUIN THAT BELOW THE ICE dived WAS ON THE SOUTH POLE' [literal translation]), whereas both for the right-branching fillers and the conjunctions the last word of the sentence was printed in lowercase (e.g., 'DE HUISVROUW KOOKTE VOOR DE KINDEREN EN DEED DAARNA DE afwas' – 'THE HOUSEWIFE COOKED FOR THE CHILDREN AND THEN DID THE dishes'). We varied the location of the word in deviant letter size to make sure that participants could not predict where the word in lowercase letters would be located in the sentence.

For both lists, another list was created in which the task blocks were switched, generating a total number of four lists. This way, every sentence was assigned to the syntactic task in one of the lists and to the physical task in another list.

# Procedure

Participants were seated in an enclosed room. A response device with three pushbuttons was set on a table in front of the participant. The sentences were presented in serial visual presentation mode at the center of a PC monitor. Word duration was 345 ms and the stimulus-onset asynchrony (SOA) was 645 ms. Sentencefinal words were followed by a full stop. The inter-trial interval was 2 s. Words were presented in black letters on a white background in font size Arial 20 at a viewing distance of ∼1 m. Each sentence was preceded by a 510 ms fixation cross followed by a 500 ms blank screen. Because eye movements distort the EEG recording, participants were trained to make eye movements, e.g., blinks, only after the sentence-final word had disappeared from the screen.

There were two blocks for each task. The experimental session of each task started with a training set of 10 sentences, others than those used in the experiment. Half of the participants started with the syntactic task, the other half started with the physical task.

In the syntactic task participants were instructed to direct their attention to the grammaticality of the sentences. After offset of the sentence-final word, participants had to indicate whether the sentence was syntactically correct (press right button with right index finger) or incorrect (press left button with left index finger).

In the physical task participants were instructed to direct their attention exclusively to the physical features of the sentences. After offset of the sentence-final word participants had to indicate whether the words comprising the sentence were presented in the same font (press right button with right index finger) or not (press left button with left index finger). The maximum response time in both tasks was 3 s, measured from the offset of the sentence-final word.

Immediately before the EEG recording, the mood induction procedure (MIP) was initiated with the first of affective film clips. Between experimental blocks, new film clips were shown. Dependent on the mood condition short film clips were presented from a happy movie or a sad movie. The happy movie fragments were cut from Warner Brothers' movie Happy Feet; the sad movie fragments were cut from the Universal Pictures' second World War drama Sophie's Choice. The film clips showed unambiguous, unipolar emotions and affective situations. Participants were asked to use the situations and emotions depicted in the clips to help them enter the specified mood. The film clips were presented on the same PC monitor used for the presentation of the sentences. The length of the film clips varied between 4.13 and 12.07 min, with a mean length of 7.17 min for the happy mood condition and a mean length of 7.42 min for the sad mood condition. A total of four film clips was presented to the participants. This with the aim to prolong the intended mood during the entire experiment. Mood was manipulated between subjects for two reasons: first, it is difficult to switch on and off a positive vs. negative mood within one single recording session. A possible solution to this would have been to invite the same groups of participants over for a second recording session. This, however, would have resulted in repetition of the critical experimental sentences. Given that language-relevant ERP components like N400 and P600 are sensitive to stimulus repetition (Olichney et al., 2006) and that it takes a long time for stimulus repetition to vanish (Cave, 1997), we preferred not to present the stimulus materials twice.

To assess the effectiveness of the film clips in inducing the intended mood, participants were asked to rate their mood after each movie. The scale ranged from 'extremely sad' (−10) to 'extremely happy' (+10). In addition, to determine the effectiveness of the instruction to focus attention on syntactic vs. purely physical features of the sentences, participants were asked to fill out an attention rating after each block. They had to indicate how well they could fully direct their attention to the grammatical aspects of the sentence (after the syntactic task blocks) or to the physical aspects of the sentence (after the physical task blocks). The scale ranged from 'extremely bad' (−10) to 'extremely well' (+10).

# EEG Data Acquisition and Analyses

The electroencephalogram (EEG) was recorded from 26 electrodes mounted in an elastic cap (Acticap system) at standard 10–20 locations. Four electrodes were placed over the midline Fz, Cz, Pz, and Oz. Eleven pairs were placed over the lateral sites F7/F8, F3/F4, Fc5/Fc6, Fc1/Fc2, T7/T8, C3/C4, Cp5/Cp6, Cp1/Cp2, P7/P8, P3/P4, and O1/O2. During recording, the right mastoid served as reference. An electrode was also placed at the left mastoid. The electro-oculogram (EOG) was recorded bipolarly; vertical EOG was recorded by placing an electrode above and below the right eye and the horizontal EOG was recorded by placing two electrodes at the outer left and right canthi. The signals were amplified (time constant = 8 s, bandpass = 0.02– 30 Hz), and digitized online at 200 Hz.

Before the analysis, the EEG signals were re-referenced to the mean of the left and right mastoid. EEG and EOG recordings were examined for artifacts and for excessive EOG amplitude (*>*100 µV) from 100 ms before the onset of the critical verb ending the relative clause to 1 s following its onset. Averages were aligned to a 100-ms baseline preceding the critical verb.

Time-course analyses were conducted to examine the onsets and durations of the ERP effects. To this aim, the mean amplitudes of consecutive time-windows of 100 ms were computed for the different conditions for each participant, beginning at the onset of the critical verb and ending 1 s later. Based on these time course analyses and to increase comparability with previous studies (e.g., Van Herten et al., 2005; Vissers et al., 2010), mean amplitudes in the 600–800 ms time window after critical word onset were used to quantify P600 effects.

To check for early effects of attention, supplementary analyses were performed for the P1 (125–175 ms time window) and N1 (175–225 ms time window) components. The P1 and N1 components are taken to reflect perceptual and attentional processing, respectively (Mangun and Hillyard, 1991; Mangun, 1995). An effect of task for these earlier ERP components would support the view that the task manipulation led to differences in early perceptual and/or attentional processes. Moreover, an interaction including mood and correctness would indicate that the MIP led to differences in early perceptual and/or attentional processes between participants in the happy vs. sad mood condition.

Repeated-measures ANOVA's were performed to analyze the ERP data for all time windows. The repeated-measures ANOVA's were conducted separately for the midline sites and for the lateral sites, with correctness (correct vs. incorrect) and task (syntactic vs. physical) as within-subject factors and mood (happy vs. sad) as a between-subject factor. The midline analyses included the additional factor site (Fz, Cz, Pz, Oz). To further explore the scalp distribution of the ERP effects for the lateral sites we used a hemisphere by lateral site (F7/F3/Fc5/Fc1/T7/C3/Cp5/Cp1/P7/P3/O1 vs. F8/F4/Fc6/Fc2/T8/C4/Cp6/Cp2/P8/P4/O2) design. The multivariate approach to repeated measurements was used to avoid problems concerning sphericity (e.g., Vasey and Thayer, 1987). Wilks' lambda was used to test whether there were differences between the means of the groups of subjects on the (combination of) dependent variables.

To test whether the in the present study reported modulations in P600 amplitude by mood were accompanied by changes in emotional state, correlation analyses were performed. Factors for these correlation analyses were the size of the P600 effect (computed by the difference in amplitude to incorrect and correct verbs) and mean mood rating (computed over mood ratings per subject). To test whether modulations in P600 amplitude were accompanied by changes in the amount of attention a participant directed at the syntactic vs. physical features of the words comprising the sentences, additional correlation analyses were performed. Factors for these analyses were the size of the P600 effect and mean attention rating (computed over attention ratings for the syntactic task and attention ratings for the physical task separately).

# Results

# Mood Induction Procedure

As **Figure 1** shows and supported by the statistical analyses reported below, the intended mood was effectively induced by the MIP. That is, participants were significantly happier after watching happy film clips than after watching sad film clips (*p <* 0.001). Likewise, participants were significantly sadder after watching sad film clips than after watching happy film clips (*p <* 0.001). Moreover, participants were significantly happier after watching each of the four happy film clips than they were at the baseline measurement (*ps <* 0.006). Similarly, after watching each of the sad film clips, participants were significantly sadder than they were at the baseline measurement (*ps <* 0.001). There was no difference in mood score between the participants in the happy mood condition (*M* = 4.06, *SD* = 0.53) and the participants in the sad mood condition (*M* = 4.36, *SD* = 0.52) before the film clips were presented [*t*(30) = −0.42, *p* = 0.68].

sad) to **+**10 (extremely happy) for the four film clips comprising the mood induction procedure, separately for the participants assigned to the two mood conditions (happy vs. sad mood condition).

### Reaction Time and Error Data

The RT and error data of 2 of the 32 participants were removed from the analyses, due to equipment failure. The RT and error data are presented in **Tables 1** and **2** respectively.

For RT a main effect of task [*F*(1,28) = 12.94, *p <* 0.002] indicated overall longer RTs in the syntactic task than in the physical task. Furthermore, a main effect of correctness [*F*(1,28) = 13.05, *p <* 0.002] reflected overall longer RTs to correct sentences than to incorrect sentences. There was no main effect of mood (*F <* 1) or interaction between mood and task or mood and correctness (*F*s *<* 1). For RT, a mood by task by correctness interaction [*F*(1,28) = 4.91, *p <* 0.04] reflected differences between conditions as a function of mood. Follow-up analyses revealed that the task by correctness interaction was more pronounced in the happy mood condition [*F*(1,14) = 20.37, *p <* 0.001] than in the sad mood condition [*F*(1,14) = 4.93, *p <* 0.05]. For the happy mood condition, a correctness effect was present in the syntactic task [*F*(1,14) = 19.84, *p <* 0.002] but not in the physical task (*F <* 2). Also, for the sad mood condition, a correctness effect was present only in the syntactic task [*F*(1,14) = 7.02, *p <* 0.02] and not in the physical task (*F <* 1).



TABLE 2 | Mean error percentages.


For the error data, a main effect of task [*F*(1,28) = 53.00, *p <* 0.001] reflected that participants made more errors in the syntactic task than the physical task. No main effects of correctness or mood (*F*s *<* 4), or interactions with these factors (*F*s *<* 2) were found.

# Event-Related Potentials

Based on interactions between mood, task and correctness (see below), the waveforms are presented separately for the two mood conditions (happy vs. sad mood) and the two task conditions (syntactic vs. physical task). The grand mean ERPs to the critical verbs for the happy mood condition for the syntactic task and the physical task are presented in **Figures 2** and **3,** respectively. The grand mean ERPs for the sad mood condition for the syntactic and physical task are presented in **Figures 4** and **5,** respectively.

As the Figures show, the critical verbs elicited an early ERP response that is characteristic for visual stimuli, namely an N1 and a P2, which was preceded by a P1 at occipital sites. These early components were followed by a broad negative wave in the 250–500 ms epoch, peaking at about 400 ms, the N400. The N400 is elicited by each open class word (e.g., Kutas and Van Petten, 1994). The most distinguishing feature of the waveforms was a slow positive shift starting at about 500 ms and extending up to 1000 ms, which was largest at centroposterior sites. This positivity resembles the P600 elicited by syntactic anomalies in terms of its timing and scalp distribution (e.g., Osterhout and Holcomb, 1992; Hagoort et al., 1993). In all conditions, the P600 seemed to be modulated by syntactic correctness, with more positive amplitudes to incorrect verbs than to correct verbs. Visual inspection of the waveforms for the syntactic and the physical task for the two mood conditions suggests (a) that the P600 effect was most prominent for the happy mood condition in the syntactic task and reduced in all other conditions; and (b) the presence of a small N400 effect (i.e., more negative going amplitudes to incorrect verbs than to correct verbs) for the sad mood condition in the syntactic task (see **Figure 4**) and absence of an N400 effect in all the other conditions.

### P600 Window (600–800 ms)

The percentage of trials excluded from the analyses because of (eye-)movement artifacts was 4.78%. The aim of this study was to examine the combined effects of emotional state and attention on the standard P600 effect to syntactic anomalies. A prerequisite for assessing the effects of both factors on the P600 effect, therefore, is

that an effect of correctness is obtained. The time-course analyses revealed correctness effects from 500 up to 1000 ms after critical word onset (*F*s *>* 9). The largest correctness effects were found in the 600–700 and 700–800 ms time windows (*F*s *>* 35). Therefore, the 600–800 ms time window measured from critical word onset was used to capture P600 effects.

Main effects of correctness were found for the midline [*F*(1,30) = 50.13, *p <* 0.001] and the lateral sites [*F*(1,30) = 39.75, *p <* 0.001]. These effects reflected that mean amplitudes were overall more positive for incorrect verbs than for correct verbs. No significant main effect of the factor mood was obtained (*F*s *<* 1). Main effects of task [midline *F*(1,30) = 33.23, *p <* 0.001; lateral *F*(1,30) = 22.70, *p <* 0.001] reflected overall more positive amplitudes in the syntactic task than in the physical task. An interaction between mood, task, correctness and site was obtained for the midline sites [*F*(1,30) = 4.24, *p <* 0.02]. The omnibus ANOVA including all lateral sites did not yield an interaction of mood, task and correctness or interactions of these factors with site and/or hemisphere (*F*s *<* 3). To test for reliable interactions between mood, task and correctness, region of interest (ROI) analyses were performed for all centroparietal lateral sites that typically yield P600 effects (i.e., C3, CP5, CP1, P7, P3, and O1 for the left hemisphere and C4, CP6, CP2, P8, P4, and O2 for the right hemisphere). The ROI analyses revealed an interaction between mood, task, correctness, hemisphere and site [*F*(1,30) = 3.06, *p <* 0.03]. Based on these interactions separate analyses for the two levels of mood and for the two levels of task were performed for the midline sites and for the lateral centroparietal sites.

# *Happy mood condition: interplay between task and correctness*

A main effect of correctness was obtained [midline: *F*(1,15) = 43.27, *p <* 0.001; ROI: *F*(1,15) = 52.51, *p <* 0.001], reflecting a larger mean P600 amplitude to the syntactically incorrect verbs than to the correct verbs. Main effects of task [midline: *F*(1,15) = 11.79, *p <* 0.005; ROI: *F*(1,15) = 15.51, *p <* 0.002] revealed that mean P600 amplitude was larger for the syntactic task than for the physical task. The analyses yielded correctness by site interactions [midline: *F*(1,15) = 30.47, *p <* 0.001; ROI: *F*(1,15) = 22.26, *p <* 0.001] and task by correctness by site interactions [midline: *F*(1,15) = 5.86, *p <* 0.01; ROI: *F*(1,15) = 3.24, *p <* 0.05]. The latter interactions reflected a larger

correctness effect in the syntactic task [midline: *F*(1,15) = 32.31, *p <* 0.001; ROI: *F*(1,15) = 48.32, *p <* 0.001] than in the physical task [midline: *F*(1,15) = 7.07, *p <* 0.02; ROI: *F*(1,15) = 10.48, *p <* 0.007]. Furthermore the interaction revealed the presence of a correctness by site interaction in the syntactic task [midline: *F*(1,15) = 30.35, *p <* 0.001; ROI: *F*(1,15) = 21,92, *p <* 0.001], but absence of this interaction in the physical task (*F*s *<* 2). Follow-up single sites analyses were performed separately for the syntactic task. These analyses revealed P600 effects at centroposterior midline sites (Cz, Pz, and Oz: *p*s *<* 0.002) and for all centroposterior lateral sites (*p*s *<* 0.03).

# *Sad mood condition: interplay between task and correctness*

Main effects of correctness were obtained [midline: *F*(1,15) = 13.72, *p <* 0.003; ROI: *F*(1,15) = 18.49, *p <* 0.002]. These effects reflected that mean amplitude was larger for the syntactically incorrect verbs than for the correct verbs. Also, a main effect of task [midline: *F*(1,15) = 24.71, *p <* 0.001; ROI: *F*(1,15) = 31.24, *p <* 0.001] reflected overall larger P600 amplitudes for the syntactic task than for the physical task. The analysis yielded task by site interactions [midline: *F*(1,15) = 6.65, *p <* 0.007; ROI: *F*(1,15) = 10.51, *p <* 0.002] and correctness by site interactions [midline: *F*(1,15) = 5.01, *p <* 0.02; ROI: *F*(1,15) = 8.32, *p <* 0.003]. To determine the topography of the P600 effects for the two tasks follow-up analyses were performed for the midline and for centroposterior lateral sites separately for the syntactic and the physical task. For the syntactic task a P600 effect was present at three midline sites (Cz, Pz, and Oz, *p*s *<* 0.04) and bilateral centroposterior sites (C3, Cp1, P3, O1, Cp2, P4, and O2, *p*s *<* 0.04). In the physical task a P600 effect was present at posterior midline sites (Pz and Oz, *p*s *<* 0.04) and bilateral centroposterior sites (C3, Cp5, Cp1, P7, P3, Cp2, P8, and P4, *p*s *<* 0.04).

### Comparison of the Size of the P600 Effects Across Conditions

To shed light on the nature of the interaction between mood, task and correctness the size of the P600 effects was compared across conditions. These analyses were carried out on difference scores (incorrect–correct) that directly represented the size of the P600 effect at each site. First we tested for differences in the size of the P600 effects between

the two emotional states and second, we tested for differences in the size of the P600 effects between the two tasks.

# *Comparison of the size of the P600 effects as a function of mood*

The midline analysis yielded an interaction between mood, task and site [*F*(1,30) = 4.85, *p <* 0.01]. The three-way interaction indicated the presence of a mood by site interaction in the syntactic task [*F*(1,30) = 6.79, *p <* 0.002] and absence of this interaction in the physical task (*F <* 1). Follow-up analyses revealed that in the syntactic task for Pz the P600 effect was significantly larger for the happy mood condition than for the sad mood condition [*t*(30) = 2.94, *p <* 0.01].

The ROI analysis yielded a trend toward an interaction between mood, task, hemisphere and site [*F*(1,30) = 2,36, *p* = 0.07]. This trend is probably caused by larger P600 effects in the syntactic task for the happy mood condition than for the sad mood condition at centroparietal sites (see **Figure 6**). In line with this the difference scores for the centroposterior sites disclosed that in the syntactic task the P600 effect was significantly larger for the happy than the sad mood condition at two lateral sites (Cp1 and P3; *p*s *<* 0.05).

# *Comparison of the size of the P600 effects as a function of task*

Based on the interaction between mood, task and site [*F*(1,30) = 4.85, *p <* 0.01] reported above for the midline the analyses were performed separately for the two levels of mood. In the happy mood condition, an interaction between task and site was obtained [*F*(1,15) = 5.86, *p <* 0.01]. Follow-up analyses disclosed that the P600 effect at Pz was significantly larger in the syntactic task than in the physical task [*t*(30) = 3.10, *p <* 0.008; see **Figure 7**]. In contrast, in the sad mood condition, no task by site interaction was obtained (*F <* 2).

#### Correlation Analyses

To test whether modulations in P600 amplitude in the two moods conditions are accompanied by changes in emotional state, Pearson correlations were calculated between the size of the P600 effect and the mean mood rating (computed over four mood ratings per participant). The size of the P600 effect was computed by subtracting P600 amplitude to the syntactically correct verbs from that to the syntactically incorrect verb; this difference score was computed for the centroparietal midline electrodes (Cz and Pz) and for the subset of lateral centroparietal electrodes (Cp5, Cp1, P3, Cp6, Cp2, and P4) showing the strongest P600 effects.

These analyses revealed significant correlations between the size of the P600 effect in the syntactic task and the mood ratings for both centroposterior midline electrodes: Cz (*p <* 0.05) and Pz (*p <* 0.001) and for five of the centroposterior lateral electrodes: Cp5, Cp1, Cp2, P3, and P4 (*p*s *<* 0.02). These correlations indicated that the happier the mood, the larger the P600 effect and likewise, the sadder the mood, the smaller the P600 effect. With correlations ranging from 0.35 up to 0.60, at least 12% up to 36% of the variation in size of P600 effect is accompanied by variations in emotional state.

To test whether modulations in P600 amplitude in the two mood conditions are accompanied by changes in attention, Pearson correlations were calculated between the size of the P600 effect and the mean attention rating (computed over four attention ratings per participant) as factors. Correlation analyses collapsed across the two levels of mood revealed a correlation between the size of the P600 effect in the syntactic task and the attention ratings for one occipital site (Oz: *p <* 0.02). This correlation indicated that the more attention is paid to the syntactic features, the larger the P600 effect at Oz and likewise, the less attention is paid to the syntactic features of the stimuli, the smaller the P600 effect. With a correlation of 0.42, 18% of the variation in size of the P600 effect is accompanied by variations in the amount of attention paid to the syntactic structure of the sentences.

#### Early Attentional Factors

To check for early effects of attention, supplementary analyses were performed for the P1 (125–175 ms time window) and N1 (175–225 ms time window) components.

## P1

For the P1 component, no main effects of correctness were present for the midline sites or for the lateral sites (*F*s *<* 2). For the midline sites, a correctness by mood interaction was obtained [*F*(1,30) = 5.42, *p <* 0.03], reflecting that a correctness effect (more positive amplitudes to incorrect than correct verbs) was present for the happy mood condition [*F*(1,15) = 5.60, *p <* 0.04], but not for the sad mood condition (*F <* 1). For the lateral sites, a three-way interaction between mood, correctness and site was obtained [*F*(1,30) = 3.20, *p <* 0.02]. The interaction reflected that a correctness by site interaction was present for the happy mood condition [*F*(1,15) = 4.45, *p <* 0.05], but not for the sad mood condition (*F <* 1). The correctness by site interaction as found for the happy mood condition reflected that a correctness effect was present at a few centroposterior, posterior and occipital sites.

### N1

For the N1 component, no main effects of correctness were found for the midline and lateral sites (*F*s *<* 2). Main effects of task were present both for the midline sites [*F*(1,30) = 11.61, *p <* 0.003] and the lateral sites [*F*(1,30) = 7.45, *p <* 0.02]. These effects reflected more negative amplitudes for the syntactic task than for the physical task. For the midline, a three-way interaction between mood, task and correctness [*F*(1,30) = 5.14, *p <* 0.04] was obtained, reflecting the presence of a two-way interaction between task and correctness for the sad mood condition [*F*(1,15) = 8.24, *p <* 0.02], but not for the happy mood condition (*F <* 1). The task by correctness interaction for the sad mood condition disclosed that the correctness effect was a bit larger, though not significant, in the syntactic task (*F <* 5) than in the physical task (*F <* 4).

# Discussion

In several fields of psychology it has been shown that a person's emotional state influences the way in which information is processed (see for a review: Clore and Huntsinger, 2007). For instance, it has been shown that a positive compared to a neutral or negative mood facilitates creative problem-solving (e.g., Greene and Noice, 1988), stereotyping (e.g., Fiedler and Walther, 2004) and recalling materials from memory (e.g., Isen et al., 1978). Based on the classic view that language processing occurs in isolation from other cognitive systems, like perception, motor action, and emotion, only recently the interplay between emotion and language has been investigated. The few ERP studies that explored the effects of mood on semantics revealed that mood affects the processing of word meaning as tapped by N400 (Federmeier et al., 2001; Chwilla et al., 2011; Egidi and Nusbaum, 2012; Pinheiro et al., 2013). The reported interactions between mood and semantic processing support interactive theories of language (e.g., Glenberg, 1997; Barsalou, 1999) and present a challenge for modular theories of language comprehension (Fodor, 1983). The effect of emotional state on syntactic processing is more controversial, one study reported a moodrelated modulation in syntactic processing as reflected by P600 (Vissers et al., 2010), while two other studies did not observe an effect (Jiménez-Ortega et al., 2012; Van Berkum et al., 2013). While the absence of an effect of emotional state on syntax seems to fit well with the view that syntactic processing is of a modular nature, the finding of a mood by syntax interaction calls this view into question.

The main goals of the present article were as follows: the first goal was to clarify whether and if so how emotional state affects syntactic processing. The second goal was to shed light on the underlying mechanisms by separating possible effects of mood from those of attention on syntactic processing. To these aims, we manipulated attention next to the emotional state of the participants and investigated the joint effects of these two factors on the processing of syntactic anomalies. Different emotional states (happy mood vs. sad mood) were induced and prolonged by presenting film clips before and between task blocks. Attention was manipulated by task demands. Participants were asked to exclusively pay attention to the syntactic well-formedness of the sentences (syntactic task), or to purely physical features of the words of the sentences (physical task).

A necessary condition for the investigation of the relationship between mood, attention and the processing of syntactic anomalies as reflected by P600, is that the mood induction was successful. The behavioral results reveal that this was the case. As indicated by the analyses of the mood ratings, the intended mood was successfully induced. Participants were in a significantly happier mood after watching the happy film clips and they were in a significantly sadder mood after watching the sad film clips (see **Figure 1**).

With these data in hand we can address the questions whether emotional state affects the P600 effect and whether attention influences the mood-related modulation of P600. The main ERP results were as follows: as predicted a standard P600 effect was elicited by subject–verb agreement errors, across mood and task conditions. More importantly, interactions between mood, attention, and correctness were obtained for the midline and the lateral centroposterior sites that typically show largest P600 effects to syntactic violations. The interactions reflected a modulation of P600 as a function of both emotional state and attention (see **Figure 8**).

# Influence of Emotion and Attention on P600

Let us first describe the influence of emotional state on the P600 effect. Emotional state only affected P600 in the syntactic task and not in the physical task. In particular, a larger P600 effect was found in the happy than in the sad mood condition. This was supported by the results of the difference scores analyses and correlation analyses. The fact that emotional state only affected P600 in the syntactic task – and not in the physical task – seems to indicate that a necessary condition for effects of emotional state on syntactic processing to occur is that (some) attention is directed at the syntactic level. Apparently, the effects of mood on syntactic processing are influenced by attentional demands.

We will now examine the influence of attention on the syntactic correctness effect on P600. The focus of attention only had an impact on the P600 in the happy mood condition and not in the sad mood condition. Specifically, for happy mood a reduction in P600 effect was found for the physical compared to the syntactic task. This was supported by the results of the difference scores analyses and correlation analyses. As stated above a task-related modulation of the P600 effect was only present in the happy mood condition, and not in the sad mood condition. In other words, while a standard P600 effect occurred in the syntactic and the physical task, directing attention at syntactic vs. physical features only affected syntactic processing – as reflected by changes in P600 amplitude – when participants were in a happy mood.

Let us summarize the main results in relation to the goals of the present article. The first goal of this study was to clarify whether emotional state has an effect on syntactic processing. As aforementioned, the findings in the literature are controversial. While Vissers et al. (2010) reported a mood-related modulation of the P600 effect to subject–verb agreement errors, Jiménez-Ortega et al. (2012) and Van Berkum et al. (2013) did not find an effect of mood on the processing of syntactic anomalies. The in the present study reported mood-related modulation of the P600 effect to syntactic anomalies is consistent with the results of Vissers and colleagues. Therefore, one major finding of this article is that we replicated an immediate effect of emotional state on the processing of syntactic anomalies. From this we conclude that syntactic processes – opposite to what has been proposed (Van Berkum et al., 2013) – are affected by changes in emotional state. This means that the effects of emotional state on language comprehension are not limited to semantic processing but also involve syntactic processing.

The second goal of this study was to separate effects of mood on syntactic processing from those of attention. Phrased differently, what role do more general factors like attention play in the mood-related modulation of the syntactic P600 effect? A comparison of the present P600 results with those of Vissers et al. (2010) helps to answer this question. Directing the attention of the participants in the present study modulated the mood-related P600 effect. In particular, directing attention to syntactic features diminished the effect of mood on the P600 effect compared to the previous study, in which participants read for comprehension. In the latter study the P600 effect was strongly reduced in the sad mood condition (i.e., only present at two lateral sites). In contrast, in the present study a broadly distributed P600 effect was found in the syntactic task for the sad mood condition (i.e., present at three midline and eight centroparietal lateral sites). The P600 results reveal that directing attention to the syntactic level reduced the immediate impact of emotional state on the processing of syntactic anomalies. However, most important for the present purposes, the effect of emotional state was not abolished as evident from that, as **Figure 6** illustrates, the P600 effect was smaller for the sad mood than for the happy mood condition. Additionally, the focus of attention modulated the effect of emotional state on the processing of syntactic anomalies, in that no effect of emotional state was observed in the physical task. From this we draw the conclusion that attention plays a modulating role in the effect of emotional state on syntactic processing. As pointed out above, emotional state also modulated the effect of attention on the processing of syntactic anomalies. This is apparent from that the task manipulation only affected the P600 effect in the happy mood condition and not in the sad mood condition. To conclude, the three-way interaction indicates a reciprocal influence of attention on emotion and of emotion on attention. Clearly future work is needed to further our understanding about the interplay of attention and emotion in language comprehension.

The fact that more general non-linguistic factors like emotion and attention impact language processing, in particular syntactic processing, supports interactive theories of language (e.g., McDonald et al., 1994; Trueswell and Tanenhaus, 1994; Barsalou, 2008) and challenges modular views according to which language processes operate in isolation from other cognitive and linguistic sub-systems (see e.g., Forster, 1979; Fodor, 1983). The present ERP data accord well with the results of fMRI studies that have indicated that emotion and cognition (memory, attention, language) strongly interact in the brain (e.g., Mitchell and Phillips, 2007; Pessoa, 2008). Brain regions that are often associated with cognitive processing, such as the lateral prefrontal cortex, have now been shown to be strongly involved in both affective and cognitive function (i.e., Gray et al., 2002). Moreover, these fMRI results have been taken to indicate that brain regions previously viewed as purely affective, like the amygdala, hypothalamus and anterior cingulate cortex, are among the most highly connected regions of the brain and might function as important connectivity hubs (Pessoa, 2008). Where these fMRI studies disclose that the neural correlates of emotion and cognition should be viewed as interactive, the present ERP results reveal an immediate interaction of emotional state and attention on the processing of syntactic anomalies. Before turning to possible mechanisms that give rise to emotion by language interactions for P600 we will now look at effects of attention and mood on early ERP effects.

# Early ERP Effects of Attention and/or Emotional State on Syntactic Processing

The attention manipulation gave rise to early ERP effects. Main effects of task were present for N1, reflecting more negative amplitudes for the syntactic task than for the physical task. It has been well established that directing attention to some aspect of the environment leads to an enhancement of N1 amplitude (Hillyard and Anllo-Vento, 1998). In line with this, the larger N1 amplitude for the syntactic task is taken to indicate that participants paid more attention to the critical verbs in the syntactic task than in the physical task. This difference could be influenced by the task manipulation itself, because participants were asked to focus their attention on the syntactic features of the sentences in the syntactic task and on the physical features in the physical task. Note that the fact that only in the syntactic task the critical verb was task-relevant could also play a role. In order to avoid a double violation (syntactic violation plus different letter size), in the physical task only words occurring in the filler sentences were task-relevant (i.e., printed in a smaller font; see below).

The mood manipulation gave rise to early effects in the P1 window. In particular, interactions between mood and correctness were found, reflecting the presence of a correctness effect in the happy mood condition and absence of a correctness effect in the sad mood condition. Early effects of emotional meaning on P1 have been reported before (see Scott et al., 2009; Bayer et al., 2012). These effects were taken to indicate that emotion affects early stages of processing. The present results may be taken to suggest that the early emotion effects are not restricted to emotional language but generalize to neutral language, as emotionally neutral words were used in the present study. However, future studies on the effects of emotional state under attended and unattended conditions are required to understand the functional significance of these early effects of attention and emotion on the N1 and P1 component, respectively.

# Possible Mechanism(s) Behind the Emotion by Language Interactions

What could be the mechanism(s) underlying the effect of emotional state on syntactic language processing? In the introduction three possible explanations for the mood-related P600 modulation have been presented. According to one scenario emotional state influences syntactic processing. Several syntactic manipulations have been shown to elicit a P600. The P600 effect has been taken to reflect processes of syntactic reanalysis (e.g., Friederici, 1995; see for a more general reanalysis account of the P600: Kolk and Chwilla, 2007), syntactic processing *per se* (Hagoort et al., 1993), or syntactic integration difficulty (Kaan et al., 2000). On a syntactic account, the decrease in P600 effect in the sad compared to the happy mood condition could reflect reduced syntactic processing. Alternatively, happy mood could lead to an enhancement of syntactic processing (for a further discussion of the syntactic account, see below). The role of syntactic processing in bringing about effects of emotional state on sentence processing could be explored in future studies by varying syntactic factors, for instance syntactic complexity, alongside emotional state.

According to a second scenario mood might influence syntactic processing via more general factors such as attention or motivation (Vissers et al., 2010, 2013; Chwilla et al., 2011). Regarding a possible mediating role of attention, this is the first study that directly assessed the joint effects of attention and emotional state on the processing of syntactic anomalies. The present P600 results show that attention plays a modulating role in the mood-related P600 effect: directing attention to the syntactic features of the sentence reduced the influence of emotional state on syntactic processing. Importantly, however, it did not abolish the effect – that is, the P600 effect was smaller for the sad mood than for the happy mood condition. From this we can conclude that there is a genuine effect of emotional state on the syntactic P600 effect that cannot be accounted for by attention. Interpreting the interaction between mood and syntactic processing in terms of attention and/or motivation lines up with a language processing model (MRC hypothesis) proposed by Brouwer et al. (2012; see Brouwer and Hoeks, 2013, for an extended discussion of this hypothesis). According to these authors, P600 reflects word-by-word construction or updating mental representations of what is being read. P600s to syntactic anomalies can thus be taken to reflect increased effort in integrating the critical word with its prior context to form a coherent representation. On this account, the broadly distributed P600 effect for the happy mood condition could reflect a strong effort to syntactically integrate words, while the reduction in P600 effect for the sad mood condition could reflect a reduced integration effort. Put shortly, happy participants could either pay more attention to syntactic features or could be more highly motivated to process syntactic information, which could be reflected by an increase in P600.

Consistent with an explanation in terms of motivation, Van Berkum et al. (2013) found an influence of emotional state on the processing of verb-based expectancies. In particular, they investigated the effect of mood on the anticipation of referents during discourse comprehension. They presented participants with sentences in which a pronoun was highly expected ("Sarah feared Joe because he*...*") vs. sentences in which a pronoun was not highly expected ("Joe feared Sarah because he*...*"). The authors found that mood affected referential anticipation. People in a happy mood did anticipate referential information, whereas people in a sad mood did not anticipate information about a specific person. Relevant for the present discussion, as stated above in the same study Van Berkum et al. (2013) did not find an effect of mood on the processing of syntactic agreement violations. They proposed a bio-energetic explanation to account for the presence of a mood effect on the processing of verb-based expectancies vs. an absence of a mood effect on syntactic parsing. According to this explanation, mood has an influence on how willing people are to invest in costly, exploratory behavior, such as referential anticipation. On this account, people in a sad mood would be less motivated to invest in exploratory processing than people in a happy mood. For people in a sad mood, the benefits of exploratory processing would not outweigh the perceived bioenergetics costs. The authors argue that referential anticipation requires greater mental effort than syntactic parsing. In this way, they explain that emotional state does have an influence on referential anticipation but not on syntactic processing.

A third scenario is that a person's emotional state influences the use of heuristics. On this account, the decrease in P600 effect for the sad mood compared to happy mood reflects that people in a happy mood rely more on heuristic processing than people in a sad mood. Heuristics are very effective in extracting meaning and allow people to quickly solve problems and make judgments (Ferreira, 2003). The proposal that mood influences the use of heuristics fits well with the notion of mood-dependent processing

styles. In happy mood, people are inclined to rely more on heuristic processing than people in a sad mood (Clore and Huntsinger, 2007). When using heuristics, people base their interpretation on a "good-enough" interpretation of the information. In other words, people do not take all information into account, but settle for a representation of the input that fits with their expectation based on their world knowledge. Ferreira (2003) and Ferreira and Patson (2007) have claimed that current models of language are missing an architectural component that explains cases in which heuristic processing is engaged. Relevant in this context, it has been shown that P600 is sensitive to heuristic factors (e.g., Coulson et al., 1998a; Vissers et al., 2007). This is indicated by the fact that semantic reversal anomalies like "The cat that fled from the mice" elicit a P600 effect compared to the based on world knowledge expected event "The mice that fled from the cat" (Kolk et al., 2003; Kim and Osterhout, 2005; Van Herten et al., 2005). Given that semantic reversals are syntactically unambiguous, they allow an assessment of the contribution of heuristic processing to the mood-related P600 modulation. Therefore, Vissers et al. (2013) investigated the effect of emotional state on the processing of semantic reversal anomalies. The main result was that for P600, a mood by semantic plausibility interaction was obtained. The interaction reflected a widely distributed P600 effect for the happy mood condition vs. absence of a P600 effect for the sad mood condition. Based on the fact that semantic reversal anomalies are syntactically unambiguous, the P600 modulation by mood cannot be explained by syntactic factors (Scenario 1). A direct statistical comparison of the emotion effect on the two kinds of anomaly revealed that the effect of mood, as reflected by modulations in P600, on the processing of semantic reversal anomalies was similar to the effect of mood on the processing of subject– verb agreement errors. Taken together, the results of Vissers et al. (2007, 2013) support the claim that heuristics play an important role in the mood by language interactions. Based on the assumption that language users expect to read syntactically correct sentences (Coulson et al., 1998a,b; Vissers et al., 2010, 2013) the in the present article reported increase in P600 effect in happy mood thus could reflect an increased use of heuristics, whereas the reduction of P600 effect in sad mood could reflect a reduced use of heuristics. Nota bene: that the mood manipulation only led to a reduction in P600 for sad as compared to happy mood when attention was directed at syntactic features (and not in the physical task) seems to fit well with a heuristic account of the interplay between mood and syntactic processing. After all, if people expect sentences to be syntactically correct, heuristics mainly play a role in the syntactic condition and not in the physical condition.

# Caveats

The above claims can only be made if it can be shown that the observed differences in ERP pattern between emotional states and task conditions could not be attributed to other factors.

One point to discuss regarding the design of the present study, is that to avoid a double violation on the critical verb (i.e., an incorrect verb that was in a deviant letter size), the change in letter size (lowercase instead of uppercase) was only present in the filler sentences. The reason for this was that physically deviant words have been shown to elicit a P3b, a positive component with an average latency of 300 ms (see e.g., Kutas and Hillyard, 1980; Donchin, 1981). Relevant in this context, Arbel et al. (2011) investigated the influence of a double violation on the N400 and P3b. For words that were both semantically and physically deviant, a smaller P3b and a larger N400 was elicited than for words that only contained a physical violation or a semantic violation, respectively. Arbel et al. (2011) propose that this modulation of P3b and N400 effects reflected that either attentional resources were allocated to the semantic rather than the physical characteristics of the stimuli, or that fewer resources were allocated for the processing of the physical characteristics because these were task-irrelevant. In the present study, we ruled out that a physical deviation on the critical verb would result in extra attention for the syntactic correctness of the sentence. By manipulating the letter size only in the filler sentences, we made sure that the physical task manipulation indeed resulted in less attention for the syntactic structure of the sentence. This, however, leads to a difference in the present study between tasks. While in the syntactic task the critical verb was task-relevant, the same set of critical words in the physical task was not task-relevant. This likely has affected the P600 in the present study across tasks, given that the P600, like the P3b, is sensitive to task demands with larger P600 amplitudes to task-relevant stimuli (e.g., Donchin, 1981). The in the present study observed main effect of task, therefore, could partly be due to these differences in task relevance of the critical verb in the syntactic vs. physical task. However, most important for the present purposes, this difference between tasks does not affect the within-task comparison of the ERP patterns across the two mood conditions.

Another consideration is that task difficulty was not controlled across tasks. This resulted in differences in overall RTs and errors between the syntactic and the physical task. There was an increase in both RTs and errors in the syntactic task as compared to the physical task. The behavioral data show that task difficulty was not matched between tasks. Could this difference in task difficulty – as reflected by differences in RTs – explain the present ERP results? It is important to point out that a different pattern was observed for the P600- and the RT-measure. While P600 amplitude across mood conditions was reduced for correct as compared to syntactically incorrect verbs, overall RTs were longer to correct than incorrect verbs. This speaks against an explanation of the present P600 pattern in terms of RT. A similar increase in RT for correct as compared to incorrect verbs has been reported by Kolk et al. (2003) and Vissers et al. (2007). In line with these authors we propose that the difference in RTs in the present study could be explained by the fact that in the case of the incorrect sentences, participants already knew at the critical verb that the sentences were incorrect. In contrast, in the case of the correct sentences they had to wait until they had read the last word of the sentence before they could know for sure that the sentence was correct. Mood by task by syntactic correctness interactions were found for RT and P600. Closer inspection shows that the three-way interactions reflect different patterns for the two measures. For RT the interaction indicates that the correctness effect (an increase in RT for correct compared to incorrect sentences) in the syntactic task was larger for happy mood than for sad mood. Note that for the physical task, no correctness effect was found for RT, neither for the happy mood nor for the sad mood condition. In contrast, for P600 the three-way interaction reveals another picture. Importantly, for P600 correctness effects (smaller P600 amplitude for syntactically correct than incorrect verbs) were present both in the syntactic task and in the physical task as well as across mood conditions. Thus although mood by task by correctness interactions were obtained for RT and P600 the underlying data patterns are different. Last but not least we would like to point at another essential difference between behavioral and electrophysiological measures. ERPs track the language processes of interest online, time-locked to the critical verb. In contrast, the behavioral data are measured offline after the sentence-final word. The mean difference in time between the onset of the critical verb and the offset of the sentence-final word was 1666 ms. The behavioral response, thus, followed the online ERP response to the critical verb by more than 1.5 s. Based on this difference in the timing of the behavioral and ERP response – and most importantly based on the arguments presented above – we consider it highly unlikely that the in the present study observed differences in ERPs between conditions can be attributed in a simple way to differences in RTs.

# Conclusion

To our knowledge this is the first ERP study that demonstrates that emotional state and attention have interactive effects on language comprehension, in particular on the processing of syntactic anomalies. The present ERP results demonstrate that emotional state modulates the P600 effect to syntactic agreement errors. The major novel finding from the present study is that more general factors like attention play a role in the mood-related

# References


modulation of the processing of syntactic anomalies, as reflected by P600. Directing attention to the syntactic level reduced the impact of emotional state on the processing of syntactic anomalies. However, the effect of emotional state was not abolished as evident from the fact that, as in the Vissers et al. (2010) study, the P600 effect was smaller for the sad mood as compared to the happy mood condition. Also, emotional state modulated the effect that attention has on language processing. This is indicated by the fact that the task manipulation only affected the syntactic P600 effect when participants were in a happy mood and not when they were in a sad mood. That emotion and attention interact with syntactic processing supports interactive views of language and further challenges modular theories of language comprehension.

Exploration of the relationship between emotion, attention and processes of language comprehension is still in its infancy. The challenge for future studies is to shed light on the different mechanisms that mediate the influence of emotional state and focus of attention on syntactic processing. Future studies will have to take into account that attention can play a modulating role in the interplay between language and emotion.

# Acknowledgments

This research was supported by the Donders Centre of Cognition. DC and MV designed the experiment. MV and JT conducted the experiments. MV, DC and CV analyzed the data. MV, DC and CV wrote the paper. We thank Daniel Fitzgerald for enabling us to use the MIP and the ERG group for their technical support, in particular Pascal de Water and Jurjen van der Helden. We are grateful to Uli Chwilla for preparing the figures.


Fodor, J. (1983). *Modularity of Mind*. Cambridge, MA: MIT Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Verhees, Chwilla, Tromp and Vissers. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Dynamic Effects of Self-Relevance and Task on the Neural Processing of Emotional Words in Context

*Eric C. Fields1,2\* and Gina R. Kuperberg1,2*

*<sup>1</sup> Department of Psychology, Tufts University, Medford, MA, USA, <sup>2</sup> Athinoula A. Martinos Center for Biomedical Imaging and Department of Psychiatry, Massachusetts General Hospital, Charlestown, MA, USA*

We used event-related potentials (ERPs) to examine the interactions between task, emotion, and contextual self-relevance on processing words in social vignettes. Participants read scenarios that were in either third person (other-relevant) or second person (self-relevant) and we recorded ERPs to a neutral, pleasant, or unpleasant critical word. In a previously reported study (Fields and Kuperberg, 2012) with these stimuli, participants were tasked with producing a third sentence continuing the scenario. We observed a larger LPC to emotional words than neutral words in both the self-relevant and other-relevant scenarios, but this effect was smaller in the self-relevant scenarios because the LPC was larger on the neutral words (i.e., a larger LPC to self-relevant than other-relevant neutral words). In the present work, participants simply answered comprehension questions that did not refer to the emotional aspects of the scenario. Here we observed quite a different pattern of interaction between self-relevance and emotion: the LPC was larger to emotional vs. neutral words in the self-relevant scenarios only, and there was no effect of self-relevance on neutral words. Taken together, these findings suggest that the LPC reflects a dynamic interaction between specific task demands, the emotional properties of a stimulus, and contextual self-relevance. We conclude by discussing implications and future directions for a functional theory of the emotional LPC.

Keywords: emotion, ERP, language, late positive potential (LPP), late positive component (LPC), self-relevance, perspective, task

# INTRODUCTION

Emotions have been described as "relevance detectors" (Frijda, 1986): if something in the environment is detected as being emotionally valenced or arousing, this indicates that it requires attention and further evaluation. Intuitively, however, what seems salient and worthy of further evaluation is influenced not only by the inherent properties of a particular emotional stimulus, but also by its perceived relevance to the comprehender given her current situation and goals. For example, when presented in isolation, the word "gun" may have negative connotations for some people, but positive connotations for others.1 However, almost anyone will find a gun pointed

#### *Edited by:*

*Cornelia Herbert, University of Ulm, Germany*

#### *Reviewed by:*

*Yang Zhang, University of Minnesota, USA Anna Hatzidaki, University of Athens, Greece*

#### *\*Correspondence: Eric C. Fields*

*eric.fields@tufts.edu*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 26 March 2015 Accepted: 15 December 2015 Published: 13 January 2016*

#### *Citation:*

*Fields EC and Kuperberg GR (2016) Dynamic Effects of Self-Relevance and Task on the Neural Processing of Emotional Words in Context. Front. Psychol. 6:2003. doi: 10.3389/fpsyg.2015.02003*

<sup>1</sup>It has been pointed out to us that the idea of having a generally positive view of guns may seem strange in some cultural contexts, but we can assure readers that in the U.S. gun ownership is simply a hobby for many people and, for better or worse, a generally positive view of guns is not uncommon.

at him- or herself to be a negative experience, and this will likely be evaluated differently from seeing a gun pointed at a stranger. Further, our context and goals matter: a soldier will respond differently to the sight of someone with a gun than an unarmed civilian. In the present work, we used event-related potentials (ERPs), a direct online measure of neural activity, to ask how the interaction between goals (manipulated by task demands), emotion, and self-relevance influences our allocation of neural processing resources to emotional words within simple social vignettes that referred either to the comprehender or to another protagonist.

Our focus was on an ERP component known as the late positive component (LPC). The LPC has a parietal scalp distribution, begins at around 400–500 ms from stimulus onset, and extends for several hundred milliseconds (for a review, see Hajcak et al., 2012).2 It is generally larger to emotional than neutral stimuli and it is seen to both pictures (Olofsson et al., 2008; Hajcak et al., 2010) and words (Kissler et al., 2006; Citron, 2012). Its amplitude is enhanced by tasks that draw attention to emotional features of stimuli (e.g., Naumann et al., 1997; Fischler and Bradley, 2006; Schupp et al., 2007; Holt et al., 2009), and, more recently, it has also become clear that task demands can also influence the sensitivity of the LPC to different dimensions of emotional stimuli (Delaney-Busch et al., in press; see also Fischler and Bradley, 2006; Bayer et al., 2012). Importantly, the LPC evoked by emotional stimuli is not only influenced by their intrinsic emotional properties, but also by the context in which they are encountered. For example, the LPC evoked by emotional words can be influenced by the local sentence or discourse contexts in which they appear (e.g., Bartholow et al., 2001; Wang et al., 2013), as well as the broader contextual environment (e.g., Crites et al., 1995; Fogel et al., 2012). While the precise neurocognitive functions indexed by the emotion-sensitive LPC are still somewhat unclear, it is thought to reflect the capture and allocation of attentional resources by motivationally significant stimuli, leading to prolonged neural processing.

In a recent study, we examined the impact of contextual *selfrelevance* on the LPC evoked by emotional and neutral words as participants read short social vignettes, with the goal of producing verbal continuations for each scenario (Fields and Kuperberg, 2012). The scenarios included neutral, pleasant, or unpleasant words, which potentially changed the meaning of the entire vignette, for example: "A man knocks on Sandra's hotel room door. She sees that he has a gift/tray/gun in his hand." In half of the scenarios the situations were made self-relevant by changing the context to the second person (see Brunyé et al., 2009), for example: "A man knocks on your hotel room door. You see that he has a gift/tray/gun in his hand."

As expected, ERPs recorded on the critical words (CW; underlined above) showed a main effect of emotion with the pleasant and unpleasant words evoking a larger LPC than the neutral words. However, the interaction between emotion and self-relevance showed an unexpected but interesting result: the amplitude of the LPC evoked by neutral words was larger in the self-relevant scenarios than in the other-relevant scenarios. Self-relevance, however, had no effect on pleasant or unpleasant words.

We noted that many of our neutral scenarios could be interpreted as ambiguous in valence. Consider, for example, the scenario, "After dinner, you are involved in a discussion. Many of your remarks surprise people." Here, it may not be immediately obvious why your remarks surprised people: was it because your comments were unexpectedly good, bad, or just unusual?3 Consequently, we argued that the selective effect of self-relevance on the neutral words may have been driven by participants' continued attempts to assess their emotional valence (for effects of emotional ambiguity in ERPs, including the LPC, see Hirsh and Inzlicht, 2008; Gu et al., 2010; Tritt et al., 2012). In the self-relevant scenarios, participants likely invested additional processing resources to resolve this inherent valence ambiguity because of the additional demands of constructing a continuation consistent with their self-concept (Swann, 2011),4 whereas they had little motivation to go beyond the first interpretation that came to mind in the other-relevant scenarios. This difference in the demands imposed by the task in response to any valence ambiguity in the neutral scenarios was reflected by increased processing in the LPC time window.

If it is indeed the case that the particular pattern of LPC modulation observed in our previous study was driven by the interaction between emotion, self-relevance, and specific task goals, then we should see a different pattern of findings with different task demands. The aim of the current study was to determine if this was the case. To this end, a different set of participants viewed the same stimuli as we used in our previous study, but with different task requirements. Instead of producing a verbal continuation for each scenario, they read each scenario for comprehension and answered intermittent questions which encouraged deep discourse comprehension, but which did not refer specifically to the valenced aspects of the scenarios (see Holt et al., 2009; Delaney-Busch and Kuperberg, 2013; Paczynski et al., 2014; Fields and Kuperberg, 2015; Xiang and Kuperberg, 2015).

We hypothesized that, without the need to produce a specific continuation, there would be less demand on participants to interpret or disambiguate valence, and that we would therefore see a different allocation of neural processing resources, as

<sup>2</sup>Some research suggests there may be multiple, related emotion-sensitive late positivities (Delplanque et al., 2006; Foti et al., 2009; MacNamara et al., 2009; Hajcak et al., 2012; Matsuda and Nittono, 2015). In practice these are often difficult to distinguish, and indeed most widely-distributed later components of the ERP have multiple underlying neural sources (Luck, 2014). Here we use the term "late positive component" as a general term for emotion-sensitive positivities peaking after approximately 400 ms.

<sup>3</sup>As another example: "You have been in your current job for a over a year. You learn that you are getting a bonus/transfer/pay-cut this month." While a bonus or pay-cut are clearly good and bad respectively, a transfer is more ambiguous in this particular context. Did you want a transfer? Is the transfer in response to good or bad performance? On closer examination, it turned out that many of the neutral scenarios were ambiguous in similar ways. See **Table 1** and the Supplementary Materials to Fields and Kuperberg (2015) for more examples.

<sup>4</sup>Independent valence ratings of these responses showed that responses to selfrelevant scenarios were more positive than those to non-self-relevant scenarios, consistent with the widely observed self-positivity bias (Taylor and Brown, 1988; Alicke and Govorun, 2005; Fields and Kuperberg, 2015). Notably, this also held true specifically for neutral scenarios. These results support the assertion that participants tended to produce continuations that were consistent with their self-concept.

reflected by the LPC. Specifically, we predicted that attention and processing resources would simply be directed to the most inherently motivationally relevant stimuli—in this case, the selfrelevant *emotional* words. We therefore hypothesized that selfrelevance would amplify the classic effect of emotion on the LPC (leading to larger differences between emotional and neutral words). Such a finding would be in line with previous studies examining the effects of self-relevance in two-word noun phrases with no overt task, which reported effects of emotion in the self-relevant condition, but not the other-relevant conditions (Herbert et al., 2011a,b), as well as with other studies reporting similar interactions between self-relevance and emotion (Li and Han, 2010; Shestyuk and Deldin, 2010; Schindler et al., 2014; see Discussion later in this manuscript).

Of course, as noted above, the idea that the LPC is influenced by task demands is not new. Indeed, some have suggested that it is related to the well-known P300 ERP component (see Discussion), which is evoked by perceptual oddball stimuli, particularly when they are task-relevant. Our aim here, however, was to understand whether and how task influences prolonged neural processing of emotional words in self-relevant contexts. Addressing this question is important because in real-world contexts the selfrelevance and emotional impact of stimuli (and their interaction) will vary depending on the goals we have in a particular situation.

In order to directly compare the pattern of findings on the LPC using this comprehension task with those seen in our previous study using a production task (Fields and Kuperberg, 2012), we combined both datasets in a model in which task was analyzed as a between-subjects factor. To allow for a full and complete comparison between the studies, we report information from both studies in the Sections "Materials and Methods" and "Results" that follow. However, all methods and results for the production study are the same as those reported in Fields and Kuperberg (2012) and are reported in greater detail there.

# MATERIALS AND METHODS

# Participants

Participants were recruited from postings on a university community website (tuftslife.com). As reported in Fields and Kuperberg (2012), 29 people originally participated in the production task experiment; three participants were excluded from analysis due to excessive artifact in the EEG, leaving 26 participants in the final analysis (15 females) between the ages of 18 and 29 (*M* = 20.7, *SD* = 2.30). Twentyeight people originally participated in the comprehension task experiment; four participants were excluded from analysis due to excessive artifact in the EEG, leaving 24 participants (17 females) between the ages of 18 and 23 (*M* = 19.3, *SD* = 1.6).5 No individual participated in both experiments. All participants were right-handed native English speakers (having learned no other language before age 5) with no history of psychiatric or neurological disorders. Participants were paid for their participation and provided informed consent in accordance with the procedures of the Institutional Review Board of Tufts University.

# Stimuli

Stimuli are described in greater detail in Fields and Kuperberg (2012). Briefly, 222 sets of two-sentence scenarios were developed with Emotion (pleasant, neutral, and unpleasant) and Self-Relevance (self and other) conditions crossed in a 3 × 2 factorial design. The first sentence introduced a situation involving one or more people, only one of whom was specifically named (the protagonist). The situation was always neutral or ambiguous in valence. The second sentence continued the scenario and was the same across all emotion conditions except for one word, the CW, which was pleasant, neutral, or unpleasant. The part of speech of the CW was the same across the three Emotion conditions for each scenario: 37 of the scenarios had noun CWs, 54 had verb CWs, and 131 had adjective CWs. The named protagonist was male half the time and female the other half of the time. To create the self-relevant conditions, this named person was changed to "you" (previous work has shown that grammatical person is an effective manipulation of self-relevance: Brunyé et al., 2009). See **Table 1** for examples. The same set of stimuli was used in both experiments.

#### Critical Word and Scenario Norms and Ratings

A series of norming studies of the stimuli were carried out via the internet. Inclusion criteria for participants in these ratings studies were the same as for the ERP experiments (see above). Means and standard deviations of all norms and ratings can be found in **Table 2**. Statistical analyses for CW length, CW concreteness, cloze probability, and constraint can be found in Fields and Kuperberg (2012). Briefly, stimuli were matched across conditions on all these features, except for concreteness where neutral words were slightly more concrete than pleasant and unpleasant words (this did not account for the unique effects of self-relevance on neutral words under the production task, see Fields and Kuperberg, 2012).

Valence and arousal ratings were gathered for both the CWs in isolation and the scenarios (cut off after the CW). Valence ratings were as expected for both CWs and scenarios: the pleasant condition was rated as more pleasant than the neutral condition, which was rated as more pleasant than the unpleasant condition [*F*s > 1000, *p*s < 0.001]. In the scenarios, self-relevance amplified these differences, making pleasant scenarios more positive and unpleasant scenarios more unpleasant [Emotion × Self-Relevance interaction: *F*(2,442) = 26.50, *p* < 0.001].

As expected, there was a main effect of Emotion for the both the CW and scenario arousal ratings [*F*s > 70, *p*s < 0.001], with pleasant and unpleasant stimuli being rated as more arousing

<sup>5</sup>These participants, and the ERPs we report from them, are the same as those reported in another paper by Fields and Kuperberg (2015), which focused on how the N400 was modulated in response to the self-positivity bias. We decided to discuss the N400 and LPC findings in separate manuscripts because, as can be seen by comparing the present paper to Fields and Kuperberg (2015), we see the results on these two components as being relevant for two different literatures and sets of theoretical questions. However, it is important to note that the results on the N400

and the LPC displayed qualitatively different patterns that rule out component overlap explaining either effect (see discussion in Fields and Kuperberg, 2015).



*The critical word is underlined (but did not appear underlined the actual* 

#### TABLE 2 | Stimuli ratings and characteristics.


*Means are shown with standard deviations in parentheses. Cloze probability and constraint are represented as the percentage of total responses from 29 subjects. Concreteness, valence, and arousal were all rated on seven point scales from least concrete (most abstract), very unpleasant, and least arousing, to most concrete, very pleasant and most arousing, respectively. "–" indicates that, for ratings conducted on the words in isolation from the scenario contexts, the values were the same in the self conditions as in the other conditions since the identical CWs were used (except for in six scenarios in which the verb was conjugated differently).* ∗*Some words did not exist in the HAL database and these were represented as null values in our calculations.*

than neutral stimuli. The comparison of pleasant and unpleasant stimuli differed between the CW and scenario ratings: pleasant CWs were rated as more arousing than unpleasant CWs, but unpleasant scenarios were rated as more arousing than pleasant scenarios. There was no Emotion by Self-Relevance interaction in the scenario ratings [*F*(2,442) = 0.02, *p* = 0.980], but there was a main effect of Self-Relevance [*F*(1,221) = 162.71, *p* < 0.001] due to self-relevant scenarios being rated as more arousing than other-relevant scenarios.

# Procedure

#### Stimulus Presentation

In both experiments, scenarios were counterbalanced such that each scenario appeared in a different condition in each of six lists (thus appearing in all conditions across lists), and participants were randomly assigned to a list. Trials were randomly ordered within each list and the same lists with the same trial orderings were used in both experiments. All trials began with the word "READY" until the participant pressed a button to begin the trial. The first sentence then appeared in full until the participant pressed a button to advance. The second sentence began with a fixation cross displayed for 500 ms, followed by an interstimulus interval (ISI) of 100 ms, followed by each word of the sentence presented individually for 400 ms with an ISI of 100 ms. The final word of the scenario appeared on the screen for a longer duration of 750 ms, 400 ms ISI.

#### Task

In the first experiment, as described in Fields and Kuperberg (2012), a production task was used. Participants were instructed to verbally produce a single short sentence that followed naturally from the sentences they had just read (i.e., that continued the story). Participants were instructed to continue second-person (self-relevant) scenarios as if they were about themselves (i.e., in the first person). After the final word of each scenario, a question mark appeared on the screen, cuing participants to produce their verbal responses. Participants spoke into a microphone so that the experimenter was able to listen to their responses to ensure that they were in keeping with the content of each scenario. In addition, after 11 scenarios (randomly interspersed among each list), a yes or no comprehension question (as described below) followed the participant's response, providing another objective measure of comprehension.

In the second experiment, the production task was eliminated and participants simply answered intermittent yes/no comprehension questions that appeared after forty of the scenarios (randomly interspersed). The question stayed on the screen until the participant gave an answer via button press. The question and its correct answer were the same across all conditions except where the self-relevance manipulation required changes. None of the questions referred to the valenced aspects of the scenarios. For example, the scenario "Casper is/You are new on campus. Everyone thinks he is/you are quite idiosyncratic/clever/dumb compared to most people." was followed by the question "Did Casper/you go to this school last year?" with the correct answer being "no".

#### ERP Acquisition and Processing

All equipment, acquisition parameters, and processing steps were the same between the two experiments. The EEG response was recorded from 29 tin electrodes in an elastic cap (Electro-Cap International, Inc., Eaton, OH; see **Figure 1**) referenced to the left mastoid. Additional electrodes were placed below the left eye and at the canthus of the right eye to monitor vertical and horizontal eye movements. The impedance was kept below 2.5 k for mastoid electrodes, 10 k- for EOG electrodes, and 5 k for all other electrodes. The EEG signal was amplified by an Isolated Biometric Amplifier (SA Instrumentation Co., San Diego, CA, USA), band pass filtered online at 0.01–40 Hz, and continuously sampled at 200 Hz.

The EEG was collected and processed using in-house software (available at: http://neurocoglaboratory.org/ERPSystem.htm). Segments from 100 ms before onset to 1100 ms after onset of each event were obtained. Trials with muscular and ocular artifact were identified and discarded using three algorithms:

the first returns the number of time points within a given amplitude range of the minimum or maximum point of an epoch and is used to monitor for amplifier blocking or signal loss (i.e., a flat line); the second returns the difference between the maximum and minimum point of an epoch at the two EOG channels (independently) to monitor for horizontal and vertical eye movement; the third returns the difference of the mean difference and maximum difference between the electrode under the left eye and the electrode on the forehead above this eye and is used to identify blinks (which are characterized by opposite polarity shifts in these two channels). Appropriate thresholds for each of these algorithms were determined for each subject via visual inspection of the raw data (but were the same across all trials within each subject). Overall, 7.7% and 7.5% of trials were rejected for artifact for the production and comprehension tasks, respectively. The rejection rate did not differ across the Self-Relevance, Emotion, or Task conditions and there were no interactions between these factors [*F*s < 2.5, *p*s > 0.09].

#### ERP Analysis

For analysis purposes, the two studies were combined and Task was treated as a between-subjects variable. Averaged ERPs, time-locked to the CWs, were formed from trials remaining after artifact rejection and low pass filtered with a halfamplitude cutoff at 15 Hz. In order to examine how the modulation of the LPC varied across the scalp, the scalp was subdivided into three-electrode regions along its anterior– posterior distribution, at both mid-line and peripheral sites. Two omnibus ANOVAs, one covering mid-regions (dark gray in **Figure 1**) and another covering peripheral regions (light gray in **Figure 1**), were conducted with Emotion, Self-Relevance, Region, and Hemisphere (peripheral regions only) as withinsubjects factors and Task as a between-subjects factor. For all tests of significance the Greenhouse and Geisser (1959) estimation of ε was used to correct the degrees of freedom (the original degrees of freedom are reported in the text). A significance level of alpha = 0.05 was used for all a priori comparisons.

# RESULTS

The LPC was quantified by calculating the mean amplitude from 500 to 800 ms relative to a 100 ms prestimulus baseline. Analyses of other time windows are available in Fields and Kuperberg (2012) for the production task and Fields and Kuperberg (2015) for the comprehension task (see Footnote 5).

Combining both datasets, the Emotion × Self-Relevance × Task interaction was significant in both the mid-regions omnibus [*F*(2,96) = 7.49, *p* = 0.001, η<sup>2</sup> = 0.135] and the peripheral regions omnibus [*F*(2,96) = 5.90, *p* = 0.004, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.109]. The Emotion <sup>×</sup> Self-Relevance <sup>×</sup> Task <sup>×</sup> Region interaction was marginally significant in the mid-regions omnibus [*F*(8,384) <sup>=</sup> 2.32, *<sup>p</sup>* <sup>=</sup> 0.063, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.046] and not significant in the peripheral regions omnibus [*F*(2,96) = 0.72, *<sup>p</sup>* <sup>=</sup> 0.489, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.015]. Neither of these effects was further modulated by hemisphere in the peripheral regions ANOVA [*F*s < 1.6, *p*s > 0.20].

Below we follow-up these interactions by examining the Emotion × Self-Relevance interaction and Emotion × Self-Relevance × Region interaction in each task group separately. Additional analyses (including all main effects and interactions both in the combined analysis and each task group separately) are available as Supplementary Materials.

# Production Task

As previously reported (Fields and Kuperberg, 2012), the Emotion × Self-Relevance interaction was significant in the midregions omnibus ANOVA [*F*(2,50) <sup>=</sup> 4.02, *<sup>p</sup>* <sup>=</sup> 0.026, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.138] and marginally significant in the peripheral regions omnibus [*F*(2,50) <sup>=</sup> 2.70, *<sup>p</sup>* <sup>=</sup> 0.078, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.098]. Both the mid-regions and peripheral regions omnibus ANOVAs showed significant effects of Emotion and/or significant Emotion × Region interactions in both the self-relevant and other-relevant scenarios, but these effects were larger in the other-relevant scenarios (see **Figure 2**).

This pattern, however, was driven entirely by the neutral words: self-relevant neutral words elicited a larger LPC than other-relevant neutral words (thus making them more similar to pleasant and unpleasant words) [mid-regions omnibus: *<sup>F</sup>*(1,25) <sup>=</sup> 20.18, *<sup>p</sup>* <sup>&</sup>lt; 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.447]. In contrast, pleasant and unpleasant words did not differ by self-relevance [*F*s < 0.4, *p*s > 0.55]. The effect of self-relevance for neutral words further interacted with Region [*F*(4,100) <sup>=</sup> 5.94, *<sup>p</sup>* <sup>=</sup> 0.008, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.192] and follow-up ANOVAs in individual regions showed that the effect was strongest in the frontal region and was also significant in the prefrontal, central, and parietal regions. See Fields and Kuperberg (2012) for additional details.

In the comprehension task study, the interaction between Emotion and Self-Relevance was significant in both the midregions ANOVA [*F*(2,46) = 3.73, *p* = 0.032, η<sup>2</sup> = 0.140] and the peripheral regions ANOVA [*F*(2,46) = 3.32, *p* = 0.045, η<sup>2</sup> = 0.126].

However, the pattern of the effect was quite different from that observed with the production task: the interaction was primarily driven by a significant effect of Emotion in the self-relevant scenarios [mid-regions: *F*(2,46) = 3.86, *p* = 0.029, η<sup>2</sup> = 0.144; peripheral regions: *<sup>F</sup>*(2,46) <sup>=</sup> 5.09, *<sup>p</sup>* <sup>=</sup> 0.011, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.181], with no significant effect of Emotion in the other-relevant scenarios [midregions: *<sup>F</sup>*(2,46) <sup>=</sup> 2.26, *<sup>p</sup>* <sup>=</sup> 0.117, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.089; peripheral regions: *<sup>F</sup>*(2,46) <sup>=</sup> 2.96, *<sup>p</sup>* <sup>=</sup> 0.064, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.114], see **Figure 3**. Fisher– Hayter pairwise comparisons within the self-relevant scenarios confirmed that both pleasant and unpleasant CWs elicited a larger LPC than neutral CWs, but that the amplitude of the LPC to pleasant and unpleasant words did not differ.

This effect of Emotion (within self-relevant scenarios) had a centro-parietally centered, but broad, distribution (see **Figure 3**). It did not interact with the Region factor in the mid-regions ANOVA [*F*(8,184) <sup>=</sup> 2.05, *<sup>p</sup>* <sup>=</sup> 0.110, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.082]. In the

peripheral regions, the Emotion × Region interaction was significant [*F*(2,46) <sup>=</sup> 4.16, *<sup>p</sup>* <sup>=</sup> 0.023, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.153] and follow-ups showed that the effect of Emotion was significant in the posterior region [*F*(2,46) <sup>=</sup> 7.60, *<sup>p</sup>* <sup>=</sup> 0.002, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.248], but not the frontal region [*F*(2,46) <sup>=</sup> 1.77, *<sup>p</sup>* <sup>=</sup> 0.184, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.071]. There were no significant interactions with the hemisphere factor [*F*s < 2.6, *p*s > 0.09].

When we broke down this interaction by examining the effect of Self-Relevance at each level of Emotion, the effect of Self-Relevance did not reach significance at any level of Emotion. There was a marginally significant effect of Self-Relevance on the pleasant words [*F*(1,23) = 3.47, *p* = 0.075, η<sup>2</sup> = 0.131] and no effect on neutral or unpleasant words [*F*s < 1.4, *p*s > 0.25].

# DISCUSSION

The aim of this work was to examine the influence of task demands and self-relevance on processing emotional words. We presented two-sentence social vignettes that were either contextually self-relevant or other-relevant and that contained a neutral, pleasant, or unpleasant CW in the second sentence. We compared our previous findings (Fields and Kuperberg, 2012) using a production task, with findings using a deep comprehension task (reported here). We observed an interaction between self-relevance and emotion in both studies, but the nature of this interaction was quite different depending on the task. With the production task, we observed a larger LPC to emotional words than neutral words in both the self-relevant and other-relevant scenarios, but this effect was smaller in the self-relevant scenarios because the LPC was relatively larger on neutral words (a larger LPC to self-relevant than non-self relevant neutral words). With the comprehension task, we only observed a larger LPC to emotional vs. neutral words in the self-relevant scenarios, and there was no effect of self-relevance on neutral words.

Previous work has shown that manipulations of both task (e.g., Naumann et al., 1997; Fischler and Bradley, 2006; Holt et al., 2009) and self-relevance (see Discussion below; Li and Han, 2010; Shestyuk and Deldin, 2010; Herbert et al., 2011a,b; Schindler et al., 2014) can enhance or attenuate the effect of emotion on the LPC. Other work has shown that which particular emotional features most strongly modulate the LPC can depend on which features a task draws attention to (Delaney-Busch et al., in press; see also Fischler and Bradley, 2006; Bayer et al., 2012). The present findings show something qualitatively different from either of these. The pattern of effects we observed went beyond simply enhancement or attenuation of the effects of emotion or even the relative effects of valence vs. arousal. Instead our studies show that that participants' goals, the social relevance of the context, and the emotional properties of stimuli can interact in complex ways with regard to how and when neural resources are allocated in processing social situations as reflected by the LPC.

Below, we first offer an explanation of how and why the particular demands of the production and comprehension tasks led to the different patterns of LPC modulation we observed. We then discuss other possible differences between the two tasks that might have potentially influenced our findings. Finally, we turn to the more general implications of our findings for a functional interpretation of LPC, and briefly discuss some future directions of research.

# The Effect of Self-Relevance on the LPC Evoked by Emotional Vs. Neutral Stimuli

As described in the Section "Introduction" and in our previous work (Fields and Kuperberg, 2012), we suggest that the production task used in our previous study was critical in inducing the enhanced effect of self-relevance on neutral (but not emotional) words. This is because it encouraged participants to disambiguate the valence of the CW in order to produce a sensible and consistent continuation of the scenario as a whole. This was particularly important (or demanding) in the self-relevant condition because of participants' desire to produce a continuation that was consistent with their selfconcept (Swann, 2011). We argued that the larger LPC reflected this prolonged enhanced neural processing to self-relevant neutral.6

In the present study, participants were required to comprehend each sentence deeply (in order to respond to comprehension questions). However, these questions did not refer to the valenced aspects of the scenarios, and there was no additional requirement to produce a specific continuation for each scenario. Therefore, there was nothing to motivate participants to disambiguate the valence of the neutral words. In this situation, we suggest that processing resources were simply allocated to the stimuli that were most inherently motivationally relevant and attention grabbing. These were the self-relevant emotional CWs. Indeed, in the self-relevant scenarios, the LPC was larger to the emotional than the neutral words.

This effect of self-relevance enhancing the effect of emotion on the LPC in the comprehension task is consistent with previous behavioral work that reports greater changes in participants' emotional states after they read self-relevant emotional texts vs. non-self-relevant emotional texts (Brunyé et al., 2011). Similarly, the results of our rating studies (see Materials and Methods) showed that self-relevance led to pleasant stimuli being rated as more positive and unpleasant stimuli being rated as more negative. This pattern is also consistent with some previous ERP studies that have examined the interaction between selfrelevance and emotion. For example, Shestyuk and Deldin (2010) saw differences between pleasant and unpleasant words (they did not include neutral words) on the LPC when words were judged for self-relevance but not when they were judged for their relevance to Bill Clinton. In a related study, Schindler et al. (2014) showed participants (with no overt task) trait adjectives under either a) a condition where a second person was supposedly judging whether the adjective applied to the participant or b) a condition where a computer was simply randomly presenting the words. They only found effects of emotion for words in the judgment (i.e., self-relevant) condition. In work more similar to our own, Herbert and colleagues (Herbert et al., 2011a,b) report two studies in which participants passively read (with no additional task) emotional and neutral words preceded by first-person and third-person pronouns. They found effects of emotion on the LPC only for words preceded by the firstperson pronouns (see also Li and Han, 2010). Thus, in all these studies, just as in the present study, effects of emotion were seen in the self-relevant condition, but not in the nonself-relevant conditions. We now turn to possible reasons for this.

# The Effect of Emotion on the LPC Evoked by Non-Self-Relevant Stimuli

With the production task, we saw a larger LPC to emotional than neutral words following both the self-relevant *and* otherrelevant contexts. These effects are consistent with a large body of ERP studies that have reported emotion effects on the LPC in single words (reviewed in Kissler et al., 2006; Citron, 2012) and to emotional words in non-self-relevant contexts (e.g., Bartholow et al., 2001; Holt et al., 2009; Bayer et al., 2010; Delaney-Busch

<sup>6</sup>It is worth noting that this effect had a more frontal distribution than the standard posterior effect of emotion on the LPC. This suggests that the nature of the further

processing induced by these particular stimulus and task conditions may have been distinct from that usually seen to emotional stimuli that are not ambiguous. For discussion of this issue, see Fields and Kuperberg (2012).

and Kuperberg, 2013). Thus, it is striking that in the studies described above (Li and Han, 2010; Shestyuk and Deldin, 2010; Herbert et al., 2011a,b; Schindler et al., 2014) and in the present study using the comprehension task, a larger LPC was observed to emotional (vs. neutral) words *only* in self-relevant contexts.

We argue that this apparent discrepancy can be explained within the broad dynamic framework we have been describing. Specifically, we suggest that allocation of attention and resources is not only a function of the inherent emotional salience of stimuli, overt task demands, and self-relevance of the immediate context, but it is *also* a function of the broader context of the environment (in this case, the surrounding stimuli within the experimental context). This influence of the broader environmental context can be understood at an intuitive level. What seems salient and important enough to garner attention in one situation may not be relevant in another: a spider discovered in your living room may dominate your attention under normal circumstances, but if your house is on fire, it's not likely to receive much of your attention. This sensitivity to broader environmental context was recently illustrated in a study by Fogel et al. (2012), who showed that differences between emotional words and neutral words on the LPC disappeared when highly salient taboo words were mixed into the stimuli, presumably because the standard emotional words lost their ability to draw special attention in the presence of the more arousing taboo words (see also Crites et al., 1995).

We suggest that, with the comprehension task in the present work, as well as in previous studies (Li and Han, 2010; Shestyuk and Deldin, 2010; Herbert et al., 2011a,b; Schindler et al., 2014), the non-self-relevant scenarios, even when emotional, lost their ability to draw additional attention in the presence of self-relevant emotional scenarios. In contrast, when emotional properties were task-relevant, as we have argued they were for the production task, attention was allocated to the emotional properties of words across conditions, leading to an enhanced LPC for emotional words regardless of their self-relevance. This explanation, of course, remains somewhat speculative. Future work is needed to systematically explore the effects of task demands on the emotional LPC to non-self-relevant stimuli in the presence of self-relevant stimuli.

# Other Differences Between the Production and Comprehension Tasks

In the discussion above, we attributed the different patterns of ERP modulation across the two tasks to the fact that the production but not the comprehension task encouraged participants to disambiguate the valence of the self-relevant neutral words. We now consider other differences between these two tasks that might have contributed to the different pattern of effects seen in the two experiments.

One possibility is that the production task encouraged deeper semantic processing of the scenarios as a whole than the comprehension task. We think that this difference is unlikely to have driven the different pattern of ERP findings for two main reasons. First, in the comprehension task, participants answered intermittent comprehension questions that required them to deeply comprehend and build a situation model of each discourse scenario. These questions were written such that they required information from different parts of the scenario and often required an inference based on the situation model described by the scenario. This meant that participants could not simply rely on any superficial semantic strategy to correctly answer these questions (see also Holt et al., 2009; Delaney-Busch and Kuperberg, 2013; Paczynski et al., 2014; Fields and Kuperberg, 2015; Xiang and Kuperberg, 2015). Second, while depth of semantic processing can influences ERPs, particularly on the N400 (e.g., Chwilla et al., 1995), it is not clear why it would generate the specific effect we observed on neutral scenarios unless it was because they were harder to process, perhaps due to ambiguity, which is similar to the explanation that we provide.

A second difference between the production and comprehension tasks is that the former required participants to plan their production utterances, whereas this was not necessary with the comprehension task. It is possible that such planning overlapped with the processing reflected in the ERPs we recorded. But this simply raises the question of why such planning would require greater processing specifically in the self-relevant neutral scenarios. We have argued it did so because of the motivation to produce a continuation consistent with the participants' self-concept in the presence of ambiguity.

Third, it is possible that, in reading the self-relevant scenarios, participants failed to adopt a self-relevant perspective in the comprehension task, as they did in the production task (since in the production task they had to produce a continuation specifically about themselves). Once again, however, this does not easily account for our findings. First, there is independent evidence that participants can and do automatically adopt selfrelevant perspectives in reading second person scenarios during comprehension (Brunyé et al., 2009, 2011). In addition, it is difficult to explain other aspects of our results if participants did not interpret the second person scenarios as self-relevant with the comprehension task: what would account for the modulation of the effect of emotion by the self-relevance factor? One might argue that our findings were driven by the requirement to explicitly plan a self-relevant responses in the production task that was enhanced in neutral scenarios an explanation that is again very similar to the one that we offer.

There are surely other important differences between the tasks as well. But any explanation of the present findings must explain how these differences interact with both emotion and self-relevance to produce the specific pattern of findings observed across our two studies. It will be important for future studies to find novel ways to test the explanations presented here.

# Implications and Open Questions

Taken together with the previous literature, our work suggests that the allocation of resources to emotional and selfrelevant stimuli reflected by the LPC is highly dynamic. While previous work has suggested that the LPC may reflect or be modulated directly by the emotional properties of stimuli (e.g., Cuthbert et al., 2000; Olofsson et al., 2008), our results and others clearly show that there is no one-toone relationship between any given property of an eliciting stimulus (valence, arousal, self-relevance, ambiguity, etc.) and LPC amplitude. In this work, we manipulated the potential influence of three factors: (1) the "inherent" emotional salience of the stimulus itself (a function of enduring biological and social motivations), (2) the local context in which a particular incoming stimulus is encountered (in this design, the selfrelevance of the discourse social vignettes), and the 3) the situation-specific goals provided by a particular task. We also discussed the role of a fourth factor: the broader experimental context. As we have discussed, none of these factors is either sufficient or necessary to evoke an LPC effect; nor are their effects simply additive. Rather, they interactively influence LPC amplitude.

This, of course, raises the question of what functional neural mechanism the LPC actually reflects. That is, what neurocognitive process is being modulated by the factors discussed above and how do these factors interact to influence this process? We suggest that one clue into the nature of this mechanism comes from the striking resemblance between the factors known to affect the LPC and the factors that are known to modulate the widely studied P300 component.7 The P300 is a positive-going component that is famously evoked by stimuli that are surprising or unexpected in their experimental context. This effect is modulated by multiple factors including local sequence effects, global probability, contingencies between stimuli, experimental instructions, the perceived value of a stimulus, and task relevance (for reviews of factors affecting the P300, see Donchin and Coles, 1988; Johnson, 1988; Polich, 2012). Importantly, despite its name, the P300 peaks at a range of latencies. Indeed, in response to more complex manipulations, such as those based on the semantic content of words, it tends to peak in the LPC time window (Kutas et al., 1977; reviewed by Donchin and Coles, 1988; Polich, 2012). In other words, the P300 is morphologically quite similar to the LPC, in addition to being sensitive to some of the same manipulations. While the links between the P300 evoked by oddball stimuli and LPC produced by motivationally relevant stimuli have often been noted in the literature (e.g., Citron, 2012; Hajcak et al., 2012; Weinberg et al., 2012), the relationship between these components has not been a topic of direct investigation or in-depth theoretical discussion.

To the extent that the many similarities between the P300 and the emotional LPC go beyond a superficial resemblance, the theoretical literature on the P300 will provide insights into the function of the LPC. A number of functional theories of the P300 have been proposed (Donchin, 1981; Donchin and Coles, 1988; Nieuwenhuis et al., 2005; Polich, 2007; Twomey et al., 2015). Such theories have often related the P300 to a process of maintaining an accurate model of the current environment. The P300 is thought to be evoked to the extent that incoming information leads us to *update* this internal model (the representation of the broader environmental context; Donchin and Coles, 1988). The literature on Bayesian generative models of cognition may offer a more contemporary view of this "context updating" process (see Perfors et al., 2011; Qian et al., 2012; Clark, 2013; Friston et al., 2015). In this framework, task relevance modulates the P300 because our model of the environment is tailored to our goals and motivations—that is, we are trying to build a model of the environment that helps to achieve the goals of whatever task we are engaged in.

It is intuitive that emotional stimuli might also be associated with this sort of model updating process. In complex, noisy, and ever-changing environments, we constantly need to monitor what stimuli are relevant or not, and which actions will be most useful for pursuing our goals. As noted at the beginning of this paper, emotions can act as "relevance detectors" (Frijda, 1986), telling us which information is relevant to our goals and motivations. They therefore indicate which information is most important to integrate into our context model, or when we might need to adapt our current model or switch to a new model. And, as we have shown in the present study, what is relevant will depend on many interacting factors, and the LPC will therefore be sensitive to all these factors—not simply the valence or arousal of the eliciting stimulus.

Future work should further examine the relationship between the P300, LPC, and context updating related processes. One way to do this is to carefully examine how the LPC responds to factors known to modulate the P300, such as stimulus probability. Computational modeling may be helpful to understanding how various factors affecting these components are likely to interact and why. In addition, given the difficulty of identifying when components are the same vs. distinct (Kappenman and Luck, 2012), this work will also likely be aided by examining the LPC (and comparing it to the P300) using techniques with higher spatial resolution such as MEG and fMRI (e.g., Liu et al., 2012; Sabatinelli et al., 2007, 2013), as well as complementary ways to examine the EEG such as time-frequency analyses.

# Summary and Conclusion

In sum, we have shown a complex three-way interaction between the emotional properties of a stimulus, the self-relevance of its local context, and task demands when participants process socially relevant real-world vignettes. When participants were asked to produce sentences to continue each scenario, selfrelevance enhanced the amplitude of the LPC specifically on neutral words. When participants simply answered questions that did not require attention to the self-relevant or emotional aspects of the scenarios, self-relevance enhanced the typical effect of emotion on the LPC. These results suggest that there is no one-to-one relationship between the emotional properties (or self-relevance) of an eliciting event and its effects on neurocognitive processing. They suggest that we allocate attention and processing to emotional stimuli in a highly dynamic fashion that is calibrated to the demands of a given situation, and they support the view that the LPC is triggered by a highly dynamic computational mechanism. One candidate for the function of this mechanism is adapting a current model or switching to a new internal model that best represents our contextual environment in relation to our goals, enabling us to

<sup>7</sup>Following common usage, we use the term P300 to refer to the component sometimes more specifically identified as the "P3b" (see Polich, 2012).

do better job at predicting incoming information in the future, as proposed by theories of the P300.

# ACKNOWLEDGMENTS

This work was supported by NIMH (R01 MH071635) to GK, as well as the Sidney Baer Trust who supported undergraduate students, including Wonja Fairbrother, Sorabh Kothari, Camila

# REFERENCES


Carneiro de Lima, Rohan Natraj, and Erich Tusch, who contributed to data collection and other aspects of the project.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2015.02003


cortical and subcortical functional MRI. *Biol. Psychol.* 92, 513–519. doi: 10.1016/j.biopsycho.2012.04.005


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Fields and Kuperberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Bodily Reactions to Emotional Words Referring to Own versus Other People's Emotions

Patrick P. Weis<sup>1</sup>† and Cornelia Herbert1,2 \* †

<sup>1</sup> Department of Psychiatry, University of Tübingen, Tübingen, Germany, <sup>2</sup> Institute of Psychology and Education, Applied Emotion and Motivation Research, University of Ulm, Ulm, Germany

#### Edited by:

Jerker Rönnberg, Linköping University, Sweden

#### Reviewed by:

Ivilin Peev Stoianov, Centre National de la Recherche Scientifique (CNRS), France Florian Bublatzky, University of Mannheim, Germany

#### \*Correspondence:

Cornelia Herbert cornelia.herbert@uni-ulm.de

†These authors have contributed equally to this work and shared first authorship.

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 17 August 2016 Accepted: 12 July 2017 Published: 22 August 2017

#### Citation:

Weis PP and Herbert C (2017) Bodily Reactions to Emotional Words Referring to Own versus Other People's Emotions. Front. Psychol. 8:1277. doi: 10.3389/fpsyg.2017.01277 According to embodiment theories, language and emotion affect each other. In line with this, several previous studies investigated changes in bodily responses including facial expressions, heart rate or skin conductance during affective evaluation of emotional words and sentences. This study investigates the embodiment of emotional word processing from a social perspective by experimentally manipulating the emotional valence of a word and its personal reference. Stimuli consisted of pronoun-noun pairs, i.e., positive, negative, and neutral nouns paired with possessive pronouns of the first or the third person ("my," "his") or the non-referential negation term ("no") as controls. Participants had to quickly evaluate the word pairs by key presses as either positive, negative, or neutral, depending on the subjective feelings they elicit. Hereafter, they elaborated the intensity of the feeling on a non-verbal scale from 1 (very unpleasant) to 9 (very pleasant). Facial expressions (M. Zygomaticus, M. Corrugator), heart rate, and, for exploratory purposes, skin conductance were recorded continuously during the spontaneous and elaborate evaluation tasks. Positive pronoun-noun phrases were responded to the quickest and judged more often as positive when they were self-related, i.e., related to the reader's self (e.g., "my happiness," "my joy") than when related to the self of a virtual other (e.g., "his happiness," "his joy"), suggesting a self-positivity bias in the emotional evaluation of word stimuli. Physiologically, evaluation of emotional, unlike neutral pronoun-noun pairs initially elicited an increase in mean heart rate irrespective of stimulus reference. Changes in facial muscle activity, M. Zygomaticus in particular, were most pronounced during spontaneous evaluation of positive other-related pronoun-noun phrases in line with theoretical assumptions that facial expressions are socially embedded even in situation where no real communication partner is present. Taken together, the present results confirm and extend the embodiment hypothesis of language by showing that bodily signals can be differently pronounced during emotional evaluation of self- and other-related emotional words.

Keywords: emotion, self, language, embodiment, facial expression, heart rate, skin conductance, emotional communication

# INTRODUCTION

fpsyg-08-01277 August 18, 2017 Time: 13:8 # 2

Theoretical considerations have long been emphasizing the independence of language and emotion. Semantic network models of language, for instance, consider language mainly as a cognitive phenomenon, its representation bearing no direct relation to sensory, sensorimotor or affective processes in the brain or the body (e.g., Semin and Smith, 2008; or Winkielman et al., 2015 for an overview). However, neurophysiologic research has proven otherwise: language and emotion processing affect each other. This has been shown for the processing of simple words (e.g., Martín-Loeches et al., 2001; Tabert et al., 2001; Kuchinke et al., 2005; Kissler et al., 2007; Herbert et al., 2008, 2009; Scott et al., 2009) and sentences (e.g., Bayer et al., 2010; Jiménez-Ortega et al., 2012). Regarding the processing of single words, rapid serial presentation of words with emotional content, in several studies, facilitated both initial stimulus processing in the visual cortex and subsequent recall performance of emotional words (e.g., Kissler et al., 2007; Herbert et al., 2008). Moreover, in several studies, reading emotional words activated emotional brain structures such as the amygdala (Hamann, 2001; Tabert et al., 2001; Kuchinke et al., 2005; Hazlett et al., 2007; Herbert et al., 2009) and induced changes in affective behavior including priming of approach and avoidance including defensive responses like the startle-reflex (e.g., Herbert et al., 2006; Herbert and Kissler, 2010; Citron et al., 2016). Presentation of emotional words also influences the perception and appraisal of non-verbal emotional signals: on a behavioral (e.g., Lindquist et al., 2006) as well as on a neural or physiological level (Lieberman et al., 2007; Moseley et al., 2012; Herbert et al., 2013a,c), having implications for the treatment of clinical and neurological disorders (Roberson et al., 2007; Kircanski et al., 2012).

Thus, preferential processing of emotional words, activation of emotional brain structures as well as changes in affective behavior during word processing could be taken as evidence for theories of embodiment arguing that written language is able to elicit, modulate and regulate emotional processes in the brain and the body (see Niedenthal, 2007; Glenberg et al., 2009).

This, however, raises the question about the social relevance of embodied language processing. Expressing one's own emotions to others as well as inducing emotions in others is a key function of spoken and written language. This holds true even in situations in which no direct face-to-face communication is possible: for instance, we text, blog, and tweet our sentiments to others and "like/dislike" others for their affection. But to what extent is language processing embodied when we assess and appraise emotional content related to one's own self (e.g., "my fear") or the self of another person (e.g., "his fear"), especially in contexts and situations where input from non-verbal modalities is not readily available to the perceiver of the message? In other words, will bodily, peripheralphysiological reactions differ as a function of the valence or as a function of the self-other reference of a word? Crucially, what does this mean theoretically for the embodiment of language and more generally for the embodiment of emotional communication?

Regarding emotional communication, the human face has been considered an important "socio-emotional signal detector," even in the absence of direct face-to-face communication (e.g., Fridlund, 1991; Buck, 1994; Hess et al., 1995). Whether facial expressions are, however, more important for understanding one's own rather than other people's emotions is still under scientific debate. For instance, Hess et al. (1992) could demonstrate that spontaneous elicitation of facial expressions influences primarily the perception of one's own subjective emotional experiences. Other studies found that people spontaneously mimic other people's emotional expressions (Chartrand and Bargh, 1999), even if the other is only imagined as a virtual other (Fridlund, 1991). In these latter views, spontaneous simulations of emotions via facial expressions are not just reflexive readouts of one's own emotions (e.g., Buck, 1994 for an overview) but may preferentially occur in response to other-related emotional stimuli, in particular to positive stimuli (Fridlund, 1991). One purpose of this "sociality effect" (Fridlund, 1991) could be to help evaluate the hedonic quality of other-related stimuli by using one's own facial expressions as proxy.

Regarding language processing, involvement of facial expressions has been reported in several recent studies recording facial muscle activity during emotional evaluation of words and sentences (electromyography, EMG; e.g., Foroni and Semin, 2009; Niedenthal et al., 2009; Havas et al., 2010). These studies revealed that reading positive words or sentences is accompanied by activation of the main facial muscle used for smiling, M. Zygomaticus, whereas reading words, sentences, or statements with negative content is accompanied by activation of the main facial muscle used for frowning, M. Corrugator (see also Foroni and Semin, 2009; Niedenthal et al., 2009; Foroni and Semin, 2011). Moreover, negating the emotional meaning of a positive statement has been found to be associated with attenuated M. Zygomaticus activity (Foroni and Semin, 2013), suggesting that changes in facial expressions during emotional word processing are related to semantic processing and word comprehension. This is also suggested by recent observations about physiological or experimental manipulation of facial muscle activity, including studies on facial Botox treatment or suppression of the facial musculature by holding a pen with the lips or teeth indicating that inhibiting facial expressions impairs specifically the comprehension (Havas et al., 2010) and emotional evaluation of emotional statements (Niedenthal et al., 2009; also see Strack et al., 1988 using cartoons). In addition, changes in facial muscle activity have been found to be more pronounced in language tasks affording emotional instead of cognitive evaluation (Niedenthal et al., 2009) and for concrete compared to abstract emotional words (Foroni and Semin, 2009, 2013), although, overall, changes in facial muscle activity seem to be less pronounced for written words than for pictures or scenes (Larsen et al., 2003), probably due to the lower arousal of word as compared to picture stimuli.

Taken together, the aforementioned findings support the idea of facial expressions being paramount for the decoding and appraisal of the emotional meaning of language stimuli.

However, previous language studies have not considered the influence social factors may have on emotional language processing, leaving open the theoretical question of whether participants will be mimicking more during emotional evaluation of other-related than self-related emotional words, or vice versa.

Regarding the perception of one's own emotions, historically (see e.g., Sorabji, 1992; Höffe et al., 2005) and metaphorically, the heart has been proposed as the central core of one's own feelings. In fact, individuals who are able to accurately detect their own heart beats experience emotions with heightened intensity (Wiens et al., 2000). They also seem to intuitively make use of their cardio-visceral reactions for decision making (Dunn et al., 2006; Werner et al., 2009) although this does not always promote favorable decisions (Dunn et al., 2010). Additionally, changes in cardiac cycle as well as in parasympathetic tone, as measured by heart rate variability (HRV), can influence social cognition, emotional stimulus processing, and later semantic memory retrieval (Wallentin et al., 2011; Quintana et al., 2012; Garfinkel et al., 2013). Even though these studies do show that interactions between emotional, mental, and cognitive processing are accompanied by cardiac changes, regarding emotion and language processing, only a few studies have investigated stimulus-driven changes in mean HR during processing of emotional words. The studies available used a mix of spoken or written (synthesized) words, sentences, and stories, in combination with autobiographical imagery, recall, or cognitive instructions (Vrana et al., 1986; Ilves and Surakka, 2012), or presented highly selective self-relevant stimulus materials such as threat words or body words to particular samples of individuals at risk for anxiety (Thayer et al., 2000) or eating disorders (Herbert et al., 2013b), impeding the generalizability of the results.

Crucially, with a few exceptions, previous studies did not explicitly control for the words' personal reference, i.e., whether the emotional content of a word was related to the reader's own self or the self of another person. In an earlier study by Cacioppo et al. (1985), positive and negative trait adjectives were presented to healthy students who were asked to judge each word according to orthographic and grammatical rules, or to evaluate the words for hedonic pleasure ("is this word good?") and self-descriptiveness ("does this trait describe you?"). Mean HR differed during semantic (emotional and self-related) and non-semantic (orthographic and grammatical) evaluation. However, mean HR did not differ significantly between the emotional and self-referential evaluation tasks, suggesting no specific influence of the self-relatedness of the task on changes in HR during word processing. This observation contrasts with findings from text-driven imagery where often considerably strong HR acceleration patterns were reported during imagery of autobiographic, self-related emotional scenes (Vrana and Rollock, 2002). Thus, so far, no clear picture has emerged with regard to whether HR varies as a function of the personal reference of a word (i.e., self- vs. other-reference), or whether during word processing changes in HR indicate differences in emotional content (positive, negative, or neutral) and depth of stimulus elaboration (e.g., Cacioppo et al., 1985), regardless of the word's personal reference.

Regarding neurophysiological processes in the brain, selfreference seems to be uniquely linked to emotional processing (e.g., Northoff et al., 2006; D'Argembeau et al., 2012). Regarding verbal stimuli (e.g., Esslen et al., 2008; Herbert et al., 2011c), there is evidence that processing of emotional words related to the reader's self increases activity in anterior cortical midline structures (medial prefrontal cortex, including the anterior cingulate cortex and the ventromedial prefrontal cortex), i.e., brain structures involved in self-referential processing of emotional stimuli (Northoff et al., 2006 for an overview). In addition, electrophysiological studies reported sustained cortical processing and better free recall performance of especially selfrelated positive words (Watson et al., 2007; Herbert et al., 2011b). Mood congruent processing has been proposed as the possible underpinning of this prioritized processing of positive stimuli related to the self; mildly positive mood being the norm in healthy Western subjects (see Herbert et al., 2011d for modulation with depression; Mezulis et al., 2004; Shi et al., 2016 for cross-cultural findings; Taylor and Brown, 1988).

In view of the observations outlined above, the present study's aims are to contribute to the so far fragmentary understanding of self-other reference and bodily involvement in language and emotion processing. To this end, a novel paradigm (see Herbert et al., 2011b,c) is deployed to investigate peripheral physiological responses to self- and other-related words with emotional and neutral content. Unlike many previous studies summarized above, in the present paradigm, the emotional valence and the personal reference of a word are altered simultaneously by using pronoun-noun pairs that are related to the reader's self (e.g., "my fear," "my joy") or other-related, i.e., related to the self of a virtual other (e.g., "his fear," "his joy"). Physiological responses are measured while participants read and quickly judge the pronoun-noun phrases for hedonic pleasure/displeasure and then evaluate them with respect to the intensity of their subjective feelings. An additional set of stimuli consisting of negated emotional and neutral words (e.g., "no fear," "no happiness," or "no book") is included as a control condition to determine whether participants' spontaneous judgments and their initial physiological reactions will be based on the evaluation of the words' semantic meaning as proposed by previous research (e.g., Foroni and Semin, 2013). The physiological measures include recording facial muscle activity (fEMG), HR, and skin conductance (electrodermal activity, EDA), the latter being included for exploratory purposes to control for physiological arousal.

Extending previous research, the following questions are addressed: How does self- versus other-reference influence emotional word processing on a behavioral, subjective and peripheral-physiological level? Is the processing of self-related positive words prioritized on a behavioral level indicating better access to one's own positive emotions in healthy subjects? Is this preference also reflected at a physiological level and associated with changes of fEMG or HR? In particular, do participants

respond with differential fEMG to emotional words depending on whether the emotional content is self- or other- related? Lastly, is HR variation during emotional word evaluation sensitive to the emotional valence of a word, the self-reference of a word, or both?

# MATERIALS AND METHODS

# Participants

In total, twenty-nine young healthy adults (five males, M = 22.8 years, SD = 2.6; range: 18–28 years), all students of the University of Tübingen, native speakers of German, with normal or corrected to normal vision, and normal depression scores (see **Table 1** for an overview) were included in the study. Twenty-eight subjects were non-smokers and one subject reported occasional smoking with less than half a cigarette a day. Caffeine intake was controlled at the day of testing. In addition, habitual drinking habits were assessed by self-report scales. Three subjects were left-handed. Participants were to report that they are currently taking no medication that might affect emotional functioning or interact with the acetylcholinergic system. They provided written informed consent prior to participation and were compensated with an hourly wage of eight Euros in return for participation. The study was approved by the local Ethics Committee (https://www.medizin.uni-tuebingen.de/Forschung/ Ethik\_Kommission.html).

# Procedure

Participants were familiarized with the laboratory setting and the experiment was explained to them in general terms before giving informed consent. Participants were asked about social demographics and handedness (German version of Oldfield, 1971), and received written instructions. In particular, participants were instructed that words paired with the possessive pronoun of the third person "his" are related to a virtual other whereas words paired with the possessive pronoun of the first person "my" are related to themselves, e.g., describing the reader's own emotions. Participants received practice trials and had to repeat the instructions to the experimenter in their own words prior to the start of the experiment to ensure that they had understood the instructions. The main experiment, following the practice trials lasted approximately 60 min. After experimental testing participants completed questionnaires as described subsequently. The state scale of the



Means, standard deviations (SD), and ranges are reported. For explanations of abbreviations and more information on the questionnaires, see procedure.

Positive and Negative Affect Schedule (PANAS state; Watson et al., 1988) and the Beck Depression Inventory II (BDI-II; Hautzinger et al., 2006) were administered to control for mood effects and possible risk for depression. The State-Trait Anxiety Inventory (STAI; Laux et al., 1981) was used to control for state and trait anxiety. The Toronto Alexithymia Scale (TAS-20; Bagby et al., 1994) enables identification of alexithymic individuals who should only have attenuated access to their evoked feelings. The Mehrfachwahl-Wortschatz-Intelligenztest (MWT-B, a verbal IQ test; Lehrl, 2005) allows quantification of familiarity with German language which is relevant for correct processing of the presented stimulus material. Although self-report data was assessed primarily to exclude participants scoring high on alexithymia, depression or anxiety, alexithymia, depression and anxiety scores as well as scores of positive and negative affect were later on also used in exploratory analyses assessing potential interindividual differences in behavioral and physiological measures. At the very end, subjects were asked about potential strategies they might have used during the main experiment and were debriefed if desired.

# Experimental Design

Stimuli were presented on a computer screen. Participants' task was to read the words silently and to spontaneously judge the words for hedonic pleasure/displeasure (i.e., "is this word eliciting a positive, negative, or neutral feeling?") before evaluating each word in detail with respect to the intensity of the subjectively experienced feeling (i.e., "how intense is the feeling elicited by the word?"). Participants were instructed to base their judgments solely on their gut feelings and decide as quickly and as spontaneously as possible. Spontaneous judgments included a quick button press for a coarse valence judgment (negative, neutral, or positive) for which participants had to press one of three keyboard buttons. The response assignment to keys was counterbalanced across participants with the middle button remaining the neutral response for all participants and the left and right buttons altering in response assignment between 'negative/unpleasant' and 'positive/pleasant.' The subsequent elaborate evaluation, following the spontaneous judgment, included a voice response. For the voice response, the valence scale of the nine-point self-assessment manikin (SAM; Lang, 1980) was presented to participants before the start of the experiment, to familiarize them with the scale, and also after each stimulus block during the experiment as reminder. Participants were told to evaluate the intensity of their stimulus-evoked feelings by naming a number corresponding to the manikin that fits best to the evoked feeling. Number assignments always started with '1' at the outermost left manikin counting up to '9' at the right outermost manikin.

Each trial started with the presentation of a pronoun-noun pair. The pair was presented in upper case in the middle of the computer screen for 4000 ms. The button response had to be given while the stimulus remained at the display. Subsequently, a microphone icon was presented for 4000 ms indicating the voice response interval, in which participants were asked to elaborate the intensity of the feeling elicited by the stimulus during the

spontaneous appraisal. Interstimulus intervals were uniformly distributed between 3000 and 4000 ms. An overview of the experimental task is provided in **Figure 1**.

# Stimulus Material and Stimulus Matching

Each stimulus consisted of one out of three different pronouns ("mein/e," German for "my," "sein/e," German for "his," or "kein/e," German for "no") and one out of 84 nouns categorized into three different valence categories (negative, neutral, or positive), resulting in a 3<sup>∗</sup> 3 (reference<sup>∗</sup> valence) design. Nouns were used for valence manipulation; pronouns were used for reference manipulation ("my" for self-reference, "his" for otherreference, and "no" for no reference). Only the male version of the third person German possessive pronoun was used for other-reference because in German language the female version of the third person possessive pronoun could be ambiguous (referring either to "her" or "their"). Each of the 84 nouns was paired with all possible pronouns (e.g., "my fear," "his fear," and "no fear"), resulting in 252 trials in total. Trials were presented in blocks, each block consisting of four trials with the same pronoun and the same valence category. A randomized blockdesign was chosen in order to avoid changes in physiology due to an increase in cognitive load or mental effort which might have been likely to occur when switching one's evaluation from trial to trial. Therefore, pronoun order was randomized while the following rule was considered: within three consecutive blocks, each pronoun (self-related, other-related, or no-reference) was used once and there were no adjacent blocks with the same pronoun. Valence order was randomized while the following rule was considered: within every nine blocks, every noun valence category was paired with every pronoun exactly once and adjacent blocks never had the same valence.

Nouns were taken from the German affective word list BAWL-R (Võ et al., 2009) and were matched on several dimensions including stimulus valence (−3: very negative to 3: very positive), arousal (1: low arousal to 5: high arousal), imageability (1: low imageability to 7: high imageability), and total frequency of appearance per million words (FTOT). Out of the over 800 nouns included in the BAWL-R with either negative (valence < −1.5), neutral (0.2 ≥ valence ≥ −0.2) or positive (valence > 1.5) valence, 28 were selected for each emotional category. The selection procedure was based on matching between valence groups for arousal, FTOT, word length, gender, and compatibility with the used pronouns. To control for compatibility effects between pronouns and nouns, we assessed whether the respective pronouns occurred as significant left occurrences of each of the 84 nouns in our list. For this procedure, a German linguistic corpus (Wortschatz Universität Leipzig<sup>1</sup> ) was used. The procedure used for extracting significant left occurrences within the Wortschatz Universität Leipzig was described by Biemann et al. (2004).

Positive, negative, and neutral nouns differed significantly in arousal, F(2,81) = 38.66, p < 0.001. There was no difference in arousal between positive and negative nouns, T(54) = 0.25, p = 0.81, but between positive and neutral, T(54) = 6.86, p < 0.001, and between negative and neutral, T(54) = 7.90, p < 0.001, nouns. There was no difference in word length, F(2,81) = 0.04, p = 0.96, FTOT, F(2,81) < 0.01, p > 0.99, or imageability, F(2,81) = 2.16, p = 0.12, between valence groups. The clustering into different valence categories was successful, F(2,81) = 18.62, p < 0.001. Negative nouns differed in valence from neutral, T(54) = 3.49, p = 0.002, and positive, T(54) = 4.98, p < 0.001, nouns. Positive nouns differed in valence from neutral nouns, T(54) = 3.59, p = 0.001. Means and standard deviations of all parameters, depending on valence group, are summarized in **Table 2**.

# Apparatus and Recording

Visual stimuli were presented at a distance of 100 cm on a 22-inch monitor (Eizo SX2262W, 1650 × 1024 pixel resolution, 60 Hz frame rate) using MATLAB version R2011a (The Mathworks, Inc., Natick, MA, United States) and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997). Button press responses

<sup>1</sup>http://wortschatz.uni-leipzig.de

TABLE 2 | Descriptive statistics of noun stimuli parameters.


were recorded using a PS/2 connected standard keyboard. Voice responses were recorded using a standard interfacial microphone (Samson Technologies, Hauppauge, NY, United States). Physiological data, i.e., data from electrocardiogram (ECG), EDA, and fEMG at M. Corrugator and M. Zygomaticus regions, was registered using the mobile Varioport amplifier (Becker Meditec, Karlsruhe, Germany), which allows data sampling at a rate of 512 Hz. The EMG electrodes were placed in accordance with methodological guidelines (Fridlund and Cacioppo, 1986). For electrocardiography, one-way electrodes were used which were placed at the upper sternum, lower sternum, and left lateral margin of the chest. This placement is thought to lead to minimal movement artifacts (Jennings et al., 1981). Skin conductance including EDA was registered using direct current stimulation with Ag/AgCl electrodes of a diameter of 4 mm. Due to hardwired settings of the amplifier, the physiological signals were filtered online. The ECG-signal was band-pass filtered at 0.9–100 Hz (−3 db). The fEMGsignal was band-pass filtered at 70–400 Hz (−3 db). While the attenuation of the fEMG-signal in the low frequency range leads to a reduction of power line noise, it also decreases a sizable part of the surface EMG signal which has most of the energy between 10 and 200 Hz (Tassinary et al., 2007). Usage of the 70 to 400 Hz passband attenuates especially weak signals originating from single motor unit firing as opposed to aggregated motor unit activity (Tassinary et al., 2007). The usage of the described online filter may thus lead to a distortion of the fEMG-signal, especially when interested in periods of low fEMG, which, however, was not analyzed in the present study.

# Preprocessing of Behavioral and Physiological Data

Only reaction times in the time window of 100 to 3900 ms after stimulus onset were considered. Trials with multiple key presses were excluded from analysis resulting on average in a loss of 1.3–2% of trials per stimulus condition. Key presses related to participants' spontaneous evaluations of the stimuli were coded offline from −1 (negative) to 0 (neutral) to +1 (positive). Participants' elaborate evaluations were coded offline in line with the SAM from 1 to 9 (1: very unpleasant, 9: very pleasant).

Electromyography raw data was high-pass filtered to reduce movement and blink artifacts and subsequently full-wave rectified. Continuous data was visually inspected and epochs with remaining artifacts were rejected. On average, 5.1% of trials per condition had to be rejected. Afterwards, data was segmented into epochs from −1000 ms to 11000 ms with relation to stimulus onset and 1000 ms pre-stimulus baseline-correction was performed.

R-spikes in the raw ECG signal were extracted using the QRStool (Allen et al., 2007) and raw ECG data was inspected for artifacts. Between 9.8 and 13.5% of the trials were rejected in each of the nine conditions, suggesting a uniform distribution of rejections. After artifact rejection, each condition for each individual subjects included more than 70% of trials. Data was segmented into epochs starting 1000 ms before stimulus onset until 11000 ms after stimulus onset. Epochs were baselinecorrected using the 1000 ms interval before stimulus onset. The epoch length was chosen to fit the maximal length of a trial with the shortest jittering possible (i.e., 4000 ms stimulus interval, 4000 ms voice response interval, and 3000 ms inter-trial interval).

Electrodermal activity was visually inspected and epochs with noisy baseline or activity that could not be classified as stimuluselicited SCR with regard to the criteria described by Boucsein (2012) were rejected. Afterward, data was segmented into epochs from −1000 ms to 11000 ms with relation to stimulus onset and 1000 ms pre-stimulus baseline-correction was performed. Data of eight subjects showed no reliable responses time locked to the onset of the stimulus (non-responders). Conservative inspection of artifacts resulted in the rejection of another eight subjects, such that finally only a sample size of 13 subjects was included into the analysis. Therefore, the interpretation of the exploratory EDA data may have only limited validity and generalizability and results from preliminary EDA analysis are reported only in the Supplementary Material.

# Manipulation Check

After the experiment participants were asked for potential processing strategies, which revealed no differences, confirming that all participants followed the instruction given. In particular, all participants stated that they used similar encoding or appraisal strategies for words with self-related, other-related and negated pronouns.

# Data Analysis: Behavior and Physiology

Arithmetic means of reaction times, spontaneous judgments (given via key press), and elaborate judgments (given via voice) were analyzed in separate repeated measurement analyses of variance (ANOVAs) using the factors reference (self-reference, other-reference, no reference) and valence (positive, negative, neutral). Participants' key presses were coded as −1 (if associated with a 'negative/unpleasant' response), 0 (if associated with a 'neutral' response), and +1 (if associated with a 'positive/pleasant' response) and, to obtain index scores of response accuracy (ranging from −1 to +1), individual responses (positive, negative, or neutral key presses) were averaged for each word category separately<sup>2</sup> . Dependent t-tests were conducted as post hoc tests.

Changes in mean M. Corrugator and mean M. Zygomaticus activity (fEMG), HR, and EDA were statistically analyzed with repeated measures ANOVAs. For fEMG and HR, activity between 0 and 4000 ms after stimulus onset was analyzed. EDA activity was analyzed between 0 and 11000 ms after stimulus onset. The longer analysis window is due to the slow reactivity of this measure, i.e., changes in mean skin conductance are characterized by slow wave drifts lasting about 10 up to 16 s.

<sup>2</sup> In accordance with Norman (2010), index scores computed from ordinal items are interval scaled and can be validly analyzed in repeated measurement analyses of variance. It has also been widely argued that ANOVA analyses are robust against violations of the normality assumption that could be induced by ordinal data (e.g., Norman, 2010).

Weis and Herbert Words, Emotions, Body and Self

All physiological signals were also assessed across time to determine stimulus-locked fluctuations across the whole stimulus presentation period from 0 to 4000 ms. To this end, the continuously recorded ECG and fEMG-signals were clustered into time bins of 500 ms; EDA data was clustered into 1000 ms bins and analyzed from 0 to 11000 ms after stimulus onset. Ultimately, each repeated measures ANOVA contained the factors reference (self-related, other-related, negated), valence (pleasant, unpleasant, neutral valence), and time. The results of the time series analyses are reported in the Supplementary Material (beneath the respective time × amplitude plots in Supplementary Figures 1B, 2B, 3B, 5B).

Analyses of variance results are reported Greenhouse-Geisser corrected where appropriate. Significant main and interaction effects were further analyzed by dependent t-tests. P-values of post hoc tests were controlled for multiple comparisons according to the procedure suggested by Benjamini and Hochberg (1995) which controls for false discovery rate (FDR).

# RESULTS

# Reaction Times

Reaction time showed a main effect of valence, F(2,56) = 18.12, p < 0.001, η <sup>2</sup> = 0.39. Reaction times were significantly shorter for positive and negative words than for neutral words [negative vs. neutral: T(28) = −4.18, p < 0.001; positive vs. neutral: T(28) = −5.20, p < 0.001]. The main effect of reference was not significant, F(2,56) = 2.47, p = 0.094, η <sup>2</sup> = 0.08. However, a significant interaction effect of valence × reference, F(4,112) = 5.62, p < 0.001, η <sup>2</sup> = 0.17, was observed, supporting the hypothesis of a self-positivity bias. As shown in **Figure 2**, participants responded to self-related positive words significantly faster than to self-related negative or self-related neutral words and significantly faster than to other-related positive words [all three comparisons |T(28)| > 2.8, p < 0.01]. For self- and other-related negative or neutral words no difference in reaction times was found. Moreover, negated words without any personal reference were not responded to slower than other-related words [other-related vs. negated: T(28) = 0.405, p = 0.689], suggesting no considerable increase in task difficulty for the evaluation of negated compared to other-related stimuli.

# Judgments

Spontaneous judgments were modulated by valence, F(2,56) = 88.97, p < 0.001, η <sup>2</sup> = 0.76, and reference, F(2,56) = 19.86, p < 0.001, η <sup>2</sup> = 0.42, and by a significant interaction effect of valence x reference, F(4,112) = 170.54, p < 0.001, η <sup>2</sup> = 0.86. Post hoc tests revealed that positive words were judged more often as positive when they were self-related than when they were other-related, T(28) = 4.38, p < 0.001. Participants responded also more often with a positive key press to self-related neutral words compared to other-related neutral words, T(28) > 4.5, p < 0.001. For self- vs. other-related negative words no such response bias could be observed: as shown in **Figure 3A**, participants did not judge self-related negative words more often as negative than other-related

The reaction time represents the time subjects needed to spontaneously judge the valence of the pronoun-noun pairs. Error bars depict SEM. ∗∗p < 0.01, ∗∗∗p < 0.001, FDR corrected.

negative words, T(28) = 1.01, p = 0.334. Negated positive words were more often judged as negative compared to negated neutral words, T(28) = 6.75, p < 0.001, and negated negative words were more often judged as positive than negated neutral words, T(28) = 8.46, p < 0.001, indicating that participants' spontaneous emotional judgments were based on the semantics of the words.

Elaborate judgments showed a significant main effect of valence, F(2,56) = 86.01, p < 0.001, η <sup>2</sup> = 0.75, of reference, F(2,56) = 16.97, p < 0.001, η <sup>2</sup> = 0.38, as well as a significant interaction effect of valence × reference, F(4,112) = 154.27, p < 0.001, η <sup>2</sup> = 0.85. Post hoc tests showed that subjective feelings elicited by positive and negative word pairs were judged as more intense, i.e., more positive, T(28) = 6.32, p < 0.001, or more negative, T(28) = 9.85, p < 0.001, respectively, compared to neutral word pairs. Moreover, positive feelings elicited by self-related positive words (e.g., "my joy") were rated higher in intensity than were feelings elicited by other-related positive words (e.g., "his joy"), T(28) = 5.24, p < 0.001. Feelings elicited by negative words were also rated as significantly higher in intensity when they were related to the self (e.g., "my death") than when they were other-related (e.g., "his death"), T(28) = 2.61, p = 0.020, suggesting that self-reference enhances the intensity of subjective feelings for positive and negative words during elaborate judgments. Feelings elicited by negated positive words (e.g., "no joy") were rated as more negative in intensity than

FIGURE 3 | (A,B) Spontaneous and elaborate valence judgments. For the spontaneous judgment, subjects had to press one of three buttons corresponding to either negative (–1), neutral (0), or positive (1) valence. In the elaborate judgment, subjects had to verbalize a number on a 9-point scale for their rating to indicate the intensity of their feelings. The number 'one' represents very negative, the number 'five' neutral and the number 'nine' very positive. Error bars depict SEM. n.s. p > 0.1, <sup>∗</sup>p < 0.05, ∗∗∗p < 0.001, FDR corrected.

were feelings elicited by negated neutral words, T(28) = 6.72, p < 0.001. Likewise, feelings elicited by negated unpleasant words (e.g., "no death") were rated as more positive than were negated neutral words, T(28) = 8.90, p < 0.001, confirming that the negating pronoun reversed the valence of negative and positive words. Results are depicted in **Figure 3B**. An overview of the behavioral results is provided in **Table 3**.

# Facial Electromyography

Mean M. Corrugator activity (0–4000 ms, see **Figure 4**) showed a main effect of the factor valence, F(2,56) = 12.84, p < 0.001, η <sup>2</sup> = 0.31, as well as an interaction of the factors valence × reference, F(4,112) = 5.75, p < 0.001, η <sup>2</sup> = 0.17. Mean M. Corrugator activity was significantly attenuated during the presentation of positive compared to neutral words, T(28) = 4.22, p < 0.001, or negative words, T(28) = 3.55, p = 0.002. This attenuation was observed particularly for positive otherrelated words in comparison to positive self-related words, T(28) = 3.67, p = 0.002, and trending for other-related positive words in comparison to negated positive words, T(28) = 2.08, p = 0.050.

Mean M. Zygomaticus activity (0–4000 ms) showed a significant main effect of valence, F(2,56) = 4.17, p = 0.039, η <sup>2</sup> = 0.13, and of reference, F(2,56) = 4.76, p = 0.012, η <sup>2</sup> = 0.15, as well as a significant interaction of valence × reference, F(4,112) = 3.46, p = 0.043, η <sup>2</sup> = 0.11. Mean M. Zygomaticus activity was significantly more pronounced for positive than for negative words, T(28) = 2.51, p = 0.022. No difference was found between positive and neutral words T(28) = 1.11, p = 0.278. Also, changes in Zygomaticus activity were more pronounced for other-related than for self-related, T(28) = 2.48, p = 0.023, or negated words, T(28) = 2.41, p = 0.026. Crucially, changes in mean Zygomaticus activity were more pronounced for other-related than for self-related positive words, T(28) = 2.52, p = 0.022, or negated positive words, T(28) = 2.83, p = 0.013; see **Figure 5**.

Changes in mean M. Corrugator and mean M. Zygomaticus activity were not significantly correlated, irrespective of whether self- or other-related words or control stimuli were presented (all N = 29, all p > 0.1).

# Heart Rate

Changes in mean HR (0–4000 ms) were significantly modulated by the factor valence, F(2,56) = 7.99, p < 0.001, η <sup>2</sup> = 0.22, but not by reference, F(2,56) = 2.53, p = 0.089, η <sup>2</sup> = 0.08. Mean HR increased significantly during presentation of positive, T(28) = 3.57, p = 0.002, and negative words, T(29) = 2.66, p = 0.018, compared to neutral words. The interaction of the factors valence × reference was not significant, F(4,112) = 0.72, p = 0.579, η <sup>2</sup> = 0.02, (see **Figure 6**).

Analysis of electrodermal activity (exploratory analysis, N = 13 subjects) is reported in the Supplement.

## Interindividual Differences

Correlation analyses between behavioral, physiological, and selfreported data (depression, alexithymia, anxiety, and positive and negative affect) revealed no consistent pattern of interactions. Regarding behavioral data, spontaneous judgments of self-related positive words showed a negative correlation with depression (r = −0.045, p = 0.008, one-tailed) and a positive correlation

#### TABLE 3 | Descriptive statistics of spontaneous and elaborate valence judgments (N = 29).


Spontaneous ratings range from −1 (negative) to 1 (positive). Spontaneous reaction times are reported in seconds. Elaborate ratings range from 1 (negative) to 9 (positive). Standard deviations of all measures are reported in parentheses.

with self-reported positive affect (r = 0.36, p = 0.028, onetailed).

# DISCUSSION

This study investigated reaction times, emotional judgments, and changes in affective physiology, fEMG and HR in particular, during emotional evaluation of words varying in emotional valence and personal reference (self-other reference). Extending previous research supporting an embodied view of language,

the present study was aimed at investigating the differential sensitivity of each of these measures to changes in valence (positive, neutral, negative) and personal reference (self, other).

# Behavioral Data (Reaction Time and Judgments)

Participants' behavioral data indicated preferential processing of positive pronoun-noun phrases, particularly when these were self-related. The preferential processing of self-related positive words was observed during spontaneous judgments

and associated with faster reaction times and significantly higher response accuracy. This self-positivity bias was evident when comparing self-related positive words to self-related negative or self-related neutral words and in comparison to other-related positive words as well as control items (i.e., negated words). Elaborate judgments revealed that subjective feelings were significantly more intense when positive and negative words were self-related than when they were otherrelated.

The self-positivity bias in reaction times is in line with recent EEG studies reporting a processing bias for positive words in designs in which the valence of a word and its personal reference (self-other reference) were experimentally manipulated or controlled for (e.g., Watson et al., 2007; Herbert et al., 2008, 2011b; Fields and Kuperberg, 2012, 2015). The present results confirm these findings on a behavioral level and suggest that participants have faster access to self-related positive information than to self-related negative information in support of mood congruent processing, mildly positive mood being the norm in healthy subjects (Diener et al., 1997; Mezulis et al., 2004). Crucially, the present results attest that it is the self-reference of a stimulus that improves the bias toward positive information, facilitating spontaneous judgments to positive words when their content is related to the reader's self.

In general, mean reaction times appeared to be slower than reaction times reported in word processing studies using, for instance, lexical decision tasks. However, reaction times greater than one second have been reported in previous studies using emotional evaluation task (see for instance, Niedenthal et al., 2009). Moreover, previous EEG-ERP studies using similar stimulus material as the present one (e.g., Herbert et al., 2011b,d) found a processing advantage for self- versus other-related emotional words specifically at later cortical processing stages in the time windows of the N400 (e.g., Herbert et al., 2011d) or the late positive potential, LPP (e.g., Herbert et al., 2011b; see also Fields and Kuperberg, 2012, 2015). Hence, for abstract stimuli such as words, discrimination between self and other might appear earliest at the level of semantic stimulus integration, and thus temporally after the initial emotional content conveyed by nouns and its personal relatedness (conveyed by pronouns) have been integrated into one semantic concept. Thus, a certain degree of semantic processing is required to discriminate emotional stimuli related to the self from those related to the other. That judgments were based on semantics and thus on the meaning of the word phrases was confirmed by the judgments of the control stimuli: pronoun-noun pairs containing a negation (e.g., "no joy," "no death") reversed the direction of the valence judgment for negative and positive words, which is possible only if the negation term is semantically taken into consideration (see e.g., Kaup and Zwaan, 2003; Herbert et al., 2011a).

# Physiological Data [fEMG, HR, and Skin Conductance (EDA)]

Interestingly, despite a processing advantage of self-related positive words in the behavioral and subjective measures, this bias was not accompanied by activity changes in fEMG or HR data. Physiological data did by no means point toward stronger embodiment of self-related positive words in comparison to other-related positive words.

Changes in HR were modulated by the emotional valence of the stimuli with significantly stronger HR acceleration patterns for positive and negative than neutral words during the first 4 s of spontaneous word evaluation. Basic changes in HR in response to the a word's emotional tone (positive or negative vs. neutral) may occur in anticipation of approach or defense (Bradley and Lang, 2007) and may be larger for emotional stimuli rated higher in emotional arousal than neutral words (Bradley et al., 2008). Of note, the positive and negative nouns chosen for the experiment differed not only in emotional valence but also in emotional arousal from neutral nouns. The observed HR changes therefore fit well with previous reports showing that increases in HR evoked by positive and negative stimuli are modulated by emotional arousal (Bradley et al., 2008) and by preparation for action (Bradley and Lang, 2007).

Mean M. Zygomaticus as well as mean M. Corrugator activity revealed significantly stronger activity changes during emotional evaluation of positive words. In particular, changes were more pronounced for other-related than self-related positive words. Whereas M. Zygomaticus activity showed a significant activity increase, M. Corrugator activity showed a significant decrease particularly during the evaluation of other-related positive words as compared to self-related positive words. While activity increases in mean M. Zygomaticus activity are reliable indicators

significant. <sup>∗</sup>p < 0.05, FDR corrected.

of positive emotions, decreases in M. Corrugator activity below baseline have also been observed in previous studies in response to positive stimuli eliciting relaxation or surprise (Neta et al., 2009). In the present study, changes in mean activity of the M. Zygomaticus and the M. Corrugator muscles were not significantly correlated during word evaluation, suggesting that changes in both muscles reflect different facets of emotion processing.

Regarding the processing of concrete emotional stimuli such as faces, response peaks in fEMG have been reported to occur quickly after stimulus presentation (e.g., Dimberg, 1982; Dimberg et al., 2000), which may indicate spontaneous readout of the reader's own emotions. Regarding previous studies using single emotional words the earliest changes in fEMG activity were occasionally observed as early as 500 ms after word presentation (Foroni and Semin, 2009). However, Niedenthal et al. (2009), for instance, in an emotional evaluation task reported considerably longer latencies. Moreover, previous EEG studies outlined above (e.g., Herbert et al., 2011b; Fields and Kuperberg, 2015) suggest that discrimination between selfrelated and other-related emotional words may occur during later stimulus processing stages for why changes in facial responses may also occur later for pronoun-noun pairs than for single words. Future research investigating the time course of changes in fEMG during the evaluation of self- vs. other-related emotional words is needed to answer this question (please see the Supplement for a first exploratory and descriptive overview regarding changes in biosignals across the time window of word presentation).

Nevertheless, positive words elicited stronger emotioncongruent changes in mean M. Zygomaticus activity when otherrelated than when self-related. Differential facial responses to positive words support the assumption that people spontaneously and preferentially mimic in relation to others, even if the other is only a virtual other (Fridlund, 1991). In line with this hypothesis and the present observations, it has been suggested that facial expressions as well as feedback from motor and action units of the face are considered particularly important for understanding other people's actions and emotional states (Rizzolatti and Craighero, 2004; Niedenthal, 2007; Gallese, 2009). The present data might therefore support the view that – as far as verbal input is concerned – facial expressions preferentially occur in response to other-related emotional stimuli; in particular to positive stimuli (Fridlund, 1991), at least when the intention is to evaluate other-related emotional words for their hedonic pleasure. Crucially, participants were not instructed to feel into the emotions of others or empathize with them during evaluation of other-related words, suggesting that it is unlikely that empathy or individual differences in empathy have influenced the results. Personality traits and interindividual differences in mood and affect may modulate facial responsivity (e.g., Ferri et al., 2010). In the present study, participants scoring high in self-report measures of depression or alexithymia were excluded from participation, which reduced the chance of finding strong correlations between these self-report, physiological and behavioral measures. Nevertheless, regarding spontaneous judgments, appraisal of self-related positive words was negatively correlated with depression scores and positively correlated with self-reported positive affect. Although, these correlations do support the hypothesis of mood congruent processing being the cause of the self-positivity bias in emotional judgments in healthy subjects, these correlations should be treated with caution and need to be validated in larger sample sizes.

Taken together, the observed fEMG results could be a challenge for traditional associative network models of language processing (Lang, 1979; Bower, 1981). According to these models, fEMG activity during word reading would be the result of activation spread after memory activation. For instance, activation of the words happiness or joy would lead to spread of activation to associated concepts (e.g., smile), thereby leading to changes in the associated parts of the peripheral nervous system, e.g., the neurons controlling facial musculature (see e.g., Lang, 1979; Bower, 1981). Viewed from this perspective, the fEMG findings would imply that nodes are more strongly interrelated in memory for other-related than for self-related positive information. This conclusion contrasts with the behavioral results as well as with several previous findings predicting overall better memory and prioritized processing for self-related information (self-reference effect; for a meta-study, see Symons and Johnson, 1997).

Differences in cognitive versus affective appraisal strategies could be one reason for differential facial involvement in the evaluation of other- versus self-related emotional words in the present study (e.g., Niedenthal et al., 2009). This speculation is, however, unlikely because the instruction was the same for all words. In addition, in the manipulation check participants did not self-report any processing differences between word categories (self, other, no reference). Thus, possible differences in appraisal strategies (cognitive versus affective) cannot explain why participants "frowned" less and particularly "smiled" more when evaluating other- versus self-related positive words. Moreover, abstractness has been shown to affect the magnitude of facial expressions: for instance, fEMG is larger for emotion-related action words (e.g., smiling, crying, etc.) than for emotional words (e.g., adjectives such as happy, funny etc.; e.g., Foroni and Semin, 2009; Fino et al., 2016). However, abstractness cannot account for the differential effects in fEMG during evaluation of self- and other-related pronoun-noun pairs: words were carefully matched on this dimension and the same set of nouns was presented in each condition such that stimulus-reference was the critical dimension signaling whether nouns were self-related or other-related. However, gender might have played a role as N = 24 out of N = 29 of the participants were females and physiological signals including fEMG activity has been reported to be more pronounced in women than in men (Greenwald et al., 1989; Bradley et al., 2001). However, previous studies using similar material (Herbert et al., 2011b,d) as well as pronouns to induce self- or otherreference in Western and Asian participant samples (Li and Zhou, 2010; Zhou et al., 2010; Blume and Herbert, 2014) did not report any gender effects. Nevertheless, gender differences should be examined further in future studies using larger sample sizes.

# CONCLUSION

fpsyg-08-01277 August 18, 2017 Time: 13:8 # 12

In the present study, the personal reference (self-other reference) and the emotional valence of words were experimentally manipulated to assess the impact of these dimensions on behavioral, subjective, and physiological responses during an emotional word evaluation task. Whereas behavioral responses indicated preferential processing of self-related positive words, facial responses were most pronounced during evaluation of other-related positive words. Moreover, changes in HR occurred during evaluation of emotional compared to neutral words regardless of their personal reference. Thus, behavioral responses support a self-positivity bias in emotional judgments whereas changes in fEMG seem to support sociality effects (Fridlund, 1991). Physiologically, bodily signals may contribute differently to the emotional evaluation of verbal content with facial expressions, M. Zygomaticus activity in particular, being most pronounced during the evaluation of other-related emotional content, positive in particular (Fridlund, 1991), and HR being modulated by differences in emotional content irrespective of whom the information may refer to. Crucially, further studies are needed to scrutinize and validate these assumptions in different settings including actual conversations taking place in both laboratory and real-life settings. The paradigm used in the present study might be especially fruitful to this end.

# REFERENCES


# AUTHOR CONTRIBUTIONS

This manuscript is a shared first authorship of PW and CH. CH wrote main parts of the manuscript, designed the study, supervised the study, and analyzed the data together with PW (master student supervised by CH). The study is part of a project funded by the German Research Foundation (HE5880/3- 1), awarded to CH. PW helped writing the manuscript including the method section, conducted the study (programmed the design, recorded the data) and analyzed the data together with CH.

# FUNDING

This study was funded by the German Research Foundation (DFG), grant HE 5880/3-1, awarded to CH.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2017.01277/full#supplementary-material




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Weis and Herbert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# It's all in your head – how anticipating evaluation affects the processing of emotional trait adjectives

### *Sebastian Schindler 1,2 \*, Martin Wegrzyn1,2 , Inga Steppacher <sup>1</sup> and Johanna Kissler 1,2*

<sup>1</sup> Department of Psychology, Affective Neuropsychology, University of Bielefeld, Bielefeld, Germany

<sup>2</sup> Center of Excellence Cognitive Interaction Technology, University of Bielefeld, Bielefeld, Germany

#### *Edited by:*

Cornelia Herbert, University Clinic for Psychiatry and Psychotherapy, Germany

#### *Reviewed by:*

Nathaniel Delaney-Busch, Tufts University, USA Constantino Méndez-Bértolo, Universidad Politécnica de Madrid, Spain

#### *\*Correspondence:*

Sebastian Schindler, Department of Psychology, Affective Neuropsychology, University of Bielefeld, Bielefeld 33501, Germany e-mail: sebastian.schindler@ uni-bielefeld.de

Language has an intrinsically evaluative and communicative function. Words can serve to describe emotional traits and states in others and communicate evaluations. Using electroencephalography (EEG), we investigate how the cerebral processing of emotional trait adjectives is modulated by their perceived communicative sender in anticipation of an evaluation. 16 students were videotaped while they described themselves. They were told that a stranger would evaluate their personality based on this recording by endorsing trait adjectives. In a control condition a computer program supposedly randomly selected the adjectives. Actually, both conditions were random. A larger parietal N1 was found for adjectives in the supposedly human-generated condition. This indicates that more visual attention is allocated to the presented adjectives when putatively interacting with a human. Between 400 and 700 ms a fronto-central main effect of emotion was found. Positive, and in tendency also negative adjectives, led to a larger late positive potential (LPP) compared to neutral adjectives. A centro-parietal interaction in the LPP-window was due to larger LPP amplitudes for negative compared to neutral adjectives within the 'human sender' condition. Larger LPP amplitudes are related to stimulus elaboration and memory consolidation. Participants responded more to emotional content particularly when presented in a meaningful 'human' context. This was first observed in the early posterior negativity window (210–260 ms). But the significant interaction between sender and emotion reached only trend-level on post hoc tests. Our results specify differential effects of even implied communicative partners on emotional language processing. They show that anticipating evaluation by a communicative partner alone is sufficient to increase the relevance of particularly emotional adjectives, given a seemingly realistic interactive setting.

**Keywords: EEG/ERP, emotion, language, social feedback, feedback anticipation, communicative context**

#### **INTRODUCTION**

Language serves many different functions, ranging from the communication of facts and knowledge, to the communication of socio-emotional evaluations. In fact, symbolic interactionism theory suggests, that language meaning is derived from interaction with others (Blumer, 1969). This interaction is supposed to connect the identities of the communicating partners (Burke, 1980). For humans, communication using emotionally relevant language is of special interest (Barrett et al., 2007; Lieberman et al., 2007). Accordingly, newspapers and advertisers often select emotional words for their headlines, as their processing is prioritized (for a review see e.g., Zald, 2003; Kissler et al., 2006; Citron, 2012). However, influence of the social communicative context on emotional word processing has not been addressed elaborately. The present study aims to do so by creating an evaluative context and investigating whether processing of emotion-laden language differs in anticipation of personality evaluation.

So far processing of emotional language has been mostly investigated in the absence of communicative context. Neuroscience research has shown that brain event-related potentials (ERPs) differentiate between emotional and neutral contents during reading (Kissler et al., 2007) and in lexical (Schacht and Sommer, 2009a,b), grammatical (Kissler et al., 2009) or evaluative decision tasks (Naumann et al., 1997). Emotion effects are most consistently reflected in a larger early posterior negativity (EPN) arising from about 200 ms, which is thought to reflect mechanisms of perceptual tagging and early attention (Kissler et al., 2007; Kissler and Herbert, 2013). A more pronounced late parietal positivity (LPP) from about 500 ms after word presentation, has been implicated in elaborative evaluation and memory processing of emotional words (Herbert et al., 2006, 2008; Kissler et al., 2006, 2009; Kanske and Kotz, 2007; Hofmann et al., 2009; Schacht and Sommer, 2009b).

Previous work showed that establishing a self referential context can alter word processing at early (Fields and Kuperberg, 2012), as well as late processing stages (Watson et al., 2007; Shestyuk and Deldin, 2010; Herbert et al., 2011a,b). This implies selfreference as one important source of plasticity in emotion word processing.

According to symbolic interactionism, the discursive context in which emotional language is embedded should likewise be an important source of plasticity in word processing. In social communication, participants have expectations about their communicative partners and react to violations of these expectations (Burgoon et al., 1983, 2000). Therefore, establishing a socially relevant communicative context, rather than solely self-relevance, can be expected to alter the way emotional language is processed.

Receiving feedback from another person regarding one's own personality represents a highly salient social context. For some people receiving feedback may even pose a social threat, since humans have a strong need to belong to a community (Baumeister and Leary, 1995), seek approval by others (Izuma et al., 2010; Romero-Canyas et al., 2010), and try to avoid unfavorable evaluations (Leary, 1983; Carleton et al., 2011). Electrophysiologically, social threat has been shown to affect early visual ERP components and frontal EEG asymmetry (Crost et al., 2008; Trautmann-Lengsfeld and Herrmann, 2013; Baess and Prinz, 2014). For example, when participants due to group pressure agreed with a wrong answer option, the P1 was reduced compared to a perceptually identical condition (Trautmann-Lengsfeld and Herrmann, 2013). The P1 is one of the first evoked visual potentials. It reflects sensory registration and it is found to be larger for attended stimuli (Mangun and Hillyard, 1991). Influence of social setting is also reported for the N1 (Baess and Prinz, 2014). In a Go/Nogo paradigm, the N1 was found to be larger when both participants had to react in Go trials (Baess and Prinz, 2014). The N1 is thought to be a marker of visual discrimination (Vogel and Luck, 2000) and decreases with repetition (Carretié et al., 2003). Like the P1, the N1 increases when stimuli are attended (Hillyard et al., 1998). P1/N1 modulations have been occasionally reported for emotional stimuli (Pourtois et al., 2004; Keil et al., 2007; Steinberg et al., 2013) and recent evidence shows that also social context may change very early sensory processing.

These electron paramagnetic resonance (EPR) findings are complemented by fMRI results showing a regionally distinct processing of social feedback Social feedback has been shown to activate reward system structures such as the medial prefrontal cortex and the ventral striatum as well as the anterior cingulate cortex, involved in pain processing (Somerville et al., 2006, 2010; Izuma et al., 2008, 2010; Davey et al., 2010; Eisenberger et al., 2011; Korn et al.,2012). Together EEG andfMRI data indicate that effects of social feedback on brain physiology can be observed in artificial laboratory conditions using highly temporally and spatially resolving imaging methods.

As humans constantly make predictions about the future (Koster-Hale and Saxe, 2013; Seth, 2013), even the anticipation of socially relevant feedback, for example delivered as gestural approval or disapproval ('thumbs up' or 'thumbs down'). The present study aims to do so by creating an evaluative context and investigating whether processing of emotion-laden language differs in anticipation of personality evaluation. Produces distinct cerebral activities (Kohls et al., 2013). In this study, the avoidance of social punishment and the anticipation of social reward led to enhanced activity in the ventral striatum and nucleus accumbens (Kohls et al., 2013). This indicates that both the fear of socially unfavorable evaluations and hope of acceptance are central human motives that modulate reward system biology.

The anticipation of socio-emotional language feedback, arguably the most common source of socially relevant feedback, has not yet been investigated. However, there is information on the effects of anticipatory anxiety on ERPs: research demonstrates unspecific sensitizing effects of threat of shock, reflected in more positive-going early ERPs during threat-cue processing (Bublatzky and Schupp, 2012). Trials signaling a possible electric shock, lead to a larger P1 and P2, as well as a larger parietal LPP compared to trials signaling safety (Bublatzky et al., 2010; Bublatzky and Schupp, 2012). Moreover, anticipatory anxiety has been reported to specifically accentuate the processing of emotional pictures, surprisingly leading to a larger EPN for positive pictures when trials are signaling a possible electric shock (Bublatzky et al., 2010). Using anticipation of speaking in public as a threat induction, a different study reported the arguably more intuitive finding of accentuated processing of negative stimuli: participants were told that they would supposedly held a speech in public after completing a face perception task. Compared to a control condition this led to a larger N170 and EPN for angry faces in the face perception task (Wieser et al., 2010).

Anticipation of verbal social feedback likely involves a phase of self-reflection, akin to self-referential processing, perhaps combined with anticipatory anxiety of negative feedback. The intensity of these processes may depend on both the message and the sender of the feedback. Existing studies of emotion word processing have focused on the processing of single words in psycho-linguistic tasks, devoid of social context. However, word meaning will change depending on attributed sender characteristic and direction of communication. In ecologically valid situations, already an inferred psychological context or a psychological attribution to another individual may constitute presence or absence of an interaction. For instance, feedback in the form of the adjective 'boring' should be more important if another human is the putative sender rather than a computer. Likewise, 'boring' may be regarded as more intense, when it is used to characterize oneself as a person rather than one's teaching lesson. Similarly, an adjective like 'cheap' may be relatively neutral when describing an object, but becomes highly negative when it is used to characterize a person.

Against this background, the present study examines the influence of the putative sender on processing of negative, neutral and positive written adjectives in a social evaluative context. Participants were told that either an unknown other person would evaluate them based on his/her first impression, or a computer program would randomly highlight trait adjectives. In reality, both conditions were random and perceptually identical. We expected that anticipation of feedback by another person would generally change stimulus processing (sensitizing effects, Wieser et al., 2010 or Bublatzky and Schupp, 2012) and investigated whether this occurs at early perceptual (P1, N1), mid-latency (EPN), or late (LPP) processing stages. Moreover, we examined valence-specific interactions between feedback content and evaluative context (human, computer). Generally, in the context of being evaluated by another person, negative, and positive trait adjectives can be expected to induce larger P1, N1, EPN, or LPP amplitudes, reflecting fear of unfavorable evaluations and social rejection (Somerville et al., 2006; Masten et al., 2009; Eisenberger et al., 2011) or hope of acceptance by others (Izuma et al., 2010; Romero-Canyas et al., 2010; Simon et al., 2014).

Against this background, we evaluate the sequence of early (P1, N1), mid-latency (EPN) and late visually evoked potentials in response to adjectives presented as potential trait-feedback by another human or a randomly acting computer.

# **MATERIALS AND METHODS**

#### **PARTICIPANTS**

Eighteen participants were recruited at the University of Bielefeld. They gave written informed consent according to the Declaration of Helsinki and received 10 Euros for participation. The study was approved by the Ethics Committee of the University of Konstanz. Due to experimentation errors, two datasets had to be excluded, leaving 16 participants for final analysis. The resulting 16 participants (12 females) were 24.40 years on average (SD = 0.66). All participants were native German speakers, had normal or corrected-to-normal visual acuity, and were righthanded. Twelve participants were undergraduate students; four had already received their Bachelor's or Master's degree. Screenings with the German version of the Beck Depression Inventory and the State Trait Anxiety Inventory (Spielberger et al., 1999; Hautzinger et al., 2009), revealed no clinically relevant depression (*M* = 4.12; SD = 4.54) or anxiety scores (*M* = 35.94; SD = 3.06).

#### **STIMULI**

Adjectives were previously rated by 20 students in terms of valence and arousal using the Self-Assessment Manikins (Bradley and Lang, 1994). Raters had been specifically instructed to consider adjective valence and arousal in the context of being described by another person with this respective adjective. 150 adjectives (60 negative, 30 neutral, 60 positive) were selected and matched in their linguistic properties, such as word length, frequency, familiarity and regularity (see **Table 1**). Importantly, negative and positive adjectives differed only in their valence. As there is a lack of truly neutral trait adjectives, neutral adjectives were allowed to differ from emotional adjectives on rated concreteness next to valence and arousal.

#### **PROCEDURE**

Participants were told that they would be rated by an unknown other person or would see ratings generated randomly by a computer program. All subjects underwent both conditions. Sequence was counterbalanced across participants.

Upon arrival, participants were asked to describe themselves in a brief structured interview in front of a camera. They were told that their self-description was videotaped and would be shown to a second participant next door. The interview contained four questions encouraging the participant to talk about their strengths and weaknesses, as well as giving a short biography overview. After the interview, participants filled out a demographic questionnaire as well as BDI and STAI whilst the EEG was applied. To ensure face validity, a research assistant left the testing room a couple of minutes ahead of the fictitious feedback, guiding an 'unknown person' to a laboratory room next to the testing room.

Stimuli were presented within a desktop environment of a fictitious program, allegedly allowing instant online communication (see **Figure 1**).

Network cables and changes of the fictitious software desktop image showing a 'neurobehavioral interactive systems' environment were implemented to enhance credibility. The 60 negative, 30 neutral, and 60 positive adjectives were randomly presented and feedback upon was randomly generated in both conditions. All adjectives were first presented in black. After a fixed (computer) or variable (human) time interval a color change indicated the feedback on a certain adjective. The presented results relate to the pre-feedback period, when all stimuli still appeared in black. Half of all adjectives were endorsed, leading to 30 affirmative negative, 30 neutral, and 30 affirmative positive decisions. While the presented feedback was randomly generated in both conditions, twenty additionally inserted highly negative adjectives were defined to be always rejected in the ratings to further increase credibility, since it would appear very unlikely for somebody to endorse extremely negative traits in a hardly known stranger. These additional trials were excluded from further analysis. The desktop environment and stimulus presentation were

**Table 1 | Comparisons of negative, neutral and positive adjectives by one-way-anlysis of variances.**


\*\*\*p ≤ 0.001. Standard deviations appear in parentheses below means; means in the same row sharing the same superscript letter do not differ significantly from one another at p ≤ 0.05; means that do not share subscripts differ at p ≤ 0.05 based on LSD test post hoc comparisons.

created using presentation1. In the 'human' condition between 1500 and 2500 ms after adjective onset, color changes indicated a decision by the supposed interaction partner. This manipulation simulated variable decision latencies in humans. The decision was communicated via color change (blue or purple) of the presented adjective, indicating whether the respective adjective applied to the participant or not. Color–feedback assignments were counterbalanced. In the computer condition, corresponding color changes always occurred at 1500 ms, conveying the notion of constant machine computing time. In both conditions color changes lasted for 1000 ms, followed by a fixation cross for 1000–1500 ms. After testing, participants responded to a questionnaire asking them to rate their confidence in truly being judged by another person in the 'human' condition, on a five point Likert-scale.

#### **EEG RECORDING AND ANALYSES**

Electroencephalography signals were recorded from 128 BioSemi active electrodes2. Four additional electrodes measured horizontal and vertical eye-movement. Recorded sampling rate was 2048 Hz. Pre-processing was done using SPM8 for EEG3. Although perhaps best known as a toolbox for the analysis of functional magnetic resonance data, SPM provides a unitary framework for the analysis of neuroscience data acquired with different technologies, including EEG and MEG using the same rationale (Penny and Henson, 2007; Litvak et al., 2011). Offline, data were re-referenced to average reference, downsampled to 250 Hz and butterworth band-pass filtered from 0.166 to 30 Hz. Recorded eye movements were subtracted from EEG data. Filtered data were segmented from 100 ms before word onset until 1000 ms after word presentation. 100 ms preceding word onset were used for baseline-correction. Automatic artifact detection was used for trials exceeding a threshold of 160 μV. Data were averaged, using the robust averaging algorithm of SPM8, excluding possible further artifacts. Overall, less than 1%

of all electrodes were interpolated and on average 15.25% of all trials were rejected, leaving on average 50.85 trials for emotional words and 25.43 trials for neutral words for each communicative sender. Artifact rejection rate did not differ between both senders [*F*(1,15) = 0.32, *p* = 0.58], nor between negative, neutral and positive content [*F*(2,30) = 0.26, *p* = 0.78]. There was also no interaction between sender and emotional content regarding artifact rejection rate [*F*(2,30) = 0.09, *p* = 0.91].

#### **STATISTICAL ANALYSES**

Electroencephalography scalp-data were statistically analyzed with EMEGS4, (Peyk et al., 2011). Two (sender: human versus computer) by three (emotion: positive, negative, neutral) repeated measure ANOVAs were set-up to investigate main effects of the communicative sender, emotion and their interaction in time windows and electrode clusters of interest. If Mauchly's Tests of Sphericity yielded significance, degrees of freedom were corrected according to Greenhouse-Geisser as Greenhouse-Geisserε's were below 0.75. Partial eta-squared (partial η2) was estimated to describe effect sizes, where <sup>η</sup><sup>2</sup> <sup>=</sup> 0.02 describes a small, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.13 a medium and <sup>η</sup><sup>2</sup> <sup>=</sup> 0.26 a large effect (Cohen, 1988). Time windows were segmented from 50 to 100 ms to investigate P1 and from 100 to 150 ms to investigate N1 effects (Bublatzky and Schupp, 2012; Fields and Kuperberg, 2012), from 210 to 260 ms to investigate EPN effects (Kissler et al., 2007) and from 400 to 700 ms to investigate LPP effects (Schupp et al., 2004; Bublatzky and Schupp, 2012).

For the P1 a fronto-central cluster was investigated (13 electrodes: FFC1h, FFCz, FFC2h, FC1h, FCz, FC2h, FCC1h, FCC2h, C1, C1h, Cz, C2h, C2), while for the N1 time window a parietal cluster of nineteen electrodes was examined (CCPz, CP1h, CPz, CP2h, CPP1, CPz, CPP2, P1, Pz, P2, PPO1, PPOz, PPO2, PO1, POz, PO2, POO1, POOz, POO2; see **Figure 2**). For the EPN time window, two symmetrical occipital clusters of eleven electrodes each were examined (left: I1, OI1, O1, PO9, PO9h, PO7, P9, P9h,

<sup>1</sup>http://www.neurobehavioralsystems.com

<sup>2</sup>http://www.biosemi.com

<sup>3</sup>http://www.fil.ion.ucl.ac.uk/spm/

<sup>4</sup>http://www.emegs.org/

P7, TP9h, TP7; right: I2, OI2, O2, PO10, PO10h, PO8, P10, P10h, P8, TP10h, TP8).

Late positive potential topographies have found to vary, with some authors reporting more parietal others more fronto-central distributions, or even both in one study (Kissler et al., 2009). Since the present data revealed conspicuous differences both at frontocentral and at parietal sites two electrode groups of interest were analyzed for this component. For the LPP time window a frontocentral cluster (14 electrodes: F1h, Fz, F2h, FFC1h, FFCz, FFC2h, FC1h, FCz, FC2h, FCC1h, FCC2h, C1, Cz, C2) and a centroparietal cluster were investigated (13 electrodes: CCP1h, CCPz, CCP2h, CP1, CP1h, CPz, CP2h, CP2, CPPz, P1, Pz, P2, PPOz; see **Figure 3**).

#### **RESULTS**

#### **QUESTIONNAIRE DATA**

After debriefing, two participants stated that they were strongly convinced that they had been rated by another person in the 'human' evaluation condition, six participants said they quite convinced, four participants somewhat convinced, and two participants said they were little convinced. Mean credibility was 3.4 (SD = 1.02) on a Liktert-scale ranging from one to five.

#### **P1**

No significant main effects of sender *F*(1,15) = 0.18, *p* = 0.68, emotion *<sup>F</sup>*(2,30) <sup>=</sup> 0.12, *<sup>p</sup>* <sup>=</sup> 0.89, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.05 and no interaction *<sup>F</sup>*(2,30) <sup>=</sup> 0.52, *<sup>p</sup>* <sup>=</sup> 0.59, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.05 was observed over fronto-central regions.

#### **N1**

A significant main effect was observed for the communicative sender over the parietal sensor cluster between 100 and 150 ms *<sup>F</sup>*(1,15) <sup>=</sup> 7.51, *<sup>p</sup>* <sup>&</sup>lt; 0.05, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.33 (see **Figure 4**). The putative'human sender' evoked a significantly larger N1 compared to the computer sender. There was no main effect of emotion *<sup>F</sup>*(2,30) <sup>=</sup> 0.83, *<sup>p</sup>* <sup>=</sup> 0.44, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.05 and no interaction between sender and emotion *F*(2,30) = 0.27, *p* = 0.76, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.02.

#### **EPN**

A significant interaction between sender and emotion was observed over occipital sensors during the EPN *F*(2,30) = 3.95, *<sup>p</sup>* <sup>&</sup>lt; 0.05, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.21. This interaction was based on a larger EPN for emotional adjectives within the 'human sender' compared to a larger EPN for neutral adjectives within the computer sender. However, within the 'human sender' *post hoc* comparisons showed only a trend for a larger negativity for positive compared to neutral adjectives (*p* = 0.06) and no differences between negative and neutral words (*p* = 0.55). Within the 'computer sender' neutral words elicited a trend- level larger EPN compared to negative words (*p* = 0.08) but not compared to positive words (*p* = 0.28).

There were no main effects of the sender *F*(1,15) = 0.79, *<sup>p</sup>* <sup>=</sup> 0.38, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.05 or of the emotional content *<sup>F</sup>*(2,30) <sup>=</sup> 0.91, *<sup>p</sup>* <sup>=</sup> 0.41, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.06 in the EPN time window.

#### **LPP**

Over the fronto-central electrode cluster, a significant main effect for emotion was observed *F*(2,30) = 3.49, *p* < 0.05, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.19 (see **Figure 5**). *Post hoc* comparisons revealed, that positive adjectives elicited a larger LPP compared to neutral adjectives (*p* < 0.05), while negative compared to neutral adjectives elicited a larger amplitude only in tendency (*p* = 0.13). Positive and negative words did not differ from each other (*p* = 0.59). Over the frontocentral cluster there was no main effect of sender *F*(1,15) = 0.30, *<sup>p</sup>* <sup>=</sup> 0.59, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.02 nor an interaction between sender and emotion *<sup>F</sup>*(1.27,19.11) <sup>=</sup> 0.20, *<sup>p</sup>* <sup>&</sup>lt; 0.83, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.01.

Over the centro-parietal electrode group a significant interaction between the communicative sender and emotional content was found *<sup>F</sup>*(2,30) <sup>=</sup> 3.46, *<sup>p</sup>* <sup>&</sup>lt; 0.05, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.19 (see **Figure 6**). *Post hoc* comparison showed, that within the 'human sender' negative words elicited a significantly larger LPP compared to neutral adjectives (*p* < 0.01), while the somewhat larger LPP for positive words compared to neutral words did not reach significance (*p* = 0.15). Negative and positive words did not differ from each other (*p* = 0.17). Within the 'computer sender' no differences were found in any comparison (*ps* > 0.49). Over the centro-parietal cluster there were no main effects of sender *<sup>F</sup>*(1,15) <sup>=</sup> 0.23, *<sup>p</sup>* <sup>=</sup> 0.64, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.02 or emotion *<sup>F</sup>*(2,30) <sup>=</sup> 1.31, *<sup>p</sup>* <sup>=</sup> 0.29, partial <sup>η</sup><sup>2</sup> <sup>=</sup> 0.08.

#### **DISCUSSION**

We hypothesized that anticipating an evaluative decision from a human sender would lead to altered processing of trait adjectives by the recipient. A 'computer sender' was introduced as a source of random evaluation to provide a maximal contrast between both conditions, while maintaining identical perceptual input. The data reveal effects of sender and emotion as well as interactions. For the 'human sender,' a significantly larger N1 between 100 and 150 ms after adjective onset was detected over parietal areas. Starting with the EPN, effects of emotion interacted with perceived sender and in the LPP window, both main effects of sender and emotion as well as their interaction was observed. In the following, we will discuss these findings against the background of the current literature.

An early-onset effect of the 'human sender' condition, already in the N1 window, is in line with earlier findings of rapid effects of self-relevance (Fields and Kuperberg, 2012), as well as with sensitizing effects of social threat (Wieser et al., 2010). Within the broader context of the ERP literature, N1 effects suggest more tonic attention orienting toward stimuli supposedly sent by a human. Tonic effects of attention deployment have first been observed by Eason et al. (1969), who also were the first to demonstrate similar effects of volitional attention and threat of an electric shock on visual stimulus processing.

A main effect of emotion was observed in the LPP time window over a fronto-central electrode cluster. Here, positive and in tendency also negative words elicited a larger positivity compared to neutral words. Descriptively, ERPs differed earlier between emotional and neutral adjectives (see **Figure 6**), but interaction effects may have canceled out by stronger main effects of emotion. Brain topographies in the LPP time window differed somewhat between negative and positive adjectives. For the emotion main effect over the fronto-central cluster, a larger positivity was only found for positive adjectives, while for the interaction over the centro-parietal cluster the *post hoc* comparison was only significant

**FIGURE 4 | Results for the main effect of communicative source at the N1. (A)** Difference topographies. Blue color indicates more negativity and red color more positivity in the 'human sender' condition. **(B)** Selected electrodes CPPz, displaying the time course over parietal sites.

for negative adjectives (see **Figures 5** and **6**). LPP topography variations have been found to vary in the same study (Kissler et al., 2009), but not such valence dependent variability. It may be hypothesized that both arousal dependent and valence specific processing, relying on partly differing generator structures exist in the LPP time window regarding positive and negative adjectives.

Processing of positive and negative adjectives was expected to differ between the social evaluation and the feedback condition as reflected in an interaction between emotional content and communicative sender. Early interactions – between 210 and 260 ms – were found over the occipital region. However, *post hoc* comparisons revealed no clearly significant differences within the respective senders. Descriptively, within the 'human sender' there was a larger EPN for emotional words, while for the 'computer sender' the EPN was somewhat more pronounced for neutral words. Such early (210–260 ms) valence-specific modulations are relatively rare, previous work reported mainly arousal effects in this time window. However, Fields and Kuperberg (2012)reported very early effects of an established self-referential context on word processing. Therefore, it may be specific to the present experimental setting and may be further enhanced by the presently used blocked design.

Between 400 and 700 ms a larger positivity for negative adjectives compared to neutral adjectives was observed over parietal sites within the 'human sender.' The comparison between positive and neutral adjectives, while qualitatively similar did not reach significance. For the 'computer sender' no differential processing

of negative, neutral and positive adjectives could be observed over central sites and in late time windows. The interaction effects indicate that the also reported LPP emotion main effect may be driven partly by the 'human sender' (see **Figures 5** and **6**). Such emotion main effects in the LPP time window have been reported previously in typical psycho-linguistic experiments that did not explicitly manipulate context (Herbert et al., 2006, 2008; Kissler et al., 2006, 2009; Kanske and Kotz, 2007; Hofmann et al., 2009; Schacht and Sommer, 2009b). However, as some studies do not find late emotion effects (Rellecke et al., 2011) it may be helpful to consider the communicative context. The present data suggest that emotional differences largely derive from the adopted communicative context or are at least amplified by it. By contrasting a meaningless and a meaningful passive visual word processing condition the differentiation between emotional and neutral words is heightened. Generally, the LPP is associated with elaborative processing and larger LPPs have been shown to predict better subsequent memory (Dolcos and Cabeza, 2002), one might speculate that contextual factors can determine whether emotional material is only transiently attended at early processing stages or elaborated on and commited to memory.

An interaction of emotion with the anticipatory context is in line with findings from shock-threatening (Bublatzky et al., 2010) or from socially threatening situations (Wieser et al., 2010). However, this is the first study which investigated anticipatory effects in a socially relevant communicative context, as extant studies focus on processing of the feedback decision, typically also using fMRI (Somerville et al., 2006, 2010; Izuma et al., 2008, 2010; Davey et al., 2010; Korn et al., 2012). Due to the higher time resolution of the EEG, we were able to investigate how the anticipated feedback on trait adjectives changes in response to the putative sender identity in distinct processing phases. Here, in addition to sensitizing effects due to threat or self-relevance (Bublatzky et al., 2010; Bublatzky and Schupp, 2012; Fields and Kuperberg, 2012) the anticipation of human-generated evaluations led to differential processing of negative adjectives, which was pronounced at later stages. Descriptively, larger differences between emotional and neutral words within the 'human sender' compared to the 'computer sender' condition could be observed already at the EPN. Emotional words may initially capture more attention resources, but ongoing processing led to a pronounced differentiation between emotional and neutral words, reflected in the enhanced central positivity in the LPP time window for emotional words. As sensitizing effects of threat have previously been found to accentuate selectively positive (Bublatzky et al., 2010) or negative (Wieser et al., 2010) stimulus processing, in this social communicative setting more complex motives may play a role. This could be explained by considerations that humans, in the absence of conflicting evidence, tend to view themselves positively (self-positivity bias), but also fear unfavorable evaluation (Leary, 1983; Somerville et al., 2006; Masten et al., 2009; Carleton et al., 2011; Eisenberger et al., 2011) and seek approval and acceptance by others (Izuma et al., 2010; Romero-Canyas et al., 2010). Perhaps these different motifs play a role at distinct processing stages, maybe even by partly distinct cortical generator structures.

Overall, we cannot exclude that some relevant effects remained undetected, due to the limited number of trials in each cell resulting in limited power. Still, we observed considerable main and interaction effects, suggesting that the study design was able to detect differences between the two putative senders and their effect on processing of emotional trait adjectives during feedback anticipation. Furthermore, credibility ratings for the 'human sender' condition indicate successful experimental manipulation of the respective conditions. Self-reported credibility was not significantly correlated with N1 sender differences (two-tailed Pearson correlation *r* = −0.11, *p* = 0.70, *N* = 16; two-tailed Spearman correlation *r*<sup>s</sup> = −0.31, *p* = 0.25, *N* = 16), making it unlikely that sender main effects could be explained entirely by credibility. A limitation of the presented study may be the generation of adequate neutral trait adjectives. Although all adjectives were tightly matched for all linguistic characteristics, neutral adjectives differed from negative and positive adjectives in arousal and in concreteness. Still, this could neither account for sender differences nor for the valence-specific accentuation of positive or negative contents. Remarkably, the results suggests that in spite of identical perceptual input, the processing of a message, as reflected by electro-cortical activity, changes as a function of the perceived communicative significance. Thus, subjective meaning seems not only to derive from real, but crucially also from supposed interaction with others, connecting not only real but even imaginary identities of communicating partners. In the current study the 'human sender' was the only sender able to give meaningful feedback. It would be interesting to compare a putative 'human sender' with a 'computer sender' able to give personality feedback, to specify unique effects

of 'humanness' in contrast to only skill attributions. In general this paradigm suggests many different possible sender manipulations which may contribute to our understanding of context influences on (emotional) language processing. Further, it may be worth to know if such very early visual modulations can be replicated in experiments not using blocked within-subject designs.

#### **CONCLUSION**

Summarizing the main results, we found an amplified N1 indicating, regardless of content, the allocation of more early attentional resources to the trait adjectives if the putative sender was another human rather than a randomly operating computer. These differences were present already in anticipation of a decision and using the identical visual input across conditions. In the EPN window, an interaction suggested that emotional adjectives in the human sender condition were processed more intensely, but *post hoc* tests did not reveal clearly significant differences, precluding firm conclusions. Emotional adjectives led to a larger LPP. This interacted with sender: the LPP was particularly large when evaluations were expected from a human sender. This suggests that at early processing stages attention is allocated to all stimuli, indiscriminate of emotional content and only after (or simultaneously with) extraction of content at an evaluative processing stage selective amplification of emotional content in the human sender condition occurs. These findings indicate that imaginary social context has a large impact on language processing within the larger framework of symbolic interactionism.

#### **AUTHOR CONTRIBUTIONS**

Sebastian Schindler and Johanna Kissler contributed to the study design. Sebastian Schindler, Martin Wegrzyn, and Inga Steppacher carried out participant testing, Sebastian Schindler and Johanna Kissler performed statistical analysis, Sebastian Schindler drafted the manuscript under the supervision of Johanna Kissler. Martin Wegrzyn and Inga Steppacher helped to draft and revise the manuscript. All authors read and approved the final manuscript. Sebastian Schindler revised the manuscript under supervision of Johanna Kissler.

#### **ACKNOWLEDGMENTS**

This research was founded by the Deutsche Forschungsgemeinschaft, DFG KI1283/4-1 and by the DFG, Cluster of Excellence 277 "Cognitive Interaction Technology."We acknowledge support for the Article Processing Charge by the Deutsche Forschungsgemeinschaft and the Open Access Publication Funds of Bielefeld University Library. We would like to thank all participants contributing to this study.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declared that they had no conflict of interest with respect to their authorship or the publication of this article.

*Received: 28 July 2014; accepted: 24 October 2014; published online: 11 November 2014.*

*Citation: Schindler S, Wegrzyn M, Steppacher I, and Kissler J (2014) It's all in your head – how anticipating evaluation affects the processing of emotional trait adjectives. Front. Psychol. 5:1292. doi: 10.3389/fpsyg.2014.01292*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Schindler, Wegrzyn, Steppacher and Kissler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Language for Winning Hearts and Minds: Verb Aspect in U.S. Presidential Campaign Speeches for Engaging Emotion

David A. Havas<sup>1</sup> \* and Christopher B. Chapp<sup>2</sup>

<sup>1</sup> Department of Psychology, University of Wisconsin-Whitewater, Whitewater, WI, USA, <sup>2</sup> Department of Political Science, St. Olaf College, Northfield, MN, USA

How does language influence the emotions and actions of large audiences? Functionally, emotions help address environmental uncertainty by constraining the body to support adaptive responses and social coordination. We propose emotions provide a similar function in language processing by constraining the mental simulation of language content to facilitate comprehension, and to foster alignment of mental states in message recipients. Consequently, we predicted that emotion-inducing language should be found in speeches specifically designed to create audience alignment – stump speeches of United States presidential candidates. We focused on phrases in the past imperfective verb aspect ("a bad economy was burdening us") that leave a mental simulation of the language content open-ended, and thus unconstrained, relative to past perfective sentences ("we were burdened by a bad economy"). As predicted, imperfective phrases appeared more frequently in stump versus comparison speeches, relative to perfective phrases. In a subsequent experiment, participants rated phrases from presidential speeches as more emotionally intense when written in the imperfective aspect compared to the same phrases written in the perfective aspect, particularly for sentences perceived as negative in valence. These findings are consistent with the notion that emotions have a role in constraining the comprehension of language, a role that may be used in communication with large audiences.

Keywords: language, emotion, embodied cognition, rhetoric, syntax, alignment

# INTRODUCTION

Language causes powerful and reliable changes in the emotions and actions of large audiences, as when a skillful politician rallies voters to the polls (Brader, 2006; Chapp, 2012). How does language interact with emotion to affect the behaviors of large audiences? Emotion theorists suggest that a fundamental function of emotion is to prioritize some actions over others in order to support adaptive responses to real world challenges (Frijda, 1986; Keltner and Gross, 1999; Levenson, 2003; Norman et al., 2014) and to promote action coordination within social groups (Hatfield et al., 1994; Keltner and Haidt, 1999; Niedenthal and Brauer, 2012). That is, emotions constrain the body in ways that facilitate adaptive, socially coordinated actions. In this article, we present initial evidence

#### Edited by:

Cornelia Herbert, University of Ulm, Germany

#### Reviewed by:

Marc Brysbaert, Ghent University, Belgium William Hart, University of Alabama, USA

> \*Correspondence: David A. Havas havasd@uww.edu

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 16 February 2016 Accepted: 31 May 2016 Published: 22 June 2016

#### Citation:

Havas DA and Chapp CB (2016) Language for Winning Hearts and Minds: Verb Aspect in U.S. Presidential Campaign Speeches for Engaging Emotion. Front. Psychol. 7:899. doi: 10.3389/fpsyg.2016.00899

that emotions constrain processing of language delivered by skilled politicians for creating social cohesion.

According to embodied theories of cognition, language comprehension involves a process of simulation grounded in neural systems for action, perception, and emotion (Barsalou, 2010; Glenberg et al., 2013). For example, words about kicking engage a simulation grounded in neural systems for kicking (Pulvermüller, 2005; van Elk et al., 2010). Likewise, words that are strongly related to emotions engage a simulation grounded in neural systems for producing and perceiving corresponding emotional expressions (Moseley et al., 2011; Citron, 2012). Thus, one explanation for the emotional power of language is that simulation occurs at the lexical level. The extent to which a speaker is able to elicit an emotional response depends largely on the extent to which the words and combinations of words employed encourage simulation.

Evidence indicates that the emotive consequences of language operate above the lexical (individual word) level (Havas et al., 2007, 2010). For example, Havas et al. (2007) measured reading times for sentences describing pleasant or unpleasant situations while participants were in a matching or mismatching emotional state. Although the sentences were emotional, they made little or no reference to emotional states. An example pleasant sentence is, "You can tell you're executing the complex dive flawlessly." An unpleasant sentence is "The police car rapidly pulls up behind you, siren blaring." To covertly manipulate emotional state, they used the procedure of Strack et al. (1988) which reliably influences positive and negative emotional experience in the absence of awareness. Participants held a pen in the teeth to produce a smile, or in the lips to produce a frown or pout. As predicted, processing of pleasant sentences was faster when the pen was held in the teeth (producing a smile) than when it was held in the lips (preventing a smile), and vice versa for the time to process unpleasant sentences. In a subsequent study, they employed the pen manipulation in a lexical decision task with words taken from the stimulus sentences in order to test a lexical priming account of their results. The words they use were rated as being "central to the meaning" of the pleasant and unpleasant sentences. Lexical decisions for words were speeded when preceded by semantically associated words, but not by the pen manipulation.

These findings suggest that emotion plays a role in language understanding above the lexical level, perhaps at the level of a situation model (Zwaan and Radvansky, 1998), or in the combinatorial processes involved in language understanding (see also, Lai et al., 2015; Lüdtke and Jacobs, 2015). To account for interactions of emotion in language processing above the lexical level, we have proposed that emotion states encode physiological (e.g., autonomic) constraints of the body that differentially prioritize the simulation of some actions over others, much as physiological constraints influence the preparation of real actions (Havas and Matheson, 2013).

The goal of the present study is to test an embodied account of how language and emotion interact to influence large audiences: the emotion-constraint-hypothesis. Just as emotions that emerge during real world challenges constrain the body to support adaptive responses and promote action coordination within groups, emotions that emerge during language comprehension constrain an ensuing mental simulation of actions in order to support language comprehension and promote alignment (Pickering and Garrod, 2004) in the mental states of message audiences. By alignment, we mean similarity in the mental models (Zwaan and Radvansky, 1998) communicators construct to represent the situation conveyed in the language.

If this hypothesis is correct, then we can predict emotionlanguage interactions where a speaker's goal is to align the mental states of a large audience for the purpose of social cohesion. In Study 1, we used a corpus analysis to investigate whether political speeches designed to enlist and mobilize the U.S. electorate make more frequent use of language that engages the constraining function of emotion than comparison speeches. In Study 2, we determined whether such language is indeed perceived as more strongly emotional than comparison language.

Support for emotion-constraint-hypothesis comes from recent neuroimaging studies that use inter-subject synchronization analysis techniques to compare the time course of neural responses across different participants exposed to the same naturalistic language. Nummenmaa et al. (2014) found that neural similarity among participants was enhanced when they listened to the same emotionally evocative narratives, relative to unemotional narratives. The synchronization was linearly related to listeners' self-reported emotional state: participants who showed greater similarity in their moment-to-moment emotional ratings of the stories also showed enhanced similarity in the time course of brain activity while listening to the stories. The authors suggest that emotions drive neural synchronization by facilitating participants' semantic processing of the language. Inter-subject synchrony is also enhanced by powerful political speeches. Schmälzle et al. (2015) examined neural synchrony across time in brains of participants who listened to speeches from German politicians that varied in rhetorical quality. More rhetorically powerful speeches elicited greater neural synchrony across participants, possibly because these speeches also contained more emotional words. While these studies provide strong evidence of emotion constraint, the authors do not offer precisely how language encourages synchronization. It is to this issue that we now turn.

According to the emotion-constraint-hypothesis, emotions are likely to be elicited by sentences or phrases that invite a simulation of action but underspecify the particular actions to be simulated (Havas and Matheson, 2013). The basis for this claim lies in affective neuroscience researching showing that key neural structures for coordinating emotional responses (namely, the amygdala, and insula) are sensitive to environmental ambiguity and uncertainty (Whalen, 2007; Singer et al., 2009). For example, the amygdala can be activated by ambiguity, unpredictability, and polysemy in sentence comprehension (Citron and Goldberg, 2014; Shibata et al., 2014; Lai et al., 2015). Although there is abundant research showing that emotional responses influence subsequent cognitive processing (for a review, see Blanchette and Richards, 2010), our hypothesis makes a functional claim that emotions contribute to language comprehension by differentially prioritizing the simulation of some actions over others (Havas et al., 2010; Havas and Matheson, 2013).

To test this hypothesis, we capitalized on the distinction between the past perfective verb aspect, which indicates an action that has happened in the past ("we were burdened by a bad economy"), and the past imperfective verb aspect, which indicates that past action is ongoing or unfinished ("a bad economy was burdening us"). Research has shown that the imperfective aspect tends to evoke more diverse associations (Coll-Florit and Gennari, 2011) and leaves readers' mental models of described actions unconstrained relative to the perfective aspect (Madden and Zwaan, 2003). Under the emotion-constraint hypothesis, imperfective sentences should be more likely to engage audience emotions for guiding mental simulation of actions, and thus instrumental in language designed to align the mental and emotional states of large audiences - namely, political stump speeches.

# STUDY 1

Campaign stump speeches have been distinguished as a genre of political rhetoric aimed at compelling specific actions from audience members (namely, to turn out and vote for the candidate; Hart, 2002). We compared stump speeches to the State of the Union Address (SOTU), a genre which is similarly focused on policy prescriptions and the office of the presidency, but is closer to an "essay" in structure, more concerned with initiating policy dialog than bringing about specific citizen actions and emotions (Campbell and Jamieson, 2008). Of these two genres, the stump speech is the one in which audience alignment is imperative, and where we expect the imperfective aspect to figure prominently as a means to constrain a simulation of language content. Thus, we expect the imperfective aspect to appear with greater relative frequency in stump speeches than in SOTU speeches

# Method

We created two unique datasets: a set of speech transcripts from 149 candidate stump speeches from the 2012 presidential campaign, and a set of 48 SOTU addresses (1965–2013) to serve as a comparison group. We obtained speech transcripts from 149 candidate stump speeches from the 2012 presidential campaign, as well as a set of 48 State of the Union (SOTU) addresses (1965– 2013). We excluded two SOTU addresses from this period that were delivered in written form. We also included five speeches delivered by recently inaugurated presidents to a joint session of congress, though these speeches are not technically SOTU addresses. See Peters (n.d.) for discussion.

To build the stump speech database, campaign appearances first were identified using Obama and Romney's campaign schedules, regularly updated by Politico (n.d.)<sup>1</sup> . From this list of appearances, we searched two speech transcription services – Federal News Service and Congressional Quarterly Transcriptions – to build an inclusive set of speech transcriptions. Aiming to capture a comprehensive portrait of the 2012 general election campaign, we included every speech delivered between August 12, 2012 (when Paul Ryan was chosen as running mate) and November 5, 2012 (the day before the national presidential election). Each speech transcript was then cleaned to remove notation ("Jeers from audience") or words not spoken by the candidate ("AUDIENCE: USA! USA!"). State of the Union addresses were identified using the American Presidency Project's (n.d.)speech database<sup>2</sup> . We included every speech from Johnson's 1965 State of the Union through Obama's most recent address. We began with 1965 because this was the first televised evening SOTU.

We created a content analysis "dictionary" tool designed to identify the imperfective aspect. The dictionary was designed to identify was and were + VERB-ing sentence constructions, as well as negations (wasn't/was not and weren't/were not). We were careful to exclude sentences written in the present perfect aspect where a past event has present consequences, as in "Let me tell you, we have tried that" and "He's ignored them." To construct the dictionary, we first developed a corpus of 3342 commonly used verbs by combining Pennebaker's Linguistic Inquiry and Word Count software (LIWC, Pennebaker et al., 2007) dictionary verbs with others taken from online English verb lists. For an example dictionary<sup>3</sup> . The full list of verbs is available upon request of the authors.

Next, we used the LIWC software to identify instances where one of these verbs was used in the imperfective aspect. LIWC computed an imperfective score for each speech that adjusts for the length of the speech ([imperfective count/total word count] ∗ 100). This approach allows us to compare how stump speeches and SOTU addresses vary with respect to the relative frequency of the imperfective aspect.

# Results

We used a series of OLS regressions to examine the extent to which speech genre (stump vs. SOTU) predicts the imperfective aspect relative to other potential factors such as the individual speakers or their party affiliations (**Table 1**). Model 1 regressed imperfective scores on a dummy variable for genre, Model 2 tested whether the party of the speaker is related to the use of the imperfective aspect, and Model 3 included dummy variables for each candidate to test whether individual differences are driving results. Each model also controlled for the average number of words per sentence for each speech, in the event that more elaborate or complex sentence constructions covary with either a particular genre and/or a particular verb aspect.

Consistent with predictions, stump speeches invoke the imperfective aspect with significantly more regularity than SOTU speeches. In a SOTU address of average length (5359 words), the model estimates that presidents will deploy the imperfective about 1.5 times. In an average stump speech of only 3178 words, we would expect to find 4.1 imperfective sentences. None of the speaker dummy variables revealed a detectable effect.

<sup>1</sup>http://www.politico.com/2012-election/calendar/

<sup>2</sup>http://www.presidency.ucsb.edu/

<sup>3</sup>http://www.englishclub.com/vocabulary/regular-verbs-list.htm

#### TABLE 1 | Imperfective aspect in political speech.


Entries are unstandardized regression coefficients with standard errors in parentheses. Dependent variable is the imperfective score for each speech. Genre is a dichotomous variable, where 1 = stump speech and 0 = SOTU address. Party is a dichotomous variable where 1 = a Democratic speaker, 2 = a Republican speaker. <sup>∗</sup>p < 0.05; ∗∗p < 0.01.

Speeches that average more words per sentence and Republican affiliation tended to include more imperfective clauses, however, the addition of these variables had little impact on the effect of the genre variable, which remained the most powerful predictor across all three models.

One possible explanation for why imperfective aspect appears with more frequency in stump speeches is that stump speeches contain more past tense constructions in general than SOTU speeches. To check this, we modified our imperfective content analysis tool, changing all "was/were \_\_\_-ing" phrases to simple past tense by deleting the "was/were" and adding "\_\_\_-ed." For example, instead of scoring a text with the phrase "was apologizing" we now score it for the word "apologized". Using this method we found that simple past tense verbs occurred at a higher rate in State of the Union Addresses than in stump speeches, t(195) = 11.288, p < 0.001. This suggests that the higher incidence of past imperfective aspect in stump speeches is not due to the particular verbs included in the content analysis tool, nor is it the case that we observe more imperfective aspect in stump speeches because they tend to utilize past-tense verb constructions more generally.

# Discussion

Presidential candidates' stump speeches – an important example of language used to create alignment in large audiences – were found to contain a significantly higher proportion of sentences with the past imperfective verb aspect than the SOTU speeches. These differences in speech genre overshadowed all other differences due to factors like words per sentence, individual candidate, or the candidate's political party, as adding these factors to the model had little or no impact on the contribution of genre.

Results of Study 1 support the emotion-constraint-hypothesis that language used to create alignment in mental and affective states of large audiences (U.S. presidential stump speeches) will be more likely to call on the simulation-constraining function of emotion, relative to comparison speeches. We focused our predictions on past imperfective sentences because they compel a prospective simulation of ongoing action without specifying those actions relative to the perfective sentence construction, and thus were presumed to engage the constraining function of emotion. Study 2 was designed to test this presumption by determining whether imperfective sentences, independent of their lexical emotional content, are in fact perceived as more strongly emotional than perfective sentences.

# STUDY 2

Participants were asked to evaluate perfective and imperfective aspect sentences found in the presidential stump speeches of Study 1 in terms of their emotion strength and their emotional category. To control for non-critical characteristics of the sentences, each sentence was presented in both aspectual forms. We predicted that sentences written in the imperfective aspect would be rated as more strongly emotional (in either a positive or negative valence direction) than the same sentences written in the perfective aspect.

# Method

# Participants

Participants were individuals who use Amazon's Mechanical Turk website, a crowd-sourcing Internet marketplace interface. Participants were paid \$1 for their participation, and the study was advertised as a not-for-profit study taking approximately 30 min (the actual mean duration was 23 min). Informed consent was obtained through the use of a written statement embedded in the preview page of the study, and participants gave their consent by clicking a button to proceed. The study was approved by

#### TABLE 2 | Example imperfective and perfective sentences.


Asterisk (<sup>∗</sup> ) indicates the sentence's original aspect format.

the University of Wisconsin–Whitewater Institutional Research Board, and was conducted according to the principles expressed in the Declaration of Helsinki.

Based on a recent comparison of the mental representations induced by imperfective and perfective aspect reporting small sized effects (Cohen's d≈0.2; Madden and Zwaan, 2003), we estimated needing 150 participants for 80% power. We continued collecting data until we reached this sample size.

#### Sentence Stimuli

We identified a total 534 sentences containing the past imperfective verb aspect and 383 sentences containing the past perfective verb aspect from the stump speech database. As sentence boundaries sometimes contained several phrases, disfluencies, corrections, or repetitions, sentence lengths ranged widely (20 to 470 characters for imperfective and 11 to 602 for perfective). In such cases, we attempted to extract the smallest complete sentence containing the key verb phrase. From among those sentences with only one verb phrase, whose length was within one standard deviation of the mean sentence length, and whose content did not duplicate other already-selected sentences (presidential campaigners tended to be rather consistent in their stump speech content from appearance to appearance), we then randomly selected 25 imperfective and 25 perfective sentences. The average length for the selected sentences was 42 characters for imperfective and 49 for perfective, and this difference was not statistically significant. Sentences made little or no reference to emotions or emotion concepts.

From the 25 perfective and 25 imperfective sentences taken from the presidential stump speeches of Study 1, we wrote a matching 25 imperfective and 25 perfective sentences (respectively) by changing the aspect. Thus, each sentence was represented twice, once in the perfective aspect and once in the imperfective aspect (see **Table 2** for example sentences). A unique yes/no comprehension question was included after each sentence to ensure participants were reading the sentences for understanding. Within each sentence type, there were an equal number of questions that were answered correctly with a "yes" and with a "no" response.

# Procedure

The final list of one hundred sentences (50 imperfective and 50 perfective) were presented in a random order to participants by computer with instructions to assign each sentence to an emotion category (afraid, angry, anxious, excited, happy, or sad), and to then rate the emotional valence of each sentence using a 5-point Likert-type scale, from strongly negative to strongly positive. To guard against the possibility that responses would reflect speaker emotion and not respondents' emotional response, survey instructions clearly asked respondents to Rate each sentence according to how effective it is at **giving you an emotional feeling**, either positive or negative. Note: Most of the sentences are not "about" emotions. So, please **rate how the sentence makes you feel**, rather than what you think the sentence is about."

# Results

Five individuals participated in the experiment twice, and we used only the first set of data from these subjects. Five participants who had extreme error rates (20% or greater) were excluded from analysis bringing the final sample size to 140. The mean error rate for the remaining participants was 3.1%.

Subjects categorized 3.9% of sentences as "afraid", 6.6% as "angry", 16.5% as "anxious", 20.6% as "excited", 35% as "happy", and 12% as "sad." The remaining 5.3% of sentences received no response. Participants were equally likely to assign an emotional category to imperfective sentences (5.2% uncategorized) and perfective sentences (5.3% uncategorized; Pearson's χ <sup>2</sup> = 0.082).

We then quantified the amount of agreement in participants' categorizations using only trials for which participants provided responses on both measures (emotion category and strength) by calculating type C (two-way random) intra-class correlations in SPSS separately for imperfective (n = 44) and perfective sentences (n = 42). The intra-class correlation analysis treats missing data by deleting cases listwise, and this accounts for the small sample set. Although participants agreed more in their emotional categorizations of imperfective sentences than in their categorizations of perfective sentences, this difference was not

statistically significant; Intra-class correlations were 0.909 and 0.837, respectively, z = 1.39, p = 0.08 (one-tailed).

In order to test whether imperfective sentences were perceived as emotionally stronger than perfective sentences, we recoded participant valence ratings to reflect valence strength (absolute difference from neutral) rather than valence direction, and subjected the resulting group means to a dependent-measures t-test. As predicted, sentences written in imperfective aspect were rated as more emotionally intense (M = 0.9857, SD = 0.28986) than the same sentences written in the perfective aspect (M = 0.9743, SD = 0.28615), mean difference = 0.01140, SD = 0.06114, SEM = 0.00517, t(139) = 2.205, p = 0.029, Cohen's d = 0.189, 95% CI = 0.00118–0.02161. A post hoc set of seven dependent-measures t-tests on each emotion category (including no response) separately revealed no statistically significant differences after accommodating multiple comparisons with a Bonferroni correction (all p > 0.007).

We also conducted a post hoc test of the effect by valence by coding as "positive" sentences that were categorized by participants as either "happy" or "excited", and coding as "negative" sentences that were categorized by participants as either "afraid", "angry", "anxious", or "sad". Disaggregating positive and negative sentences resulted in a larger dataset but with a substantial number of zeroes (44, or 6.9%, with 23 for perfective sentences and 21 for imperfective sentences) making the distribution of absolute value scores non-normal. We therefore excluded these observations before conducting paired samples t-tests separately for positive and negative valence sentences. The result was a significant effect of aspect in sentences rated as negative, t(139) = 2.497, p = 0.014, mean difference = 0.026, but no effect of aspect in sentences rated as positive, t(138) = 0.236, p = 0.814, mean difference = 0.002.

# GENERAL DISCUSSION

Two studies tested the emotion-constraint-hypothesis: that emotional responses play a functional role in language processing by constraining mental simulation and thereby promoting alignment in the mental states of large audiences. In Study 1, we found that political rhetoric used to create alignment in the American electorate, U.S. Presidential stump speeches, is more likely than comparison speeches to contain sentences using the past imperfective aspect. Such sentences were hypothesized to call upon the constraining function of emotion relative to perfective sentences because they compel a prospective simulation of ongoing action without specifying those actions (Havas and Matheson, 2013).

In Study 2, we tested the hypothesis that imperfective speech sentences are more likely to engage audience emotions with an emotional rating task. As predicted, participants rated stump speech sentences in the imperfective aspect as more strongly emotional than the same sentences in the perfective aspect. An important qualification of this finding is that the observed effect size is small (Cohen's d = 0.189). However, small effects in the domain of cognition and emotion – particularly in real-world contexts of mass communications – are likely to be meaningful as small changes in emotional language can lead to large-scale shifts in emotional behavior (e.g., Kramer et al., 2014). Post hoc tests suggested this effect is somewhat larger when considering only those sentences that participants perceive as having negative valence, perhaps because negative events are associated with a greater number of action response options (Rozin and Rozyman, 2001), and are therefore more open-ended in general, than positive events. However, it's not yet clear which properties of our stimulus sentences might be critical in this perception. Future research should use validated emotional stimuli to determine whether the effect is limited to a particular emotion or valence category.

These findings extend investigations of emotion-language interactions above the lexical level of processing (e.g., Havas et al., 2007, 2010; Lai et al., 2015). Using a lexical level account, we might predict that sentences in the imperfective aspect appear more frequently in stump speeches because they contain words having closer associations with emotion concepts or emotion states than do perfective sentences. The more emotionally associated imperfective past tense sentences are presumably more effective in compelling audience members to turn out and vote. This type of account is challenged by the results of Study 2 that controlled for lexical content and showed that verb aspect alone is capable of influencing emotional perceptions. Still, it is possible that other systematic lexical differences between types of speeches could influence, or interact with, verb aspect to produce emotional effects beyond those of grammatical aspect. Our data do not discount this, but they further our understanding of how words can be combined at the grammatical level to influence emotion.

Of course, our conclusions are limited by our choice of corpus, and by our methods of sampling. Future studies can explore whether these effects generalize to other contexts and other forms of ambiguous language at syntactic and situation model levels. One salient question is whether our verb phrases produce stronger or weaker effects when processed in isolation than when processed within their speech context. Verb aspect seems to play a pivotal role in situation model construction by modulating the accessibility of text-based information and world knowledge relevant to the text (Madden and Zwaan, 2003; Ferretti et al., 2007). If emotion constraint is involved in this role, then we would predict an enhanced effect for verb phrases when processed as part of a meaningful text. In addition, future research should determine whether such effects might be mediated by embodied emotions as suggested by our previous work (Havas et al., 2007, 2010; Havas and Matheson, 2013). While our predictions were derived from theory and research in embodied cognition, our study provides only indirect support for embodied theories because we did not manipulate or measure the body directly.

Why should the past imperfective grammatical aspect preferentially engage emotion? We sketch an account based on an embodied theory of emotional language comprehension (Havas and Matheson, 2013). Much as emotion constrains the preparation of real actions in order to meet environmental demands (e.g., Frijda, 1986; Barrett, 2006), emotion states differentially prioritize the mental simulation of some actions

over others in order to support language comprehension. When effective action is unspecified or underspecified by the language (i.e., the language is ambiguous), there will be a failure to complete a simulation of the sentence content. The function of emotion in such cases is to modulate the action system and influence a simulation. Take the sentence, "Families were struggling to make the mortgage." Because the imperfective aspect suggests the struggle is ongoing, any emotions associated with struggling will continue to play a role in the ensuing simulation. By contrast, in simulating the perfective "Families struggled to make the mortgage," the struggle is now over and any emotions associated with struggling need not affect the simulation. Although in both cases the language is emotional, we would predict that the first sentence would lead to greater involvement of emotional activity than the second sentence.

This mechanism complements recent findings of enhanced alignment of neural states in participants exposed to the same emotional narratives and political speeches from neuroimaging studies that have used inter-subject synchronization analysis techniques (e.g., Nummenmaa et al., 2014; Schmälzle et al., 2015). Our use of corpus data reveals one way that alignment effects could be harnessed in real-world settings. It should be noted, however, that our effort to find evidence of enhanced alignment resulting from imperfective aspect in Study 2 was unsuccessful. Future research should aim to directly test the emotion-constraint-hypothesis prediction that imperfective aspect fosters alignment in the mental states of an audience by probing the content and degree of similarity among participants' mental representations. The present data are consistent with this claim but inconclusive.

Our account may shed light on current studies of verb aspect and emotion. Hart (2013) found participants who recounted emotional autobiographical experiences using imperfective aspect experienced stronger corresponding shifts in mood than participants using perfective aspect, and speculated that imperfective aspect enhanced participants' access to the details of event memory. However, an episodic memory account is unlikely in our study where participants responded to language produced by others. Instead, our results suggest that the openended simulation process initiated by the imperfective aspect more effectively draws on emotion for its guiding role in language processing.

Two alternative accounts should be considered, and each is based on the notion that imperfective aspect privileges emotional responding because it describes ongoing actions that persist in time relative to completed events described by perfective aspect. The first suggests that simulating ongoing events draws attention to details about the described event. Indeed, several studies suggest that the imperfective aspect enhances access to discursive detail relative to perfective aspect, including detail about characters (Carreiras et al., 1997), events (Magliano and Schleich, 2000), locations (Ferretti et al., 2007), and visual objects (Madden and Therriault, 2009) that are implied in the language. In these studies participants were provided with a probe word to demonstrate that the probe concept was more cognitively available after reading sentences in the imperfective versus perfective aspect, and thus the probe may have served as a critical constraint needed to complete a simulation. By contrast, in studies that present participants with an openended task (more than one probe, or no probe) the imperfective aspect has been found to leave readers' representations less, not more, constrained relative to perfective aspect (Madden and Zwaan, 2003; Bergen and Wheeler, 2010; Coll-Florit and Gennari, 2011). For example, Coll-Florit and Gennari (2011) showed that reading language about ongoing events leads more diverse semantic associations than language about completed events, a finding that leads us to a second alternative explanation: If ongoing events are associated with a greater diversity of emotional experiences, then our observed emotion advantage for imperfective sentences might be attributable to broader semantic priming of emotion concepts. In such a case, however, participants would have been expected to categorize imperfective sentences with less, not equal or greater, uniformity than perfective sentences.

Finally, our findings add to the research suggesting that the use of verb aspect has political ramifications. Fausey and Matlock (2011) found that describing a positive or negative behavior by a politician using the imperfective aspect (versus perfective aspect) made a larger proportion of participants feel strongly confident that the politician would or would not be reelected, respectively. This may be because imperfective aspect also made participants' action inferences more extreme (e.g., they estimated that a politician had taken a larger sum of hush money when taking hush money was described using the imperfective aspect). Our findings suggest that audiences viewing a typical stump speech are likely to have a qualitatively different experience than those viewing a SOTU, characterized by more strongly valenced emotions. The mechanism responsible for this difference is emotion constraint, brought about by subtle differences in verb tense.

# AUTHOR CONTRIBUTIONS

DH and CC contributed equally to the conceptualization and design of the work, the acquisition and analysis of data, the drafting and revising of the manuscript, the final approval of the version to be published, and are equally accountable for all aspects of the work regarding its accuracy and integrity.

# ACKNOWLEDGMENTS

We wish to thank Sarah Sweeney and Nick Kavalec for their helpful contributions, and Art Glenberg for reviewing an earlier draft of this manuscript.

# REFERENCES

fpsyg-07-00899 June 20, 2016 Time: 13:28 # 8


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Havas and Chapp. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Words putting pain in motion: the generalization of pain-related fear within an artificial stimulus category

*Marc P. Bennett1,2\*, Ann Meulders2,3, Frank Baeyens1,2 and Johan W. S. Vlaeyen2,3,4*

*<sup>1</sup> Centre for Psychology of Learning and Experimental Psychopathology, Faculty of Psychology and Educational Sciences, University of Leuven, Leuven, Belgium, <sup>2</sup> Center for Excellence on Generalization Research in Health and Psychopathology, Faculty of Psychology and Educational Science, University of Leuven, Leuven, Belgium, <sup>3</sup> Research Group on Health Psychology, Faculty of Psychology and Educational Sciences, University of Leuven, Leuven, Belgium, <sup>4</sup> Department of Clinical Psychological Science, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands*

#### *Edited by:*

*Cornelia Herbert, University of Tübingen, Germany*

#### *Reviewed by:*

*Ian Stewart, National University of Ireland, Galway, Ireland Louise McHugh, National University of Ireland Maynooth, Maynooth, Ireland*

#### *\*Correspondence:*

*Marc P. Bennett, Centre for Psychology of Learning and Experimental Psychopathology, Faculty of Psychology and Educational Sciences, University of Leuven, Tiensestraat 102, Box 3712, 3000 Leuven, Belgium marc.pat.bennett@gmail.com*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 29 December 2014 Accepted: 13 April 2015 Published: 30 April 2015*

#### *Citation:*

*Bennett MP, Meulders A, Baeyens F and Vlaeyen JWS (2015) Words putting pain in motion: the generalization of pain-related fear within an artificial stimulus category. Front. Psychol. 6:520. doi: 10.3389/fpsyg.2015.00520* Patients with chronic pain are often fearful of movements that never featured in painful episodes. This study examined whether a neutral movement's conceptual relationship with pain-relevant stimuli could precipitate pain-related fear; a process known as symbolic generalization. As a secondary objective, we also compared experiential and verbal fear learning in the generalization of pain-related fear. We conducted an experimental study with 80 healthy participants who were recruited through an online experimental management system (*M*age <sup>=</sup> 23.04 years, SD <sup>=</sup> 6.80 years). First, two artificial categories were established wherein nonsense words and joystick arm movements were equivalent. Using a between-groups design, nonsense words from one category were paired with either an electrocutaneous stimulus (pain-US) or threatening information, while nonsense words from the other category were paired with no pain-US or safety information. During a final testing phase, participants were prompted to perform specific joystick arm movements that were never followed by a pain-US, although they were informed that it could occur. The results showed that movements equivalent to the pain-relevant nonsense words evoked heightened pain-related fear as measured by pain-US expectancy, fear of pain, and unpleasantness ratings. Also, experience with the pain-US evinced stronger acquisition and generalization compared to experience with threatening information. The clinical importance and theoretical implications of these findings are discussed.

Keywords: symbolic generalization, pain-related fear, chronic pain disorders, fear-avoidance model, acceptance and commitment therapy

# Introduction

Over the past 30 years, health psychologists have discovered that not just the intensity of pain, but the *fear of pain* is associated with functional disability, physical inactivity, and feelings of anxiety and depression in patients with chronic pain disorder (McCracken et al., 1992; Vlaeyen et al., 1995; Asmundson et al., 1999; Crombez et al., 1999; Zale et al., 2013). Prospective studies have shown that fear of pain predicts the development of chronic pain better than other physiological complaints, such as the severity of the original injury (Jensen et al., 1994; Gheldof et al., 2010). Also, psychological treatments that foster adaptive emotional regulation strategies can lead to meaningful reductions in disability, distress and life dissatisfaction even in the absence of pain reduction (Morley et al., 1999; George et al., 2006; Leeuw et al., 2008; Wicksell et al., 2008, 2010). This evidence collectively suggests that the emotional response to pain is a significant clinical issue that deserves attention in both research and therapy.

The *fear-avoidance model of chronic pain* appeals to associative learning processes to describe how fear of pain leads to the functional disabilities experienced by some patients (Vlaeyen and Linton, 2000, 2012). Here, pain is thought of as an unconditioned stimulus (pain-US) that motivates emotional learning. Pain's impetus comes from its sensory salience and also the catastrophic cognitions that an individual might have about its consequences, e.g., a belief that pain signifies damaged nerves. Neutral bodily movements that have been paired with pain can therefore signal the possibility of more pain or (re)-injury (conditioned stimulus; CS+) and evoke *pain-related fear*. Safety behaviors might then develop in a desperate attempt to reduce pain and avoid (re)-injury, e.g., adopting rigid gait (Volders et al., 2012). Any transient relief is likely to be attributed to these coping strategies and increase the likelihood that they will be employed again. However, safety behaviors are often so pervasive that they disrupt valued activities and this in turn has a deleterious impact on mood and sense of self.

A challenge for the fear-avoidance model has been to understand patients who are fearful of movements that never featured in pain episodes. In these cases, there appears to be a problematic (over)-generalization of fear to innocuous movements. It could be that a neutral movement evokes pain-related fear because it is proprioceptively similar to a conditioned movement; a process known as *stimulus generalization* (Meulders and Vlaeyen, 2013). To examine this possibility Meulders et al. (2013) recently used a voluntary joystick arm movement paradigm and paired a painful electrocutaneous stimulus (pain-US) with a specific movement (e.g., moving left; CS+) and did not pair the pain-US with another movement (e.g., moving right; CS−). In a subsequent testing phase without the pain-US, participants were prompted to perform intermediate movements varying in similarity to the conditioned movements. Those that were more similar to the CS+ evoked more pain-related fear than those more similar to the CS− such that a gradient was observed; the more similarity with CS+ the more fear. These findings broadly indicate that proprioceptive similarity can indeed facilitate the spreading of pain-related fear. In real-life, generalization could exacerbate the difficulties of chronic pain patients as an increasing number of movements come to elicit distress and avoidance behavior.

An interesting observation is that fear can spread to previously neutral events even if they are physically dissimilar from a conditioned stimulus. For instance, a conceptual sameness shared between arbitrary events might contribute to the (over) generalization in learned fear and this has recently been referred to as*category-based* or*symbolic generalization* (see Dymond et al., 2014; Dunsmoor and Murphy, 2015) 1 . For example, Dunsmoor et al. (2012) demonstrated that when members from a specific category (e.g., types of tools) are paired with a pain-US, other members spontaneously produce heightened fear in the absence of the US (also see, Boyle et al., in press). One method to study the symbolic generalization of fear involves the creation of artificial verbal categories with perceptually distinct stimuli, e.g., nonsense words or shapes. This is accomplished using a computer-based, operant learning procedure called a matchingto-sample (MTS) task. A single item (the *sample stimulus*) is presented onscreen for a few seconds and is followed by a set of other items. Participants then select one item from the set. From trial to trial, different sets are shown but there is always one correct item (the *comparison stimulus*): correct choices are reinforced ("Correct" appears onscreen) while incorrect choices are punished ("Wrong" appears onscreen). As such, a number of stimulus relations first are taught using corrective feedback wherein different comparison stimuli are mutually related to a common sample stimulus. In a later phase, the emergence of untrained (or derived) stimulus relations is examined using a similar format but without corrective feedback. This phase examines whether participants can reverse the previously trained stimulus relations: if presented with a comparison stimulus then they might select the appropriate sample stimulus from a set of items (*derived symmetry relations*). It is also examined whether participants can combine the previously trained stimulus relations: if presented with one comparison stimulus then they might select another comparison stimulus from a set of items (*derived equivalence relations*). Overall, physically distinct stimuli become functionally substitutable with one another and, therefore, are said to partake in a *stimulus equivalence category*. This emergent interchangeability between distinct stimuli arguably resembles a conceptual sameness between individual items in a natural language category (see, Sidman, 1971; Hayes et al., 2001; Barnes-Holmes et al., 2005). To study the extension of learned fear through these *de novo* verbal categories, an aversive US is repeatedly paired with one of the comparison stimuli (CS+). As a result, other comparison stimuli typically act as if they too predict threat and evoke fear. In this way fear generalizes to stimuli that are perceptually dissimilar to the CS+ and have not been explicitly related to the CS+ but instead share a rather abstract conceptual similarity (Augustson and Dougher, 1997; Valverde et al., 2009; Dymond et al., 2011; Vervoort et al., 2014).

Very little, if anything at all, is known about the symbolic generalization of pain-related fear. Given that visual stimuli can

<sup>1</sup>There could be some confusion between '*stimulus generalization*' and the phenomenon of '*(over)-generalization.*' The former is a principle of learning- novel

stimuli that are distinct but similar to a conditioned stimulus can evoke a conditioned response (Kalish, 1969; McLaren and Mackintosh, 2002). The latter term has been recently used to describe a clinical phenomenon whereby innocuous stimuli can evoke problematic emotional states and responses even if they never featured in an aversive learning episode (e.g., Lissek et al., 2005, 2008; Lissek and Grillon, 2010; Hermans et al., 2013; Dymond et al., 2014; Dunsmoor and Murphy, 2015). There is an obvious link between the two concepts. The (over)-generalization of emotional responses could be explained in terms of 'stimulus generalization.' However, it is important to recognize that emotional responses can still spread, or 'generalize,' to novel stimuli even in the absence of any physical overlap. For instance, the (over)-generalization of fear can be a product of shared conceptual meaning and category-membership (Dunsmoor et al., 2012; Boyle et al., in press). (For more information, see Dymond et al., 2014; Dunsmoor and Murphy, 2015; Bennett et al., in press).

evoke fear based on their membership in verbal categories, it is conceivable that proprioceptive stimuli during movements could also produce fear in this manner. As a real-world example, *lifting* could be thought of as a verbal category entailing different muscular-skeletal movements, e.g., raising a box with the back or picking up an infant with the arms, as well as different vocalizations and written words, e.g., "lift" or "raise." Should one member of this category become associated with pain then perhaps pain-related fear could generalize throughout this entire category. For example, a well-intended physiotherapist might advise- "*be cautious while lifting because it could damage the spine.*" Here, the category label, "lifting," becomes conceptually related to pain-relevant, threat attributes, "damage." This evaluation might then extend to specific movements in the category and precipitate pain-related fear in the absence of a discrete painful experience. Generally speaking, the realization that pain-related fear can spread in accordance with proprioceptive similarity was an important step in the development of a theoretical account of chronic pain-disorders symptoms. Furthering the scope inquiry to consider the complex verbal similarity movements' share might contribute to a more complete framework.

The current study sought to examine if pain-related fear can emerge due to symbolic generalization. Using a MTS task, two stimulus equivalence categories were established with nonsense shapes, words and joystick arm movements. First, selecting words or performing a movement in the presence of sample shapes was rewarded. Second, derived symmetry relations between movements (or words) and shapes were tested, as were derived equivalence relations between words and movements. Using a pain-related fear conditioning paradigm, a nonsense word from one stimulus equivalence category was associated with a pain-US (CS+) while a nonsense word from the other stimulus equivalence category was not (CS−). Lastly, participants were prompted to perform movements from both equivalence categories and informed that the pain-US could follow; when in truth it never occurred. It was predicted that participants would report heightened pain-related fear for movements equivalent to the pain-relevant nonsense words. Self-reported measures of pain-related fear, retrospective US expectancy and unpleasantness ratings were administered as proxies of pain-related fear. One could also imagine that participants would be more hesitant to initiate movements that are associated with pain. For that reason, it was predicted that movements equivalent to the CS+ would take longer to initiate than movements equivalent to the CS−.

The fear learning literature clearly indicates that fear could be installed through different pathways (Rachman, 1977; Olsson and Phelps, 2004; Dymond et al., 2012), including directly experienced CS–US pairings (e.g., Grillon and Davis, 1997) and verbal threat information (e.g., Field and Schorah, 2007). As a secondary aim, the present study investigated if verbal information about pain alone could catalyze the generalization of pain-related fear to particular movements. Using a between-groups design, one group experienced the CS+ being directly paired with the pain-US while the CS− was not. In a second group, the CS+ was paired with threatening information (e.g., "painful" and "dangerous") while the CS− was paired with safety information (e.g., "gentle" and "secure"). We predicted that both groups would show generalization of pain-related fear to the actual, equivalent movements. This could mimic the real-life emergence of pain-related fear due to the conceptual relationships between movements and certain evaluative attributes. For instance, and in real life scenarios, words (e.g., "lifting") are paired with verbal information (e.g., "is dangerous") and this can prompt evaluative change in the specific referents (i.e., the musculature involved in lifting; e.g., Muris and Field, 2010).

# Materials and Methods

# Participants

Eighty healthy participants (52 female) were recruited for this study through an online experimental management system (*M*age = 23.04 years, SD = 6.80 years, range = 18–49 years) and paid €8/h remuneration. The ethical committee of the Faculty of Psychology and Educational Sciences of the University of Leuven approved the procedure (S55215). All participants signed an informed consent form. Exclusion criteria were pregnancy, cardio-pulmonary difficulties, diagnosed psychiatric disorders or neurological conditions like epilepsy, and wrist pain. Participants were randomly assigned into one of two groups; the *pain-US* group (*N* = 41, *M*age = 22.95 years, SD = 6.80 years) and the *instructed-US* group (*N* = 39, *M*age = 23.13 years, SD = 5.86 years). Due to an experimenter error, one participant was placed into the wrong experimental condition, hence, the uneven group size. The chosen sample size was based on previous research conducted in our lab (see Vervoort et al., 2014; Bennett et al., in press).

# Apparatus

Experimental sessions were conducted in a sound-attenuated cubicle using a Dell desktop PC (17" monitor with a black background; 1024 × 768 pixels). Stimulus presentations and response recordings were controlled using Affect 4.0 (Spruyt et al., 2010). The *pain-US group* experienced an electrocutaneous stimulus. A commercial constant current stimulator (i.e., DS7A, Digitimer, Welwyn Garden City, England) delivered a 2 ms electrocutaneous stimulation (pain-US) to the wrist of the right hand, via Sensormedics electrodes (8 mm) filled with K-Y gel. An individual pain-US intensity level was decided upon during a pre-experimental calibration procedure (*M*intensity = 16.00 mA; SE = 1.70 mA). The pain-US was reliably aversive as indicated by participants ratings using a pencil and paper Likert scale where 0 = not at all unpleasant and 10 = highly unpleasant (Munpleasentness = 7.33; SE = 0.35). The *instructed-US* group was shown safety and threat information in size 32 white Arial fonts instead of the pain-US. Five threatening terms were used- *injury*, *terrible*, *danger*, *pain,* and *hurt*. Five safety terms were used- *safe*, *secure*, *gentle*, *trust,* and *peace*.

During the MTS task, two nonsense shapes (A1 and A2), 150 × 150 pixels in white font, were used as sample stimuli (see **Figure 1**). Three nonsense three-letter words (B1, B2, and B3) were shown in size 32 white Arial fonts, i.e., "Ler,"

"Zid," and "Mau," and these acted as comparison stimuli (see **Figure 1**). These words were chosen as previous research has indicated that they are neutral and not associated with a particular evaluative state, prior to conditioning (see Bennett et al., in press). Three arm movements (C1, C2, and C3) were made using a Logitech Attack 3 joystick, i.e., left, right and down, and these acted as comparison stimuli (see **Figure 1**). Joystick was operated by the participants' right arm and movements were represented as mouse coordinates on the computer screen (the cursor was not visible). A left, right, and downward arm movement was defined by the cursor moving from the middle of the screen, 0 × 0 × 0 × 0 pixels (*top* × *left* × *bottom* × *right*), into a rectangular target region (200 × 200 pixels) positioned at the left side (0 × 284 × 200 × 484 pixels), the right side, (412 × 568 × 612 × 768 pixels), and bottom (824 × 284 × 1024 × 484 pixels) of the screen, respectively. Stimuli were assigned to one of two stimulus equivalence categories; A1 = B1 = C1 and A2 = B2 = C2. During some MTS trials, participants chose to perform one of the three movements and this was cued using a 1.50 s image of three intersecting white arrows pointing left, right and down (50 × 50 pixels); the *comparison-signal* (see **Figure 2**). During other MTS trials, participants were required to perform one specific movement and this was cued using a 5 s image of a white arrow that pointed either left, right, or down (50 × 50 pixels); the *movement-signal* (see **Figure 2**). Participants could only move once the signal was removed from screen and moving too early caused a red X, size 32 font, to appear in the center of the screen. This remained onscreen until the joystick was returned to its resting position, which was defined by a virtual circle located in the center of the screen, 512 × 384 pixels and radius 328 pixels.

## Procedure

#### Pain-US Calibration

Participants confirmed that they did not meet any of the exclusion criteria and signed an informed consent form. They were then brought to the experimental room and a work-up procedure established an intensity of electrocutaneous stimulus for the experiment. The experimenter explained that it was important for the experiment that the electrocutaneous stimulus be uncomfortable and somewhat painful. Two electrodes were placed on the participant's right wrist, 1.00 cm apart. Starting at 1.00 mA, an electrocutaneous stimulus was delivered with increasing intervals of 1.00 or 2.00 mA until the stimulus was "painful but tolerable." While progressing upward through these intensities, the experimenter asked the participant to describe aloud the painfulness of the electrocutaneous stimulus, where 0 = feel nothing and 10 = maximum tolerable pain. Once the intensity was selected, participants were asked to rate the unpleasantness of the pain-US using an 11-point Likert scale.

# Matching-to-Sample Task

#### *Pre-training*

Six practice trials were completed to familiarize participants with the MTS task and joystick arm movements. Participants were told that the electrocutaneous stimulus would not yet occur. Instructions stated that on some trials a *sample stimulus* (A1 or A2) would first appear at the center of the screen and they would then have to choose one of three movements to perform (C1, C2, or C3). Participants were told that the presentation of the *comparison-signal* in the center of the screen would indicate when they were required to choose a movement. Finally, participants were instructed to only perform a movement once the signal terminated and that moving too soon would cause a red X to appear. Over three trials, A1 or A2 randomly appeared at the center of the screen for 5 s. The offset of the sample stimulus was followed by the comparison-signal for 1.50 s in the center of the screen (e.g., **Figure 2**, A→C trials). Over three trials, the experimenter directed the participant to make each movement following the offset of the comparison signal. No feedback was given.

Instructions then stated that other trials would require performing a movement (C1 or C2 or C3) and then selecting 1 of 3 items. Participants were told that the presentation of a *movement-signal* would indicate the specific movement they needed to perform. Participants were again instructed to only make the movements once the signal disappeared otherwise a red X would appear. Over three trials, a movementsignal for C1, C2, or C3 randomly appeared in the center of the screen for 5 s. When the movement-signal terminated, the experimenter directed the participant to make the movement and this resulted in the presentation of B1, B2, and B3 in a line at the center of the screen. The experimenter explained that they could select the stimulus on the left by pressing 1, select the stimulus in the middle by pressing 2, or select the stimulus on the right by pressing 3 (e.g., **Figure 2**, C→B trials). Selecting a stimulus removed all other stimuli and started the next trial. Again, no feedback was given.

#### *Trained stimulus relations*

Participants were reminded that they should (i) press 1, 2, or 3 to select items, (ii) choose 1 of 3 movements to perform when the comparison-signal appears, and (iii) perform a specific movement when a movement-signal appears. No further instructions were given for the rest of the MTS task. In the first set of

trials, A1 and A2 stimuli were *sample stimuli*, and B1, B2, and B3 were *comparison stimuli* (see **Figure 2**, A→B trials). Two trials were presented; [A1<sup>→</sup> **B1**, B2, B3] and [A2<sup>→</sup> B1, **B2**, B3] (the correct comparison is shown in **bold)**. Here, A1 (or A2) appeared in the center of the screen for 5 s. Its offset was then followed by the presentation of B1, B2, and B3 in a line at the center of the screen (the linear order was randomized). Selecting B1 (or B2) was reinforced by the following feedback, "Correct," whereas incorrect responses were followed by the following feedback, "Wrong." Feedback was presented for 1 s and trials were separated by a 3–5 s intertrial interval (ITI). Trials continued until 12 consecutively correct responses were made. In the second set of trials, A1 and A2 were sample stimuli, and C1, C2, and C3 were comparison stimuli (see **Figure 2**, <sup>A</sup>→C trials). Two trials were presented; [A1<sup>→</sup> **C1**, C2, C3] and [A2<sup>→</sup> C1, **C2**, C3]. A1 (or A2) appeared in the center of the screen followed 5 s later by the presentation of the comparisonsignal for 1.5 s. Following the offset of the comparison-signal, performing C1 (or C2) was reinforced by the following feedback: "Correct," whereas incorrect movements were followed by the feedback: "Wrong." The trials continued until 12 consecutively correct movements were made. In a final set of training trials, participants were presented with a mix of all four trial types; [A1<sup>→</sup> **B1**, B2, B3], [A2<sup>→</sup> B1, **B2**, B3], [A1<sup>→</sup> **C1**, C2, C3], and [A2<sup>→</sup> C1, **C2**, C3]. Trials were presented quasirandomly (with no more than two consecutive presentations of the same type) until 24 consecutively correct responses were made.

#### *Derived symmetry relations*

Four trials tested if participants would reverse the relation between the sample and comparison stimuli; [B1<sup>→</sup> **A1**, A2, A3], [B2<sup>→</sup> A1, **A2**, A3], [C1<sup>→</sup> **A1**, A2, A3], and [C2<sup>→</sup> A1, **A2**, A3]. These were presented four times each in a block of 16 trials without feedback. On some trials, B1 or B2 appeared in the center of the screen for 5 s followed by A1, A2, and A3 in a line on the center of the screen (see **Figure 2**, B→A trials). On other trials, a movement-signal appeared for 5 s and then participants performed the appropriate arm movement (C1 or C2). Once the movement was complete, A1, A2, and A3 appeared in the center of the screen. Pressing 1, 2, or 3 to select an item caused all stimuli to be removed from the screen (see **Figure 2**, C→A trials).

#### *Derived equivalence relations*

Four trials were presented to examine the relationship between comparison stimuli; [B1<sup>→</sup> **C1**, C2, C3], [B2<sup>→</sup> C1, **C2**, C3], [C1<sup>→</sup> **B1**, B2, B3], and [C2<sup>→</sup> B1, **B2**, B3]. These were presented four times each in a block of 16 trials, without feedback. On some trials B1 and B2 appeared in the center of the screen for 5 s followed by a 1.5 s comparison-signal. Participants then chose whether to perform C1, C2, or C3 (see **Figure 2**, B→C trials). On other trials, a movement-signal appeared for 5 s and then participants made the appropriate arm movement (C1 or C2). B1, B2, and B3 then appeared in the center of the screen and one of these was selected (see **Figure 2**, C→B trials).

### Pain-Related Fear Conditioning

For the *pain-US* group, instructions stated that nonsense words would appear in the center of the screen and that the pain-US could follow. B1 was conditioned to predict the pain-US (i.e., B1 was the CS+). B1 was presented four times for 5 s followed by the onset of the 2 ms pain-US. B1 appeared once for 5 s and was not followed by the pain-US. B2 appeared on screen five times and was never followed by the pain-US (i.e., B2 was the CS−). Trials were presented quasi-randomly, with no more than two consecutive trials with the same stimulus, and separated by a 5–9 s ITI. For the *instructed-US* group, instructions stated that extra information would be given about the nonsense word that had been seen. B1 was presented in the center of the screen five times for 5 s and followed by the onset of a 3 s threatening term, i.e., *injury*, *terrible*, *danger*, *pain,* or *hurt*. B2 was also presented five times for 5 s and followed by the onset of 3 s safety term, i.e., *safe*, *secure*, *gentle*, *trust,* or *peace*. Trials were presented quasi-randomly and separated by a 5–9 s ITI. As such, and in both groups, a member of one equivalence category (A1 = B1 = C1) was associated with a pain-US while a member of the other equivalence category (A2 = B2 = C2) was not.

#### Signaled Joystick Arm Movement Task

Instructions stated that participants were now required to make certain arm movements and that the pain-US might follow certain movements. However, at no point in this task was the pain-US presented. Participants were also reminded to wait until the movement-signals disappeared before moving otherwise a red X would appear. Here, movement-signals for C1 or C2 were presented for 5 s after which participants made the appropriate arm movements. C1 and C2 were randomly presented once each in a single block. Overall, four blocks were presented. Trials could only be completed once a movement was performed and were separated by a 5–9 s ITI.

### Outcome Measures Manipulation Checks

Symbolic generalization requires (i) the establishment of a stimulus equivalence category and (ii) a learned fear response. To check for the first criterion, the number of correct responses during the derived symmetry and derived equivalence phases were recorded. Accuracy scores were then calculated for each participant by expressing the total of correct responses as a percentage of the number of trials in each part. An accuracy score greater than 87.50% (14/16 correct responses) was taken to indicate the successful completion of the symmetry and equivalence phases. The mean number of MTS training trials was also calculated and a one-way analysis of variance (ANOVA) was run to examine if the pain-US and instructed-US group differed in the number of MTS training trials. Three one-way ANOVAs were calculated to examine if the pain-US group and instructed-US group differed in performance during (i) MTS training, (ii) symmetry testing, and (iii) equivalence testing.

To check for the second criterion, participants were asked to report the unpleasantness of the CS+ (i.e., B1) and CS− (i.e., B2). This was assessed at the very end of the experimental study. The question "*How unpleasant did you find this word?*" appeared on the top of the screen with B1 or B2 presented in the center of the screen and followed 1.50 s later an 11-point Likert scale, where 0 = not at all, 5 = uncertain, and 10 = highly unpleasant. A repeated measures ANOVA was then calculated to examine the effect of (i) stimulus and (ii) type of US on unpleasantness ratings for CSs. There was 1within-subjects factor (*stimulus*) with two levels; CS+ and CS−. There was also 1 betweensubjects factor (*group*) with two levels; pain-US group (directly experienced the pain-US) and the instructed-US group (informed about threat/safety).

### Symbolic Generalization of Pain-Related Fear

#### *Self-report measures*

After the signaled joystick arm movement task, participants were informed that they would be asked a series of questions. Each question appeared on the top of the computer screen with the movement-signal for C1 or C2 in the center of the screen. Answers were provided using a mouse-click on an 11-point Likert scale (where 0 = not at all, 5 = uncertain, and 10 = definitely), which was shown at the bottom of the screen. The first two questions measured retrospective pain-US expectancy for C1 and C2; participants were asked, "*How much did you think the electrical stimulation would follow this movement*?" The next two questions measured pain-related fear for C1 and C2; participants were asked, "*How fearful where you while making this movement?*" The final two questions measured the valence for C1 and C2; participants were asked, "*How unpleasant did you find this movement?"*

The mean pain-US expectancy rating, mean pain-related fear rating and mean unpleasantness of movements were calculated for the C1 and C2 movements. A series of mixed repeated measures ANOVAs were then calculated to examine the effect of (i) stimulus and (ii) group on the self-report measures. For each ANOVA there was 1 within-subjects factor (*stimulus*) with two levels; C1 (equivalent to the CS+) and C2 (equivalent to the CS−). There was 1 between-subjects factor (*group*) with two levels; pain-US (experienced pain) and instructed-US (instructed about threat).

### *Reaction time measures*

Response latency was recorded for each movement in each block during the signaled joystick arm movement task. This was defined as time between the termination of the movementsignal and the time taken to initiate an arm movement (the joystick deviating from its resting-position). In accordance with the recommendation of Meulders and Vlaeyen (2013), all reaction times shorter than 250 ms and longer than 3000 ms were eliminated. In addition, mean response latency scores for the C1 and C2 were calculated for each participant, and latencies more than 3 SDs from the mean were eliminated (see Meulders and Vlaeyen, 2013). Overall, 2.35% of the overall data set was discarded. A repeated measures ANOVA was calculated to compare the effects of (i) stimulus and (ii) group on the response latency. This model entailed two within-subjects factor; *stimulus*, which had two levels (C1 and C2) and *block*,

which had four levels (blocks 1–4). There was 1 betweensubjects factor (*group*), which had two levels (pain-US and instructed-US).

# Results

Where Mauchly's test revealed that sphericity could not be assumed, the Greenhouse–Geisser correction is reported. The alpha-level was set at 0.05 and effect size was calculated using the partial ETA squared (η<sup>2</sup> p). Bonferroni corrections were used as the rejection criteria when pairwise comparisons were calculated.

#### Matching-to-Sample Task

A mean of 68.47 MTS training trials (SE = 1.56) were required and there was high accuracy of responding (*M* = 88.99%, SE = 0.57%). The one-way ANOVA indicated that the pain-US group required significantly more MTS training trials than the instructed-US, *<sup>F</sup>*(1,78) <sup>=</sup> 4.49, *<sup>p</sup>* <sup>=</sup> 0.04, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.04. However, just four outliers in the pain-US group drove this difference. In addition, a one-way ANOVA indicated that the groups did not significantly differ in terms of accuracy during MTS training, *F <* 1.00, *p* = 0.43. More importantly, a high level of accuracy was achieved during the symmetry testing (*M* = 88.75%, SE = 2.25%). One-way ANOVA indicated that the two groups did not significantly differ in terms of their accuracy during symmetry testing, *F <* 1, *p* = 0.33. Finally, a high level of accuracy was achieved during the equivalence testing (*M* = 89.66%, SE = 2.57%) and a one-way ANOVA indicates that the two groups did not differ in their performance, *F <* 1, *p* = 0.93. The accuracy during the symmetry and equivalence testing suggests that stimulus equivalence categories were reliably established. Therefore, the criterion of the first manipulation check was met.

## Unpleasantness of the Original CSs

The 2 (stimulus) × 2 (group) repeated measures ANOVA revealed a main effect of stimulus, *F*(1,78) = 148.22, *p <* 0.001, η2 <sup>p</sup> <sup>=</sup> 0.66 (see **Figure 3A**). The CS<sup>+</sup> was rated as more unpleasant than the CS− for the pain-US group, *t*(40) = 9.82, *p <* 0.001, *d* = 5.34, and the instructed-US group, *t*(38) = 7.33, *p <* 0.001, *<sup>d</sup>* <sup>=</sup> 3.36 (see **Figure 3A**). This suggests that conditioning was complete. The criterion of the second manipulation check was therefore met. Interestingly, a main effect of group was also observed, *<sup>F</sup>*(1,78) <sup>=</sup> 14.17, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.15, as was a significant interaction between stimulus and group, *F*(1,78) = 7.69, *p <* 0.01, η<sup>2</sup> <sup>p</sup> = 0.09. The CS+ was rated as more unpleasant in the pain-US group than in the instructed-US, *t*(75) = 3.86, *p <* 0.001, *d* = 2.27. This suggests that the threat value of the CS was higher when paired with the actual US as opposed to threatening information. On the other hand, the two groups did not significantly differ in terms of CS− unpleasantness ratings, *t*(78) = 0.78, *p* = 0.44.

#### Pain-US Expectancy

A 2 (stimulus) × 2 (group) repeated measures ANOVA indicated a main effect of stimulus, *<sup>F</sup>*(1,77) <sup>=</sup> 94.10, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.55 (see **Figure 3B**). In line with our predictions, movements equivalent to the CS+ prompted higher pain-US expectancy than

movements equivalent to the CS− in both the pain-US group, *t*(40) = 7.91, *p <* 0.001, *d* = 4.53, and the instructed-US, *<sup>t</sup>*(37) <sup>=</sup> 5.82, *<sup>p</sup> <sup>&</sup>lt;* 0.001, *<sup>d</sup>* <sup>=</sup> 3.21 (see **Figure 3B**). There was no main effect of group, *F* = 1.20, *p* = 0.27, nor was there an interaction between group and stimulus, *F* = 2.76, *p* = 0.10. This indicates that the groups did not differ in their expectancy ratings for movements equivalent to the CS+, *t*(77) = 1.60, *p* = 0.12, and movements equivalent to the CS−, *t*(77) = 0.33, *p* = 0.74.

#### Fear of Pain

A 2 (stimulus) × 2 (group) ANOVA indicated a main effect of stimulus on self-reported fear of pain, *F*(1,77) = 70.75, *p <* 0.001, η2 <sup>p</sup> <sup>=</sup> 0.48 (see **Figure 3C**). As predicted, movements equivalent to the CS+ evoked higher pain-related fear ratings than movements equivalent to the CS− in the pain-US group, *t*(40) = 6.89, *p <* 0.001, *d* = 3.71, and the instructed-US, *t*(37) = 4.98, *p <* 0.001, *d* = 2.32. Interestingly, a main effect of group was also observed, *<sup>F</sup>*(1,77) <sup>=</sup> 5.98, *<sup>p</sup>* <sup>=</sup> 0.01, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.07, and the interaction between group and stimulus was nearing significance, *<sup>F</sup>*(1,77) <sup>=</sup> 3.78, *<sup>p</sup>* <sup>=</sup> 0.056, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.05. The pain-US group reported significantly more pain-related fear in response to movements equivalent to the CS+ than the instructed-US group, *t*(77) = 2.54, *p* = 0.01, *d* = 1.73. On the other hand, the two groups did not differ in pain-related fear in response to movements equivalent to the CS−, *t*(73) = 0.88, *p* = 0.38.

#### Unpleasantness of the Movements

A 2 (stimulus) × 2 (group) repeated measures ANOVA indicated a main effect of stimulus on the self-reported unpleasantness of movements, *<sup>F</sup>*(1,78) <sup>=</sup> 40.68, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>p</sup> = 0.34 (see **Figure 3D**). As predicted, movements equivalent to the CS<sup>+</sup> were rated as more unpleasant than movements equivalent to the CS− in both the pain-US group, *t*(40) = 5.19, *p <* 0.001, *d* = 2.83, and the instructed-US group, *t*(38) = 3.79, *p <* 0.01, *d* = 1.61. A main effect of group was also observed, *F*(1,78) = 13.31, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.15, but there was no significant interaction effect, *F* = 3.04, *p* = 0.09. Interestingly, the pain-US group rated the CS+ equivalent movements as significantly more unpleasant then the instructed-US, *t*(78) = 3.26, *p <* 0.01, *d* = 2.07. Finally, and using Bonferroni's corrected alpha level (α = 0.01), there was no difference in how the two groups rated the unpleasantness of the CS− equivalent movements, *t*(62) = 2.18, *p* = 0.03.

### Response Latency

A 2 (stimulus) × 2 (group) × 4 (block) ANOVA indicated no main effect of stimulus, *<sup>F</sup> <sup>&</sup>lt;* 1, *<sup>p</sup>* <sup>=</sup> 0.87 (see **Figure 4**). There was also no main effect of group, *F* = 1.98, *p* = 0.16. There was, however, main effect of block, *F*(3,198) = 5.60, *p <* 0.001, η<sup>2</sup> <sup>p</sup> = 0.08. Pairwise comparisons indicated that the mean response latency during the first block were significantly longer than those in the third block, *d* = 95.11, SE = 30.91, *p* = 0.02, and fourth block, *d* = 117.22, SE = 36.10, *p* = 0.01. Also, the mean response latency during the second block were significantly longer than those in the fourth block, *d* = 100.47, SE = 34.79, *p* = 0.03. This suggests that participants performed the specific movements quicker as the signaled joystick arm movement task progressed.

# Discussion

Previous research has clearly shown that a conceptual sameness between individual events can facilitate the generalization of learned fear; this has been termed symbolic generalization (Dymond et al., 2011, 2014). The present study investigated if movements could come to specifically evoke pain-related fear in this manner. The results demonstrated that pain-related fear spread from conditioned nonsense word (CS) to joystick arm movements from within the same stimulus equivalence category. In accordance with our predictions, movements from the pain-relevant stimulus equivalence category spontaneously prompted higher pain-US expectancy ratings, fear of pain ratings and unpleasantness ratings than movements from the painirrelevant stimulus equivalence category. This finding is particularly interesting given that the movements themselves were never paired with pain-US, nor were the movements in anyway perceptually similar to the nonsense word stimuli that had been associated with pain. It is also interesting given that the movements and nonsense words were never explicitly related to one another. Participants derived the stimulus equivalence category without any corrective feedback during the derived symmetry and equivalence phases. Overall, it appears that movements can become conceptually related to pain-relevant words through a process of stimulus equivalence-based category formation and that this conceptual relation can facilitate the emergence of pain-related fear. To the extent that stimulus equivalence is involved in real-world verbal behavior (for further discussion see, Hayes et al., 2001; Barnes-Holmes et al., 2005; Dymond, 2014), the current study may describe a unique means for movements to evoke pain-related fear in the absence of a pain episode.

The present study also investigated if verbal information about potential harm could also promote the symbolic generalization of pain-related fear. First, nonsense words that were paired with threat information prompted higher unpleasantness ratings than those paired with safety information, suggesting a change in stimulus valence following conditioning. As predicted, movements that were equivalent to the threat-associated nonsense words then evoked higher pain-US expectancy ratings, fear of pain ratings and unpleasantness ratings than movements equivalent to the safety-relevant stimuli. This observation points to the impressive control that verbally relating movements and evaluative terms can have over emotional responding (also see, Blackledge, 2007; McCracken and Morley, 2014; McCracken and Vowles, 2014). Neutral joystick arm movements evoked heightened fear because of a derived equivalence relation with nonsense words, which were themselves paired with threatening information. Overall, this indicates that conceptually linking movement-terms (e.g., "lifting") to particular evaluative attributes (e.g., "danger" or "safe") can alter emotional responding to the actual movements.

Response latencies were expected to be longer for movements from the pain-relevant stimulus equivalence category relative to movements from the pain-irrelevant category. This would suggest a hesitation to perform movements associated with pain, and strengthen the claim that pain-related fear and affiliated avoidance behavior generalized through verbal relations. No such difference was observed. However, previous research suggests response latencies may be less sensitive to the generalization of pain-related fear than other fear measurements. Meulders et al. (2013) found that joystick arm movements that were paired with a pain-US elicited an elevated eye-blink startle response and this subsequently generalized to proprioceptively similar movements. On the other hand, longer response latencies were observed for arm movements that were paired with the pain-US but the same was not observed for proprioceptively similar movements. Future research will be required to examine why such an asymmetry is observed across different fear measurements. One commonality between our study and Meulders et al. (2013) was the use of a basic Logitech Attack 3 joystick. Perhaps future research could benefit from the use of more sensitive and informative technologies to measure arm movements (e.g., Houtsma and Van Houten, 2006). On a related note, it will also be important for future research to measure pain-related fear more directly using physiological measures like skin conductance and startle-reflex potentials (e.g., Lissek et al., 2008; Vervoort et al., 2014). This would provide clearer evidence that the conditioning procedure did indeed install fear/safety of the conditioned stimuli. A limitation of the current study is that we rely self-reported stimulus valence for this information.

Importantly, the current findings indicated that direct experience with pain-US is dominant over verbal information in the initial acquisition and subsequent (symbolic) generalization of pain-related fear. Participants who directly experienced the pain-US demonstrated stronger pain-related fear conditioning than those who received verbal threat information. And this heightened acquisition of pain-related fear may have lead to the heightened generalization of pain related fear. That is, movements prompted higher fear of pain and higher unpleasantness ratings when equivalent to CSs that were paired with the pain-US rather than threat information. These results are congruent with recent research found elsewhere. In a within-subjects design, Raes et al. (2014) first paired one stimulus (CS1+) with an electrocutaneous US and another stimulus (CS2+) with a 'placeholder' that represented the US. This placeholder was explained to participants as a way of preventing the delivery of too many shocks so early in the experiment. In a second phase, participants were instructed that both stimuli would be followed by the actual US for real. Prior experience with a CS–US contingency had an additive effect over instructed fear as CS1+ then prompted higher fear ratings than CS2+. It appears that direct experience with CS–US pairings makes a distinct contribution to fear learning over verbal information. Such nuances between different pathways for painrelated fear learning could be consequential in the assessment and treatment of chronic pain. For instance, it may be important to consider whether a patient had any (in)-direct experience with pain to gauge the intensity of pain-related fear and evaluate the risk of generalization. However, we cannot discount the possibility that the between group difference reflects a procedural artifact. During the pain-related fear conditioning, participants were given quite general information about the CS, e.g., MAU→ "hurt" and VEK→ "safe." Perhaps conditioning effects would be more comparable between the groups if threat information was more specific, e.g., "MAU will be followed by an electric stimulus" (see Raes et al., 2014).

As far as we are aware, no other study has shown that proprioceptive stimuli can partake in stimulus equivalence categories. Although Tierney et al. (1995) designed an innovative MTS task to establish stimulus equivalence categories with haptic stimuli. Three sticks, each of which had a different center of mass, were placed within the grasp of participants but beyond their visual range. Therefore, the sticks could only be discriminated by their haptic properties once they were placed in the participants' hands. During some training trials, a sample-word was presented and the selection of one stick was reinforced. During other training trials, a stick was placed in the participants' hands as the sample stimulus and the selection of a different comparison-word was reinforced. Symmetry relations emerged during the testing phase. Participants selected the appropriate previous sample-word when holding a particular stick and, also, selected the appropriate stick when presented with one of the comparison-words. Derived equivalence relations were also observed. Participants selected the appropriate comparison-word stimulus in the presence of one of the sample-words, and vice versa. Tierney et al. (1995) trained the baseline stimulus relations such that the comparison haptic stimulus for one relation was the sample stimulus for the next relation (a *linear* MTS task). As a result, haptic stimuli could only take part in symmetrical relations with words and not equivalence relations. In our procedure nonsense words and proprioceptive stimuli both served as comparison stimuli to a common sample symbol (a *one-to-many* MTS task). A benefit of our approach is that proprioceptive stimuli could be observed to participate in both (i) derived symmetry with the sample symbol and (ii) derived equivalence relations with the nonsense words.

A key finding in the current study is that verbally categorizing movements with pain-relevant words (through stimulus equivalence learning) can create a potential for unwarranted pain-related fear. It is worth mentioning that a very similar, if not an identical, mechanism is supposed by some to be at the core of human psychopathology. *Acceptance and Commitment Therapy* (ACT; Hayes et al., 1999) is a relatively recent addition to the behavior and cognitive therapies, and has been found to significantly improve emotional, social and physical functioning in chronic pain patients (e.g., McCracken and Vowles, 2007, 2008; Vowles and McCracken, 2008; McCracken and Velleman, 2010). A central assertion in ACT is that humans readily infer verbal rules or relationships and this often becomes a problematic source of behavioral control that dominates over actual experiences; this is referred to as *cognitive fusion* (see Vilardaga et al., 2009; Hayes et al., 2013). As a simple example, individuals with chronic pain might conceptualize certain movements as 'pain-relevant' and 'disabling' and reify this rule, despite the fact these movements might have never causally featured in a pain episode. Cognitive fusion is often described as a therapeutic construct that speculated to be based on fundamental learning processes such as symmetry and equivalence relations, symbolic generalization as well as Pavlovian and operant conditioning (Hayes et al., 1999, 2013). However, a drawback of this novel approach is a paucity of research that clearly describes how learning processes might relate to the components of ACT, like cognitive fusion (see Arch and Craske, 2008; Dymond et al., 2013; Vlaeyen, 2014). In the context of the current study, we demonstrated that emotional response to physical pain can indeed be influenced by verbal relations; this experimental model might elaborate on the learning mechanisms underlining cognitive fusion in chronic pain disorders. Particular arm movements, which were never before painful, controlled pain-related fear because of their derived equivalence to words that were associated with physical pain (also see, Barnes-Holmes et al., 2004; Blackledge, 2007; Dymond and Roche, 2009). This represents a first step in our research unit to investigate the role of verbal categories in the generalization of pain-related fear and chronic pain disorders. In future, it will be important for us to further explore the core learning processes underlining ACT.

# Conclusion

The present study investigated whether joystick arm movements could evoke pain-related fear due to their participation in a *de novo* verbal category. An artificial stimulus equivalence category was established in which nonsense words and joystick arm movements were equivalent. When nonsense words were associated with pain, joystick arm movements from within the same stimulus equivalence category spontaneously elicited pain-related fear. This highlights a unique pathway for the emergence of painrelated fear in the absence of a discrete pain episode. The present study also employed a between-groups design in which words were associated with pain through direct pairing with the pain-US or through verbal information about threat. While both pathways excited the symbolic generalization of pain-related fear,

# References


direct experience with the pain-US had a stronger effect. This may be valuable information when considering the etiology of painrelated fear in chronic pain disorders. Finally, and from a broad clinical perspective, we imagine that this experimental study may speak to the learning mechanisms underlining cognitive fusion in ACT. When considering these promising first results, we contend that it will be particularly intriguing for future research to further explore the role of complex verbal relations in the acquisition, and possibly even the attenuation, of pain-related fear.

# Acknowledgments

MB is a doctoral student supported by a research project funded by the Research Foundation of Flanders (FWO; grant ID# G051811N10). AM is a postdoctoral researcher of the Research Foundation Flanders (FWO; grant ID# 12E3714N). The contribution of JV was supported by the Odysseus Grant "The Psychology of Pain and Disability Research Program" funded by the Research Foundation Flanders (FWO; grant ID# G090208N) and AM was also supported by an EFIC-Grünenthal Research Grant (E-G-G ID: 169518451). The authors would like to thank Katrien Van Rijsselberge and Marijke Ruytings for their assistance in the data collection.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Bennett, Meulders, Baeyens and Vlaeyen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The role of emotionality in the acquisition of new concrete and abstract words

#### *Pilar Ferré1\*, David Ventura1, Montserrat Comesaña2 and Isabel Fraga3*

*<sup>1</sup> Research Center for Behavior Assessment and Department of Psychology, Rovira i Virgili University, Tarragona, Spain, <sup>2</sup> Human Cognition Lab, CIPsi, School of Psychology, University of Minho, Braga, Portugal, <sup>3</sup> Cognitive Processes & Behavior Research Group, Department of Social Psychology, Basic Psychology and Methodology, University of Santiago de Compostela, Santiago de Compostela, Spain*

A processing advantage for emotional words relative to neutral words has been widely demonstrated in the monolingual domain (e.g., Kuperman et al., 2014). It is also wellknown that, in bilingual speakers who have a certain degree of proficiency in their second language, the effects of the affective content of words on cognition are not restricted to the native language (e.g., Ferré et al., 2010). The aim of the present study was to test whether this facilitatory effect can also be obtained during the very early stages of word acquisition. In the context of a novel word learning paradigm, participants were trained on a set of Basque words by associating them to their Spanish translations. Words' concreteness and affective valence were orthogonally manipulated. Immediately after the learning phase and 1 week later, participants were tested in a Basque go-no go lexical decision task as well as in a translation task in which they had to provide the Spanish translation of the Basque words. A similar pattern of results was found across tasks and sessions, revealing main effects of concreteness and emotional content as well as an interaction between both factors. Thus, the emotional content facilitated the acquisition of abstract, but not concrete words, in the new language, with a more reliable effect for negative words than for positive ones. The results are discussed in light of the embodied theoretical view of semantic representation proposed by Kousta et al. (2011).

Keywords: emotional words, concreteness, novel vocabulary learning, translation task, lexical decision

# Introduction

During the last decade, studies devoted to the relationship between language and emotion have grown exponentially. A large body of research has focused on the effects of the emotional content of words on several cognitive processes, such as word recognition, attention, or memory (see Yiend, 2010; Citron, 2012; Talmi, 2013; Kuperman et al., 2014, for recent overviews). Although the results are not entirely consistent, there is now high consensus in that affectively valenced words show an advantage in processing with respect to neutral words, as revealed in word recognition tasks (e.g., lexical decision, Kousta et al., 2009; Kuperman et al., 2014), in naming tasks (Kuperman et al., 2014) or in memory tasks (e.g., Herbert et al., 2008; Talmi, 2013; Ferré et al., 2014), among others. Importantly, the effects of the affective content of words on cognition are not restricted to the native language. Indeed, they are also observed in the non-native languages of multilingual speakers who have a certain degree of proficiency in these languages (e.g., Ferré et al., 2010; Ponari et al., 2015;

#### *Edited by:*

*Cornelia Herbert, Clinic for Psychiatry and Psychotherapy, Germany*

# *Reviewed by:*

*Anna Hatzidaki, University of Athens, Greece Jelena Havelka, University of Leeds, UK*

#### *\*Correspondence:*

*Pilar Ferré, Research Center for Behavior Assessment and Department of Psychology, Rovira i Virgili University, Carretera de Valls s/n, 43007-Tarragona, Spain mariadelpilar.ferre@urv.cat*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 22 December 2014 Accepted: 29 June 2015 Published: 09 July 2015*

#### *Citation:*

*Ferré P, Ventura D, Comesaña M and Fraga I (2015) The role of emotionality in the acquisition of new concrete and abstract words. Front. Psychol. 6:976. doi: 10.3389/fpsyg.2015.00976* see also Caldwell-Harris, 2014, for a recent overview). A relevant question is whether the affective content of words has also an effect during the very early stages of foreign language acquisition. The present research addresses this issue by using a novel-word learning paradigm.

This paradigm has been typically used in the literature to identify the factors that facilitate the acquisition and consolidation of new words in the vocabulary. Usually these words belong to a foreign language, although they can also be pseudowords or very low frequency words in the native language (see De Groot, 2011, for an overview). Different training methods have been employed along with this paradigm. A popular method is paired-associate learning, in which the novel words are paired with their translations in the first language (L1, e.g., De Groot and Keijzer, 2000; Comesaña et al., 2009, 2012; Altarriba and Basnight-Brown, 2011a; Geukes et al., 2015). Other methods consist of associating new words and their concepts by means of pictures (Yu and Smith, 2007; Comesaña et al., 2009, 2012; Palmer and Havelka, 2010), or presenting novel words together with their definitions (e.g., Clay et al., 2007; Palmer et al., 2013; Tamminen and Gaskell, 2013).

To test the effectiveness of the above methods, researchers have relied on different tasks, depending on the particular goal of the study. For instance, researchers interested on assessing the number of words acquired in a prior training episode have mostly applied cued recall tasks (i.e., to produce a word in response to a specific cue, De Groot, 2011). In particular, a widely used procedure has been to ask participants to produce either the L1 translation equivalents of the novel words acquired in the foreign language (the so-called backward translation, De Groot and Keijzer, 2000; Kaushanskaya and Marian, 2009; Farley et al., 2012), or the translation equivalents in the foreign language in response to L1 words (i.e., forward translation, De Groot and Keijzer, 2000). Other studies have tried to determine the state of the new vocabulary, assessing whether the recently learned words behave as familiar words in different tasks, such as the lexical decision task (LDT; Elgort, 2010; Palmer et al., 2013), and exploring what different types of knowledge the learner has acquired about the new words.

The main focus of interest in this line of research concerns semantic knowledge. Thus, researchers have investigated whether effective links between the novel words and the corresponding concepts are established. To do so, they have used different approaches: some authors have tested whether the performance in different tasks is affected by the semantic characteristics of the new words, such as their concreteness (De Groot and Keijzer, 2000; Kaushanskaya and Rechtzigel, 2012; Palmer et al., 2013). Others have investigated whether specific semantic effects can be obtained with these new words. Among them, there are the semantic priming effect (Elgort, 2010; Tamminen and Gaskell, 2013), the Stroop effect (Geukes et al., 2015), and the interference effect produced by words semantically related to the correct translation in a translation recognition task (Comesaña et al., 2009, 2012; Altarriba and Basnight-Brown, 2011a; Poarch et al., 2014).

The results of the above research have revealed that different variables can modulate the outcome of novel-word learning procedures. A relevant factor is the training method used. Thus, although direct access from the new words to concepts seems to be achieved with methods based on lexical associations (see Comesaña et al., 2012; Poarch et al., 2014, for recent reviews), stronger lexico-semantic links are found with methods that emphasize conceptual mediation, such as those based on picture-L2 associations (Comesaña et al., 2009; Dobel et al., 2010), or on the presentation of definitions together with the novel words (Tamminen and Gaskell, 2013). Another variable to consider is the time elapsed between the acquisition phase and the testing phase. Most studies assess performance in two different moments: shortly after learning and then, again, after a period time (i.e., several hours or days). These studies reveal that there is a decrease in performance between the two sessions, suggesting that some information has been forgotten (e.g., De Groot and Keijzer, 2000) and that a consolidation period seems to be required for several effects to appear (e.g., lexical competition with similar previously known words, Davis et al., 2008, or semantic interference, Comesaña et al., 2009).

Of relevance for the present study is the type of word, another variable that has shown to affect new vocabulary training results. Researchers have focused on cognate status and concreteness, demonstrating that cognate words are learned faster than noncognate words (De Groot and Keijzer, 2000; Tonzar et al., 2009; Comesaña et al., 2012), and that concrete words are learned faster than abstract words (De Groot and Keijzer, 2000; Altarriba and Basnight-Brown, 2011a; Kaushanskaya and Rechtzigel, 2012; Palmer et al., 2013). In contrast, the effect of the affective properties of words, in spite of their widespread study in other fields, has been scarcely addressed in this literature.

To our knowledge, the only study that has dealt with the effect of the emotional content of novel words on their acquisition is that of Altarriba and Basnight-Brown (2011a). In this study, native speakers of English learned the Spanish translations of concrete, abstract and emotion words (i.e., words that label an emotion, -scared), by associating them to their English translation equivalents. After training, participants were tested in a Stroop task and in a backward translation recognition task. In the Stroop task, participants were required to press a key denoting the ink color of the Spanish words presented. The authors found faster responses to emotion words than to concrete and abstract words. This result contrasts with the typical emotional Stroop effect repeatedly found in both people's native language and in bilinguals' second language [i.e., faster response times (RTs) for neutral words than for affectively charged words, which are considered a result of the attentional capture by the latter, e.g., Eilola et al., 2007]. Concerning the backward translation recognition task, participants were presented with pairs including a new learned Spanish word and an English word that could be either its translation equivalent, an English word semantically related to the correct translation, or an unrelated word. With respect to incorrect translations, concrete, abstract and emotion words revealed the same pattern of results. That is, participants took longer times to reject as incorrect translations semantically related words than unrelated words (i.e., a semantic interference effect), suggesting that the link between the novel words and concepts had been established during training. Importantly, regarding correct translations, participants responded slower to emotion words than to abstract or concrete ones, the later words showing the shortest RTs. The finding of slower RTs for emotion words than for neutral words is also at odds with all the literature reporting an advantage for affectively valenced words in tasks such as lexical decision (e.g., Kousta et al., 2009), naming (e.g., Kuperman et al., 2014), or free recall (e.g., Altarriba and Bauer, 2004; Herbert et al., 2008; Ferré et al., 2014). In light of these findings, Altarriba and Basnight-Brown (2011a) concluded that recently acquired emotion words do not have the same properties as familiar words. They argued that the semantic representation of the former is less rich than that of the latter. The reason would be that only familiar words would have been experienced in emotional contexts over a long period of time.

As stated above, the study of Altarriba and Basnight-Brown (2011a) is the only one that has addressed the effects of emotional content on novel-word learning, making an interesting contribution to the field. However, the authors only tested negatively valenced emotion words. As past research in word processing and memory suggests that the experimental findings obtained with negative words do not always converge with those obtained with positive words (e.g., Estes and Adelman, 2008; Herbert et al., 2008), it is relevant to compare these two types of valenced words in vocabulary acquisition paradigms. On the other hand, it should be mentioned that Altarriba and Basnight-Brown (2011a) did not completely disentangle the effects of emotional content and concreteness in their results, since some of the words included in the abstract condition might have been affectively charged too (e.g., virtue). In order to elucidate which effects are produced by emotional content and which are produced by concreteness, an orthogonal manipulation of both variables should be done.

The orthogonal manipulation of emotional content and concreteness is, in fact, relevant in light of a recent theoretical proposal of Kousta et al. (2011). These authors have proposed an embodied theoretical view of semantic representation according to which sensory-motor information would be central to the representation and processing of concrete words, whereas affective information would be more relevant in the representation of abstract words. Thus, abstract words would be more affectively loaded than concrete words. Importantly, Kousta et al. (2011) posited that the emotional content would play an important role during language acquisition, facilitating the acquisition of abstract lexical concepts and their labels during childhood. To confirm that, these researchers conducted a regression analysis on a large set of words and observed that, for abstract words, valence and age of acquisition were related by a U-shaped function. That is, abstract emotional words seem to be acquired before abstract neutral words.

If emotional content can facilitate the acquisition of abstract words, it might be possible that this modulation is not only observed when children acquire their first language, but also when adult people learn vocabulary in a new language. This is the prediction we aimed to test in the present

work by orthogonally manipulating emotional content and concreteness. We used a novel word learning paradigm in which participants learned the Basque translations of concrete and abstract Spanish words that were positive, negative or neutral, by associating them to their Spanish translations. (Basque is an ancient pre-Indo-European language spoken in a small area in the eastern part of the Pyrenees.) In particular, we investigated whether the acquisition of abstract words in a new language is modulated by their emotional content to a greater extent than the acquisition of concrete words.

To sum up, the aim of the present work was to shed further light on the characteristics of the words that can facilitate their acquisition in a paired-associate word learning paradigm. We used the paired-associate learning task because it has been commonly employed in foreign-language training programs (De Groot and Keijzer, 2000; De Groot, 2011). Furthermore, this procedure allows the inclusion of both concrete and abstract words as experimental materials, in contrast to other paradigms such as the association of novel words to pictures, which can be used only with concrete words.

Therefore, participants learned a set of novel Basque words, by associating them to their Spanish translations. Then, they were tested in two different tasks: a go/no-go LDT and a backward translation task, both immediately after acquisition and 1 week later. These two testing sessions were included to assess long-term word retention. We used the LDT to explore whether emotional content produces an advantage in the recognition of recently trained foreign words, as has been observed with words in the native language (Kousta et al., 2009; Kuperman et al., 2014) as well as with words in the L2 of proficient bilinguals (Ponari et al., 2015). Regarding the translation task, it was used as a measure of the participants' success in linking the novel Basque words to their referents in L1. In fact, the translation task is the most widely employed in the literature to assess the success of training procedures in terms of the number of learned words (De Groot and Keijzer, 2000; De Groot, 2011; Kaushanskaya and Rechtzigel, 2012).

Taking into consideration the results of past studies in novel vocabulary acquisition (e.g., De Groot and Keijzer, 2000), we expected both a concreteness advantage and a session effect (i.e., better performance during the first session than during the second session). Importantly, if emotional content modulates the acquisition of new vocabulary, we would expect better performance with emotional words than with neutral words. Furthermore, if the emotional content mainly facilitates the acquisition of abstract words, as could be predicted from the proposal of Kousta et al. (2011), we should expect an interaction between emotional content and concreteness. Finally, taking into account that, in order to decide whether a given string of letters is a Basque word (i.e., the LDT task), participants can rely on the familiarity with its form, whereas in order to produce the Spanish equivalent, they have to rely on the links established during acquisition between the translation equivalents, we expected a better performance in the LDT than in the translation task.

# Materials and Methods

#### Participants

Fifty undergraduate students of Psychology (41 women) from the University Rovira i Virgili (Tarragona, Spain) took part in the experiment (*M*age = 21.8, SD = 4.4). All of them were highly proficient and balanced Catalan-Spanish bilinguals and had not any knowledge of the Basque language. They had normal or corrected-to-normal vision and all of them received a course credit for their participation.

## Materials

The stimuli used in the present study comprised a set of 48 Spanish words and their Basque translations. The 93.4% of the words were nouns and the 6.4% were adjectives. Spanish words were obtained from two normative databases: the Spanish Adaptation of ANEW (Redondo et al., 2007), and the Affective Norms of Ferré et al. (2012). The Basque translations were obtained from an on-line Spanish-Basque dictionary published by the Autonomous Basque Government (Elhuyar Online Dictionary of Spanish and Basque, 2003) and were checked by a proficient bilingual of Basque and Spanish. Words were divided into six sets: concrete positive words, concrete negative words, concrete neutral words, abstract positive words, abstract negative words, and abstract neutral words (see Data Sheet 1).

The Spanish words in the critical conditions were matched in several variables that can affect word processing (see **Table 1**). The values of frequency of use, length, number of lexical neighbors, mean bigram frequency, concreteness, and imageability were obtained from B-Pal (Davis and Perea, 2005). We also considered the degree of orthographic similarity (OS) between the Spanish words and their Basque translations, by using the NIM database (Guasch et al., 2013). NIM computes the index of van Orden (1987), which ranges from 0 (not similar at all) to 1 (exactly the same). The words selected had values of OS lower than 0.5 (i.e., they were non-cognates) to avoid influences of crosslanguage similarities on novel word acquisition and processing. As participants were bilinguals of Catalan and Spanish, and they had some knowledge of English, the OS between the novel Basque words and their translations in Catalan and English were also considered. Additionally, the OS between Catalan and Spanish was taken into account to guarantee an equal distribution across conditions of cognates between these two languages. Finally, in order to discard differences in the difficulty of the Basque words across the different experimental conditions, we obtained the Spanish frequency of the bigrams in the Basque words from B-Pal.

We conducted a 2 (concreteness) × 3 (emotional content) ANOVA on the above mentioned variables. The analysis revealed, as expected, a significant effect of concreteness on words' concreteness, *<sup>F</sup>*(1,42) <sup>=</sup> 290.80, MSE <sup>=</sup> 0.24, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.87 and imageability, *F*(1,42) = 333.95, MSE = 0.28, *p <* 0.001, η<sup>2</sup> = 0.89. That is, concrete words had higher values of concreteness (*M* = 5.83) and imageability (*M* = 6.0) than abstract words (*M* = 3.40 and *M* = 3.23 for concreteness and imageability, respectively). There was also a significant effect of emotional content on both valence, *F*(2,42) = 613.03, MSE = 0.20, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.97 and arousal, *<sup>F</sup>*(1,42) <sup>=</sup> 160.77, MSE <sup>=</sup> 0.22, *p <* 0.001, η<sup>2</sup> = 0.88. Pairwise Bonferroni comparisons showed that positive (*M* = 7.54), negative (*M* = 1.98), and neutral words (*M* = 4.67) were significantly different concerning valence (*p <* 0.001). In addition, positive and negative words were matched in extremity (i.e., the difference between the valence of emotional words and the average valence value of neutral words), *t*(30) = 0.93, *p* = 0.36. With respect to arousal, both positive (*M* = 6.50) and negative words (*M* = 6.74) were more arousing


TABLE 1 | Mean of the lexical characteristics of the experimental stimuli (standard deviation in parentheses).

*Concr, concreteness (from 1 to 7); Imag, imageability (from 1 to 7); Val, valence (from 1 to 7); Aro, arousal (from 1 to 7); Spanlength, mean number of letters for the Spanish words; LogFreq, mean log frequency per million words; Neighbors, mean number of substitution neighbors of the Spanish words; Spanbigfreq, mean bigram frequency of the Spanish words; Basqlength, mean number of letters of the Basque words; Basqbigfreq, mean bigram frequency of Spanish in the Basque words; OS SpaBas, orthographic similarity between the Basque words and their Spanish translations; OS CatBas, orthographic similarity between the Basque words and their Catalan translations; OS EngBas, orthographic similarity between the Basque words and their English translations and OS SpaCat, orthographic similarity between the Spanish words and their Catalan translations.*

than neutral words (*M* = 4.04, *p <* 0.001), although there was not any difference between positive and negative words in that variable (*p* = 0.16). Importantly, the ANOVA revealed that there was not any effect of either concreteness or emotional content on frequency, length, number of lexical neighbors, or mean bigram frequency of the Spanish words as well as of the Basque words in Spanish. Similarly, neither concreteness nor emotional content had any effect on the different measures of OS computed (all *F*s *<* 1.99).

We also constructed a set of 48 Basque pseudowords to be used in the LDT. These pseudowords were obtained from the Wuggy software (Keuleers and Brysbaert, 2010) and were matched to the Basque experimental words in length, mean bigram frequency, and number of lexical neighbors in Basque (data taken from E-Hitz, Perea et al., 2006).

## Procedure

The experimental procedure followed the ethical guidelines of the Faculty of Sciences of Education and Psychology of the University Rovira i Virgili. In addition, participants signed an informed consent before starting the experiment. Participants were individually tested in separate soundproof booths. The experiment consisted of two sessions. The first session began with a learning phase. During this phase, participants were presented with pairs of Basque words and their Spanish translations in six blocks of eight words. Five different random presentation orders were created, to which each participant was randomly assigned. Each block of eight pairs was presented three times using Microsoft-Powerpoint. The first time, each block was displayed visually during 2 min while participants also heard the pairs of Basque-Spanish translations. They were asked to study these pairs. During the second presentation, the same six blocks appeared. The presentation of each block was as follows: initially, only the first word of the eight pairs (i.e., the Basque word) was displayed, and participants were asked to try to think of their Spanish translation. After 45 s, the Spanish translations appeared together with the Basque words. They remained on the screen for 2 min and participants were asked to study again the pairs. Then, the following block was displayed. During the third presentation, the same six blocks (including the eight Basque-Spanish translation pairs) were presented during 1 min and participants were asked again to study them.

Immediately after the learning phase, participants performed a go-no go LDT. Participants were presented with the 48 Basque words mixed with 48 Basque pseudowords. They had to decide whether each sequence corresponded to a previously learned Basque word or not, by pressing the "yes" button of a keypad with the preferred hand. If they did not recognize the string as a Basque word, they had to refrain from responding. We chose this approach in order to make the task less demanding, since an advantage of the go/no-go procedure with respect to the standard LDT (i.e., faster and more accurate responses) has been reported in the literature (see Gómez et al., 2007). Presentation of the stimuli and recording of RTs and errors were controlled by using the DMDX software (Forster and Forster, 2003). On each trial, the sequence was as follows: first, a fixation point (i.e., "+") appeared in the middle of the screen for 500 ms. Immediately afterward,

the letter string was presented until participants responded or for a maximum of 2000 ms. The inter-trial interval (ISI) was 1000 ms.

When the LDT was finished, participants performed a translation task. They were presented with a sheet of paper containing the 48 Basque words (in a different order with respect to the acquisition) and they were given 10 min to try to produce as many Spanish translations as they could (i.e., backward translation). They were encouraged to guess. We used this direction of translation because forward direction (i.e., to produce the learned Basque words in response to the Spanish words) has been demonstrated to be more difficult for people that are at the initial stages of learning a foreign language (De Groot and Keijzer, 2000; De Groot, 2011). After finishing the translation task, participants were requested to come back to the laboratory the following week in order to continue with the experiment. They were not informed of the content of the second experimental session.

The second session was conducted 1 week after the first one. Participants came back to the laboratory and were administered with the same LDT and the same translation task in the same order as in the first session. When the experiment was finished, participants were thanked for their participation and debriefed.

# Results

## Lexical Decision Task

Incorrect responses and RTs lower than 200 ms were excluded from the latency analysis. There was no upper limit for RTs. Values falling more than 2 SD from the mean for a given participant in all conditions were also removed. As a result, 0.5% of the data was removed in the first session. In the second session, the percentage of rejected data was 0.3%.

The results are shown in **Table 2**. The analyses were restricted to the responses to Basque words (i.e., pseudowords were not considered in the analyses). We conducted separate ANOVAs by participants and by items on RT and on Accuracy (i.e., the percentage of Basque words correctly identified). The analyses included the factors Session (Session 1 vs. Session 2), Concreteness (concrete vs. abstract words) and Emotional content (positive, negative, and neutral words). All were within-participant factors in the analysis by participants. In the analysis by items, Session was a within-items factor and both Concreteness and Emotional content were between-items factors.

The ANOVA on RT only included correct responses. This analysis revealed a main effect of concreteness, significant only in the analysis by participants, *F*1(1,49) = 26.26, MSE = 16911.84, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.35, *<sup>F</sup>*2(1,42) <sup>=</sup> 2.64, MSE <sup>=</sup> 27971.19, *<sup>p</sup>* <sup>=</sup> 0.11, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.06, revealing that participants responded faster to concrete words (*M* = 977.33) than to abstract words (*M* = 1031.74). Emotional content also reached statistical significance in the by-participants analysis, *F*1(2,98) = 15.78, MSE = 18957.45, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.24, *<sup>F</sup>*2(2,42) <sup>=</sup> 1.81, MSE <sup>=</sup> 27971.19, *<sup>p</sup>* <sup>=</sup> 0.18, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.08. Bonferroni *post hoc* tests showed that participants took less time to identify negative words (*M* = 964.90) than both positive words (*M* = 1042.18, *p <* 0.001) and neutral


TABLE 2 | Results of the Lexical decision task -LDT- (mean of RTs and mean of the percentage of correctly identified Basque words-Accuracy) and the Translation task (mean of the percentage of correctly translated Basque words-Accuracy).

*Standard deviation is presented in parentheses.*

*Concr, concrete; Abst, abstract.*

words (*M* = 1006.52, *p <* 0.05). Furthermore, the difference between positive and neutral words approached significance (*p* = 0.07). Importantly, the interaction between concreteness and emotional content was also significant in the analysis by participants, *F*1(2,98) = 9.96, MSE = 24097.99, *p <* 0.001, η<sup>2</sup> = 0.17, *F*2(2,42) = 1.26, MSE = 27971.19, *p* = 0.29, η<sup>2</sup> = 0.06. This interaction revealed that the effect of the emotional content was restricted to abstract words, where negative words were responded faster (*M* = 952.36) than both positive (*M* = 1093.22, *p <* 0.001) and neutral words (*M* = 1049.64, *p <* 0.01), whereas there was not any difference between positive and neutral words (*p* = 0.37). Neither the effect of session nor the remaining interactions reached statistical significance (all *F*s *<* 1.96).

The ANOVA on Accuracy showed a significant effect of session, *<sup>F</sup>*1(1,49) <sup>=</sup> 6.56, MSE <sup>=</sup> 322.02, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.12, *<sup>F</sup>*2(1,42) <sup>=</sup> 19.71, MSE <sup>=</sup> 24.54, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.32, indicating that participants were more accurate in the first session (*M* = 74.60) than in the second session (*M* = 70.84). Concreteness was also significant in the analysis by participants, *<sup>F</sup>*1(1,49) <sup>=</sup> 18.12, MSE <sup>=</sup> 281.78, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.27, *<sup>F</sup>*2(1,42) <sup>=</sup> 2.23, MSE <sup>=</sup> 362.38, *<sup>p</sup>* <sup>=</sup> 0.14, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.56, as participants showed a higher accuracy with concrete words (*M* = 75.64) than with abstract words (*M* = 69.80). Emotional content reached significance too in the analysis by participants, *F*1(2,98) = 4.56, MSE = 280.96, *p <* 0.05, η<sup>2</sup> = 0.08, *<sup>F</sup>*2(2,42) <sup>=</sup> 0.66, MSE <sup>=</sup> 362.38, *<sup>p</sup>* <sup>=</sup> 0.52, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.03. *Post hoc* comparisons revealed that participants were more accurate with negative words (*M* = 75.50) than with neutral words (*M* = 70.55, *p <* 0.05), whereas there was not any difference in accuracy between positive (*M* = 72.11) and neutral words (*p* = 0.13). Finally, as in RT, the interaction between concreteness and emotional content was significant in the analysis by participants, *F*1(2,98) = 16.12, *p <* 0.001, MSE = 288.07, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.25, *<sup>F</sup>*2(2,42) <sup>=</sup> 2.11, MSE <sup>=</sup> 362.38, *<sup>p</sup>* <sup>=</sup> 0.13, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.09. This interaction indicated that the effect of the emotional content was again restricted to abstract words: participants were more accurate in responding to negative abstract words (*M* = 78.01) than to their positive (*M* = 67.12, *p <* 0.001) and neutral counterparts (*M* = 64.19, *p <* 0.001). There was not any difference between positive and neutral abstract words (*p* = 0.51). No other interaction reached statistical significance (all *F*s *<* 2.26).

### Translation Task

Response time data were not obtained in this task. The percentage of Basque words correctly translated to Spanish were collected (Accuracy, see **Table 2**). We conducted separate ANOVAs by participants and by items on that measure including the same factors as in the analyses of lexical decision data. The analyses revealed a main effect of session, *F*1(1,49) = 133.17, MSE <sup>=</sup> 512.63, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.73, *<sup>F</sup>*2(1,42) <sup>=</sup> 397.11, MSE <sup>=</sup> 7.20, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.90, as participants produced a higher percentage of correct translations in the first session (*M* = 46.29) than in the second session (*M* = 24.96). Furthermore, participants performed better with concrete words (*M* = 48.87) than with abstract words (*M* = 22.38), *<sup>F</sup>*1(1,49) <sup>=</sup> 309.19, MSE <sup>=</sup> 340.69, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.86, *F*2(1,42) = 29.27, MSE = 147.62, *p <* 0.001, η<sup>2</sup> = 0.41. The emotional content reached statistical significance in the analysis by participants, *F*1(2,98) = 11.38, MSE = 299.34, *p <* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.19, *<sup>F</sup>*2(2,42) <sup>=</sup> 0.92, MSE <sup>=</sup> 147.62, *<sup>p</sup>* <sup>=</sup> 0.41, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.04. Pairwise comparisons revealed that both positive (*M* = 35.75, *p <* 0.05) and negative words (*M* = 39.69, *p <* 0.001) were more accurately translated than neutral words (*M* = 31.44). The difference between positive and negative words approached significance (*p* = 0.07). The three simple interactions were significant too. The interaction between session and concreteness, *<sup>F</sup>*1(1,49) <sup>=</sup> 5.96, MSE <sup>=</sup> 190.49, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.11, *<sup>F</sup>*2(1,42) <sup>=</sup> 6.69, MSE <sup>=</sup> 7.20, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.14, showed that the size of the advantage for concrete words over abstract words was higher in the first session (*M* = 29.25) than in the second session (*M* = 23.75). Besides, it revealed that the decrease in performance between the first session and the second session was higher for concrete words (Mean decrease = 24.08) than for abstract words [Mean decrease = 18.58, *t*(49) = 2.44, *p <* 0.05]. The interaction between session and emotional content also reached statistical significance, *F*1(2,98) = 7.07, MSE = 86.85, *p <* 0.005, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.13, *<sup>F</sup>*2(2,42) <sup>=</sup> 3.41, MSE <sup>=</sup> 7.20, *<sup>p</sup> <sup>&</sup>lt;* 0.05, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.14. *Post hoc* comparisons showed that, although negative words (Mean performance in Session 1 = 52.37, Mean performance in Session 2 = 27.00) were better translated than neutral words (Mean performance in Session 1 = 41.00, Mean performance in Session 2 = 21.87) in both sessions (*p <* 0.001 and *p <* 0.05 for the first and second session, respectively), their advantage over positive words (Mean performance in Session 1 = 45.50, Mean performance in Session 2 = 26.00) was restricted to the first session (*p <* 0.01). Concerning positive words, although they failed to show a significant difference with respect to neutral words, they tended to show an advantage over them in both the first (*p* = 0.07) and the second session (*p* = 0.08). In addition, the interaction between concreteness and emotional content, significant in the analysis by participants, *F*1(2,98) = 23.49, MSE <sup>=</sup> 237.10, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.32, *<sup>F</sup>*2(2,42) <sup>=</sup> 1.58, MSE <sup>=</sup> 147.62, *<sup>p</sup>* <sup>=</sup> 0.22, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.07, showed that the enhancing effect of the emotional content was restricted to abstract words. Thus, negative abstract words (*M* = 32.50) were better translated than both their positive (*M* = 20.00, *p <* 0.001) and neutral counterparts (*M* = 14.62, *p <* 0.001). Positive abstract words were also better translated than neutral abstract words (*p <* 0.05). Finally, the triple interaction between session, concreteness and emotional content was also significant, *F*1(2,98) = 8.37, MSE <sup>=</sup> 85.79, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.15, *<sup>F</sup>*2(2,42) <sup>=</sup> 3.69, MSE <sup>=</sup> 7.20, *p <* 0.05, η<sup>2</sup> = 0.15. *Post hoc* analyses revealed that, during the first session, the facilitatory effect of emotional content was observed only in abstract words. In particular, negative abstract words were better translated than both their neutral (*p <* 0.001) and positive counterparts (*p <* 0.001). Furthermore, although there was a trend for positive abstract words to be better translated than neutral ones, this difference did not reach statistical significance (*p* = 0.09). Concerning the second session, the pattern was very similar, with an effect of the emotional content restricted to abstract words. Pairwise comparisons revealed that negative abstract words were better translated than both positive (*p <* 0.05) and neutral words (*p <* 0.001). Likewise, positive abstract words were better translated than neutral ones (*p <* 0.05).

To further explore the differences between sessions, we computed the magnitude of the emotional effect in abstract words (i.e., we subtracted the percentage of neutral words correctly translated to the percentage of emotional words correctly translated). Concerning positive abstract words, the magnitude of the effect was very similar in both sessions (*M* = 5.50 and *M* = 5.25 for the first and second session, respectively). Conversely, negative abstract words showed a significant decrease in the magnitude of the emotional effect between the first (*M* = 24.25) and the second session (*M* = 12.50), *t*1(49) = 4.88, *p <* 0.001, *t*2(7) = 1.96, *p* = 0.09. Similarly, the difference between positive and negative abstract words correctly translated was larger in the first session (*M* = 18.75) than in the second one (*M* = 6.25), *t*1(49) = 4.81, *p <* 0.001, *t*2(7) = 4.28, *p <* 0.005.

Finally, in order to ascertain whether participants' accuracy was affected by the type of task, we compared the percentage of correct responses in both tasks. We conducted an ANOVA including "Task" and "Session" as factors. This analysis revealed a main effect of "Task," *F*1(1,49) = 157.31, MSE = 437.41, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.76, *<sup>F</sup>*2(1,47) <sup>=</sup> 594.53, MSE <sup>=</sup> 234.64, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.93 as well as of "Session," *<sup>F</sup>*1(1,49) <sup>=</sup> 108.48, MSE <sup>=</sup> 72.52, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.69, *<sup>F</sup>*2(1,47) <sup>=</sup> 200.43, MSE <sup>=</sup> 14.21, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.81. These results showed that accuracy was higher in the LDT (*M* = 72.72) than in the Translation task (*M* = 35.62). Furthermore, there was a better performance during the first session (*M* = 60.44) than during the second session (*M* = 47.90).

The interaction between both factors also reached statistical significance, *<sup>F</sup>*1(1,49) <sup>=</sup> 58.06, MSE <sup>=</sup> 66.59, *<sup>p</sup> <sup>&</sup>lt;* 0.001, <sup>η</sup><sup>2</sup> <sup>=</sup> 0.54, *F*2(1,47) = 24.79, MSE = 19.99, *p <* 0.001, η<sup>2</sup> = 0.34. *Post hoc* comparisons revealed that the decrease in accuracy when comparing the first session and the second session was larger for the translation task (*M* = 21.33) than for the LDT task (*M* = 3.75), *t*1(49) = 7.61, *p <* 0.001, *t*2(47) = 4.98, *p <* 0.001.

# Discussion

The aim of the present study was to test whether the affective content of words has a facilitatory effect during the early stages of a foreign language acquisition. We also investigated whether this effect was modulated by words' concreteness. To do that, participants were trained on a set of Basque words by associating them to their Spanish translations. Immediately after the learning phase and 1 week later, they were tested in a go-no go LDT and in a backward translation task. Although performance, in terms of accuracy, was higher in the LDT than in the translation task, the pattern of results was very similar across tasks as well as over time: apart from a main effect of concreteness and emotional content, there was an interaction between both factors, indicating that the emotional content facilitated the acquisition of abstract rather than concrete words in a new language. Overall, the effect was more reliable for negative words than for positive words. Finally, even though there was a significant loss of information over time, which was higher for the translation task than for the LDT, the pattern of findings was very similar in both sessions.

This work adds to the literature in novel-word training showing that some word types are more easily acquired than others. Thus, apart from cognate status (De Groot and Keijzer, 2000; Tonzar et al., 2009; Comesaña et al., 2012) and concreteness (De Groot and Keijzer, 2000; Altarriba and Basnight-Brown, 2011a; Kaushanskaya and Rechtzigel, 2012; Palmer et al., 2013, and the present study), it is apparent that emotional content facilitates word acquisition during the early stages of foreign language learning. In this field of research, this is the first time that a facilitatory effect of emotional content is reported. Importantly, the effect is restricted to abstract items, suggesting that the words that are harder to learn are those obtaining more benefits from their affective content. Although these results were only significant in the participant analyses, we would like to note that they are consistent with the proposal of Kousta et al. (2011), according to which the emotional content would play a more relevant role in the representation and processing of abstract words than in the representation and processing of concrete words.

Further studies should be conducted to establish the extent to which this pattern of findings can be generalized to other sets of items, especially because this facilitatory effect for emotional words was not observed in the study of Altarriba and Basnight-Brown (2011a). In that work, the authors failed to obtain either the emotional Stroop effect or an advantage in the translation recognition task for recently acquired Spanish emotion words. Although the reasons for this discrepancy are unclear, it is worth mentioning that the different methodological approaches limit the comparison between those two studies. On the one hand, unlike the present study, Altarriba and Basnight-Brown (2011a) did not orthogonally manipulate concreteness and emotional content. On the other, these authors compared concrete and abstract words to emotion words (i.e., words labeling an emotion, -scared), whereas we mostly used emotionladen words (i.e., words that do not refer directly to emotions but have emotional connotations, for example *success*). Some researchers have suggested that these two word types can be processed differently (e.g., Pavlenko, 2008), although the few studies which have addressed this distinction have only found slight differences between them (e.g., Altarriba and Basnight-Brown, 2011b). Thus, further research is needed in which the contribution of concreteness and emotional content to the initial stages of word acquisition is evaluated separately for emotion and emotion-laden words.

As stated above, our findings support the proposal of Kousta et al. (2011). These authors argued that whereas sensorymotor information would be central to the representation and processing of concrete words, affective information would be more relevant in the representation of abstract words. In fact, it would be through emotionality that abstract words would become embodied, as one of the functions of emotion is to initiate approach and avoidance behavior (e.g., Lang, 1995). Thus, one might expect strong sensory-motor effects for abstract words as well, if effects are mediated by emotion, as indicated for instance by the startle reflex (see Herbert and Kissler, 2010; Herbert et al., 2011 for evidence of modulation of the startle reflex by the emotional content of words)1 . Importantly, as Sheikh and Titone (2013) have recently pointed out, a prediction from the embodied approach to semantic representation of Kousta et al. (2011) is that the contribution of emotionality to word processing would be more likely in conditions in which there are no other sources of information (i.e., sensorimotor) that can facilitate processing. This is the case for abstract words.

During the last years, several lexical decision studies have addressed the possible modulation of emotional effects by concreteness in the native language of adult speakers during word processing. This research has yielded mixed findings. Thus, whereas Kanske and Kotz (2007) failed to find any interaction between emotional content and concreteness, Palazova et al. (2013) did obtain it, although in this case emotional abstract words showed a disadvantage, rather than an advantage, with respect to their neutral counterparts. These inconsistencies might be explained by methodological differences. On the one hand, whereas Palazova et al. (2013) used a standard LDT, Kanske and Kotz (2007) employed a visual hemifield LDT. On the other hand, whereas the experimental stimuli of Kanske and Kotz (2007) were nouns, Palazova et al. (2013) tested verbs and it is well-known that nouns and verbs are differently processed (see for instance, Van Assche et al., 2013). Whatever the reason for the discrepancies, what is relevant is that the above studies failed to support the proposal of Kousta et al. (2011).

Kousta et al. (2011) also posited that the emotional content of words would play a role during their acquisition, facilitating the acquisition of abstract concepts. Our results suggest that this facilitation is also observed when people learn vocabulary in a new language. The modulation of the effects of the emotional content of words by their concreteness in the direction predicted by Kousta et al. (2011) found in this work contrasts with the results of the above discussed studies (Kanske and Kotz, 2007; Palazova et al., 2013). It might be that this modulation is not apparent when vocabulary is firmly established. Rather, it would be more easily observed during (first or foreign) language acquisition. During this process, emotional content might help to acquire those words that are harder to learn, namely abstract words.

Concerning the mechanism involved in the beneficial effects of emotional content, we should take into consideration that acquiring foreign words by means of a paired associate learning involves the assignment of a new name to an already existing concept. The results of this study and of past research suggest that there is something about the representation of some types of L1 words that makes it easier to attach a new name onto it (see De Groot, 2011). This would be the case of concrete words with respect to abstract words. Different proposals have been made with respect to these representational differences. For instance, according to the dual-code theory (Paivio, 1986), semantic representations are richer for concrete words than for abstract words, because only the former are represented in a non-verbal system besides being represented in a verbal system. Alternatively, Schwanenflugel and Shoben (1983) considered that concrete words have stronger and denser associations to contextual knowledge than abstract words. Finally, from a connectionist point of view, Plaut and Shallice (1993) argued that the representation of concrete words is supported by a higher number of units or semantic features than the representation of abstract words. All these proposals assume that the representation of concrete words is richer than that of abstract words and this is the cause of the disadvantage for the latter in processing. The emotional content, whatever the mechanism involved – affectively valenced words having higher semantic richness (e.g., Hofmann et al., 2009), capturing more attentional resources (Pratto and John, 1991), being prioritized in the process of binding to the context (Mackay et al., 2004), or being more elaborately processed during encoding (Sharot and Phelps, 2004), would reduce the disadvantage for abstract words in the process of labeling an existing concept with a novel word.

A final relevant result of the present work refers to valence effects. We obtained a more robust emotional advantage for negative abstract words than for their positive counterparts. Indeed, in the LDT only negative but not positive abstract words did show an advantage in processing. In the translation task, although both types of words were better translated than abstract neutral words, the advantage was higher for negative words than for positive ones. In our opinion, a possible explanation for the superiority of negative words is related to their adaptive function. According to Pratto and John (1991), negative stimuli contain more survive-relevant information than positive stimuli. For that reason, they would be preferentially attended (i.e., the so-called negativity bias) and, as a consequence, better remembered than positive and neutral words. Regarding the novel-word learning

<sup>1</sup>We thank Dr. Herbert for this suggestion.

paradigm applied in this research, the preferential attention to negative words during the learning phase might have facilitated the formation of links between the Basque words and both their translation equivalents and existing concepts. We would like to note that the grammatical class of the stimuli in the present study might have contributed also to this negativity bias. Most of the experimental words were names. If we had used adjectives instead of nouns, a different pattern of findings might have been observed. Indeed, a positivity advantage is often reported for lists consisting of adjectives (Herbert et al., 2008), a possible reason for this advantage being the high selfrelevance of adjectives. Regardless of the cause of the superiority for negative words, it is worth noting that the results obtained in the translation task suggest that this advantage is particularly strong in the first session, in which the percentage of correctly translated negative abstract words almost duplicated that of neutral and positive words. This superiority is observed in both sessions, but its magnitude is reduced over time. In contrast, the advantage of positive abstract words over neutral words, although slower, is more stable across sessions. This result is in line with other studies in the emotional memory literature demonstrating that the superiority for negative words in immediate memory tests decreases after a delay of several hours or days, whereas this decrease is much less pronounced for positive words (e.g., Toyama et al., 2014). Our findings, as those of Toyama et al. (2014), might be accounted for by the so-called fading bias effect, according to which the intensity of the negative affect produced by any event fades faster than that of positive emotion (e.g., Walker and Skowronski, 2009). This phenomenon has

# References


been described in the field of autobiographical memory and has only recently been applied to the study of emotional verbal information (Toyama et al., 2014).

To conclude, the present study shows that emotional content facilitates the acquisition of new vocabulary and that this effect is modulated by concreteness as well as by word valence. Our findings suggest that emotional content is a relevant variable to consider in novel-word learning studies. Furthermore, our results can have a practical application, as they demonstrate that foreign abstract neutral words (i.e., those lacking affective charge) are the hardest to learn. Therefore, teaching strategies should be directed to improve the acquisition of those words in foreign language training programs.

# Acknowledgments

This research was funded by the Spanish Ministry of Economy and Competitiveness (PSI2012-37623 and PSI2012-32834). Besides, it was funded by FCT (Foundation for Science and Technology) through the state budget, with reference IF/00784/2013/CP1158/CT0013.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fpsyg*.* 2015*.*00976


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Ferré, Ventura, Comesaña and Fraga. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Emotionality differences between a native and foreign language: theoretical implications

## *Catherine L. Caldwell-Harris\**

*Psycholinguistics Laboratory, Department of Psychological and Brain Sciences, Boston University, Boston, MA, USA \*Correspondence: charris@bu.edu*

#### *Edited by:*

*Cornelia Herbert, University of Tübingen, Germany*

#### *Reviewed by:*

*Debbie L. Mills, Bangor University, UK*

**Keywords: bilingualism, emotion, context dependence, embodiment, language acquisition**

The topic editors, Cornelia Herbert and colleagues, have noted that language has historically been assumed to be independent from emotions. The historical backdrop to this is the long reign of faculty psychology, which viewed the human mind as composed of discrete abilities (see discussion in Barrett, 2013). The mental modularity popularized by Chomsky (1965) and Fodor (1983) continued this view following the cognitive revolution of mid-century. Emotion had no role in information processing psychology, leading to its neglect in the cognitive sciences (Cromwell and Panksepp, 2011). Indeed, the classic emotion-cognition divide has been criticized in the past decades by theorists who are otherwise not natural allies (e.g., Damasio, 1994; Cromwell and Panksepp, 2011; Lindquist, 2013). An alternative to faculty psychology is psychological construction (Lindquist, 2013). On this view, mental abilities and mental states like emotions are constructed from the dynamic interaction of physiological states, situation-specific information, and conceptual knowledge.

In the modular view of mind, emotion and language should have little overlap in their processes and representations. However, according to psychological constructivism, an emotional reaction can be influenced by any aspect of the ongoing situation, such as the language being spoken, which is the topic of this commentary.

I describe here findings on the emotionality differences between a native and a foreign language. Bilingual speakers <sup>1</sup> frequently report that swearing, praying, lying, and saying *I love you* feel differently when using a native rather than a foreign language (see, e.g., Pavlenko, 2005; Dewaele, 2010). My goal is to highlight the relevance of this body of work for the theoretical assumptions regarding language-emotion independence.

### **WHEN AND WHY IS A FIRST LANGUAGE MORE EMOTIONAL?**

An emotionality advantage for native languages has been documented using diverse techniques, as recently discussed in a comprehensive review paper (Pavlenko, 2012). For example, in a European study using a variety of L1-L2 pairings, advertising slogans were judged to be more emotional when the messages were written in the native language rather than respondents' L2 (Puntoni et al., 2009). Anooshian and Hertel (1994) found emotion-memory effects for L1 but not L2 words, among Spanish-English bilinguals.

Reduced emotionality in the L2 has also been found in studies that use emotion words to interfere with processing. Colbeck and Bowers (2012) compared emotion word processing in native Chinese speakers and native English speakers using an English attentionalblink task. The native English speakers showed a strong blink following a taboo distractor, while Chinese speakers of English as a second language showed a blink that was reduced in size, consistent with being able to more easily ignore the taboo distractor. Other examples of improved performance because of reduced L2 emotionality have been found using decision making tasks. In two studies by different research teams, bilingual speakers made slightly more rational decisions when evaluating vignettes written in a foreign language (Keysar et al., 2012; Costa et al., 2014; see also findings about moral dilemmas, Costa et al., 2014).

Laboratory studies measuring skin conductance amplitudes have corroborated these findings (Harris et al., 2003). An important qualification was obtained by studying early, sequential bilinguals, who learned Spanish first from their parents and English second from peers and schooling in American society (Harris et al., 2006). For these bilinguals, their first language was not their most proficient language. They had similar electrodermal responses for emotional phrases in their two languages. One implication (which needs additional empirical support) is that both early age of acquisition and high proficiency are required to show an emotionality advantage. That is, if only age of acquisition were sufficient to show heightened electodermal responses, then the heritage language learners should have shown stronger emotions to Spanish phrases. If only proficiency mattered, then this

<sup>1</sup>To be as inclusive as possible, I follow the common practice of identifying bilingualism as either having good proficiency in more than one language, or of regularly using more regularly using more than one language, regardless of proficiency. The first language (L1) is defined to be the chronologically first acquired language, with "second language" meaning a language acquired after the L1 (see Dewaele, 2010). A foreign language is a language acquired primarily via classroom learning, and not a language spoken in the learners' community.

group should have shown stronger emotionality responses to L2-English. A comparison group of bilinguals for whom L1-Spanish was both the first learned and most proficient language revealed higher skin conductance responses for childhood reprimands in L1 than in L2. This suggests that L1/L2 emotionality differences are strongest when L1 is the native language and L2 is a less proficient, foreign language.

In addition to early age of acquisition and high proficiency, emotional resonances are stronger when language is learned via immersion, rather than from classroom learning (Dewaele, 2010). Another important factor is high usage frequency (Degner et al., 2011). In the broader literature on L1/L2 effects, these four factors are linked in reciprocal, causal relationships, and indeed, are important for determining individual differences in bilingual experiences and abilities. Early age of acquisition typically results in high proficiency; high proficiency usually leads to frequent use. Frequency of use improves proficiency; immersive learning leads to higher frequency of use and better proficiency.

Note that there have been inconsistences in laboratory tasks of L1/L2 emotionality differences. Several studies have failed to replicate Anooshian and Hertel's emotion-memory effects, with Ferré et al. (2010) reporting no recall effects as a function of L1/L2 status (see also Ayçiçegi-Dinn and Caldwell-Harris, 2009). Similar interference was found for L1 and L2 on an emotional Stroop task (Eilola et al., 2007). When Eilola and Havelka (2011) recorded skin conductance during a taboo Stroop task, they found similar interference effects of the taboo words in L1/L2, but L1 taboo words nevertheless elicited larger autonomic responses than did L2 taboo words.

#### **CAUSES: WHY ARE EMOTIONAL RESONANCES STRONGEST WHEN A LANGUAGE IS ACQUIRED EARLY AND LEARNED TO HIGH PROFICIENCY?**

Intuitively, it makes sense that a language learned in childhood will carry strong emotional resonances. The family context of learning means that everyday language carries the full range of human emotions. A mechanism for connecting the physical experience of emotion with specific phrases and words is amygdala-mediated learning. Early language develops at the same time as emotional regulation systems (Bloom and Beckwith, 1989). It is thus plausible that utterances that are learned early become tightly connected with the brain's emotional system. However, second languages can also come to feel emotional, if they are used frequently and are learned via immersion rather than in the classroom (Dewaele, 2010; Degner et al., 2011). This is why I proposed that the primary causal factor is the context in which a language is learned and used (Harris et al., 2006). Words and phrases come to have a distinctive emotional feel by virtue of being learned, or habitually used, in a specific emotional context.

My theoretical proposal is that using a language in emotional contexts provides it with emotional resonances because human experiences are learned and stored in a context-dependent manner. This view is consistent with episodic trace theories of memory (Hintzman, 1986), encoding specificity (Tulving and Thompson, 1973), language-specific autobiographic recall of memory (Marian and Kaushanskaya, 2004, 2008), and psychological constructivism broadly construed (Lindquist, 2013). With context-dependent learning, distributional analysis sorts out, via exposure to many examples, which aspects of the overall meaning most frequently cooccur with specific words and phrases (e.g., McClelland et al., 1986). An alternative view is that frequency of use is what matters rather than contexts of use (e.g., Puntoni et al., 2009; Degner et al., 2011). I suspect the frequency view and the contexts of learning view are actually highly similar perspectives and make different predictions only in rare cases. My reasoning is that frequent usage entails emotional usage. Human social lives, which are mediated by communication, are highly emotional. If there are situations of frequent use of an L2 in low-emotion environment, then my theory predicts that these L2 users will experience their L2 as low in emotional resonances.

One of the strengths of the "emotional contexts of learning" hypothesis is that it accommodates the idiosyncratic learning histories of individual speakers. In a group study on emotional word processing, a word such as *snake* will elicit different emotional reactions depending on individuals' personalities, experiences with snakes and cultural backgrounds. An implication is that we can have two (complementary) ways of studying L1/L2 emotionality effects. We can take average responses across a group of bilingual speakers, by examining language that most people find emotional, such as parents scolding children (childhood reprimands), peers insulting each other (insults), or people expressing love, praise, appreciation (endearments). When my colleagues and I used these categories of emotional expressions, we thus studied common situations where these phrases are learned and used (Harris et al., 2006). But in these studies, individual experiences that deviate from group trends are ignored and treated as noise.

A second method is to interview people about their idiosyncratic experiences. What specific phrases did your parents say to you? How did authority figures speak to you as a young adult? What did a romantic partner tell you that you appreciated? The prediction is that emotionality will be greater for the language that was used and/or learned in these situations. Although this interview technique has not yet been used, immigration narratives revealed that emotional language varies with individual experiences (Marian and Kaushanskaya, 2008).

#### **THEORETICAL IMPLICATION: VOCABULARY AND GRAMMAR ARE NOT CONTEXT-INDEPENDENT**

To move beyond behaviorists' focus on imitation as the main route to learning, Chomsky (1965) and theorists of the mid-20th century emphasized that linguistic expressions are primarily a result of applying abstract rules. They characterized language as a parsimonious symbol system, a type of mental algebra. The language learner had to strip away words' context to construct context-independent vocabulary. Learners must ignore extra-linguistic aspects of sentences in order to construct an abstract grammar.

The Chomskyan theoretical view dovetailed with the intuition that many people have, which is that words are containers for meanings. Reddy (1979) has labeled this the conduit metaphor, referring to the belief that language, phrases and sentences are the containers for speakers' meanings and thoughts. These containers are then sent to conversation partners, who extract and thus possess the meaning. Examples provided by Reddy include the common request to "put your feelings into words."

An inference from the conduit metaphor is that, as long as two phrases are translation equivalents, they should deliver the same meaning. However, "same meaning" is itself open to interpretation. Consider the case where an English native speaker has learned French in a classroom context. When hearing *Je t'aime*, the phrase doesn't deliver the same emotional punch as *I love you* (as documented by Dewaele's study of multilingual speakers' report of I love you expressions; see also Caldwell-Harris et al., 2013). If "extracting and possessing the meaning" includes the totality of mental states that form as a reaction to hearing a phrase, then the I love you examples (and other emotionality effects) falsify the conduit metaphor. However, the Chomskyan tradition has generally advocated a narrow view of meaning, confining it to the sense of words, not their richer connotations. If the meaning of words is confined to what is involved in identifying translation equivalents, then the conduit metaphor can be preserved.

One reason to retain the conduit metaphor (and the narrow definition of meaning) is if the conduit metaphor is the only way we have of understanding how symbols convey information. But other conceptions are present in the research literature and in everyday use. Reddy's (1979) description of how language actually works to provide meaning is called the toolmaker paradigm. Words and phrases are not containers of meaning, but clues that hearers' use to infer speakers' communicative intent. On this view, Je t'aime doesn't deliver the same emotional punch to the classroom French learner as *I love you*, because the phrase isn't a container for the feeling expressed by *I love you*. It's a tool speakers use to guide hearers to an interpretation. In the case of foreign language learners, L2 phrases are imperfect tools for activating the meanings that would automatically be elicited by the same phrase in a native language.

An advantage of discussing the relevance of emotionality differences to the conduit metaphor is that the conduit metaphor and objections to it are a bit abstract. L1-L2 emotionality differences lend concreteness to Reddy's classic critique of the container metaphor.

These arguments in turn have their theoretical implications, including how context is represented. A second theoretical implication of L1/L2 emotionality effects is that words and phrases gain meaning from sensorimotor and emotional embodiment. Both of these are discussed further in Caldwell-Harris (2014).

## **CONCLUSIONS**

Beyond the theoretical implications, understanding L1/L2 emotionality effects is important for bilinguals who may wonder why these effects exist, or may wonder why these effect don't exist for them. Emotionality effects are relevant for monolingual speakers whenever they interact with bilinguals who are using the language that for them is later-learned or less-proficient. And finally, they are important because they challenge us to confirm, refute, or extend our theories about the relationship between language and emotion.

# **REFERENCES**


differences in emotionality of autobiographical memories. *Ment. Lex.* 3, 72–90. doi: 10.1075/ml.3.1.06mar


intensity of advertising language. *J. Consum. Res.* 35, 1012–1025. doi: 10.1086/595022


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 April 2014; accepted: 03 September 2014; published online: 23 September 2014.*

*Citation: Caldwell-Harris CL (2014) Emotionality differences between a native and foreign language: theoretical implications. Front. Psychol. 5:1055. doi: 10.3389/ fpsyg.2014.01055*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology.*

*Copyright © 2014 Caldwell-Harris. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Erratum on: Emotionality differences between a native and foreign language: theoretical implications

Frontiers Production Office\*

Frontiers Production Office, Frontiers, Switzerland

Keywords: bilingualism, emotion, context dependence, embodiment, language acquisition

# **An erratum on**

**Emotionality differences between a native and foreign language: theoretical implications** by Caldwell-Harris, C. L. (2014). Front. Psychol. 5:1055. doi: 10.3389/fpsyg.2014.01055

Reason for Erratum: Recognition of an additional reviewer.

Due to an oversight, the following reviewer Lowri Mair Hadden (Bangor University, UK), who jointly reviewed the paper with her supervisor Debbie L. Mills (Bangor University, UK), was not acknowledged on the publication. This error does not change the scientific conclusions of the article in any way. The publisher apologizes for this mistake.

**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Frontiers Production Office. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Approved by: Psychology Editorial Office, Frontiers, Switzerland

\*Correspondence:

Frontiers Production Office, production.office@frontiersin.org

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

> Received: 27 March 2015 Accepted: 27 March 2015 Published: 02 April 2015

#### Citation:

Frontiers Production Office (2015) Erratum on: Emotionality differences between a native and foreign language: theoretical implications. Front. Psychol. 6:437. doi: 10.3389/fpsyg.2015.00437

# On the Relation between the General Affective Meaning and the Basic Sublexical, Lexical, and Inter-lexical Features of Poetic Texts—A Case Study Using 57 Poems of H. M. Enzensberger

#### Susann Ullrich1, 2 \*, Arash Aryani 1, 2, Maria Kraxenberger 1, 3, Arthur M. Jacobs 1, 2, 4 and Markus Conrad1, 5

<sup>1</sup> Languages of Emotion, Freie Universität Berlin, Berlin, Germany, <sup>2</sup> Department of Experimental and Neurocognitive Psychology, Freie Universität Berlin, Berlin, Germany, <sup>3</sup> Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany, <sup>4</sup> Dahlem Institute for Neuroimaging of Emotion, Freie Universität Berlin, Berlin, Germany, <sup>5</sup> Department of Cognitive, Social and Organizational Psychology, Universidad de La Laguna, San Cristóbal de La Laguna, Spain

#### Edited by:

Cornelia Herbert, University of Ulm, Germany

#### Reviewed by:

Erich David Jarvis, Duke University, USA Eric C. Fields, Tufts University, USA

Constantina Theofanopoulou contributed to the review of Erich David Jarvis

> \*Correspondence: Susann Ullrich susann\_ullrich@msn.com

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 27 February 2016 Accepted: 22 December 2016 Published: 11 January 2017

#### Citation:

Ullrich S, Aryani A, Kraxenberger M, Jacobs AM and Conrad M (2017) On the Relation between the General Affective Meaning and the Basic Sublexical, Lexical, and Inter-lexical Features of Poetic Texts—A Case Study Using 57 Poems of H. M. Enzensberger. Front. Psychol. 7:2073. doi: 10.3389/fpsyg.2016.02073 The literary genre of poetry is inherently related to the expression and elicitation of emotion via both content and form. To explore the nature of this affective impact at an extremely basic textual level, we collected ratings on eight different general affective meaning scales—valence, arousal, friendliness, sadness, spitefulness, poeticity, onomatopoeia, and liking—for 57 German poems ("die verteidigung der wölfe") which the contemporary author H. M. Enzensberger had labeled as either "friendly," "sad," or "spiteful." Following Jakobson's (1960) view on the vivid interplay of hierarchical text levels, we used multiple regression analyses to explore the specific influences of affective features from three different text levels (sublexical, lexical, and inter-lexical) on the perceived general affective meaning of the poems using three types of predictors: (1) Lexical predictor variables capturing the mean valence and arousal potential of words; (2) Inter-lexical predictors quantifying peaks, ranges, and dynamic changes within the lexical affective content; (3) Sublexical measures of basic affective tone according to sound-meaning correspondences at the sublexical level (see Aryani et al., 2016). We find the lexical predictors to account for a major amount of up to 50% of the variance in affective ratings. Moreover, inter-lexical and sublexical predictors account for a large portion of additional variance in the perceived general affective meaning. Together, the affective properties of all used textual features account for 43–70% of the variance in the affective ratings and still for 23–48% of the variance in the more abstract aesthetic ratings. In sum, our approach represents a novel method that successfully relates a prominent part of variance in perceived general affective meaning in this corpus of German poems to quantitative estimates of affective properties of textual components at the sublexical, lexical, and inter-lexical level.

Keywords: general affective meaning, interlexical measures, phonological iconicity, EMOPHON, basic affective tone, neurocognitive poetics, Enzensberger

# INTRODUCTION

Emotional impact constitutes an important aspect of poetry (Turner and Poeppel, 1983; Cupchik, 1994; van Peer et al., 2007; Schrott and Jacobs, 2011)—people read poems to be amused, pleased, or emotionally and aesthetically moved (Jacobs, 2015a). The underlying affective and aesthetic processes of reading are just beginning to be tackled by research on literature reception. On the one hand, there is a tradition of explaining aesthetic sensation to literature and other works of art by foregrounding effects as deviations from a normative background (van Peer, 1986; Miall and Kuiken, 1994)—focusing mainly on structural and stylistic properties of poetry. On the other hand, the Neurocognitive Poetics Model (NCPM) of Literary Reading (Jacobs, 2011, 2014, 2015a,b) postulates that background elements facilitate emotional involvement in general, for example via mood empathy (Lüdtke et al., 2014), while foregrounding features promote aesthetic evaluation (Jacobs et al., 2013).

In this study, we will focus on the role of basic textual features in affective poetry reception—investigating how sublexical, lexical, and inter-lexical affective features determine the perception of the general affective meaning of a poem.

Narrowing, thus, our research focus on these basic textual elements, our approach deliberately leaves aside—or aims beyond—the influence of higher level variables as, e.g., context information, rhetorical features, or familiarity and comprehensibility of literary texts, as well as the interaction of these higher level variables, on the overall affective perception of art (e.g., Leder et al., 2012; Bohrn et al., 2013; Menninghaus et al., 2015).

The general affective meaning of a text probably closely relates to global affective appraisals of the reader concerning the overall theme and impression of a text (see also Aryani et al., 2016). In their most basic form, such appraisals should involve the core dimensions of affect: valence and arousal (Wundt, 1896; Russell, 1978, 2003; Watson and Tellegen, 1985; Bradley et al., 1992). But they can also be captured on more discrete affective/emotional scales, or even for higher-order cognitive-aesthetic concepts, using respective rating scales (see Methods Section for details). Especially in poetry, aesthetic emotions triggered by textual features and style might crucially add to the overall affective impact—besides immersive emotions arising from the plot.

A theoretical guideline for our approach to explain variance in the perceived general affective meaning of literary works by basic textual elements from different hierarchical text levels is provided by Jakobson's postulations about the "Framework of language" (Jakobson, 1980a): "Each level above [that of language sounds] brings new particularities of meaning: they change substantially by climbing the ladder which leads from the phoneme to the morpheme and from there to words (with all their grammatical and lexical hierarchy), then go through various levels of syntactic structures to the sentence [...]. Each one of these successive stages is characterized by its clear and specific properties and by its degree of submission to the rules of the code and to the requirements of the context. At the same time, each part participates, to the extent possible, in the meaning of the whole" (1980a, p. 20). Recent brain imaging research reveals close matches between this hierarchy of linguistic structures and the respective hierarchies of brain processes during language processing (Ding et al., 2016).

Although the hierarchical processing of language applies to everyday speech or prose as well, this study focuses on poetry because this genre intertwines content and form in most intimate ways—or, like Jakobson put describing the general "poetic function" of language: "The message focuses on the message for its own sake" (Jakobson, 1960, 1980a).

In the following paragraphs we will introduce empirical evidence for how the affective impact of texts can depend on specific lexical, inter-lexical, and sublexical levels of processing. In the empirical part of this study we will then operationalize affective properties at these three different levels and statistically examine their relation with the perceived general affective meaning of poems from a corpus of the German author Hans Magnus Enzensberger.

# Lexical Effects on General Affective Meaning

Lexical affective meaning has been shown to be of reliable predictive potential for the affective perception of different types of texts (Anderson and McMaster, 1982; Whissell et al., 1986; Bestgen, 1994; Whissell, 1994; Hsu et al., 2015a). The importance of lexical affective meaning is increasingly stressed in sentiment analyses of online social media texts (Thelwall et al., 2010; Paltoglou, 2014). Valence and arousal ratings of words are most often employed for lexical affective analyses, as most of the variance in a word's meaning on a variety of scales can be accounted for by these two largely independent factors, as has been shown via semantic differential techniques (Osgood and Suci, 1955; Osgood et al., 1957, 1975). Furthermore, valence and arousal are also the core dimensions around which several well-established two-dimensional emotion and affect theories are built (e.g., Wundt, 1896; Bradley et al., 1992). Hence, largescale affective word databases have been gathered to provide normative affective ratings for several thousand words from a given language (English: e.g., ANEW: Bradley and Lang, 1999; DAL: Sweeney and Whissell, 1984; Whissell, 2009; German: e.g., BAWL: Võ et al., 2006, 2009; Jacobs et al., 2015; ANGST: Schmidtke et al., 2014a; also see Schauenburg et al., 2014). For examples of usages, see, for instance, Kuchinke et al. (2005); Hofmann et al. (2009); Scott et al. (2009); Conrad et al. (2011); Palazova et al. (2011); Hsu et al. (2014, 2015a,b,c); Recio et al. (2014).

However, the general affective meaning is, most probably, more than just a direct function of lexical affective values in the text since the processing of affective words is expected to interact with the surrounding sentence context.

# Inter-Lexical Effects on General Affective Meaning

Lüdtke and Jacobs (2015) show that the succession of two words of similar valence in a sentence can lead to priming effects, with shorter sentence verification times in the case of affectively compatible words—specifically for positive words. In this vein, one might ask what effect a continuous rise or fall of affective lexical values throughout a poem could have on the affective perception by the reader. Furthermore, affective connotations of a single word can dominate the general affective meaning of a whole sentence—especially in the case of negative adjectives, which have been shown to exert a negativity bias (Liu et al., 2013; Lüdtke and Jacobs, 2015). Yet, it remains an open question whether one single word with an extreme affective value could even dominate the affective perception of a whole text paragraph or poem. Moreover, the span width between the two most extreme lexical affective values might also be of relevance for the general affective meaning: For example, the arousal span has been shown to account for about 25% of the variance in suspense ratings in the story "The Sandman" by E. T. A. Hoffmann (Lehne, 2014; Jacobs, 2015a). Furthermore, the arousal span strongly contributes to the perceived arousal level of text passages as well as to the activation of emotion-related brain areas when reading passages from Harry Potter books (Hsu et al., 2015a).

# Sublexical Effects on General Affective Meaning

Poetry inherently involves the structuring of sound, which is why it is important to consider the phonological composition at the sublexical level—also and especially when investigating the emotional impact of poetry. Our study draws on the general theoretical assumption of phonological iconicity: Sublexical language sounds have been found to evoke highly consistent assessments of meaning dimensions—potentially relevant for affect—such as size, shape, or pleasantness (Köhler, 1929; Sapir, 1929; Taylor and Taylor, 1965; for a review on the phenomenon of phonological iconicity see Schmidtke et al., 2014b). Such findings inspired literature scientists and psychologists to compare the phonetic content of poems of opposite general affective meaning. While some of these studies indicated that, for example, plosives appear more often in positive or happy poems, whereas nasals appear rather in sad contexts (Wiseman and van Peer, 2003; Albers, 2008; Auracher et al., 2011), other studies found contradictory evidence, for example, that plosives reflect negative characteristics (Fónagy, 1961; Whissell, 1999, 2000), or that nasal vowels represent beauty (Tsur, 1992). A general problem of these studies is that they were merely investigating the frequency of occurrence of the phonemes of interest, which could be misleading due to specifics of phoneme distributions in the poetic language mode. An alternative approach is to calculate the deviation of existing phonological patterns in a poem from expected standard patterns (Aryani et al., 2013). This reflects the idea of foregrounding, which is held responsible for the interruption of the automated reading process, thus leading to deeper cognitive processing and potentially aesthetic sensations (Mukaˇrovský, 1964; van Peer, 1986; Miall and Kuiken, 1994; van Peer et al., 2007; Jacobs, 2011, 2015a,b). Aryani et al. (2013) compared the use of phonological units in a poem to the standard distribution of phonological units in prosaic language. This is based on proposed differences between poetic and prosaic language use (see Jakobson's "poetic function" as mentioned above). The resulting deviant phonological units may be responsible—by the foregrounding effects of their salience—for a specific impact of the poem's sound onto the reader. The basic affective tone approach of Aryani et al. (2016) further involves intrinsic affective values of the salient phonological segments. This is inspired from the finding that certain phonological clusters tend to occur particularly often in words of specific affective meaning (e.g., high arousal and negative valence). Sublexical affective values were computed averaging the valence and arousal values, respectively, of all words in which a particular phonological segment occurs in a normative database containing valence and arousal ratings for over 6000 German words (an extension of Võ et al., 2006, 2009, by Conrad et al. in preparation)—assuming an internal relation between the signifier and the signified. For the corpus "verteidigung der wölfe," a compendium of 57 German poems by Hans Magnus Enzensberger (1957), Aryani et al. (2016) have investigated the match of the author-given affective chapter labels "friendly," "sad," and "spiteful" with the readers' affective appraisals of the poems, and connected these comparisons to an analysis of basic affective tone at the sublexical phonological level—connecting thus all three parts of Jakobson's extension of Bühler's organon model: sender (the author), message (the text), and receiver (the readers) (Bühler, 1934; Jakobson, 1960). They could show how a close match between author labels and readers' affective appraisals appears to be mediated through a specific use of phonology: the basic affective tone (term introduced by Aryani et al., 2016) alone accounted for up to 20% of the variance in readers' ratings of the general affective meaning.

Here, we will extend the analyses of Aryani et al. (2016) on the relation between text and reader for the same corpus of poems to the lexical (referring to the words in a text) and inter-lexical (concerning the relations between words) text levels in order to achieve a more comprehensive understanding of how basic textual elements may determine the affective impact of poetry.

In general, research described above has shown that affective features of different text levels can contribute to the general affective meaning of a text. It remains unclear, though, whether such effects of different text levels are independent of each other, and how much of the general affective perception of poetry could be explained via these relatively basic textual dimensions altogether. In the following, we will try to quantify these influences on readers' affective perception of poetry via multiple regression analyses.

We hypothesize, in particular, that (i) lexical variables will generally be the best predictors of general affective meanings as assessed by ratings. Nonetheless, we assume that (ii) affective features at all text levels significantly contribute to the perceived general affective meaning of poems. Partialling out the influence of lexical variables via multiple regression should reveal independent sublexical and inter-lexical contributions to the affective impact of poetry.

Consciously leaving aside important supra-lexical features such as familiarity with a literary genre (Bohrn et al., 2013), comprehensibility (Leder et al., 2012), experience with literary work in general (Winston and Cupchik, 1992), and many other personality variables (Bleich, 1978), as well as syntactic and structural characteristics of the poems, we search to estimate how much affective potential resides already within more basic constituents of the text itself: single phonemes, words, or basic inter-relations between words.

# MATERIALS AND METHODS

# Ratings

# Poem Corpus

"die verteidigung der wölfe" ("the defense of the wolves") was written in 1957 by the contemporary German author Hans Magnus Enzensberger (<sup>∗</sup> 1929, see Astley, 2006, for an English introduction to Enzensberger's poetry). These 57 poems are partitioned by the author into three chapters of 21 "sad" ("traurig"), 19 "friendly" ("freundlich"), and 17 "spiteful" ("böse") poems. This assures a sufficient variety of affectivity across all poems, paving the way for a differentiated prediction of the variance in the perception of their general affective meaning via affective information at subjacent text levels. An advantage of this contemporary poem volume is the employed free verse poetry which should prevent our operationalization of phonological salience to be confounded with features of a strong metrical ordering and rhyme that also exert a specific influence on aesthetic judgments of poems (Obermeier et al., 2013; Menninghaus et al., 2014, 2015).

# Participants

German native speakers were recruited through a post on the institute's website and a diversity of Facebook webpages. More than 300 people participated, 252 of which left evaluable data (173 female; age range from 17 to 76, M = 35.9, SD = 12.1).

# Procedure and Variables

General affective meaning ratings were acquired via an online survey using the QuestBack Unipark software. After being welcomed, instructed, and asked to enter a few personal data (age, sex, native language, profession), people were free to read and rate as many poems as they liked (M = 4.3, SD = 5.4). The poems were presented in a pseudo-randomized order. People were asked whether they already knew each poem—only unknown poems were used in the analyses. A minimum of 15 complete ratings for each poem were acquired on each of the following eight dimensions—presented to participants in randomized order:

Ratings of Valence and Arousal—Linking our Approach to Psychological Emotion Models


Ratings on Discrete Affective Categories to Directly Assess the Labels the Author Suggested for his Poems:


The basic two levels of our approach toward capturing the perception of general affective meaning in poetic texts dimensional and discrete aspects of emotion—are derived from the dual-process model of emotional responses to art of Cupchik and Winston (1992; also see Cupchik, 1994). While arousal and valence ratings form the reactive part of their model, the discrete affective dimensions—which require more context information (i.e., appraisals)—form the reflective part of the model. In poetry, though, immersive emotions—arising from the plot—may be less dominant than in narrative fiction (Oatley, 1994; Jacobs, 2011, 2015a; Mar et al., 2011), whereas aesthetic emotions characterized mainly by the affective evaluation and appreciation of artistic style, beauty, etc.—play a more dominant role (Frijda, 1989; Leder et al., 2004; Markovic, 2012 ´ ). Hence, we extend the model of emotional responses to art by a meta-reflective layer comprising aesthetic concepts: A liking rating is assumed to assess the affective part of aesthetic judgment, referring to personal emotional experiences during poetry reception, whereas we assume poeticity ratings to capture the more cognitive aspects of aesthetic judgment, as they have been shown to be influenced by linguistic competence in general (Hoffstaedter, 1987). Such aesthetic preferences represent a much more abstract level of the perception of general affective meaning, as they strongly depend on context and personality factors as well (Bleich, 1978; Jacobs, 2011, 2015a). As our study also refers to the phenomena of phonological salience and basic affective tone at the sublexical level (see Aryani et al., 2016, and further below), we additionally collected onomatopoeia ratings. Onomatopoeia represents the use of words whose sound is suggestive of their meaning. Hence, this rating is supposed to assess how well the (imagined) sound of a poem is perceived by the reader to fit the overall meaning of a poem—as "poetry is a province where the internal nexus between sound and meaning changes from latent into patent and manifests itself most palpably and intensely" (Jakobson, 1960).

Ratings on Aesthetic Evaluations as Well as the Fit of Sound and Meaning:


# Multiple Regression

The rating variables (including the absolute value of valence) were used as dependent variables in a multiple regression approach. To provide a most extensive screening for potential effects of different phenomena at different text levels we included a considerable number of predictor variables (55 in total, listed in Table S1 in the Supplementary Materials) from the three basic levels of text processing into the forward stepwise multiple regression models. As stop criterion we used the rather conservative Bayesian information criterion (BIC), which seems an appropriate way to constrain the number of significant results—putting specific effort in avoiding false positives—for an a priori high number of predictors.

We assume deviation to be an important precursor of foregrounding, which is supposed to be responsible for many affective and aesthetic effects while reading literary works. Hence, we tried to operationalize the degree of deviation from expected values at each text level: instead of using raw mean values of, e.g., valence or arousal values of words or subsyllabic segments (see Aryani et al., 2016) in a poem, we rather used the degree of deviation of these values from neutral global means within a representative database of everyday German language (Brysbaert et al., 2011) to predict readers' affective perception of the poems (see sections below for calculations).

To capture potential quadratic effects of variables with potentially bipolar character we used both standard values including positive/negative algebraic signs—and their absolute values as predictors of ratings. This should enable us to capture effects such as, e.g., arousal ratings increasing with both more negative and more positive lexical content of poems (as compared to neutral valence)—or negative or positive deviations from neutral at the sublexical level, respectively.

## Lexical Predictors

All poem texts were PoS (part-of-speech) tagged to identify the word forms and infinitives of each word. Function words were omitted for their use is determined mainly by grammatical requirements (Miller, 1954; Anderson and McMaster, 1982). For the remaining words lexical valence and arousal values were looked up in an extended version of the BAWL database, containing more than 6000 German words with affective ratings (Võ et al., 2006, 2009; publication of the extended version in preparation). For poem words that did not appear in the database but were standard German words, we collected additional valence and arousal ratings. Finally, this yielded a 90% matching rate. The missing 10% mainly consist of names of people and places, and a few foreign or obsolete words. We determined the lexical affective predictors—in terms of valence and arousal—for a poem by their deviations from the respective affective mean of a null model. That is, we calculated the extent to which the mean of valence and arousal values of the words in a poem deviates from the affective mean of the same number of words randomly pulled from a linguistic corpus. For this, the valence and arousal values of the words in the BAWL database were first z-standardized according to their lexical frequency in the SUBTLEX-DE corpus (Brysbaert et al., 2011) resulting in a normal distribution with a mean of zero and a standard deviation of one. In order to calculate the standard deviation of a randomly pulled sample given statistical independence of the values in each sample—the standard deviation of the whole words in the database (i.e., one) was divided by the square root of the size of the random sample (i.e., the total number of words appearing in the corresponding poem). This value represents the standard deviation of the null model. By dividing the mean of valence or arousal values of all words in a poem by the standard deviation of its corresponding null model, we obtained a "sigma factor" which indicates the extent to which the valence or arousal mean deviates from an expected value (i.e., the null model). As example, the formula for the sigma factor of lexical arousal looks as follows:

$$
\text{Sign}(around) = \frac{M(around)}{1\sqrt{\text{N}}}
$$

where M(arousal) is the mean of arousal values of all words in a poem, and N is the total number of words appearing in the corresponding poem.

– These sigma factors (together with their resulting absolute values) for lexical valence and arousal of each poem served as predictors in the multiple regression models.

#### Inter-Lexical Predictors

The inter-lexical variables that we included in the analysis are thought to reflect tensions and dynamics within a text. Here as well, deviation matters: Standard deviations and spans of all words' valence and arousal values may serve as a proxy for the general affective spread of a poem:


As already one single affectively deviant word can dominate the general affective meaning of a whole poem, also affective minima and maxima are included in the range of inter-lexical predictor variables:

– Minimum and maximum values of lexical valence and arousal per poem

Correlations between words' positions in the text and their affective values might reflect the development of affectivity throughout the course of a poem:

– Correlation coefficients (together with their absolute values) between words' positions (beginning to end) within a poem and arousal, valence, and the absolute valence values

In addition, the number of words per poem was also included as a predictor variable.

### Sublexical Predictors

The EMOPHON tool (Aryani et al., 2013) translates an input text into its phonemic notation and then analyses the text for salient phonological units based on a probabilistic model: a reference linguistic corpus for the German language (SUBTLEX-DE; Brysbaert et al., 2011) determines confidence intervals for the frequency of occurrence of all sublexical units in a text depending on its length. If the actual frequency of occurrence of a specific sublexical unit in the text exceeds its confidence interval, it is regarded as salient. Here, we chose the tool's option to segment the texts into the subsyllabic units onsets, nuclei, and codas (instead of single phonemes)—which are used for all following analyses. For the number of salient phonological segments that exceed their confidence intervals, we used the

– Ns of salient onsets, Ns of salient nuclei, Ns of salient codas as well as the Ns of all salient subsyllabic segments altogether (in each case weighted by the length of the respective poem)

The recent update of EMOPHON (Aryani et al., 2016) further provides a quantitative measure for the basic affective tone by integrating the detected salient phonological segments with affective values assigned to each of them. These sublexical affective values for subsyllabic onsets, nuclei and codas were computed by averaging lexical valence and arousal values of all words in a lexical database containing rating values for over 6000 words a specific phonological segment appears in (see Conrad et al. in preparation, for details). Again, we used the sigma factors—the extent to which the respective mean affective value of salient phonological segments in a poem deviates from the mean of the random distribution used in the model—as predictor variables. The sigma factor reflects how strongly the affective sublexical value of the poem deviates from an expected value for a text of the same length—and into which direction. As predictors, we used:


Note that the present analyses go beyond the ones presented in Aryani et al. (2016) by also addressing:


Furthermore, by letting sublexical and lexical values compete in multiple regression analyses we want to provide answers to the following research questions:


# RESULTS

# Descriptive Statistics of the Rating Variables

Means and spreads of rating variables are summarized in **Table 1**. The mean ratings for the discrete affective concepts friendliness, sadness, and spitefulness are highest in the respective poem categories. This shows that readers' perceptions of the poems' general affective meaning generally correspond well with the author-given affective categorization. Valence and arousal ratings further support the character of these discrete affective categories. For example, the author-defined spiteful poems show the strongest negative valence and highest arousal ratings (for statistical comparisons see Aryani et al., 2016).

TABLE 1 | Means (M) and standard deviations (SD) of the rating variables for each author-given affective category (N being the number of poems rated).


Bivariate correlation coefficients between all dependent variables are shown in **Table 2**. Especially valence is highly correlated with friendliness in a positive way, and with spitefulness in a negative way. An opposite pattern is found for the correlation between arousal and spitefulness (positive) compared to friendliness (negative). Whereas liking and poeticity correlate moderately with each other as well as with valence and friendliness, onomatopoeia, and sadness are the two ratings correlating least with the other ones.

# Multiple Regression Results

Results for forward stepwise regression models using all predictors on different rating dimensions of the general affective meaning as dependent variables are summarized in **Tables 3**– **5**. **Figures 1**–**9** display bivariate correlations between all rating dimensions and up to four significant predictors emerging from the Multiple Regression Models.

Significant predictors of ratings on the basic affective dimensions valence (including also the absolute values of valence) and arousal are shown in **Table 3**: Around half of the variance (43–59%) of the ratings of the basic affective dimensions valence and arousal can be explained solely by the employed lexical, sublexical, and inter-lexical affective measures.

The variance in the valence ratings can be predominantly accounted for by the lexical valence and arousal patterns in the poems—41% of the ratings' variance can already be explained by lexical valence alone. Higher valence ratings go along with increasing lexical valence values but decreasing lexical arousal values (see **Figures 1A,B**). This pattern is in line with the general negative correlation between the two affective dimensions in, for example, words from German affective word databases (BAWL: Võ et al., 2009; ANGST: Schmidtke et al., 2014a). At the sublexical level, more salient segments in general also lead to higher valence ratings (**Figure 1D**). Regarding the sublexical arousal level of all these salient segments, the inclusion of the absolute arousal values allows a more detailed characterization of the underlying mechanisms: Generally speaking, low sublexical arousal leads to higher valence ratings (**Figure 1C**). This is confirmed by the absolute values of sublexical arousal of all



Indicators of significance: \*\*\*p ≤ 0.001, \*\*0.01 ≥ p > 0.001, \*0.05 ≥ p > 0.01.




salient segments showing a positive partial correlation. For higharousing sublexical segments however, the two variables would predict opposite patterns which cancel out each other. Hence, although very low/calming sublexical arousal values lead to higher valence ratings, very high sublexical arousal values do not coincide with more negative supra-lexical valence ratings.

The absolute values of valence ratings, representing the intensity of valence ratings irrespective of their direction, can best be predicted by lexical arousal and the sublexical arousal values of all salient segments. At both text levels, higher arousal leads to a higher intensity of valence ratings (see **Figure 2**). This clearly reflects the U-shaped distribution of lexical valence and arousal values in affective word databases (BAWL: Võ et al., 2009, ANEW: Bradley and Lang, 1999), where both the arousal levels of positive and negative words are higher than for neutral words, even if the arousal for positive words does not reach the same height as for negative words in the German language (Schmidtke et al., 2014a). The fact that sublexical arousal adds another 12% explanation of variance, hints toward a similar distribution of valence and arousal at the sublexical level.

For the arousal ratings, no sublexical affective values appear in the regression model. The variance in these ratings is mainly accounted for by the word with the highest arousal level in the poem, but also by the overall lexical arousal level: the higher the lexical arousal and its maximum value, the higher the arousal ratings (**Figures 3A,B**). But also a changing level of lexical valence throughout the poem has an influence on the perceived arousal: a rise of valence values toward the end of the poem leads to diminished arousal ratings—marked by a negative partial correlation of the supra-lexical arousal and the correlation values of lexical valence with the word order—whereas a decline of the words' valence throughout the poem leads to a higher perceived arousal level (**Figure 3C**).

**Table 4** lists significant predictors of discrete affective concepts' ratings: Very similar to the valence ratings, also friendliness is mainly driven by the positive influence of lexical valence (**Figure 4A**) and the negative influence of lexical arousal (**Figure 4B**), together accounting already for 51% of the variance in the friendliness ratings. Additionally, if the valence values of words rise with their position in the poem, higher friendliness TABLE 4 | Predictors of the discrete affective concepts' ratings.



ratings occur—and vice versa (**Figure 4D**). At the sublexical arousal level, we find the same pattern as for the valence ratings, just that here the one-sided effect of very low arousal values leading to higher friendliness ratings—whereas high arousal does not lead to diminished friendliness—stems from salient nuclei only (**Figure 4C**), not from all types of salient segments. Furthermore, a higher intensity of the sublexical valence values of all nuclei in the text—being represented by the absolute value seems to lead to higher friendliness ratings.

Contrary to friendliness, in the spitefulness model, lower lexical valence and higher lexical arousal lead to higher spitefulness ratings (**Figures 5A,D**), as could be expected for high arousing negative poems such as spiteful ones. At the sublexical arousal level, the more extreme the arousal of all salient segments is, regardless in which direction, the higher are the spitefulness ratings (**Figure 5B**). Also the total number of words seems to play a role here, with longer poems being rated as slightly more spiteful than shorter ones (**Figure 5C**). This, however, might be a specific quality of this particular poem corpus, not being transferable into general.

A first glance at the regression model for sadness shows that, unlike in every other of the analyzed models, neither lexical valence nor arousal per se is included. However, the word with the smallest valence value in a poem is the most influential predictor of the sadness ratings—the smaller its valence is, the sadder is the overall impression of the poem (**Figure 6A**). Another important inter-lexical aspect in the case of sadness is the correlation of word affectivity with the word order. For lexical valence, more negative word values toward the end of a poem raise the sadness rating, and more positive values toward the poem's end make it less sad (**Figure 6D**). In the case of lexical arousal, declining word arousal values throughout the poem account for a sad poem (**Figure 6C**), but rising arousal levels to the end of a poem do not necessarily lead to smaller sadness ratings. This is reflected by the absolute value of the correlation of lexical arousal with the words' positions entering the regression model as well, which neutralizes the potential influence of higher lexical arousal values. At the sublexical level, the absolute value of the arousal level of all codas in the text seems to be the strongest predictor. Thus, any coda's arousal value being significantly different from the distribution's mean—no matter whether it is especially low or high-arousing—leads to higher sadness ratings. The same holds for the valence values of all types of salient segments in the poems. Regarding their arousal level, the higher it is, the sadder the poem is perceived (**Figure 6B**). Furthermore, the occurrence of many salient nuclei in a text goes along with lower sadness rating.

**Table 5** lists significant predictors of the two aesthetic as well as the onomatopoeia ratings: The only two variables that significantly predict part of the liking ratings (23%) are lexical arousal and the sublexical arousal of all salient segments. Both types of arousal show a negative partial correlation with the dependent variable: poems appear to be "liked" less when containing words of relatively high arousal, but more when they are low-arousing (**Figure 7A**). The same holds for the arousal potential of salient phonological segments (**Figure 7B**).

TABLE 5 | Predictors of the two aesthetic and the onomatopoeia ratings.


SD, standard deviation, |...|, absolute value; Indicators of significance: \*\*\*p ≤ 0.001, \*\*0.01 ≥ p > 0.001, \*0.05 ≥ p > 0.01, <sup>+</sup>0.1 ≥ p > 0.05; Color coding: Red, lexical variables; Blue, inter-lexical variables; Yellow, sublexical variables. The bold number indicates the respective overall cumulative R<sup>2</sup> corrected for each regression model.

For the dependent variable poeticity, lexical arousal appears as a highly significant predictor variable if its absolute values are considered: The more deviant the lexical arousal values are from zero, no matter whether into a higher arousing or more calming direction, the less poetic the poem is rated (**Figures 8A,B**). Thus, poems that contain predominantly words of a rather unremarkable arousal—not significantly high- or low-arousing—are perceived as more poetic than poems with salient lexical arousal features. Moreover, the poeticity ratings are also strongly influenced by sublexical affective values. The number of salient segments, in particular of the salient codas, accounts for a reasonable part (>10%) of the ratings' variance: poems that use phonological segments more often than expected from everyday language are perceived as more poetic (**Figure 8C**). Furthermore, the arousal level of respective salient nuclei seems to play a differentiated role, as specifically the low-arousing salient nuclei lead to a higher perceived poeticity (**Figure 8D**). This results from the finding that the continuous arousal values of the nuclei are negatively correlated with the poeticity ratings, while the absolute arousal values correlate in a positive manner. Thus, for the negative range—namely the lowarousing part—the inferred statement is the same, whereas in the positive—high-arousing—range the correlation patterns oppose and hence zero out each other. Consequently, more arousing nuclei values do not necessarily lead to diminished poeticity ratings.

The onomatopoetic perception is significantly influenced by variables from all three text levels. At the lexical level, a higher occurrence of negatively valenced words in a poem leads to increased onomatopoeia ratings. In contrast, with a higher maximum value of lexical valence in a poem, the ratings for onomatopoeia become slightly higher as well. However, this partial correlation is not a very strong one. Regarding the spread of lexical valence and arousal in each poem—depicted by their standard deviations—higher deviations involve lower onomatopoeia ratings (**Figures 9A,B**). At the sublexical level, the nuclei seem to play an important role: On the one hand, a high number of salient nuclei in a poem predict higher onomatopoeia ratings (**Figure 9C**). On the other hand, if the arousal level of all nuclei in a poem taken together is getting very high or very low, the poem is perceived less onomatopoetic (**Figure 9D**). The overall picture receives further complexity by the fact that a more positive valence specifically of salient codas augments the onomatopoeia ratings.

In summary, it can be stated that in all of the regression models at least two out of the three examined levels of affective text analysis contribute significantly but differently to the variance in the respective dependent rating variable. In eight out of nine cases, at least one of the lexical variables valence or arousal is contained in the regression model, in six cases it enters the model first. Especially the inclusion of lexical arousal in seven models increases the amount of explained variance to a noticeable extent. Lexical valence supports four models significantly. The newly defined inter-lexical variables, whose task it is to represent dynamic shifts and spreads of affective lexical content, find their way into the regression equations in five out of nine models. From the huge number of sublexical predictor variables, prominently the arousal level of salient segments consistently explains variance in eight out of nine models. In addition, the pure number of salient segments in a poem, disregarding their affective values, plays a role in four of the nine regression models.

FIGURE 1 | Bivariate correlations between valence ratings and four predictors: the sigma factors for lexical valence (A), the sigma factors for lexical arousal (B), the sigma factors for sublexical arousal of all salient segments (C), and the total number of salient segments per poem weighted by its length—note that the correlation gets significant after partialling out the influence of the other predictors (D).

Regarding the different abstraction levels of the rating variables, the best goodness of fit is achieved for the discrete affective concepts ratings (47–70% of variance accounted for), closely followed by the dimensional affective ratings (43–59% variance accounted for). Even for ratings at the most abstract level of general affective meaning—including aesthetic as well as onomatopoetic ratings—still 23–48% of the variance are accounted for by basic textual predictor variables.

# DISCUSSION

This study investigates to which extent affective connotations at the rather basic textual dimensions of phonological units and single words (or the relative positions of the latter) influence the overall affective perception of poetry. For this purpose, we used the volume "verteidigung der wölfe" by the author Hans Magnus Enzensberger that is categorically divided into friendly, sad, and spiteful poems. To estimate their affective perception by the reader, we collected ratings of the poems on several affective scales, ranging from the basic dimensions valence and arousal to the author-based discrete affective dimensions friendliness, spitefulness, and sadness, to aesthetic evaluations of poeticity and liking, as well as the concept of onomatopoeia. To identify basic textual sources potentially determining these ratings we quantified affective properties of the texts (using valence and arousal values from large-scale normative lexical databases) at three different basic text levels: sublexical, lexical, and inter-lexical. We then used these measures as predictor variables in a stepwise multiple regression approach to test how much of the variance in the perceived general affective meaning can be accounted for by these textual variables, and how these influences may vary across different rating dimensions.

Overall, our results from the different regression models show that a prominent portion of the variance in affective and further aesthetic and onomatopoetic ratings of our poems can be accounted for by affective features at the sublexical, lexical, and inter-lexical level. These findings suggest that very basic affective processes play a crucial role in poetry perception. Note that we do not argue that higher-level processes would not matter, they are just not studied in our approach.

The best predictors of the perception of the general affective meaning of the poems—assessed via ratings—were the average lexical valence and arousal values of words—in terms of their deviation from an expected average value—contained in the poems. Pragmatically speaking, this would mean that it is sufficient to put words with specific affective connotations together to create half of the affective impact a poem is able to provoke in the reader. Again, while this view may appear extremely minimalistic, it is well in line with other findings from reading studies using normal sentences or passages from novels (Anderson and McMaster, 1982; Whissell et al., 1986; Bestgen, 1994; Whissell, 1994; Hsu et al., 2015a).

Beyond the single word level, our study provides a number of novel results for inter-lexical phenomena and how they contribute to the affective reading experience. From a neuroscientific perspective, Hsu et al. (2015a) and Jacobs (2014,

2015a) have already shown how inter-lexical affective features such as the span of lexical arousal values across a text passage can account for variance of arousal (Hsu et al., 2015a) and suspense ratings (Lehne, 2014) as well as elicit increased activation of brain areas associated with affective processing (Hsu et al., 2015a). In our data, for instance, the overall ratings of arousal induced by a poem were best predicted not by the average lexical arousal values but rather by specific maxima of lexical arousal. The maximum lexical arousal value in a text is a mathematical constituent of the arousal span (max–min) and probably the most relevant one, as it represents salient peaks or particularly exciting moments in a text—which well fits the general view on this emotion dimension as an alert system reacting immediately to salient affective input. Such findings underline the importance of deviation from expected patterns—here the outstanding arousal level of one single word in a text—for foregrounding effects (compare with the Neurocognitive Poetics model, NCPM, Jacobs, 2014, 2015a).

Furthermore, our novel operationalization of the evolution of affective content throughout a text—correlating lexical affective values with word position—yields a number of interesting results: The respective measures for lexical valence and arousal evolution significantly contribute to predicting affective evaluations of poems' general "sadness," "friendliness," or "arousal." For instance, poems were perceived as sadder when affective values of words became increasingly negative and less arousing toward the end. Instead, poems were perceived as more friendly when words of an increasingly positive character were used toward the final lines of a poem. We conclude that these correlations between word positions and affective values offer a good proxy for how overall affectivity is being continuously created throughout the course of a poem involving either a classical crescendo or a descent of affective intensity toward the end. In addition, this finding complements well with the established idea that readers naturally exert their greatest reading emphasis at the end of a sentence or passage (Gopen and Swan, 1990).

Last but not least, our data corroborate and extend recent findings on how sublexical phonological features influence affective processes during poetry reading. Aryani et al. (2016) have already shown for the same corpus how a sublexical, phonologically defined measure of the basic affective tone is significantly associated with both the author-given affective labels of single poems and the readers' evaluations of the general affective meaning. That is, for instance, valence ratings of poems get more negative, or spitefulness ratings increase, when poems feature particularly many phonological segments of high arousal potential (i.e., occurring in many words of highly arousing lexical meaning—hence reflecting phonological

iconicity of language). In the present study, using a huge number of predictors from different text levels, we could show that these effects of basic affective tone indeed seem to occur independently of the lexical affective content of the poems, as effects persist even after the very robust effects of lexical affective values have been partialled out in our multiple regression models. Note also that control measures of the basic affective tone—not using the phonologically salient but all phonological segments—only rarely account for significant amounts of variance of the ratings in our multiple regression models (and if then only referring to specific subsyllabic units), while the EMOPHON's measures based on phonological salience did so in eight out of nine regression models. This is strong evidence that phonological salience in combination with phonological iconicity can be considered an important poetic device. While the choice of words and their arrangement is obviously a major concern for poetic style, our data suggest that affective sublexical phonology may be crucial for choosing the words that best fit a given poetic purpose (see also "subliminal verbal patterning in poetry," Jakobson, 1980b). Importantly, our data also show that readers are obviously sensitive to phonological salience per se: Subjective ratings of poeticity and onomatopoeia were significantly associated to the number of phonological segments qualified as phonologically salient by the EMOPHON tool.

At the level of rating dimensions as dependent variables—and from a general perspective—our study offers an interesting comparison between rather global evaluations of the general affective meaning of poems using the terms of dimensional emotion models (valence and arousal), specific affective dimensions presumably best suitable for the given corpus (sadness, spitefulness and friendliness), and the more aesthetic evaluations of liking and poeticity, as well as the further evaluation of onomatopoeia. Goodness of fit for regression models trying to predict the latter three dimensions was clearly less as compared to the other two groups. This is no surprise, as in the case of valence and arousal ratings, criteria and predictor variables are based on identical operationalizations of affect (as all predictors were quantified using valence and arousal values). The author-given labels of spitefulness, sadness, and friendliness deliver even more impressive fits, presumably because they might simply capture the entire variance of affective content of these poems in optimal ways. Still, our approach offers interesting insights on how more abstract evaluations of poetry (such as participants' liking of a poem or the ascription of poeticity and onomatopoeia to a text) relate to the basic affective dimensions of valence and arousal at lexical and sublexical textual levels: A remarkable finding is the decrease in general liking ratings of poems with increasing arousal—concerning both the words (or concepts dealt with) in a poem, and its

FIGURE 6 | Bivariate correlations between sadness ratings and four predictors: the minimum values of lexical valence per poem (A), the sigma factors for sublexical arousal of all salient segments (B), the correlation coefficients between lexical arousal values and word positions in a poem—note that the correlation gets significant after partialling out the influence of the other predictors (C), and the correlation coefficients between lexical valence values and word positions in a poem—again, note that the correlation gets significant after partialling out the influence of the other predictors (D).

arousal of all salient segments (B).

phonological content (also see Aryani et al., 2016, for the prominent role of sublexical arousal). Note that this might meet a general principle of emotion processing, as already Fechner related aesthetic preference for arousal states according to the "principle of the aesthetic middle," meaning that people prefer "a certain medium degree of arousal, which makes them feel neither overstimulated nor dissatisfied by a lack of sufficient occupation" (Fechner, 1876, vol. 2, pp. 217–218; also see Berlyne, 1971, and Wundt, 1874). As the general arousal level of the poems in the Enzensberger volume is on average very high, a lowered lexical

arousal level, as indicated by the regression results for liking, would still be of medium value. This principle also seems to generalize to the evaluation of poeticity by our participants: Both very high and very low levels of lexical arousal go along with lesser ascriptions of poeticity to the poems. Also at the sublexical level, a rather low arousal level coincides with higher poeticity ratings. Hence, any extremes at the phonological and at the lexical content level rather seem to "disturb" the perception of poeticity. A similar pattern is present for the explicit evaluation of phonological content during onomatopoeia ratings: these increased with the number of phonologically salient segments, but decreased with deviations concerning the arousal level of these segments toward either the very exciting or the very calm end of the bipolar arousal scale. Most interestingly, they also decreased with increasing spreads of lexical valence and arousal. Again, the focus—at least the conscious one—of our participants on formal features of poetry appeared to be rather disturbed by a too distracting affective variety at the level of semantic content.

Taken together, while previous studies had reported a range of effects of specific text levels influencing the affective appeal of literature (e.g., Bestgen, 1994, or Whissell, 2009, for the lexical level; Lüdtke and Jacobs, 2015, for the inter-lexical level; Aryani et al., 2016, for the sublexical level), in this study we can show in one conjoint explorative approach how sublexical, lexical, and inter-lexical affective features combine in constituting considerable parts of the perceived general affective meaning as well as further aesthetic and onomatopoetic evaluations of poetry.

# LIMITATIONS AND FUTURE PROSPECTS

What we consider a characteristic strength of the current approach certainly represents a shortcoming when it comes to deliver a comprehensive model of poetry perception: our very basic, or even minimalistic, contrastive approach to the standard investigation of the affective perception of poetry, which normally involves supra-lexical context or readers' personality features as well. While this alternative approach interestingly matches current computer-based approaches to poetic writing (Kirke and Miranda, 2013; Misztal and Indurkhya, 2014), it does not take into account well established phenomena of, e.g., familiarity (Bohrn et al., 2013) or comprehensibility (Leder et al., 2012) for poetry perception, nor does it allow for generalizing over different populations of readers. The latter might especially matter, considering that poetry may be differently "consumed" by expert readers with specific expert poetry reading strategies

in comparison to unexperienced readers (see, e.g., Hanauer, 1995, on differences in literariness ratings between expert and novice readers, and Hanauer, 1996, regarding poeticity ratings), whereas our sample represents a randomly selected group of participants. For example, it is important to consider that people naïve to art may generally prefer art work that provides them with warm, i.e., positive and low-arousing, feelings (Winston and Cupchik, 1992). Furthermore, people less experienced with poetry might be less aware of more sophisticated stylistic devices or further meanings on a meta-level. Hence, basic textual features may play a bigger role in forming the general affective meaning of poetry for lay people than for experienced poetry readers. It would be interesting to investigate through followup studies with expert poetry consumers whether the influence of basic textual levels on affective perception would decrease with expertise. Moreover, future studies trying to complete the "emotion potential function" (Jacobs, 2015a,b) for literary texts might have to include many further contextual and personality features of the readers to come up with a more complex account of affective perception of poetry.

Also, the wide variety of poetic œuvres certainly calls for cross-validations of findings with different text material and in different languages—including prose as well as everyday written and spoken language. Further, also the choice of textual measures could still be extended—for example, integrating morphemic and syntactic text levels—and refined—for example in terms of the inter-lexical measures. The merit of this study might thus just lie in having made first explorative steps toward investigating—or having opened initial insights on—text-based affective potential functions for several aspects of the general affective meaning. These innovative insights may also compensate methodological disadvantages of our statistical approach using a large number of predictors in stepwise multiple regression. While we opted for this specific method as it seems optimal when screening for the most influential ones among a wide range of possible candidate measures, future studies may apply more fine grained methods to disentangle the details concerning the interplay of a restricted number of variables according to more specific research questions.

Future studies should, in particular, extend our investigations to (i) the works of other writers—as some of our findings may in theory result from an idiosyncratic writing style of H. M. Enzensberger, (ii) (non-) literary texts or even everyday speech in different languages, and (iii) affective ratings from different types of reader groups including expert readers.

# CONCLUSION

In this study we focused on how and to which extent affective connotations of very basic textual measures at the lexical, inter-lexical, and even sublexical level of a poem—that can all be derived from existing normative databases—determine the perception of the general affective meaning of poetry in a way that proves quantifiable beyond the specific context of a given poem, author, or recipient. By applying an exhaustive exploratory regression analysis to a comprehensive corpus of poems and their ratings from hundreds of readers, we found that a significant amount of variance in discrete and dimensional affective ratings of poetry can be accounted for solely by text-based affective measures from different levels of processing. In all of the presented statistical models focusing on different aspects of the general affective meaning variance of each rating dimension is significantly accounted for by affective properties of several text levels: while the lexical one generally explains the biggest amount of variance, further significant effects in explaining residual variance are found for the alternative sublexical and inter-lexical text levels. Thus, our research brings together previous accounts on specific effects of single text levels, showing how they may co-exist each in their own right or interact to constitute the complex holistic framework of poetry perception. Taken together, the affective properties of text elements from all three text levels could account for 43–70% of the variance in the perceived general affective meaning of the here utilized poetry and still for 23–48% of the variance in further aesthetic and onomatopoetic evaluations of the poems—a substantial amount purely accounted for by textual elements which should not be neglected in future affective analyses of poetry. This mixedlevel approach represents a first step toward quantifying and computationally modeling what Jakobson hypothesized about the "Framework of Language" (1980): "Each [text] level above brings new particularities of meaning . . . ." Our explorative regression models may guide the way for various future ideas on interrelations between specific textual features and the perception of general affective meaning in further poem corpora and other literary work.

# AUTHOR CONTRIBUTIONS

MC as principal investigator developed the project idea, raised the funds from the cluster "Languages of Emotion," coordinated the project, and provided major contributions to all parts from preparation of data collection and data analyses to writing of the manuscript. SU collected and analyzed the data, as well as

# REFERENCES


wrote the main part of the manuscript. AA was responsible for some of the computational aspects, especially regarding the Emophon tool developed by him, and offered important critical feedback. MK initiated the idea to use the poem corpus and gave helpful input from her philological perspective throughout the whole research process. AJ gave important input regarding the theoretical framework of the study. All authors substantially contributed to the conception and interpretation of the work, revised it critically, and agree to be accountable for all aspects of the work.

# ETHICS STATEMENT

This study was approved by the ethics committee of the Freie Universität Berlin and was conducted in compliance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). We conducted a non-experimental, voluntary online survey, in which people had to read and judge poems. In the instructions we told the participants that they can skip the survey any time they want to. If they had any questions regarding the survey they could contact us any time (e-mail addresses provided). There was only one participant of the online survey who was only 17 years old. We did not have any additional instructions for minors or their parents. But we assume that rating poetry does not pose a significant difference between teenagers and adults.

# ACKNOWLEDGMENTS

This research was funded through the grant 410 "Sound physiognomy in language organization, processing and production" to MC from the German Research Foundation (DFG) via the Cluster of Excellence "Languages of Emotion" at the Freie Universität Berlin. We thank Kai-Michael Würzner for advice on the PoS tagging of the poems. We acknowledge support by the Open Access Publication Funds of the Freie Universität Berlin.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg. 2016.02073/full#supplementary-material


of beauty and familiarity. Brain Lang. 124, 1–8. doi: 10.1016/j.bandl.2012. 10.003


in reading: towards a neurocognitive poetics)," in Sprachen der Emotion (Languages of Emotion), eds G. Gebauer and M. Edler (Frankfurt: Campus), 134–154.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2017 Ullrich, Aryani, Kraxenberger, Jacobs and Conrad. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.