From Abstract Symbols to Emotional (In-)Sights: An Eye Tracking Study on the Effects of Emotional Vignettes and Pictures

Reading is known to be a highly complex, emotion-inducing process, usually involving connected and cohesive sequences of sentences and paragraphs. However, most empirical results, especially from studies using eye tracking, are either restricted to simple linguistic materials (e.g., isolated words, single sentences) or disregard valence-driven effects. The present study addressed the need for ecologically valid stimuli by examining the emotion potential of and reading behavior in emotional vignettes, often used in applied psychological contexts and discourse comprehension. To allow for a cross-domain comparison in the area of emotion induction, negatively and positively valenced vignettes were constructed based on pre-selected emotional pictures from the Nencki Affective Picture System (NAPS; Marchewka et al., 2014). We collected ratings of perceived valence and arousal for both material groups and recorded eye movements of 42 participants during reading and picture viewing. Linear mixed-effects models were performed to analyze effects of valence (i.e., valence category, valence rating) and stimulus domain (i.e., textual, pictorial) on ratings of perceived valence and arousal, eye movements in reading, and eye movements in picture viewing. Results supported the success of our experimental manipulation: emotionally positive stimuli (i.e., vignettes, pictures) were perceived more positively and less arousing than emotionally negative ones. The cross-domain comparison indicated that vignettes are able to induce stronger valence effects than their pictorial counterparts, no differences between vignettes and pictures regarding effects on perceived arousal were found. Analyses of eye movements in reading replicated results from experiments using isolated words and sentences: perceived positive text valence attracted shorter reading times than perceived negative valence at both the supralexical and lexical level. In line with previous findings, no emotion effects on eye movements in picture viewing were found. This is the first eye tracking study reporting superior valence effects for vignettes compared to pictures and valence-specific effects on eye movements in reading at the supralexical level.


INTRODUCTION
Imagine a future where the best-selling books aren't the sole product of an author's mind but the result of a machine learning assisted approach. A future with personalized phrases and e-books being able to predict your reading behavior. What would be the key to a future like this? Concerning psychological reading research, it would certainly require a stronger focus on ecologically valid study materials (Jacobs, 2015a;Pinheiro et al., 2017;Xue et al., 2019). In this context, most empirical results, especially from studies using eye tracking, are limited to the level of single words or experimentally controlled sentences (Clifton et al., 2007;Radach et al., 2008;Radach and Kennedy, 2013;Wallot et al., 2013). By contrast, reading as one of the essential daily activities commonly involves context information and goes along with emotional processes (e.g., Jacobs, 2011;Mar et al., 2011;Bohn-Gettler, 2019). This leads unavoidably to the second key point. The future scenario calls for a better understanding of affective responses elicited by ecologically valid text stimuli. In discourse comprehension, many studies made use of textual materials and indicated, for example, that the emotions of protagonists were represented in situation models even when not explicitly mentioned (e.g., Gernsbacher et al., 1992;Gygax et al., 2003Gygax et al., , 2004Gygax et al., , 2007. However, such studies largely neglected both reader's emotions and valence-driven effects. The scientific investigation of affective processes necessitates the availability of standardized stimuli that are reliably able to elicit emotions under controlled, experimental conditions. At present, researchers have access to a variety of crossvalidated, international databases addressing different perceptual modalities and providing normative ratings. However, verbal stimulus sets are again restricted to the level of words (e.g., Bradley and Lang, 1999;Redondo et al., 2007;Võ et al., 2009;Eilola and Havelka, 2010;Briesemeister et al., 2011;Soares et al., 2012;Moors et al., 2013;Söderholm et al., 2013;Warriner et al., 2013;Montefinese et al., 2014;Schmidtke et al., 2014;Riegel et al., 2015;Imbir, 2016a) or single sentences Imbir, 2016b;Pinheiro et al., 2017). In addition, their use as an emotion induction method has been predominated by visual stimuli such as pictures (Dan-Glauser and Scherer, 2011). However, only little attention has been paid to the comparison of verbal and visual stimulus domains. For example, early metaanalyses on the efficiency of emotion induction procedures neither differentiated between stories and films nor included static pictures (Gerrards-Hesse et al., 1994;Westermann et al., 1996). Even when differentiated and included, the heterogenous definition of vignettes (cf. Siedlecka and Denson, 2019) made it difficult to draw conclusions about their suitability.
With respect to simpler linguistic materials, Schlochtermeier et al. (2013) were able to highlight potentially beneficial effects of words and phrases on evaluative judgments. More specifically, their behavioral results revealed stronger valence ratings for verbal compared to pictorial stimuli. Additionally, no differences in reaction times and arousal ratings were reported contradicting the commonly assumed privileged processing of pictures within the area of emotion induction (e.g., Azizian et al., 2006;Kensinger and Schacter, 2006;Seifert, 1997). Similarly, Bayer and Schacht (2014) were able to show that both words, faces, and pictures elicit early and late emotion effects as indicated by event-related potentials. Furthermore, words and pictorial materials were perceived as comparably strong in their emotional valence and arousal.
The present study was designed to face some of the aforementioned challenges by introducing a set of ecologically valid, emotion-inducing vignettes verbalizing the semantic content of pre-selected pictures from the Nencki Affective Picture System (NAPS; Marchewka et al., 2014). For both textual and pictorial stimuli, eye movements, as sensitive measure for cognitive and affective processes (Rayner, 1998(Rayner, , 2009, were recorded and analyzed. Accordingly, our study aimed at extending prior findings with two main objectives: (1) the comparison between more complex verbal (i.e., textual) and visual affective materials in the area of emotion induction, and (2) the influence of emotional content on reading behavior in ecologically valid texts.
The article is organized as follows. After reviewing past research on emotion in reading and picture viewing, effects of Valence Category (i.e., positive, negative) and Stimulus Domain (i.e., textual, pictorial) on ratings of Perceived Valence and Arousal are analyzed. Second, the influence of (perceived) textual valence on reading times of both supralexical (i.e., text level) and lexical units (i.e., word level) is examined. Lastly, the role of pictorial valence on the execution of fixations (i.e., Mean Fixation Duration, Total Number of Fixations) and saccades (i.e., Mean Saccade Amplitude) is illustrated. While eye tracking data are reported for both stimulus domains, the thematic focus and innovation of the present article strongly lies on the emotional processes evoked by linguistic stimuli.
Supporting evidence for this was first provided by behavioral studies and experiments using EEG and fMRI (Citron, 2012, for review). Interestingly, also eye movement studies indicated differences in the processing of emotional compared to neutral words (Scott et al., 2012;Knickerbocker et al., 2015). These studies examined emotionally valenced (i.e., positive, negative) and neutral target words embedded in a single sentence structure. Word frequency was considered as additional manipulation by Scott et al. (2012). In both experiments, early measures of processing (e.g., single and first fixation duration) indicated faster reading of emotional words compared to neutral ones. Moreover, emotional valence seemed to be of similar advantage at later processing stages as reflected in shorter total reading times, less regressions, and shorter second pass reading (Knickerbocker et al., 2015). However, valence-specific effects remained unexplored in this comparison. Both studies replicated results from EEG and fMRI studies indicating that emotional words are easier to process than their neutral counterparts while highlighting some differences when comparing emotionally positive and negative words. In this context, modulatory effects of word frequency were reported (Scott et al., 2012). More specifically, negative valence was only found to be beneficial when targets were characterized by a low frequency. In contrast, processing advantages of emotionally positive words emerged robustly under all experimental conditions. Following a dimensional approach of emotion, words' emotion potential can be empirically and computationally quantified in a two-dimensional space with valence representing their polarity and arousal their intensity (Võ et al., 2006Scott et al., 2012;Recio et al., 2014;Jacobs, 2019). Since the two variables are strongly intercorrelated, high arousal commonly goes in line with extreme valence (Bradley and Lang, 1994;Lang et al., 2008;Hofmann et al., 2009;Citron, 2012;Jacobs et al., 2015). Moreover, emotionally negative words tend to reach higher values than emotionally positive ones (e.g., Võ et al., 2009). Concerning valence-specific effects, positive events (e.g., words, sentences) are often associated with accelerated reactions and facilitated word processing (Kousta et al., 2009;Briesemeister et al., 2011;Lüdtke and Jacobs, 2015). In case of negative valence, the oftentimes inconclusive effects are mainly explained by the interactive relationship with the dimension of arousal. Thus, emotionally negative words are mainly associated with shorter reaction times when having high arousal values (Larsen et al., 2008;Hofmann et al., 2009;Recio et al., 2014). In sum, the current evidence supports the notion of superior processing of emotionally positive and higharousal negative words.
Can we expect similar results when manipulating valence at the supralexical, textual level? According to Bestgen (1994), we can act on the assumption that there is a high correlation between the different processing levels. By collecting valence ratings of four texts and their constituting sentences and words, significant correlations between the three processing levels were shown. Similarly, Whissell (2003) demonstrated that valence and arousal ratings of words from the Dictionary of Affect in Language (Whissell and Dewson, 1986) can be used as an estimator of the affective tone of excerpts of romantic poetry. Finally, Hsu et al. (2015b) computed mean and spread measures of valence and arousal for the words of 120 text passages from the Harry Potter novels. Their results indicated that mean lexical valence values can account for approximately 28% of the variance in subjective valence ratings of the text units. Taken together, previous results suggest that the valence of supralexical units like the present vignettes can be -in its simplest form -predicted (at least approximately, cf. Lüdtke and Jacobs, 2015) as a function of the valence of their constituting words (Jacobs, 2015b).

Vignettes as Controlled More Natural Reading Material
As already pointed out, a majority of reading studies fails to go beyond the level of single words or non-literary constructed sentences, i.e., so-called textoids (Bailey and Zacks, 2011). Although this experimental approach offers the possibility to test specific assumptions, the results can only be generalized to a limited extent (Clifton and Staub, 2011). Frazier and Rayner (1982) were already able to show that fixation times are influenced by phrase structure. The context in which a word is presented plays a similarly crucial role (Kuperman et al., 2010;Clifton and Staub, 2011;Wallot et al., 2013). Hence, the use of single words neglects the entangled effects of both syntactic and supralexical semantic features (Boston et al., 2008). To overcome such limitations, existing narratives became an attractive alternative. In this regard, short stories (e.g., Altmann et al., 2012Altmann et al., , 2014 and fairy tales (Wallentin et al., 2011), poems (Lüdtke et al., 2014;Jacobs et al., 2016b;Xue et al., 2019), excerpts of books such as the Harry Potter novels (e.g., Hsu et al., 2014Hsu et al., , 2015a, "The House Of The Scorpion" (Wallot et al., 2013), "The Sandman" (Jacobs, 2015b;Lehne et al., 2015;, "One Boy's Day" (Speer et al., 2009), "Dubliners" (Cupchik et al., 1998), "Hurricane Hazel" (Cupchik and Laszlo, 1994), and newspaper articles (Kennedy and Pynte, 2005) have been used as objects of reading research. While the results thus obtained might be of high ecological validity, they might as well leave ample degrees of freedom for their interpretation (cf. Clifton and Staub, 2011).
What if we seek to combine the benefits of both short textoids and natural reading materials? In case of prose, such a compromise can be found in the construction of vignettes. The term encompasses short, written descriptions of fictitious situations and/or persons (Poulou, 2001). Vignettes usually contain background information and offer readers a base for evaluative judgments (Huebner, 1991;Poulou, 2001). They have been used in the context of teaching (Finch, 1987;Brophy and McCaslin, 1992;Gavrilidou et al., 1993;Poulou, 2001), in the appraisal (Robinson and Clore, 2001), cognitive (Filik and Leuthold, 2013), and emotion recognition research (Camras et al., 1983;Reichenbach and Masters, 1983;Ribordy et al., 1988), to study the theory of mind (Ishii et al., 2004), situational empathy (de Wied et al., 2005), or emotion processing in healthy people (Wilson-Mendenhall et al., 2013), in patients with schizophrenia (Kuperberg et al., 2011), and borderline personality disorder (Levine et al., 1997). Complementary to these applications, the present study evaluates the usefulness of vignettes in the area of emotion induction.

Going Beyond Emotion Inferences
In discourse comprehension, vignettes have become prominent stimuli for studying emotional inferences that are specifically concerned with emotions experienced by characters of a story. One often cited set of 24 vignettes was first published by Gernsbacher et al. (1992). The short stories were constructed to examine how readers infer and represent emotional states of a protagonist that are not explicitly mentioned. Since then these stories have been widely used and adapted in further studies investigating the specificity and content of emotional inferences (Gygax et al., 2003(Gygax et al., , 2004(Gygax et al., , 2007Gillioz et al., 2012;Gillioz and Gygax, 2017). It has been shown that emotional inferences are part of readers' mental representations (Miall, 1989;Graesser et al., 1994), are rather general (e.g., Gygax et al., 2003Gygax et al., , 2004, and may only refer to certain parts (e.g., behavioral descriptions) of a multi-componential emotion construct (Gygax et al., 2007;Gillioz and Gygax, 2017).
Most of the above-mentioned studies involved manipulations of a target sentence containing either a matching or mismatching emotion term (e.g., Gygax et al., 2003Gygax et al., , 2004 or behavioral description (e.g., Gygax et al., 2007) and focused on the analysis of reading times for target sentences to explore effects of consistent versus inconsistent emotional information. They neither examined emotions elicited in the reader nor compared reactions to positive and negative valences. One exception was published by León et al. (2015). The authors constructed short texts of four sentences possessing either a positive, negative, or neutral valence. The last sentence either ended with a related (e.g., the word happy in an emotionally positive context), nonrelated, or non-word as target word for which a lexical decision had to be performed. Analyses of corresponding reaction times revealed faster reactions to both related and non-related words when presented in an emotionally positive context compared to a negative one.
Notably, effects of emotional valence are likely to go beyond the level of inferences. In a recent study conducted by Megalakaki et al. (2019), native French speakers were instructed to read easily understandable texts varying in their emotional valence and intensity. Subsequently, participants were asked to answer comprehension questions on different levels of discourse (i.e., textbase, surface level, inference level). Analyses of their answers revealed that positive valence facilitated the comprehension of textual contents (surface level) whereas negative valence favored the construction of inferences (inference level). In addition, high emotional intensity promoted the understanding of emotionally positive texts but impeded the comprehension in a negative valence context. Hence, textual valence was found to influence the comprehensibility of reading materials. More importantly, the influential role of valence needs to be considered in eye tracking studies since eye movements have been shown to be affected by both text difficulty (Rayner et al., 2006;Lüdtke et al., 2019) and valence (e.g., Scott et al., 2012;Knickerbocker et al., 2015;Ballenghein et al., 2019).
In the present study, emotionally positive and negative vignettes were constructed to examine their emotion induction potential and analyze effects of emotional content on reading behavior. Although emotional vignettes have been applied in the research field of emotion inferences, valence-specific effects have largely remained unexplored. Moreover, most of the abovementioned studies made use of the onlooker perspective (i.e., using the pronoun "he/she"). However, the personal perspective (i.e., using the pronoun "you") was found to cause both greater internalization of emotional narratives (Brunyé et al., 2011) and stronger effects of positive emotion induction (Child et al., 2020). The thus provoked empathic engagement is assumed to facilitate the presence of immersion (Jacobs, 2015b). While reading, we start forgetting about the physical world around us and feel transported into the book's fictitious setting. As stated by the Neurocognitive Poetics Model of literary reading (NCPM; Jacobs, 2011Jacobs, , 2015b, immersion leads to faster reading (i.e., shorter fixation, longer saccades) making it directly relevant to the analysis of eye movements. Hence, when examining online reading behavior in emotional vignettes, the immersion potential should be considered as it might interact with both the valence manipulation and the reading behavior.

Emotion in Picture Viewing
Living in the digital age, we are constantly exposed to pictorial stimuli such as personal photos or social media posts. Related research has highlighted the influential role of both pictures in general and their semantic relevance on attentional processes during visual perception (Pilarczyk and Kuniecki, 2014;Keib et al., 2016). At present, it is widely acknowledged that fixations are biased toward informative regions of our perceptual field (Henderson, 2003). These are areas that either pop out because they are very different with respect to low-level visual features (i.e., high visual saliency) or because they inform about the emotional meaning (i.e., high semantic relevance). When comparing both influential factors, regions of semantic richness tend to attract more fixations than visual salient ones (e.g., Pilarczyk and Kuniecki, 2014). Consequently, eye movements are substantially guided by the distribution of emotional contents (cf. Budimir and Palmović, 2011).
Previous studies examining effects of emotional pictures have stressed processing differences of positive, negative, and neutral pictures (Olofsson et al., 2008, for review). In this context, the privileged processing of emotional stimuli has been explained in terms of their evolutionary and motivational relevance. Although the special role of negatively valenced stimuli (e.g., snakes) has been put forward (e.g., Fox et al., 2000;Yiend and Mathews, 2001;Calvo et al., 2006), the majority of results rather supports the existence of arousal-driven, valence-independent effects. Hence, emotional compared to neutral stimuli were found to initially attract and maintain attentional processes (Calvo and Lang, 2004;Calvo and Avero, 2005;Nummenmaa et al., 2006;Carniglia et al., 2012).
With respect to the present study, results from previous eye tracking experiments using a free viewing paradigm with pictures presented one at a time are of particular interest as this is the paradigm used for emotion induction. Eye movements can serve as an indicator for (overt) attentional processes since both are tightly coupled. Thus, an attentional shift is usually linked to the execution of saccades (Findlay and Gilchrist, 2003). To the best of our knowledge, only a handful of eye tracking studies made use of the above-mentioned paradigm applied in the area of emotion induction (Christianson et al., 1991;Bradley et al., 2008Bradley et al., , 2011Niu et al., 2012;Yang et al., 2012;Lanatà et al., 2013;Henderson et al., 2014). Among them, only two allow for the comparison of eye movements on positively and negatively valenced pictures. In this context, Bradley et al. (2011) presented emotionally charged and neutral pictures for a free viewing period of 6 seconds (s). Eye movements were analyzed in terms of three parameters: number of fixations, average fixation duration, and total scan path (i.e., length of all saccades). Their results showed that emotional compared to neutral pictures possessed longer scan paths and attracted more as well as shorter fixations. No valence-specific effects on eye movements were found. Niu et al. (2012) addressed the research question to which extent gaze behaviors are driven by visually salient versus affective features. Consequently, their analyses were performed on the level of predefined areas of interest. High arousal proved to increase the probability of fixations on affective regions independent of the pictorial valence.
In sum, while indicating differences between affective and neutral pictures, studies with a comparable experimental design suggest an absence of valence-specific effects on eye movements in healthy participants. It should be noted that the majority of the above-mentioned studies worked with pictures from the International Affective Picture System (IAPS; Lang et al., 2008). Despite the widespread use of the cross-validated database, three associated issues should be considered (Dan-Glauser and Scherer, 2011;Marchewka et al., 2014). First, the vast majority of pictures contains people as primary objects limiting its usefulness when studying the influence of content-specific effects. Second, due to its frequent application, processes of familiarity might occur and possibly reduce emotion-inducing effects. Third, some stimuli are outdated and of lower visual quality. To address these issues, a comprehensive, modern alternative is provided by the NAPS (Marchewka et al., 2014).

Emotion Induction: The Role of Stimulus Domains
So far, evidence supporting the privileged sensory and cognitive processing of emotional compared to neutral stimuli has been provided for both pictorial and verbal materials. However, pictures have been claimed to induce stronger emotional reactions than words (e.g., De Houwer and Hermans, 1994; Larsen et al., 2003;Bayer and Schacht, 2014, for review). This view has largely been supported by evidence from EEG (e.g., Azizian et al., 2006) and fMRI (e.g., Kensinger and Schacter, 2006) studies stressing temporal and topographical differences. According to dual coding theories (e.g., Glaser, 1992), pictorial and verbal materials vary with respect to their processing channels and semantic representations. In this context, the reported superior processing of pictures compared to words has been explained by their more direct access to the semantic system (e.g., Seifert, 1997). Hence, linguistic stimuli, as abstract and learned symbols, were assumed to require additional translational processes for the extraction of meaning (cf. Schlochtermeier et al., 2013). With respect to emotional valence, different processing biases have been reported for pictures and words (Bayer and Schacht, 2014). More precisely, pictures were associated with a negativity bias whereas verbal stimuli were claimed to show a preferential processing of positive valence.
Notably, differences between both stimulus domains have mostly been reported when the processing of mere perceptual features was sufficient to perform the task (e.g., Pegna et al., 2004;Hinojosa et al., 2009;Schacht and Sommer, 2009a;Rellecke et al., 2011). In this connection, many studies neglected to collect evaluative judgments and thus missed the analysis of perceived emotion effects. In contrast, when semantic processing was demanded, a comparable effectivity of both stimulus domains has been put forward (Schacht and Sommer, 2009a;Schlochtermeier et al., 2013;Tempel et al., 2013;Bayer and Schacht, 2014). For example, the EEG study by Bayer and Schacht (2014) compared effects of emotional words and pictures (positive, negative, neutral) using a recognition memory task. Both stimulus domains elicited emotion effects at early and late processing stages. Besides, collected ratings of valence revealed that words were perceived as more pleasant within the groups of positive and neutral stimuli. For arousal, no main effect of stimulus domain was found. Hence, words were not rated as less arousing in general. Interestingly, when reducing the complexity of pictures (e.g., by using black-white pictograms), superior emotion effects of words compared to their pictorial counterparts were reported Tempel et al., 2013). In these imaging studies (i.e., EEG, fMRI), effects of emotionally positive and neutral stimuli were examined for both materials (i.e., pictures, words) while accounting for differences in stimulus complexity. At the neural level, words were found to elicit more widespread and larger activities than pictures. Moreover, positively valenced words attracted faster (Tempel et al., 2013) as well as stronger  subjective ratings of emotional valence.
Most of the above-mentioned studies applied words, especially nouns (e.g., Hinojosa et al., 2009Hinojosa et al., , 2010Bayer and Schacht, 2014), to represent the verbal stimulus domain. However, processing differences between words and pictures might be partly mediated by effects of stimulus complexity. When controlling for this confounding factor by using more complex linguistic materials (e.g., phrases), processing differences incline to disappear . In sum, previous studies suggest that both the task demand and the stimulus complexity are of crucial role when comparing the emotion induction potential of verbal stimuli and pictures (e.g., Hinojosa et al., 2010;Bayer and Schacht, 2014). To the best of our knowledge, this is the first article including the direct (i.e., within-subject design) comparison of emotional vignettes and pictures with shared semantic content. Moreover, the present study addresses the suggested importance for individual ratings being recorded within the same group of participants as further variables of interest (e.g., physiological activity). For example, it has been shown that evaluative judgments of arousal differ from provided normative ratings (Olofsson et al., 2008). In line with this perspective, we aimed to operationalize valencespecific effects through the perceived (i.e., subjective ratings) and not experimentally manipulated (i.e., positive, negative) valence. Hence, individual differences in the perception of emotional stimuli were anticipated and accounted for.

Hypotheses
The present study aimed to examine effects of emotional materials (i.e., vignettes, pictures) on (1) ratings of Perceived Valence and Arousal, (2) eye movements in reading, and (3) eye movements in picture viewing. We therefore selected 40 emotionally valenced (i.e., positive, negative) pictures and vignettes, respectively. We assumed that our valence manipulation would influence subjective ratings of both Perceived Valence and Arousal. More specifically, based on the strongly negative, linear relationship between valence and arousal reported for the consulted NAPS database (Marchewka et al., 2014), we expected that emotionally positive stimuli (i.e., vignettes, pictures) would, on average, be rated more positively (i.e., Perceived Valence) and as less arousing (i.e., Perceived Arousal) than emotionally negative ones.
Based on prior findings indicating stronger valence effects of emotionally positive words compared to pictures (e.g., Schlochtermeier et al., 2013;Tempel et al., 2013;Bayer and Schacht, 2014), we also suggested that emotionally positive vignettes would, on average, be rated more positively than emotionally positive pictures. Moreover, we assumed that this domain-specific effect would also apply to the negative valence category. Hence, we expected that emotionally negative vignettes would, on average, be perceived more negatively than emotionally negative pictures. With respect to subjective ratings of Perceived Arousal, there is evidence that words are able to induce arousal levels that are comparable to the ones elicited through pictures (e.g., Schlochtermeier et al., 2013;Bayer and Schacht, 2014). Consequently, it was assumed that stimulus domains (i.e., textual, pictorial) would not differ in their induced arousal levels.
With respect to effects of Perceived Valence on eye movements in reading of ecologically valid stimuli, reading times for both vignettes and their constituting words were of primary interest. Based on previous results showing faster processing of emotionally positive words, sentences, and texts (e.g., Kousta et al., 2009;Briesemeister et al., 2011;Lüdtke and Jacobs, 2015;Ballenghein et al., 2019), we assumed that vignettes perceived as emotionally positive would, on average, attract shorter reading times than vignettes perceived as emotionally negative. In this context, the first eye tracking study examining valence-specific effects at the supralexical level indicated shortest reading times for positive, followed by negative, and lastly neutral narratives .
In line with reported correlations between lexical and textual valence ratings (Bestgen, 1994;Whissell, 2003;Hsu et al., 2015b;Jacobs, 2015b), we expected that Perceived Valence would likewise affect reading times at the lexical level (i.e., words). Since our study refers to effects of an affective semantic superfeature (valence; Jacobs et al., 2016a), content words were of main interest (cf. Bestgen, 1994). Thus, we expected that content words constituting vignettes perceived as emotionally positive would, on average, attract shorter reading times than content words constituting vignettes perceived as emotionally negative. Since it remained inconclusive at which processing stage valence-specific effects on reading times for lexical units (i.e., words) would become evident, eye tracking measures reflecting both early (e.g., first fixation duration) and later (e.g., word total reading time) processes were considered (cf. Lüdtke et al., 2019).
Lastly, we aimed to examine the influence of Perceived Valence on eye movements during picture viewing. In line with previous eye tracking studies suggesting an absence of valence-specific effects (e.g., Bradley et al., 2011;Niu et al., 2012), we assumed that pictures perceived as emotionally positive would attract scan (e.g., Mean Saccade Amplitude) and fixation (e.g., Total Number of Fixations) patterns that are similar to the ones provided by pictures perceived as emotionally negative.

MATERIALS AND METHODS
In order to examine the above-stated hypotheses, eye movements of 42 participants were recorded while reading and viewing 40 emotional vignettes and pictures, respectively. Textual stimuli were constructed based on pre-selected emotional pictures and validated in two pilot studies. For both material groups, stimuli were presented one at a time and followed by an evaluative judgment task which required participants to assess the emotional valence and arousal of each stimulus. Accordingly, linear mixed-effects models were performed to analyze effects of valence (i.e., Valence Category, Valence Rating) and Stimulus Domain on (1) ratings of Perceived Valence and Arousal, (2) eye movements in reading, and (3) eye movements in picture viewing. Since the present study was designed to address effects of emotion induction, individual ratings of valence (and arousal) were used to define the emotional quality of our stimuli (cf. Rubo and Gamer, 2018).

Participants
Forty-two native German speakers (33 female, 1 non-binary; M age = 23.81 years, SD age = 5.41, age range: 18-44 years) gave their informed, written consent for participation and further use of their anonymized data. They were recruited through collegiate tutorials in the Bachelor's degree of Psychology at Freie Universität Berlin as well as from announcements on social media. Participants either received course credit (88.1%) or took part voluntarily. All of them had normal or correctedto-normal vision. Thirty-three participants (78.6%) named a general qualification for university entrance as highest level of education. The study was approved by the ethics committee of the Department of Education and Psychology at Freie Universität Berlin.

Recording of Eye Movements
Eye movements were recorded with an SR Research EyeLink 1000 tower-mounted eye tracker providing a sampling rate of 1000 Hz (SR Research Ltd., Mississauga, ON, Canada). Due to the chinand-head rest, head movements could be reduced to a minimum.
Recording of eye movements occurred exclusively during stimulus presentation in which only the right eye was tracked. The experiment was built using the SR Research Experiment Builder software (version 1.10.1630) 1 . Stimuli were presented on a 19-inch LCD monitor with a resolution of 1024 × 768 pixels and a refresh rate of 120 Hz. The distance between the participant's eyes and the monitor measured approximately 65 centimeters. At the beginning of the experiment, a standard 9point calibration was used to ensure a spatial resolution error of less than 0.5 • of visual angle. To avoid a permanent repetition of this time-consuming and distracting procedure, each reading and viewing trial started with two sequentially presented fixation crosses (Times New Roman, 20 point-size). They were either positioned above the first reading line at the right and left corner of the display or arranged at the upper right and left corner of the subsequently presented picture. For each of them, a rectangular area of interest (AOI; 70 × 62 pixels) was defined. When a total fixation time of 500 milliseconds (ms) was registered in each AOI, stimulus presentation started automatically. Fixations and saccades were identified using the EyeLink 1000 parser (velocity threshold = 30 • /sec, acceleration threshold = 8000 • /sec 2 ).

Stimuli
Emotional stimuli were selected and constructed following a stepwise procedure. As previously stated, pictures and vignettes were intended to provide comparable semantic information. Hence, the construction process started with the collection of 48 emotion-inducing pictures (24 emotionally positive, 24 emotionally negative) from the NAPS (Marchewka et al., 2014). The standardized, high-quality database includes normative ratings for over 1,000 realistic pictures. A major advantage for eye tracking studies concerns the availability of information on physical properties. Since eye movements are known to be affected by low-level visual features such as complexity, luminance, or contrast (e.g., Pilarczyk and Kuniecki, 2014), we aimed to control for these confounding factors. In sum, the following inclusion criteria were applied: First, pictures had to possess normative valence ratings either below four or above six 2 to minimize the potential overlap between both valence categories. Second, valence categories were not allowed to vary with respect to the following physical parameters: luminance, contrast, JPEG size, color composition (i.e., LABL, LABA, LABB), entropy, and format (landscape; 1600 × 1200 pixels). Third, valence categories had to consist of pictures similarly distributed among the provided content categories [i.e., animals, faces, people, objects, landscapes; for the final stimulus set: Based on the selected pictures, 48 vignettes verbally reproducing the pictorial information were constructed by the first author and Ilai Jess. To avoid a systematic influence of the narrative perspective, readers were continuously addressed in the second person singular (e.g., Miall and Kuiken, 2001;Brunyé et al., 2009). The text length was kept between 85 and 96 words. To ensure a high comprehensibility and emotion induction potential, an online pilot study was conducted via SoSci Survey (Leiner, 2019) 3 . Fifty-three people were recruited from announcements on social media and either received course credit or participated voluntarily (33 female, 15 non-binaries; M age = 33.81 years, SD age = 14.56, age range: 17-71 years). Questions referring to Valence, Arousal, Comprehensibility, Immersion Potential, and Emotion Induction Potential were rated after self-paced reading of the randomly ordered 48 vignettes (24 emotionally positive, 24 emotionally negative). Valence and Arousal were rated on a 9-point rating scale, for the three other measures 5-point rating scales were applied.
In a next step, potentially problematic vignettes were identified based on the average valence ratings. In consideration of Comprehensibility and the physical parameters of the corresponding pictures, eight vignettes were excluded, nine additional ones revised and rated for a second time (N = 13, M age = 35.69 years, SD age = 11.39, age range: 22-60 years). Table 1 includes the descriptive statistics of the final stimulus set (20 emotionally positive, 20 emotionally negative vignettes), Table 2 entails an example for each valence category. Information on the corresponding pictures can be found in Table 3. The results indicated that the vignettes are easy to understand and capable of inducing negative and positive emotional responses. Furthermore, positive and negative valence groups showed differences on the rated dimensions. Emotionally negative vignettes were, on average, rated higher with respect to Arousal [t(38) = −11.95, p < 0.001, R 2 = 0.79] and Emotion Induction Potential [t(38) = −4.04, p < 0.001, R 2 = 0.30] whereas positive ones seemed to be easier to understand [t(38) = 2.52, p = 0.02, R 2 = 0.14] and better suited to put the reader in the perspective of the text [t(38) = 2.73, p < 0.01, R 2 = 0.16]. Most importantly, valence ratings supported the success of our valence manipulation: emotionally positive vignettes were, on average, perceived more positively than emotionally negative ones [t(38) = 33.97, p < 0.001, R 2 = 0.97]. As expected, 3 http://www.soscisurvey.de/

Design and Procedure
A repeated measures design was implemented (i.e., each subject viewed and read the entire stimulus set). Stimulus domains were presented in separate blocks. The order of blocks was counterbalanced across participants. Within each block, stimuli were presented in a pseudo-randomized sequence so that no more than two stimuli of the same valence category were presented successively.
The study was conducted in a sound-attenuated room separated from the daylight. After arriving, participants were informed about the procedure as well as the option to quit the experiment at any time without facing any consequences. The experiment started with a standard 9-point calibration and was followed by a sequential presentation of three emotionally neutral stimuli in order to match the initial situation between participants. Afterward, the participant's mood was measured on a 7-point, non-verbal rating scale offering the possibility to account for mood differences 5 . Ratings indicated a slightly positive mood before the presentation of both emotional vignettes (M = 5.26, SD = 0.59) and pictures (M = 5.19, SD = 0.71).
Subsequent to the rating, emotional stimuli of the first block (i.e., vignettes or pictures) were presented. Each trial started with two sequentially appearing fixation crosses which had to be looked at for 500 ms, respectively. For the vignettes, reading speed was self-controlled allowing participants to go back and forth within a single page as often as they wanted. Each vignette was presented on a single page (eight to eleven lines) and could be left by a single mouse click. Vignettes were written in a sans-serif font (Tahoma) with 17-point letter size and presented left-aligned in the center of the monitor. In order to maximize the accuracy of the recordings, double-spacing was used. Participants were instructed to read each story for comprehension. Pictures (800 × 600 pixels) were presented for a fixed viewing period of 3 s in the center of the display. Participants were instructed to freely look at each picture for the whole presentation time. After each stimulus presentation, participants were instructed to perform an evaluative judgment task. More specifically, subjects were asked to assess each stimulus in terms of its emotional valence (i.e., How positive or negative do you rate the text/picture?) and arousal (i.e., How calming or exciting do you rate the text/picture?). Answers were given on 9-point Self-Assessment-Manikin scales (SAM; Lang, 1980;Suk, 2006). Rating scales were displayed sequentially in the center of the monitor. No time restrictions were provided.
When finished with the first block, an online survey referring to demographic variables, reading habits, imagination, and empathy was answered at a separate table. At their own free discretion, participants returned to the eye tracker and completed the same procedure on the remaining stimulus domain starting again with the 9-point calibration. In sum, an experimental session lasted approximately 60 minutes. Figure 1 provides an illustration of the experimental procedure.

Data Analysis
In line with the tripartite structure of our hypotheses, the following sections will be arranged into three subparts: (1) analysis of evaluative judgments, (2) analysis of eye movements in reading, and (3) analysis of eye movements in picture viewing. 5 Previous studies reported effects of participants' mood on eye movements in both picture viewing (Wadlinger and Isaacowitz, 2006) and reading (Scrimin and Mason, 2015).
All analyses refer to data for the emotionally positive and negative stimuli. Due to their experimental function and the limited number, data for the emotionally neutral vignettes and pictures will only be reported in terms of descriptive statistics. However, the same preprocessing steps were applied.

Data Preprocessing
For the 40 emotional vignettes and pictures, we collected 3,360 ratings for both Perceived Valence and Arousal (42 participants à 80 ratings). As two trials had to be excluded due to errors during the export of data, 3,358 individual ratings and reaction times for each subject and item (i.e., pictures, vignettes) could be used for statistical analysis.
Eye tracking data were preprocessed using the EyeLink Data Viewer (version 1.11.900) 6 . Fixations less than 80 ms were either merged with nearby fixations (distance of less than one degree) or removed from further analysis. Based on automatically defined AOIs, text data were exported on the level of single words (150,899 data points). Further aggregation of data and preprocessing were run in JMP Pro 14 for Mac OS X 7 . The selection of eye tracking parameters resulted from our hypotheses on reading times for both supralexical (i.e., vignettes) and lexical units (i.e., words). For the analysis at the supralexical level, text total reading time as the sum of all fixations, saccadic movements, and blinks was computed. For the analysis at the lexical level, we aimed to study a measure associated with early and a measure associated with both early and late processes of word recognition and comprehension (cf. Lüdtke et al., 2019). To analyze immediate effects of Perceived Valence, the commonly reported duration of the first fixation on each word was extracted (Hyönä et al., 2003;Kuperman and Van Dyke, 2011). As late measure, word total reading time (afterward called total reading time, TRT) defined as the total sum of all fixation durations on a word was used (Boston et al., 2008). Since valence groups of emotionally positive and negative vignettes varied slightly in their text lengths [t(38) = 2.07, p = 0.046, R 2 = 0.10], we accounted for the difference by computing Reading Speed [in words per minute (wpm)], mean First Fixation Duration (mean FFD in ms), and mean Total Reading Time (mean TRT in ms) for each subject and vignette.
To calculate mean FFD and mean TRT, we first excluded all function words (articles, pronouns, conjunctions, auxiliary verbs, prepositions, particles, cardinal numbers, pronominal adverbs) as they lack or are poor in lexical or affective lexical meaning (cf. Segalowitz and Lane, 2000;Fiedler, 2011). Part-of-speech (POS) tagging was automatized using the freeware tagger TagAnt (Anthony, 2015). Like any POS tagger TagAnt produces error rates of approximately three percent (Manning, 2011), and thus obviously wrong classifications were corrected by hand. On the remaining 75,348 data points (see Table 4), extreme values were defined and excluded following a two-stage procedure. First, FFDs larger than 2,000 ms (six data points) and TRTs larger than 3,000 ms (five data points) were excluded. Second, outliers were defined based on the distributions of FFDs and TRTs within each FIGURE 1 | Graphic depiction of the experimental procedure. Participants performed an evaluative judgment task on textual and pictorial stimuli, respectively. The order of stimulus domains (i.e., pictorial, textual) was balanced across participants. To ensure comparable initial situations, each experimental session started with the presentation of three emotionally neutral stimuli. At the beginning of each trial, two serially presented fixation crosses had to be looked at for 500 ms. After assessing participants' mood, 40 emotionally valenced stimuli were sequentially displayed while recording eye movements. Pictures were viewed for a fixed period of 3 s. Reading of vignettes was self-controlled. Ratings of perceived valence and arousal were measured using a 9-point Self-Assessment-Manikin scale (SAM; Lang, 1980;Suk, 2006). After completion of the first presentation block, an online survey was conducted.
subject and vignette. Words with standardized residuals larger or smaller than three were excluded [FFD: 655 data points (0.87%); TRT: 1,174 data points (1.56%)]. Based on the remaining data points, mean FFD and mean TRT were computed for each subject and vignette treating skipped words (mean skipping: 19.65%) as missing values. Taken together, the resulting data table contained 1,678 data points including information about mean FFD (in ms), mean TRT (in ms), and Reading Speed (in wpm) for each subject and item.
With respect to the pictures, eye tracking data were exported on the level of trials for each participant (42 participants à 40 trials; 1,680 data points). Since we aimed to investigate whether

Statistical Analysis
All statistical analyses were run in R 3.5.1 for Mac OS X 8 . Since participants and items (i.e., vignettes, pictures) represented samples of larger populations, hypotheses were tested using linear mixed-effects models (LMM; Baayen et al., 2008). Following a confirmatory approach on real data, intercepts-only models with by-item and by-subject random intercepts were employed (cf. Bates et al., 2015a). In R, models were computed using the lmer-function from the lme4 package (Bates et al., 2015b) with restricted maximum likelihood estimation.
To obtain the optimal fixed-effects structure (i.e., trade-off between fit to the data and complexity), models were selected according to a backward-elimination procedure (cf. Barr et al., 2013). Starting with random-intercepts models accounting for all possible fixed-effects terms, predictor variables were successively excluded based on the strongest evidence (i.e., highest p-value). If the variable with the strongest evidence was involved in an interaction with less evidence, the predictor with the second highest evidence was excluded. In an iterative procedure, nested models differing in one degree of freedom (i.e., one fixed effect) were systematically compared using the anova-function from the stats package (R Core Team, 2019). To justify a reduction of fixed-effects terms, likelihood ratio tests were performed. Decisions were based on the statistical significance (p < 0.05) of the asymptotically chi-squared distributed likelihood ratio test statistic with one degree of freedom. If the likelihood of the simpler model was not significantly worse than the likelihood of the more complex model (p > 0.05), the former was favored.
For the analysis of evaluative judgments (i.e., Perceived Valence and Arousal), the following predictor variables were initially included: Valence Category (i.e., positive, negative), Stimulus Domain (i.e., pictorial, textual), and Mood Rating. To test for valence-specific effects, the interaction between Valence Category and Stimulus Domain was included. For the analysis of reading behavior at the supralexical (i.e., Reading Speed) and lexical level (i.e., mean FFD, and mean TRT), initial models consisted of the following predictor variables: Perceived Valence (i.e., Valence Rating), Perceived Arousal (i.e., Arousal Rating), Mood Rating, and three characteristics of the vignettes collected in the pilot studies (Comprehensibility, Immersion Potential, Emotion Induction Potential). Considering theoretically and empirically provided evidence (e.g., Võ et al., 2009;Marchewka et al., 2014), we included the interaction between Perceived Valence and Arousal. For the analysis of picture viewing (i.e., Mean Saccade Amplitude, Total Number of Fixations, and Mean Fixation Duration), the following predictor variables were initially included: Perceived Valence (i.e., Valence Rating), Perceived Arousal (i.e., Arousal Rating), and Mood Rating. Again, the interaction between Perceived Valence and Arousal was included. Detailed information on the mathematical formulation and lmer specification of all eight initial models are reported in the Supplementary Tables S4-S11.
For the categorical variables (i.e., Valence Category, Stimulus Domain), effect coding was chosen. The metrical covariates were centered prior to analysis in order to avoid collinearity, increase probability of model convergence, and facilitate interpretations (Baayen, 2008). Fixed effects were checked with Type III sum of squares statistics using the Anovafunction from the car package (Fox and Weisberg, 2019). To ensure a best possible approximation of the residuals' distribution to the normal distribution, dependent variables were transformed as indicated by the Box-Cox transformation test from the MASS package (Box and Cox, 1964;Venables and Ripley, 2002). For all eye tracking variables, exclusion of extreme values followed a stepwise procedure. First, an absolute criterion in form of an upper threshold was applied based on the visual inspection of the distribution of each dependent variable. Second, extreme values were defined based on intercepts-only models including only crossed random effects for subjects and items. Since no missing values existed, the relative criterion was set to two standard deviations from the mean. For the evaluative judgments, extreme values were defined based on the recorded reaction times (RTs in ms). The lower and upper thresholds were set to 500 and 20,000 ms, respectively. All statistical analyses are based on a 95% level of significance (α = 0.05). For the sake of conciseness, only fixed effects of the best-fitting model will be reported as they are directly relevant to our hypotheses. Results of the entire stepwise deletion procedure are provided in the Supplementary Tables S4-S11.

Descriptive Statistics
To illustrate effects of the experimental manipulation, descriptive statistics are provided for each valence category (i.e., positive, negative, neutral) and stimulus domain (vignettes: Table 5, pictures: Table 6). As expected, ratings of Perceived Valence and Arousal differed between valence categories. For both stimulus domains, lowest mean valence ratings were observed for the negative, followed by the neutral, and lastly the positive valence category. With respect to mean arousal ratings, the following rank order became evident for vignettes and pictures: positive < neutral < negative. As indicated by the minimum and maximum values, each valence category attracted a wide range of individual ratings on both scales. For example, negatively valenced pictures attracted subjective valence ratings ranging from one to eight, with the maximum value indicating a perceived positive valence (see Table 6). With respect to the supralexical eye tracking parameter Reading Speed, we observed fastest reading for positive, followed by negative, and lastly emotionally neutral vignettes. The same rank order was observed at the lexical level (i.e., mean FFD, mean TRT). Regarding the pictorial stimuli, average values of Mean Saccade Amplitude were shortest for the positive, followed by the neutral, and lastly negative valence category. For Mean Fixation Duration, average values suggested the following rank order: negative < positive < neutral. All valence groups attracted, on average, ten to eleven fixations. Based on the optimal lambda suggested by the Box-Cox transformation test (λ = 0.46), Valence Rating was Descriptive statistics were computed based on aggregated values for each participant and vignette. 1 Ratings were assessed on 9-point rating scales. 2 Analyses for mean First Fixation Duration (mean FFD in ms) and mean Total Reading Time (mean TRT in ms) included only content words.  With respect to ratings of Perceived Arousal, values were sqrttransformed as indicated by the Box-Cox transformation test (λ = 0.71). Based on the reaction times, 13 extreme values (0.39%) were identified and subsequently excluded (3,345 remaining data points). The backward-reduction of fixed effects resulted in a random-intercepts model (AIC = 3462.4, BIC = 3492.9, loglikelihood = −1726.2) with Valence Category as sole predictor [χ 2 (1, N = 3, 345) = 461.74, p < 0.001] indicating that emotionally negative stimuli (M = 6.44, SD = 1.87) were, on average, rated as more arousing than emotionally positive ones (M = 3.07, SD = 2.00).

Eye Movements in Reading
Due to the rightward skewed distribution, Reading Speed was log-transformed (Box-Cox transformation test: λ = −0.14). The absolute criterion for the identification of extreme values was set to 1,000 wpm (exclusion of three data points). Fifty-nine further data points were excluded based on the relative criterion (in total: 3.69%; 1,616 remaining data points). Following the backwardelimination procedure, the best-fitting model (AIC = −1488.6, BIC = −1445.5, log-likelihood = 752.31) suggested significant main effects of Valence Rating [χ 2 (1, N = 1, 616) = 4.36, p = 0.04], Arousal Rating [χ 2 (1, N = 1, 616) = 3.85, p = 0.05], and Immersion Potential Rating [χ 2 (1, N = 1, 616) = 4.44, p = 0.04]. The latter effect indicated a positive linear relationship between ratings of Immersion Potential and Reading Speed with faster reading in case of higher ratings on Immersion Potential (see Figure 3).   . Valence and Arousal Rating were evaluated on 9-point rating scales by participants of the eye tracking study. Arousal Rating was split into two factor levels (i.e., low versus high) using the quantcut-function from the gtools package (Warnes et al., 2018). Colored areas indicate the 95% confidence interval of each fitted line.
Since the interaction between Valence and Arousal Rating proved to be statistically significant [χ 2 (1, N = 1, 616) = 7.84, p = 0.01], we further analyzed simple main effects by splitting the data based on Arousal Rating into two quantiles using the quantcut-function from the gtools package (Warnes et al., 2018). In this manner, the main effect of Valence Rating could be explored within two artificially constructed factor levels of Arousal Rating, one subset representing the low-(interval: [1][2][3][4][5]; N = 884) and the other the high-arousal group (interval: (5-9], N = 732). The main effect of Valence Rating reached statistical significance within the low-[χ 2 (1, N = 884) = 8.57, p = 0.003] but not high-arousal [χ 2 (1, N = 732) = 0.02, p = 0.89] subset (see Figure 4). In the low-arousal subset, positively valenced vignettes were, on average, read faster than negatively valenced ones. The main effect of Immersion Potential Rating remained significant within the high-[χ 2 (1, N = 732) = 4.44, p = 0.04] but not low-arousal group [χ 2 (1, N = 884) = 3.12, p = 0.08]. It should be noted that the results of the linear mixed-effects models within the two subsets have to be treated with caution. Since Valence and Arousal Rating were, in general, highly correlated, the artificially created arousal subsets possessed items disproportionally distributed over the valence categories, e.g., the high-arousal subset clearly contained more negatively than positively valenced vignettes.
For mean FFD, values were again transformed as indicated by the Box-Cox transformation test (λ = 1.43; 1/mean FFD). As absolute criterion, an upper threshold of 300 ms was applied (exclusion of one data point). Fifty-three further data points were excluded based on the relative criterion (in total: 3.22%; 1,624 remaining data points). Following stepwise model comparisons, the random-intercepts model with solely random effects was identified as best-fitting model (AIC = −21595, BIC = −21573, log-likelihood = 10802). Consequently, none of the considered predictors proved to be of explanatory value for mean FFD.
With respect to mean TRT, a sqrt-transformation was applied due to the rightward skewed distribution (Box-Cox transformation test: λ = −0.46). Values over 500 ms were identified as extreme values and subsequently excluded (five data points). Based on the relative criterion, 62 additionally data points were removed (in total: 3.99%; 1,611 remaining data points). The backward-elimination procedure identified the intercepts-only model with Valence Rating as only predictor as best-fitting model (AIC = −13290, BIC = −13263, loglikelihood = 6650.0). The statistically significant main effect of Valence Rating [χ 2 (1, N = 1, 611) = 6.05, p = 0.01] indicated that mean TRTs tended to decrease with increasing Valence Rating (see Figure 5).

Eye Movements in Picture Viewing
As indicated by the Box-Cox transformation test (λ = 0.38), values for Mean Saccade Amplitude were sqrt-transformed. The absolute criterion for the exclusion of extreme values was set to 10 • of visual angle (exclusion of four data points). Based on the relative criterion, 46 further data points were excluded (in total: 2.98%; 1,630 remaining data points). Values of Total Number of Fixations were squared as suggested by the Box-Cox transformation test (λ = 1.76). The upper threshold was set to a total number of 15 fixations (exclusion of five data points). Within the second step, 21 data points were additionally excluded (in total: 1.55%; 1,654 remaining data points). Lastly, Mean Fixation Duration was transformed due to the rightward skewed distribution (Box-Cox transformation test: λ = −1.23; 1/mean fixation duration). An upper limit of 800 ms was applied as absolute criterion for the identification of extreme values (exclusion of nine data points). Based on the relative criterion, 63 further data points were excluded (in total: 4.29%; 1,608 remaining data points).
For all three dependent variables, the intercepts-only models with solely random effects were identified as best-fitting models

DISCUSSION
The aim of the present study was to examine effects of emotional content on subjective ratings of Perceived Valence and Arousal, eye movements in reading, and eye movements in picture viewing. With this aim, we asked a group of 42 participants to assess the emotional valence and arousal of 40 emotionally valenced (i.e., positive, negative) vignettes and pictures, respectively. To the best of our knowledge, this is the first study including a cross-domain comparison between more complex verbal materials (i.e., vignettes) and pictures providing matching semantic information.
As indicated by the reported descriptive statistics, the experimental manipulation of textual and pictorial valences proved to be successful. Lowest ratings of Perceived Valence were observed for negative, followed by neutral, and lastly positive stimuli. Furthermore, the wide range of individual ratings collected for each valence category stressed the necessity to go beyond the simplified categorical operationalization of emotional valence. Emotionally positive stimuli were, on average, rated as less arousing than emotionally negative stimuli. In line with previous studies on words (e.g., Võ et al., 2009;Soares et al., 2012;Söderholm et al., 2013;Montefinese et al., 2014), sentences (e.g., Pinheiro et al., 2017), and pictures (e.g., Verschuere et al., 2001;Lang et al., 2008;Dufey et al., 2011;Soares et al., 2015), the linear relationship between Perceived Valence and Arousal was found to be more pronounced for negatively compared to positively valenced stimuli. This observed asymmetry might possibly be due to the absence of erotic and thus high-arousal positive pictures in the NAPS database (cf. Marchewka et al., 2014). However, the strong correlation suggests that the two affective dimensions can rarely be studied apart from each other when focusing on effects of emotion induction in ecologically valid materials (cf. Citron, 2012).
With respect to the cross-domain comparison, vignettes attracted, on average, more extreme valence ratings than pictures supporting the assumed superiority of textual compared to pictorial materials. As expected, no effect of Stimulus Domain on Perceived Arousal was observed indicating that textual stimuli are able to induce arousal levels that are comparable to the ones elicited through pictures. Hence, the present study provides further evidence that verbal stimuli are at least as suitable for the induction of emotions as pictures. More specifically, the previously reported superior valence effects of emotionally positive words and phrases (e.g., Schlochtermeier et al., 2013;Tempel et al., 2013;Bayer and Schacht, 2014) applied not only to more complex linguistic material but also to negatively valenced ones. In contrast to Tempel et al. (2013), reaction times for valence ratings showed no significant differences between stimulus domains. Thus, while judgments of emotional valence required comparable processing times for both stimulus domains (cf. Schlochtermeier et al., 2013), vignettes attracted more extreme valence ratings than pictures.
Regarding emotion effects in reading, faster reading times for both vignettes perceived as emotionally positive (i.e., supralexical level) and their constituting content words (i.e., lexical level) were hypothesized. At the supralexical level, additional analyses exploring the significant interaction between Valence and Arousal Rating on Reading Speed revealed that the main effect of Perceived Valence applied to low-arousal vignettes, exclusively. Hence, vignettes rated as slightly arousing, emotionally positive were, on average, read faster than vignettes perceived as slightly arousing, emotionally negative. Reading speeds for high-arousal vignettes were found to be comparable to the one for lowarousal, emotionally positive vignettes independent of their perceived valences. Hence, vignettes perceived as emotionally negative required high arousal levels to show a similar processing advantage as vignettes perceived as emotionally positive. The observed effect conforms to previous findings from lexical decision tasks showing that the processing of emotionally negative, but not positive, words depends on their arousal levels. More specifically, reactions to negatively valenced words were reported to be slower for low-versus high-arousal stimuli (e.g., Nakic et al., 2006;Hofmann et al., 2009;Recio et al., 2014). In other words, in line with studies examining reaction latencies, valence effects on Reading Speed were absent for vignettes perceived as highly arousing.
To our knowledge, there is only one further study primarily focusing on effects of textual valence on eye movements during reading of narratives. In accordance with our results, Ballenghein et al. (2019) were able to find shortest reading times for emotionally positive texts. However, statistically significant differences were restricted to the comparison of mean fixation durations between emotionally positive and neutral narratives. Hence, the present study extends their findings by revealing statistically significant differences between positively and negatively valenced vignettes when using individual ratings instead of valence categories. Interestingly, Ballenghein et al. (2019) likewise observed that reading in emotionally negative texts is influenced by arousal. More specifically, the authors reported significantly shorter mean fixation durations for higharousal, emotionally negative texts compared to their mediumarousal counterparts.
In accordance with the NCPM (Jacobs, 2011(Jacobs, , 2015b, vignettes associated with higher ratings of Immersion Potential attracted, on average, faster reading. The multi-dimensional phenomenon of immersive reading is related to various factors including characteristics of the text (e.g., easy-to-recognize words; Jacobs, 2015b), the context (e.g., action-oriented descriptions; Kuijpers et al., 2014), and the reader (e.g., identification with the protagonist; Jacobs, 2015b). Since the vignettes were constructed to be easily understandable, to emotionally engage the reader, to enhance the identification with the protagonist, and likely activated familiar situation models as their contents were based on pictures of daily situations, the overall high ratings on Immersion Potential are not surprising. Hence, it has to be considered that the reported effect of Immersion Potential is based on a comparably low value range (i.e., range: 3.89-4.6 on a 5-point rating scale). Immersion Potential was neither explicitly manipulated nor systematically measured with commonly used scales such as the Story World Absorption Scale (Kuijpers et al., 2014). Thus, while the reported significant main effect of Immersion Potential is in line with the assumptions of the NCPM, the effect needs to be replicated by materials especially constructed or selected to study effects of immersion.
The effect of Perceived Valence reported at the supralexical level could also be observed at the lexical one. As expected, content words of vignettes perceived as emotionally positive attracted, on average, faster reading as indicated by shorter mean TRTs. However, effects of (perceived) emotional valence were missing for mean FFD suggesting an absence of valencespecific effects at early processing stages. Comparable findings are provided by EEG studies examining the time course of emotional word processing (Citron, 2012, for review). In this regard, effects of emotional content have been shown to appear at early and later processing stages (e.g., Herbert et al., 2006Herbert et al., , 2008Kissler et al., 2007;Hofmann et al., 2009;Schacht and Sommer, 2009b;Bayer et al., 2010). Whereas the early effect has been assumed to be predominantly governed by arousal, valencedriven modulations have been put forward in explanations of the later impact indicating a deeper encoding of positive stimuli (e.g., Delplanque et al., 2004;Herbert et al., 2006Herbert et al., , 2008Kiefer et al., 2006;Conroy and Polich, 2007;Kissler et al., 2009). Whether the here reported shorter total reading times for as positive perceived vignettes and their constituting content words are in line with the assumed deeper encoding for positive compared to negative words is still an open question for future empirical research. So far, our results conform to the positivity bias during meaning construction (cf. Jacobs et al., 2015;Lüdtke and Jacobs, 2015) assuming an easier semantic integration and construction of situation models for verbal materials including emotionally positive compared to negative words. Further studies have to be conducted to explore under which circumstances such processing advantage may cause deeper encoding.
With respect to eye movements in picture viewing, descriptive statistics revealed that emotionally negative pictures tended to attract slightly shorter fixations and longer saccades compared to emotionally positive ones. However, neither Perceived Valence nor Arousal were of explanatory value for the three examined eye tracking parameters. Hence, emotion effects were absent for both fixation (i.e., Mean Fixation Duration, Total Number of Fixations) and scan (i.e., Mean Saccade Amplitude) patterns as hypothesized and previously reported by studies with a comparable experimental design (Bradley et al., 2011;Niu et al., 2012).

Limitations and Future Directions
Building on the lastly mentioned pictorial materials, results of the present study have to be interpreted within the framework of the particular viewing paradigm and performed analysis. Hence, it remains an open question whether valence-specific effects would be present when analyzing certain areas of interest (e.g., focal object versus background; Yang et al., 2012) or focusing on temporal dynamics in picture viewing. With respect to the latter, differences between positive and negative valences have been reported when emotional and neutral pictures were presented at the same time competing for attentional resources (e.g., Calvo and Avero, 2005;Simola et al., 2013).
Complementary to studies focusing on single word processing (e.g., Kousta et al., 2009;Briesemeister et al., 2011;Lüdtke and Jacobs, 2015), (perceived) positive valence showed comparable facilitative effects when manipulated at the text level. This hypothesized effect conforms to the aforementioned high correlation between the text and lexical levels (Bestgen, 1994;Whissell, 2003;Hsu et al., 2015b;Jacobs, 2015b). In general, the present distribution of Reading Speed in wpm indicated that our participants read rather fast (M = 367.48, SD = 123.97), with an average of 330 wpm being considered as a threshold for fast reading (cf. Rayner et al., 2010). This observation emphasizes the low cognitive demands and high comprehensibility of our linguistic materials (cf. Cupchik et al., 1998;Søvik et al., 2000;Rayner et al., 2006;Liversedge et al., 2011;Lüdtke et al., 2019).
In this context, the influential role of our evaluative judgment task ought to be considered. Hence, both the comparable low task demands (e.g., no comprehension questions) and the strong focus on clearly emotionally valenced materials (i.e., no neutral stimuli) might have influenced participants' reactions and compliance with the task (Westermann et al., 1996;Estes and Verges, 2008). Although it might have been possible that participants performed the task without having read the vignettes entirely, visual inspections of fixation patterns indicated that they did not stop reading after the first sentences. Apart from the fact that evaluative judgment tasks have largely been applied in the context of emotion induction (e.g., Schupp et al., 1997;Bradley et al., 2001;Calvo and Lang, 2004;Nummenmaa et al., 2006;Brunyé et al., 2011;Schlochtermeier et al., 2013;Mouw et al., 2019;Child et al., 2020), effects of emotion have likewise been reported for tasks not explicitly focusing on the affective content (e.g., Schacht and Sommer, 2009a;Rellecke et al., 2011;Lüdtke and Jacobs, 2015). Consequently, it has been shown that encoding of emotional valence takes place even when affective processing is not necessary to perform the task. Nevertheless, up to which level the here reported effects of perceived emotional valence could be observed in other reading situations remains an open empirical question.
Since the focus of the present study was on the induction of emotions in a cross-domain comparison, neutral stimuli were largely neglected. Consequently, future research has to reveal whether the reported effects of Perceived Valence remain stable when including neutral materials. However, it should be noted that emotional valence is highly prevalent in objects of everyday life making it difficult to select appropriate neutral stimuli (cf. Lebrecht et al., 2012). With respect to the comparison between vignettes and pictures, the slightly different presentation mode has to be considered. Whereas vignettes required self-paced reading, pictures were presented within a commonly used 3-s time interval (e.g., Calvo and Lang, 2004;Calvo and Avero, 2005;Nummenmaa et al., 2006;Yang et al., 2012;Bayer and Schacht, 2014;Marchewka et al., 2014). However, we are convinced that the reported domain-specific effects are stronger related to differences associated with the processing of pictures compared to vignettes than to different presentation modes.
As a step toward the use of more ecologically valid stimuli in psychological reading research, reading behavior was examined in self-constructed vignettes. Although they do not represent natural reading materials such as excerpts of well-known books (e.g., Hsu et al., 2014Hsu et al., , 2015a, their major advantage concerns the opportunity to easily and systematically control or rather manipulate various variables of interest (e.g., choice of words, number of protagonists). The thus acquired results can consequently be used to inform about potentially influential covariates that ought to be considered in future studies.
As mentioned in the introduction, the simplest model about the valence of supralexical units like vignettes would assume that the global valence is a (linear or non-linear) function of the valence values of its constituents. Following this idea, mean TRT and mean FFD were computed by averaging the fixation durations of all content words constituting a vignette. Consequently, lexical word features such as length, frequency, word position and repetition were neglected (Raney et al., 2000;Kuperman et al., 2010, for reviews). Moreover, since the present study aimed to investigate natural reading processes, our study material could only be controlled for a limited set of variables. However, current computational quantitative narrative and sentiment analysis tools such as "QNArt" and "SentiArt" (Jacobs, 2018a(Jacobs, ,b, 2019 provide the possibility of quantitatively describing words on a wide range of lexical affective-semantic features. To investigate their possibly interactive impact on eye movements in reading, other approaches than the applied linear mixed-effects models are of inevitable necessity. In this context, machine learning assisted methods of predictive modeling offer a promising and valuable alternative or complementary perspective (e.g., Yarkoni and Westfall, 2017;Vijayakumar and Cheung, 2018). Since the approach allows working with many intercorrelated variables and non-linear data patterns (Goodman et al., 2016;Cheung and Jak, 2018), it is particularly well suited for analyzing effects of literary texts on reading behavior (Jacobs, 2018a,b;Xue et al., 2019). The combined application of QNA data and machine learning algorithms such as neural networks or decision trees has yielded promising results in previous research on the beauty of words (Jacobs, 2017), the literariness of metaphors Kinder, 2017, 2018), or the comprehensibility and emotion potential of poetic texts Jacobs, 2018a,b;Xue et al., 2019). Future research will have to provide comparable analyses for reading in prose (e.g., vignettes).

CONCLUSION
Considering that our results indicated that emotional vignettes are able to induce stronger valence effects than their pictorial counterparts, it can be proposed that the present study provides further evidence for the suitability of textual materials in the area of emotion induction. Furthermore, this is the first eye tracking study showing a statistically significant difference in effects of positive and negative, and not only of emotional and neutral texts. In this context, results from previous experiments using isolated words and sentences could be replicated at the supralexical level: perceived positive text valence attracts shorter reading times than perceived negative valence at both the supralexical and lexical level.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of the Department of Education and Psychology at Freie Universität Berlin. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
FU and JL contributed to the conception and design of the study. FU developed the test stimuli and collected the data.
FU performed the statistical analysis in consultation with AJ and JL. FU wrote the first draft of the manuscript. All authors contributed to the manuscript revision, read and approved the submitted version.

FUNDING
Open Access Funding was provided by the Freie Universität Berlin.