The Impact of Context on Affective Norms: A Case of Study With Suspense

The emotional response to a stimulus is typically measured in three variables called valence, arousal and dominance. Based on such dimensions, Bradley and Lang (1999) published the Affective Norms for English Words (ANEW), a corpus of affective ratings for 1,034 non-contextualized words. Expanded and adapted to many languages, ANEW provides a corpus to evaluate and to predict human responses to different stimuli, and it has been used in a number of studies involving analysis of emotions. However, ANEW seems not to appropriately predict affective responses to concepts when these are contextualized in certain situational backgrounds, in which words can have different connotations from those in non-contextualized scenarios. These contextualized affective norms have not been sufficiently contrasted yet because the literature does not provide a corpus of the ANEW list in specific contexts. On this basis, this paper reports on the creation of a new corpus of affective norms for the original 1,034 ANEW words in a particular context (a fictional scene of suspense). An extensive quantitative data analysis comparing both corpora was carried out, confirming that the affective ratings are highly influenced by the context. The corpus can be downloaded as Supplementary Material.


INTRODUCTION
The cognitive-affective theory claims that human emotional response to a stimulus is mainly determined by two different information-processing systems: (a) an automatic affective system, and (b) a cognitive processing system which evaluates the information related to the stimulus (Mischel and Shoda, 1995;Moors et al., 2013). Along with visual and acoustic resources (Baumgartner et al., 2006;Dolcos and Cabeza, 2002;Eerola and Vuoskoski, 2013;Bradley and Lang, 2000), experimental studies have typically used sets of words to identify and measure what characteristics of a stimulus trigger emotional responses, and to what extent. For instance, positive/negative valence or high arousal terms seem to have a greater emotional impact than neutral terms (Maratos et al., 2000;Hamann and Mao, 2002). These terms are also more vividly remembered (Kensinger and Corkin, 2003). Other studies report differences in speed of recognition and comprehension (Estes and Verges, 2008;Citron et al., 2014b;Kuperman et al., 2014). Furthermore, the emotional response is influenced by the context in which the terms are introduced (Bokde et al., 2001;Buchanan et al., 2006;Kousta et al., 2011). This content is determinant to attribute their semantic and, hence, the subjective values of their affective features (Sperber et al., 1979;Pearson, 1998;Shaikh et al., 2007;Barrett and Kensinger, 2010).
A conventional classification for these affective features relies on three variables called valence, arousal and dominance (Russell and Mehrabian, 1977). Valence describes the degree to which a stimulus causes a positive or a negative emotion (which ranges from pleasant to unpleasant-; arousal refers the intensity or level of energy inverted in the emotion (which ranges from calm to excited-; and dominance reflects the extent of the perceived control over the emotional response when facing the stimulus (which ranges from in control to out of control) (Lang et al., 1997;Bradley, 2009;Citron et al., 2014a;Gantiva Díaz et al., 2015). Based on these affective dimensions, Bradley and Lang (1999) published the Affective Norms for English Words (ANEW), that include valence, arousal and dominance scores for 1,034 terms and their analysis. Each word was rated using a 9-point scale represented by the Self-Assessment Manikin (SAM) (Bradley and Lang, 1994), a non-verbal, pictorial assessment technique that directly measures the emotion in the three affective variables.
Also, in a study for the detection and automatic generation of suspense framed in the development of an automatic story generator, Delatorre et al. (2017) used the ANEW dataset to determine which concepts were the best candidates to evoke suspense according to preferences of the audience. The experiment consisted of asking a number of subjects to rate the suspense provoked by a short text where different words representing decorative elements were included. Additionally, the study was repeated using an interactive 3D environment. The resulting data analysis found moderate correlations between reported suspense, and ANEW valence and dominance affective ratings. In particular, suspense increased as valence and dominance decreased and, to a lesser extent, as arousal grew.
Part of these observations are in line with the general idea of suspense found in the relevant literature (Delatorre et al., 2016b): although the existing multiple definitions of suspense largely differ in the identification and importance of its fundamental features 1 (see Zillmann and Tannenbaum, 1980;Carroll, 1984;Ortony et al., 1990;Caplin and Leahy, 2001;Vorderer et al., 2001;Somanchi, 2003;Szilas, 2007;Abbott, 2008;Smuts, 2008, among others), there is a general agreement that suspense is triggered by the anticipation of an outcome which is mostly negative for the characters (Delatorre et al., 2018). This conceptualization can be found, for instance, in the definition by de Wied et al. (1992, p. 325), that describe(s) suspense as "an anticipatory emotion, initiated by an event which sets up anticipations about a forthcoming (harmful) outcome event for one of the main characters." Such a common approach seems consistent with the values of the affective dimensions reported by the participants of the aforementioned experiment: decreased valence and dominance, and increased arousal.
However and although the model of Delatorre et al. (2017) revealed significant correlations in line with the mentioned concept of suspense, some other aspects seemed to challenge the validity of the experimental ratings prediction of the current affective dimensions.
First, the impact of the correlated dimensions depended on the narrative medium, that is, the correlation between the dimensions and suspense was different in the text story and in the interactive 3D environment. For the interactive 3D environment (the best case), the valence correlation reached r = −0.579, the arousal correlation was r = 0.345, and the dominance correlation was r = −0.498. Although this impact of valence and dominance might be considered acceptable, both failed to predict suspense evoked by concepts whose connotation varies in contexts typically associated to suspense. Thus, terms related to sanitation (such as vomit, dirt, manure, mucus, germs) or words as penalty were rated as less suspenseful than predicted. By contrast, the suspense score was higher than expected for other concepts such as dress or doll.
As mentioned, the situational context in which the words are used influences the affective ratings (Bokde et al., 2001;Buchanan et al., 2006;Kousta et al., 2011). Since ANEW is context-independent (i.e., the words are not contextualized), the no complete formal specification of the constituents and impact of suspense has still been proposed, it is a challenge to develop formal processes. Actually, the automatic story generation systems including suspense evaluate and implement it with a limited functional simplification, for example implementing some emotional links between the characters or reducing the chances of success for the protagonists in certain situations (Turner, 2014;Pérez y Pérez, 1999;Cheong and Young, 2006;Szilas, 2007;O'Neill and Riedl, 2014). Ultimately, the lack of a formalism results in the fact that several potential features of the different conceptualizations of the term are still not explicitly addressed, such as the quantitative measures of emotions during the referred process of anticipation of the outcome. emotional response is not bound by a restricted background. This may represent a relevant difference when comparing its performance with an evaluation based on a contextualized, constraining framework. Indeed, the general affective meaning of a word is more than just a direct function of lexical affective values, since the processing of affective words is expected to interact with the surrounding context (Westbury et al., 2013;Ullrich et al., 2017). The set of cognitive and emotional reactions resulting in the absence of a context-which means the lack of effective understanding of the information environment (Bawden and Robinson, 2009)-makes it difficult to determine the relevance of the concepts (Carleton, 2012), as well as to evaluate if this relevance influences the affective assessment in the target context. This uncertainty is generated by inexactness, unreliability, and ignorance, factors which can potentially be found when no defined context is present (Funtowicz and Ravetz, 1990).
As several studies cover affective responses in non-specific contexts given words and sentence structures (Bradley and Lang, 2007;Bao et al., 2011;Imbir, 2016), other studies have experimented with generic contexts based on textual corpora (ANEW, LANG, BAWL), sound corpora (IADS) or visual corpora (IAPS) (Lang et al., 1997). The meaning of "context" varies in these studies: it can mean the emotional status of the participant (Kanske and Kotz, 2012a), tone of voice of the pronunciation of the terms (Bertels et al., 2009), visual image association (Al-Naser et al., 2015), or adjectives in Adjective-Noun-Pair structures (Gawronski et al., 2005), among others. Notwithstanding this heterogeneity, the experimental results support that, terms being presented in contexts that are either negative or imply lack of control, the affective responses are even more negative and intense than when there is no context (Bertels et al., 2009;Lambert et al., 2010;Blaut et al., 2013;Guidry, 2005;Lehne and Koelsch, 2015;Lehne, 2014). Additionally, an increase in processing speed is observed when the valence of the term is consistent with the valence of the context, which seems to confirm the effect of the congruence of both polarity and context on attention and affective evaluation (Fazio et al., 1986;Gawronski et al., 2005;Al-Naser et al., 2015;Cummings et al., 2006;Holt et al., 2009;Erk et al., 2003).
Although these results are consistent with the emotional responses to suspense also observed in Delatorre et al. (2017), none of these previously mentioned studies includes a narrative background for the terms, which would be required for a quantitative analysis of the impact of narrative contexts in story generation. Therefore, a corpus that rates affective responses to concepts in a context of suspense 2 does not exist. As the main goal of this study, it would allow to compare the divergences with existing ANEW datasets when a specific situational background is introduced. On the other hand and particularly, it would enable the study of a model of suspense that relies on affective ratings.
On this basis, this work describes an experiment in which the three affective dimensions for the set of words included in the Spanish adaptation of the original ANEW (Redondo et al., 2007), contextualized in a suspenseful background, are rated by N = 206 Spanish subjects. The resulting dataset is called Spanish S-ANEW, "S" standing for "suspense." Five different analyses of the gathered ratings are described: descriptive statistics, associations between the affective dimensions, gender differences, comparison with suspense ratings published in the aforementioned work of Delatorre et al. (2017), and relations with other psycholinguistic indices. Additionally, each analysis includes the comparison with the ANEW study performed by Redondo et al. (2007), also introducing other studies where relevant.

Materials and Procedure
The word set used for the experiment contained 1,034 Spanish words taken from the corpus of affective words compiled by Redondo et al. (2007). It included objective and subjective psycholinguistic indices: number of letters, number of syllables, frequency, number of orthographic neighbors, familiarity, concreteness, and imageability. Regarding the grammatical class, 713 words (68.96%) were classified as nouns, 157 words (15.18%) were adjectives, 68 words (6.58%) were verbs, and the remaining 96 words (9.28%) were included in more than one of these groups.
Participants who agreed to take part in the study received an e-mail with a spreadsheet file containing three sheets. The first sheet presented the instructions. To ensure that the participants understood them, the second sheet provided an example using the exact same set of sample words proposed by Redondo et al. (2007). Finally, the third sheet listed the 1,034 words. This sheet was divided in a first column presenting the list of the words, and three groups of columns to score valence, arousal and dominance, with the respective Self-Assessment Manikin pictogram (Bradley and Lang, 1994) in the header. Each of these three groups was composed of nine columns, visually aligned with the positions of its respective SAM figure header. Except for the headers, all the rows kept a size of 20 pixels. All the columns to score had 30 pixels, and the dimensions were separated by 40 pixels. The font used was Times New Roman, size 12 pt. The first column and row had a locked layout to ensure that subjects had the references of both the words and the SAM pictograms at all times.
Each participant was instructed to rate every word in the entire set in the three affective dimensions by placing an X in the corresponding columns, only one for each word-dimension pairing. The list of words for every file was randomly ordered, resulting in a different permutation for every participant. The evaluation had to be carried out by contextualizing each word of the set within a scene of suspense. In order to avoid limiting this context to a specific scene, scene details were omitted. Actually, the scene could be freely altered for each word at the discretion of the subject. The objective was to give freedom of choice to imagine the scene (Delatorre et al., 2017). However, and to clarify the concept of "scene of suspense, " it was described as a situation in which a victim is under a forthcoming harmful outcome event due to a threat. This description is based on de Wied et al. (1992) definition of suspense.
Additionally, the instructions illustratively described the SAM model and the three affective variables (valence, arousal, and dominance). It was also remarked that there was no right or wrong answer, although the importance of using the entire range of ratings was emphasized as well as to rate the words according to the first impression, not spending too much time with any word. Also, participants did not receive instructions regarding ambiguous words. Participants were asked to fill in and send the completed file in within 2 weeks. These instructions were similar to those provided in related studies (Bradley and Lang, 1999;Eilola and Havelka, 2010;Redondo et al., 2007;Moors et al., 2013;Montefinese et al., 2014), and are available as Supplementary Material to this article.
The study was carried out in accordance with the recommendations of national and international ethics guidelines, Código Deontológico del Psicólogo and American Psychological Association. The study did not present any invasive procedure, and it did not carry any risk to the participants' mental or physical health, thus not requiring ethics approval according to the Spanish law BOE 14/2007 and the ethical guidelines of authors' institutions. All subjects participated voluntarily and gave written informed consent in accordance with the Declaration of Helsinki. They were free to leave the experiment at any time.

Participants
An initial set of N 0 = 318 undergraduate students from different fields in three public Spanish universities 3 participated in this study. However, after the time period for sending back the questionnaires expired, 97 voluntaries did not submit a completely filled-in answer (86), or they did it after the deadline (11). Additionally, 15 more questionnaires were excluded because they had less the half of the questions answered. The rest of questionnaires were sent in time, and all of them presented a percent between 98% and 100% of questions answered. Therefore, the final set of voluntaries was composed of a total of N = 206 undergraduate students (123 males (59.71%), and 83 females (40.29%)). Subjects ranged in age from 18 to 37 years old (M = 21.26, SD = 2.24). All of them were native Spanish speakers.

Description of the Database
The resulting database is available as Supplementary Material to this article. For ease of comparison, it was organized in the same structure as published by Redondo et al. (2007), as following: • Number: A numeric identifier for each of the 1,034 words, matching the original number in the ANEW dataset (Bradley and Lang, 1999).

RESULTS AND INTERPRETATION
Once the data was gathered and ordered, a split-half test was conducted in order to check its reliability and consistency. The three indexes (valence, arousal, and dominance) were calculated on 1,000 different randomizations of the participants. The mean correlations between groups were high for all affective dimensions: 0.93 for valence, 0.87 for arousal, and 0.85 for dominance. This finding agrees with the previous studies of ANEW (Redondo et al., 2007;Eilola and Havelka, 2010;Moors et al., 2013;Monnier and Syssau, 2014). Hereunder, section 3.1 details the descriptive statistics of the analysis, carried out similarly to the related literature (Bradley and Lang, 1999;Redondo et al., 2007;Soares et al., 2012;Moors et al., 2013;Warriner et al., 2013;Monnier and Syssau, 2014;Montefinese et al., 2014). Additionally, a brief study of the words that were rated the highest and lowest for each affective dimension is included. The goal of this analysis was to study numeric and semantic differences within a non-contextualized affective evaluation. The resulting deviations and their potential interpretations are presented.
Section 3.2 presents the study of associations between the affective dimensions for Spanish S-ANEW and in comparison to Redondo et al. (2007) ANEW. The results offer an explanation for the high correlation found in the Spanish S-ANEW scores against the suspense ratings published in Delatorre et al. (2017). This correlation is analyzed and described in section 3.4. Finally, gender differences and relations with other psycholinguistic indices are also analyzed, in line with the previous ANEW studies. Table 1 shows the central tendency and variability of both Spanish S-ANEW and Redondo et al. (2007) ANEW, which are visually displayed in Figure 1 (left).

Descriptive Statistics
The initial analysis of the central tendency revealed that ratings for valence and arousal were lower in Spanish S-ANEW than in Redondo et al. (2007) ANEW. However, the mean valence difference 4 was small (dif = 0.11, t = 2.67, p < 0.01), although the difference in terms of dispersion was high (2.05). Moreover, significant mean differences were found for arousal (dif = 0.92, t = 24.18, p < 0.0001), where Spanish S-ANEW rated almost one point out of nine less than Redondo et al. (2007) ANEW in the SAM scale. By contrast, dominance ratings were just barely higher in Spanish S-ANEW than in Redondo et al. (2007) ANEW (dif = −0.37, t = −13.13, p < 0.0001).
Figure 1 (right) shows the Spanish S-ANEW distributions of valence, arousal and dominance affective ratings. It illustrates that the distributions moderately skewed left for the valence (Sk = 0.24) and arousal (Sk = 0.39) indexes, while dominance is barely skewed right (Sk = −0.12). Both valence and arousal ratings trended to land in a lesser range than the mean of the words, respectively, 51.16% and 54.06%; while most of dominance ratings scored higher, in a value range greater than the mean (51.64%). As it can be seen in the figure, the respective maximum density was reached when valence got a value of 3.54 (30.13% of ratings), and 3.64 for arousal (26.63% of ratings), in both cases the deviation was higher than one point. This difference is lesser in dominance, which results in the maximum density at 5.68 (27.34% of ratings). Furthermore, the three dimensions presented a platykurtic distribution (K val = −0.87, K aro = −0.64, K dom = −0.97), having a moderate to low concentration of values around the average and, consequently, a moderate to high scoring range.
Additionally, the distribution of the Spanish S-ANEW ratings was individually compared with the Redondo et al. (2007) ANEW scores in each of the affective dimensions. Specifically and regarding the valence, both Spanish S-ANEW and Redondo et al. (2007) ANEW curves presented a similar shape, as shown in Figure 2 (up). However, while the distribution for the valence in Redondo et al. (2007) ANEW covered almost the entire rating scale ([1.11, 8.54]), the range of the Spanish S-ANEW had a more limited reach ([2.03, 7.75]). Therefore, the density of the rates was increased in the center of the curve. Furthermore, the Redondo et al. (2007) ANEW valence curve presented a bimodal distribution, with a higher concentration from 1.5 to 2.5, and from 5.5 to 7.5 mean ratings. Also, the peaks of the Spanish S-ANEW valence curves were located between 3 and 4, and between 4.5 and 5.5. These results revealed similarities between both distributions, where Spanish S-ANEW valence ratings seemed to be distributed in a similar way to Redondo et al. (2007) ANEW valence ratings did, but in a more confined range. As Figure 2 (up, right) shows, there is a linear uphill strong correlation (R = 0.826, p < 0.0001), and the aforementioned small mean difference (dif = 0.11, t = 2.67, p < 0.01) supports this observation. By contrast, as presented in Figure 2 (center), the range of Spanish S-ANEW arousal ratings ([1.80, 8.56]) was higher than in Redondo et al. (2007) ANEW ([2.36, 8.36]), whose arousal ratings were concentrated mostly in the upper part of the scale, with prevalence around the mean (K = −0.45, Sk = 0.05). This time, the best adjusted correlation between both measures is moderate (r = 0.532, p < 0.0001).
Also, as presented in Figure 2 (bottom), the dominance dimension curves presented similarities across both datasets. Both were platykurtic distributions that concentrated the ratings around the mean, barely skewed right, and with a moderate to strong correlation (R = 0.709, p < 0.0001). However, Redondo et al. (2007) ANEW dominance ratings showed a tendency for lower affective score in the case of higher rated words, which is reflected in the resulting significant difference of means (dif = −0.37, t = −13.13, p < 0.0001).  In order to analyze semantic differences in the highest and lowest Spanish S-ANEW and Redondo et al. (2007) ANEW scores, two samples of the thirty highest rated terms were gathered from both datasets, respectively. Their valence scores were used to classify the terms as negative -from 1 to 4-, neutral -from 4 to 6-, or positive -from 6 to 9-, in accordance with the criteria as used by authors as Ferré et al. (2012), Monnier and Syssau (2014), and Hinojosa et al. (2016). Table 2 shows this classification, including number of words, mean, standard deviations and range, for valence, arousal and dominance. The data evidences the decreasing number of positive words in Spanish S-ANEW, most of which are in the neutral valence value range. Table 3 shows the thirty words with the lowest and highest valence dimension ratings for each dataset, around 2.5% at each end. All the lowest rated terms were mostly related to concepts that generate danger, negative emotional states, sickness, or pain, although the Redondo et al. (2007) ANEW dataset included more potential large-scale tragedies, either causes or effects as war, accident, massacre, bomb, misery, destruction, terrorist. Beyond that, no other semantic groups differentiated Spanish S-ANEW from Redondo et al. (2007) ANEW. Four words (13.33%) were shared between them. All selected terms in each dataset were considered within the range of negative words in the other dataset.
Regarding the highest rated terms, nine words (30.00%) were shared between both datasets. Redondo et al. (2007) ANEW contained more terms related to interpersonal relations (30.00%) (love, kiss, friend, party, mother, family, valentine, hug), while in the Spanish S-ANEW set there were more words related to positive outcomes (20.00%) (peace, liberty, victory, freedom, win, triumph), and the means to achieve them (16.67%) (miracle, useful, lucky, knowledge, advantage).
In order to include the relation of the other dimensions with the valence, the study of the highest and lowest ratings of arousal and dominance sets of words is described in the next section.

Interpretation of Results
The results of the demographic analysis show that the curves in both Spanish S-ANEW and Redondo et al. (2007) ANEW have similar shapes and scores, and moderate to strong correlations were found between both datasets in the three affective dimensions. The distributions of overall ratings are centered over the middle of the scale with a slight bias. The ratings also present significant differences; specifically: (a) the number of words considered positive in a context of suspense is around half the amount of the words obtained when no context is introduced, and mean values are slightly less pleasant; (b) in suspenseful contexts, subjects tend to rate pleasant terms as less pleasant, and tend to use a lower of valence (one point over nine of deviation at each end), implying a lower concept polarization according to their semantics; (c) concepts evoke less arousal in the subjects when the words are introduced in a scene of suspense, resulting a higher tendency to emotional neutrality than in the non-contextualized ratings; and (d) words that evoke the highest control tend to be rated as evoking slightly more dominance when the suspenseful context is introduced. Therefore and in terms of valence, in a context of suspense subjects do not rate concepts as either extremely pleasant or extremely unpleasant. This contrasts with the results in experiments without context, in which subjects use a wider range in ratings. On the other hand, arousal ratings in the context of suspense present a clear displacement toward a lower segment of the rating range, slightly more expanded, and resulting in a lower density for high rating values. Finally, words with a medium-high dominance score have a higher rating for this dimension in Spanish S-ANEW than in Redondo et al. (2007) ANEW. This suggests that concepts evoking an either neutral or positive sense of control do it even more in a context of suspense.
Furthermore, an analysis of the most and the least pleasant words shows semantic differences between the contextualized and non-contextualized word sets. In a suspenseful scene, several of the most pleasant words involve positive outcomes and the means to achieve them. With suspense defined as a situation in which a victim is under a forthcoming harmful outcome event due to a threat, it is assumable that concepts related to escaping the threat are the most desirable (e.g., peace, liberty, victory, freedom). This trend is not present when the context is not specified, with the top-scoring pleasant words involving interpersonal relations. By contrast, in a context-less scenario, the lowest rated words are related to potential large-scale tragedies. This trend is not observed in a suspenseful context.
A plausible reason for these divergences is that the specific background allows to contextualize and to manage the different concepts in a more accurate way, reducing the uncertainty. In addition, an analysis of the elements seems to show a tendency to assign a greater arousal to concepts that are frequently associated to suspenseful development or threatening outcome situations. However, the amount of ANEW terms that are potentially related to suspense is low (specifically from the middle-high terms sorted by decreasing valence). These tendencies have also been found in previous studies (Lazarus, 1966;Comisky and Bryant, 1982;Guidry, 2005;Madrigal et al., 2011;Lehne and Koelsch, 2015). Existing words in both dataset are remarked. + , positive; • , neutral; or − , negative word in the other dataset.
The potential causes and effects of these distributions will be discussed further.

Associations Between Affective Dimensions
In this section, comparisons between the different emotional dimensions are conducted through the corresponding regression analyses, taking the affective valence as the independent factor in line with the works of Redondo et al. (2007), Ferré et al. (2012), Soares et al. (2012), Monnier and Syssau (2014), and Hinojosa et al. (2016). Additionally, the words with the highest and lowest scores are studied, similarly to the previous valence dimension case. Figure 3 shows the ratings for the 1,034 words in the two-dimensional affective spaces corresponding to valence and arousal.

Valence vs. Arousal Dimensions
In both datasets, a significant U-shaped quadratic correlation between valence and arousal was found. The Spanish S-ANEW correlation is given by the formula a = 0.285v 2 − 3.322v + 13.441 [R 2 = 0.361, F (2, 1031) = 293.4, p < 0.0001]) 5 , and represents 36.12% of the variance. This is similar to the results obtained by other authors as Moltó et al. (1999) against the 27.14% found in Redondo et al. (2007) ANEW and given by the formula a = 0.137v 2 − 1.349v + 8.227 [R 2 = 0.271, F (2, 1031) = 193.4, p < 0.0001]. The ratings distribution between valence and arousal showed that the terms rated as either strongly pleasant or strongly unpleasant are also consistently rated as more arousing. Also, in Spanish S-ANEW there is a predominance of unpleasant words that are rated as more arousing than pleasant words. Even though this asymmetry had not been clearly identified in the work of Redondo et al. (2007), similar distributions were found in other previous studies (Ferré et al., 2012;Montefinese et al., 2014;Soares et al., 2012;Guasch et al., 2016). Figure 3 (left) shows that most of the words with valence greater than 5 are located in the lower right of the chart, presenting medium to low values along the arousal dimension. An analysis of the arousal scores along the ranges of valence (negative, neutral, and positive) supports the observed asymmetry in Spanish S-ANEW [χ 2 = 330.02, F (2, 1031) = 260.60, p < 0.0001]. Nevertheless, a post-hoc analysis only revealed significant arousal differences for the negative words, but not for neutral and positive words.
In order to find any tendencies in the highest and lowest arousal-rated words, two samples with the thirty highest rated terms per classification of both negative and positive valence were gathered from Spanish S-ANEW and Redondo et al. (2007) ANEW datasets. Due to the shape of the curve, it was necessary to check how the highest arousal-rated words differ between the extremes of the valence. Additionally, other sample of the thirty lowest rated terms was analyzed. Table 4 shows the set of negative words (left) and positive words (right) evoking the highest arousal for both Spanish S-ANEW and Redondo et al. (2007) ANEW datasets. Regarding the negative words, the ten words shared by the two datasets (33.33%) are highlighted. Four of these terms were in the first positions of both lists, almost coinciding in order (with the exception of drown) and all of them were outliers in the distribution of the Spanish S-ANEW arousal ratings. According to the range of scores (see Table 2 and Figure 2), Spanish S-ANEW arousal ratings were slightly higher than Redondo et al. (2007) ANEW arousal ratings, unlike valence ratings, which behaved conversely. In addition, an analysis of the words revealed that most of Spanish S-ANEW terms referred to concepts either related to tragic physical effects (40.00%) (death, rape, drown, suffocate, mutilate, paralysis, crushed, mangle, dead, torture, scorching, cancer), related to potential threats (33.33%) (bomb, massacre, outrage, killer, terrorist, assassin, danger, fire, sour, beast), or, in a lower proportion, related to emotional states (10.00%) (horror, panic, fear). In comparison, Redondo et al. (2007) ANEW list contained appreciably less concepts related to tragic physical effects (20.00%) (rape, suffocate, drown, torture, abuse), and a similar number of terms related to potential threats (33.33%) (bomb, slaughter, accident, danger, shark, terrorist, robber, tragedy, war, avalanche), but more focused on emotional states (30.00%) (panic, anxious, alert, rage, enraged, rabies, nervous, stress, despairing). Other words such as physical locations (morgue, ambulance) or environment features (dark) were also presented in one or both lists, although in a smaller proportion. The classification of valence for most of the words was shared between the two datasets, with the exception of fire (neutral for Redondo et al., 2007 ANEW), anxious and ambulance (both neutral for Spanish S-ANEW).
In the other extreme of the range of values for valence, Table 4 (right) compares the set of positive words that scored the highest for arousal. In contrast to the list of negative words, the list of the highest arousal Spanish S-ANEW positive words contained terms related to chances or means to control a situation (26.67%) (option, chance, mind, army, hope, wit, vigorous, intellect), semantically related to the achievement of an objective or succeeding in general (30.00%) (victory, achievement, champion, triumph, win, success, bliss, triumphant, profit), and, more specifically, related to outcomes that imply escape (16.67%) (rescue, alive, liberty, freedom, free). In the case of Redondo et al. (2007) ANEW ratings, terms related to the achievement of an objective or escaping were also present to a lesser extent (20.00%) (win, victory, chance, triumph, rescue, achievement), and most of the remaining words were related to love and sexual themes (46.66%) (orgasm, passion, aroused, kiss, sex, intercourse, pleasure 6 , excitement, erotic, love, valentine, couple, romantic, sexy) and generic pleasant activities (30.00%) (fun, merry, party, adventure, elated, applause, travel, laughter, happy). In this regard, almost half of the words related to sexual issues and appearing in the Spanish S-ANEW list were classified as neutral in Spanish S-ANEW dataset. Finally, Table 5 compares the set of words considered to evoke the least arousal, respectively, for both Spanish S-ANEW or Redondo et al. (2007) ANEW datasets. In this case, no words were shared between the resulting lists. As illustrated in Figure 3 (left), the lowest arousal Spanish S-ANEW words were mostly located in the neutral part of the valence range ([3.63, 6.65]), and corresponded to Redondo et al., 2007 ANEW neutral and 6 Due to word relations and in order to simplify, we decided to included pleasure as relevant to sexual activities, although it could be considered as relative to general well-being. This criterion may be applied to other terms from the lists, due to the inherent difficulty associated to disambiguate the intended connotations of the subjects. In any case, we consider this does not represent a significant difference in the general analysis. positive valence words. Several of these Spanish S-ANEW terms were related to food (36.67%) (sugar, pizza, jelly, milk, chocolate, hamburger, mushroom, muffin, butter, ketchup, salad). On the other hand, the Redondo et al. (2007) ANEW list presented a wider range for valence ([2.16, 8.11]) and a lower mean arousal. In the Redondo et al. (2007) ANEW list, the meaning of most of the terms is related to non-stress and well-being. Figure 4 shows the ratings for the 1,034 words in the two-dimensional affective spaces corresponding to valence and dominance. Dominance (d) had a positive correlation with valence (v) in Spanish S-ANEW [r = 0.697, F (1, 1032) = 977.8, p < 0.0001], following the formula d = 0.719v + 1.715 and presenting a higher variation than the one found in Redondo et al. (2007) ANEW [r = 0.827, F (1, 1032) = 2241, p < 0.0001]. The relationship was also linear, implying that higher dominance ratings were assigned to words associated with pleasant concepts. These results were consistent with the findings of authors who analyzed both dimensions (Montefinese et al., 2014;Warriner et al., 2013;Moors et al., 2013). Once again, the study of this distribution was extended through the analysis of the highest and lowest rated words. Table 6 shows the set of words with the lowest and highest ratings for dominance, thirty for each dataset. All the lowest rated words were in the range of negative valence. They were also rated as unpleasant in the other dataset. As with the lowest valence-rated words, the lowest rated terms were mostly related to individual or large-scale tragedies (either cause or effect), without specific semantic groups that clearly differentiated Spanish S-ANEW from Redondo et al. (2007) ANEW. Eight words (26.67%) were shared across both sets.

Valence vs. Dominance Dimensions
Regarding the highest rated dominance and as shown in Figure 4, most of Spanish S-ANEW rated words lied within the range of neutral valence, while most of Redondo et al. (2007) ANEW rated words were within the range of positive valence. Beyond that, no significant semantic commonalities were found.
As expected, in Spanish S-ANEW, arousal and dominance have a similar correlation to the one found between valence and arousal [R 2 = 0.441, F (2, 1031) = 408.2, p < 0.0001]. This behavior is in line with the findings from other studies (Warriner et al., 2013;Montefinese et al., 2014). The curve follows the formula a = 0.119d 2 − 1.918d + 11.054.

Interpretation of Results
The study of the inter-dependencies between valence and arousal shows that subjects' rate as more arousing either strongly pleasant or strongly unpleasant words, regardless of the context. However, when a suspenseful context is introduced, unpleasant words are rated with a higher arousal than pleasant words are. It could be said that suspense diminishes arousal for the most pleasant concepts. This might be related to the lower amount of words ranked with a positive valence in a suspenseful scene. In a suspenseful context, we found that there are roughly twice as much negative valence words as positive valence words. Also, Spanish S-ANEW has half the words with positive valence that Spanish S-ANEW has. Since suspenseful scenes are usually conceived as unpleasant, it makes sense to assume that less involved concepts are perceived as positive.
Moreover, in a context of suspense, subjects assign the highest arousal to negative concepts related to tragic physical effects and potential threats. Likewise, positive concepts related to control a situation, and again to achieve an objective or escaping a threat were rated with the highest arousal as well. By contrast, when no context is present, the list included less words related to tragic physical effects, but more negative concepts involving emotional states, as well as positive concepts linked to interpersonal relations, were rated with the highest values for arousal.
Regarding the dominance dimension, a linear high correlation with valence implies that subjects usually feel in control when facing pleasant concepts and vice-versa, although to a lesser degree in a context of suspense. Thus, in this context the valence impact is significant but not as determinant as to evoke a sense of control. In any case and even considering that the average dominance is lower with the suspense backdrop (as explained in the previous sections), the similarities between ratings suggest that the words elicit proportionally similar emotional reactions to the feeling of dominance regardless of the context. Furthermore, there are no different semantic groups between both the highest and the lowest dominance rated words. Regarding the lowest rated concepts, both suspense contextualized and non-contextualized sets share around 25% of words. All of them were considered unpleasant and most of them were related to individual or largescale tragedies.
In conclusion, while both datasets do not present differences for dominance, the ratings and the semantics substantially differ for arousal. On the one hand, in a context of suspense, unpleasant words evoke significantly less arousal than pleasant words. On the other hand, words associated with certain semantic themes like tragic physical effects, potential threats, control, or escaping, tend to be rated with the highest arousal mainly or only in the context of suspense.

Gender Differences
For each emotional dimension and term, subjects' responses were analyzed in terms of gender-related variations. Table 7 presents means, standard deviations, range values, correlation indexes, and differences of means of the Spanish S-ANEW ratings for both females and males in the three affective dimensions, globally and per valence classification.
Although scores were highly similar between males and females, when comparing the ratings according to their valence classification, females presented greater average ratings than males for positive words and lower than males for negative words. Also, the range of scores was higher for female subjects in the three dimensions. These results are in line with Monnier and Syssau (2014) and Soares et al. (2012), who found that ratings for affective stimuli were lower for males than for females. Additionally, strong correlations where found between men and women ratings across all dimensions (R >= 0.91, p < 0.0001), which is broadly supported by most of studies (Bradley and Lang, 1999;Redondo et al., 2007;Monnier and Syssau, 2014;Montefinese et al., 2014;Soares et al., 2012;Lang et al., 1997).
In order to study other possible dependencies for different combinations of gender, dimension and valence, a multivariate analysis MANOVA was conducted. First, a global, marginal effect in gender was found [F (1, 6202) = 3.196, p = 0.073]. Second, a significant interaction between gender × dimension × valence classification was also observed [F (4, 6199) ) = 16.452, p < 0.0001]. Post-hoc tests indicated that valence ratings (p < 0.01 for all the valence classifications), and dominance in the range of negative valence (p < 0.01) differed significantly depending of the gender of the subject. This result does not coincide with Redondo et al. (2007) findings, in which sex did not reach statistical significance, beyond female subjects tended to rate negative words as more negatively and vice-versa. However, results were mostly consistent with other works, such as the original research of Bradley and Lang (1999), who obtained statistically significant differences for the dominance dimension, or Soares et al. (2012), who found that females tend to assign higher ratings to pleasant words.
Regarding the association between dimensions, Figure 5 shows the distribution of the Spanish S-ANEW scores in the affective space defined, respectively, by valence and arousal, and by valence and dominance, filtering male and female ratings. These dependencies showed significant differences for arousal, fitting female subjects' scores (R = 0.640, p < 0.0001) better than male subjects' scores (R = 0.534, p < 0.0001). By contrast, similar correlation values for dominance, both for females (R = 0.690, p < 0.0001) and males (R = 0.677, p < 0.0001), were found.

Interpretation of Results
Results suggest that response to emotional stimuli is lower for males than females in valence and positive dominance ratings. Likewise, considering the U-shape curve that relates arousal and valence, women rate with a higher arousal value the words with either low or high valence, while men seem less affected by the words in the upper and lower limits of the valence range. These observations do not coincide with the findings of Redondo et al. (2007) or Monnier and Syssau (2014), for whom there are not significant differences related to gender. However, a number of Redondo et al. (2007) ANEW studies also support gender-related variations in one or more dimensions (Bradley and Lang, 1999;  Existing words in both dataset are remarked. + , positive; • , neutral; or − , negative word in the other dataset. Soares et al., 2012;Stevenson et al., 2007). Thus, there is no agreement regarding the impact of the gender in the affective evaluations in the literature (Soares et al., 2012;Montefinese et al., 2014;Redondo et al., 2007), so at this point it is not possible to assure that the context is a relevant factor for gender-related variations in the results.

Comparison With Scores for Suspense
Once analyzed, Spanish S-ANEW ratings were contrasted with the suspense scores gathered in Delatorre et al. (2017). This corpus was composed by twenty-five words. Each one of them was introduced in two different suspenseful scenes: a short text passage and an interactive 3D environment. Table 8 shows the words and the scores.
In order to determine the relationship between suspense and the affective dimensions, a multiple regression analysis was conducted with the suspense mean ratings as the dependent factor, while both Redondo et al. (2007) ANEW and Spanish S-ANEW ratings were used as independent factors. The resulting models were compared to determine the best fit.
We performed separate analyses for each dataset by combining the three dimensional variables [valence (v), arousal (a), and dominance (d)], to predict the suspense ratings in a similar way as existing analyses found in the literature (Montefinese et al., 2014;Stevenson et al., 2007;Riegel et al., 2016;Jacobs et al., 2015). The formula of the best-fit model for Spanish S-ANEW dimensions [R = 0.844, F (3, 47) = 39.44, p < 0.0001, RMSE = 0.860] included the three variables. However, the valence ratings were barely significant (t = −1.35, p = 0.187), and excluding it did not imply a substantially worse fit (p = 0.184). Thus, only arousal and dominance were ultimately included. The new model presented a similar high adjustment [R = 0.841, F (2, 47) = 57.95, p < 0.0001, RMSE = 0.868], explaining 70.8% of the variance. The obtained formula for this model was 0.533a − 0.622d + 4.636. Moreover, other polynomial and linear estimations did not result in significant improvements.
The model was strongly correlated to the reported suspense, for both the textual story (r = 0.878, p < 0.0001) and the interactive 3D environment (r = 0.834, p < 0.0001). Figure 6 illustrates this.
Regarding Redondo et al. (2007) ANEW ratings, a lineal regression that included only dominance was the best-fit model [R = 0.606, F (1, 47) = 27.66, p < 0.0001, RMSE = 1.283], accounting for 36.2% of the variance. The obtained formula was −1.008d + 8.395. The model was medium correlated to the suspense ratings for the text story (r = 0.534, p < 0.0005), and medium to strongly correlated for the interactive 3D environment (r = 0.685, p < 0.0005). These values were clearly lower in comparison to the model computed for Spanish S-ANEW affective ratings.

Interpretation of Results
Valence as negative factor positively contributes to a small better adjustment, meaning that unpleasant words have an impact in suspense responses. Actually, some authors specialized on suspense have included concepts as "danger, " "hostile, " "deplorable, " or "harmful" in their definitions (Perron, 2004;Zillmann and Tannenbaum, 1980;Zillmann, 1996;de Wied et al., 1992). Likewise, words that evoke similar ideas rated the lowest in terms of valence (see Table 3). Despite of this, when analyzing the statistical impact of valence in the model, it seems not to improve the adjustment significantly. This observation is in line with other modern definitions that avoid explicitly giving a negative connotation to the causes that trigger suspense, but only to the emotional responses derived from suspense (fear, frustration, anxiety, concern, apprehension) (Abbott, 2008;Alwitt, 2002;Vorderer and Knobloch, 2000;Smuts, 2008;Caplin and Leahy, 2001;Guidry, 2005;Knobloch et al., 2004). Considering the general conception of suspense as a feeling of anticipation (Wang and Cheong, 2006;Burget, 2014;Wirth and Schramm, 2005;Lehne, 2014), all those definitions seem to agree that the outcome must just bring "significant consequences (either good or bad) (Brewer and Lichtenstein, 1982), being anything  that potentially causes positive or negative future changes (the promise of a kiss, a pay rise, a loose bolt in the airplane's wheel) (Howard, 2006;Smuts, 2008;Burget, 2014). Therefore, although it is not possible to discard at all the impact of the negative valence (and, in fact, the number words rated as positive substantially decreases in a suspenseful context, as shown in Table 2), its degree of influence is put into question according to both the literature on the topic and the statistical result, that is barely determinant at least in contrast to the other affective dimensions. Arousal, on the other hand, fits the model better. The most unpleasant and the most pleasant words get the highest ratings of arousal, which presents a reduced dependency on the polarity of the valence. Moreover, the highest arousal ratings in the context of suspense comprise a substantial proportion of specific concepts related to potential threats and tragic physical effects, as well as (in contrast to Redondo et al., 2007 ANEW scores) controlling a situation, achieving an objective, and escaping a threat, concepts that may also be related to being in control. Along with this, large-scale tragedies have been rated mostly with the lowest dominance. Despite the fact that these terms can be found in a diversity of contexts, they may be often observed working together specifically in generating suspense (Allen and Ishii-Gonzales, 2004;Gerrig and Bernardo, 1994;Frome and Smuts, 2004;Truffaut and Scott, 1998). These observations provide evidence in favor of arousal and dominance having a strong relation to the emotional response, terms and chain of events which generally characterize a suspenseful scene (Hsu et al., 2014).
On the other hand, the regression model for Redondo et al. (2007) ANEW scores presents a worse adjustment. Besides, its accuracy seems to depend on the narrative medium, as it does not fit the text story's suspense equally to the interactive 3D environment's. However, the general behavior of the individual curves of both arousal and dominance ratings is not too different from ones found in Spanish S-ANEW as to explain these disparities. The more plausible assumption is that, even if the ratings are distributed in a similar way, the words included in each score range are highly influenced by the existence of the context. Indeed, the specific context of suspense has made it possible to obtain a set of affective ratings and a subsequent model that strongly fits the reported suspense.

Relation With Other Psycholinguistic Indices
The values for the affective dimensions of Spanish S-ANEW were compared to objective (number of letters, number of syllables, grammatical class, frequency, orthographic neighbors) and subjective psycholinguistic indices (familiarity, concreteness, imageability). Similarly, the reported suspense was compared with these variables to study potential relations, in both the textual story and the interactive 3D environment.
The results revealed very weak or no significant correlations for most of the candidate relations, with the exception of valence. This valence relations included a positive weak correlation with frequency (r = 0.154, p < 0.0001), and a positive weak to moderate correlation with familiarity (r = 0.201, p < 0.0001). Also, familiarity presented weak correlations with arousal (r = −0.165, p < 0.0001) and dominance (r = 0.183, p < 0.0001).

Interpretation of Results
Considering the results of the familiarity subjective index, subjects seem to feel a significant low emotional bias when facing words that they either know or use more often. The more familiar the word is to them, the higher valence, the lower arousal, and the higher dominance is reported. This hypothesis is partially in line with the conclusion of Warriner et al. (2013) and Montefinese et al. (2014), although they found that this happens mainly for the dominance. They conclude that response is a "fear of the unknown, " which seems to be consistent with a state of anticipation and suspense.
Nevertheless, other correlations with subjective psycholinguistic indices found in relevant literature are not present enough in our results to include them as part of the affective responses to suspense. Since the current analysis was conducted by using the ratings gathered by Redondo et al. (2007), where no context was introduced, it would be necessary a new assessment of psycholinguistic indices in a specific context.

DISCUSSION
Although these results represent an improvement over the previous models that predict suspense, some issues need to be discussed.
The proposed methodology has a number of limitations. First, in order to avoid restricting the context to a single kind of predetermined scene, no specific description was provided to the participants. This prevents the subjects from being forced to adapt the concepts for a previously manipulated scenario. It also reduces the risk of the participants not being able to come up with a coherent relation between the scene and some of the concepts, or consider some of the proposed scenes as not suspenseful. Therefore, it was decided to let the participants recreate and operate their own suspenseful scenes in a potentially natural way. This approach was inspired by similar procedures found in the literature, in which not even the term "suspense" is defined. These strategies assume the validity of the subjective criteria of the participants (Brewer and Lichtenstein, 1980;Comisky and Bryant, 1982;Gerrig and Bernardo, 1994;Hoeken and van Vliet, 2000;Alwitt, 2002;Cheong and Young, 2008;Abuhamdeh et al., 2015;Liang, 2015). As in the mentioned studies, this lack of definition implies an ambiguity in the method. In our case, a scene in a participant's mind can be different from one another. This can lead to different ratings for the same term, produced not only by the personal interpretation, but by the details of a different scene. This situation also occurs in the original ANEW studies (Bradley and Lang, 1999;Redondo et al., 2007;Soares et al., 2012;Moors et al., 2013;Warriner et al., 2013;Monnier and Syssau, 2014;Montefinese et al., 2014) where, even when no context is specified for each term, the represented concept will be placed in a specific context in the reader's mind once it is read (Citron et al., 2014a;Stanovich, 2017). This activation process seems to take place because the stimulus activates the memory, and therefore inter-subject divergences can happen even when the context is not previously set (Neumann, 1984;Stanovich, 2017). On this basis and due to the memory activation, we may assume a similar automated recall process when known and consistent contexts are required (Stanovich and West, 1979;Fazio, 2001;Rayner et al., 2012).
In any case, the similarity between the results among the previously mentioned suspense or ANEW experiments seems to indicate that this methodological ambiguity does not invalidate the quantification of the emotional effect of the terms. At the very least, it does not seem to be a discussed issue in the reviewed literature. Certainly, it is not possible to guarantee that the participants have interpreted or imagined the concepts in a context of suspense, but despite of this fact, we argue that the obtained model presents a better fit of the previous ratings of suspense than the original Redondo et al. (2007) ANEW data. This assumption, the different ratings of words with distinct semantic attribution, and the consistency in terms of reliability suggest that the contextualization has effectively happened, and that it can be quantified as described in the paper.
The second relevant aspect worth discussion concerns the procedure for gathering the results through a questionnaire. The methodology utilized to obtain the Spanish S-ANEW ratings is based on the one used in Moors et al. (2013), which differs from Redondo et al. (2007)'s methodology. Since this latter's dataset is the dataset used to contrast the Spanish S-ANEW scores, it might be argued that a similar method of assessment could ensure a more rigorous comparative analysis. Nevertheless, a number of experiments on gathering affective ratings (in which each experiment is also compared with each other) obtained assessment through strategies different from the classroom-based collective sessions used by Redondo et al. (2007) e.g., remote web surveys or spreadsheet (Montefinese et al., 2014;Hinojosa et al., 2016;Soares et al., 2012;Guasch et al., 2016;Moors et al., 2013), asking each participant for the evaluation of 56 (Montefinese et al., 2014) to 4,300 words (Moors et al., 2013). Ultimately, email or web-based methods are used by several authors due to the clear advantages in terms of time and resources (Buchanan and Smith, 1999;Reips, 2000;Risso, 2002). Regarding this, relevant literature points out very small differences between experiments in person and remote experiments, and in the psychometric properties, providing substantial similarities across all the results (see also Krantz et al., 1997;Smith and Leigh, 1997;Pasveer and Ellard, 1998;Stanton, 1998;Buchanan and Smith, 1999;Buchanan, 2000;Risso, 2002, among others). This suggests that the methodological differences are not significant enough to compromise the comparative analysis, nor to challenge the validity of the regression model. However, and as an issue worth mentioning, the lack of supervision makes it impossible to reliably control the number of participants' sessions and behaviors, or the time intervals, which can impact the task performance (Gasquet et al., 2001). Even though our collected data point toward a consistency in the results, it is acknowledged that there is a limitation in the experiments using remote questionnaires.
The comparative study carried out with the first and last thirty terms according to the rating and semantics of the words also deserves discussion. A number of semantic groups of terms for each dataset were identified by the authors. The objective was to illustrate the impact of the context in the ratings when considering the semantics. Due to the great variability of potential classifications, it is acknowledged that subjective interpretation can play an important role in the configuration of these semantic groups. Because no study has previously set any standard quantity, the specific value of thirty words (around 2.5% at each end) may be considered unjustified and potentially extensible. However, the observed words seem to be sufficient to defend that the context influences the ratings, which is illustrated by the positional order of the analyzed word when sorted by score.
This tendency indicates that, when contextualized and grouped by range (positive, neutral and negative), the scores seem to be higher for some specific semantically-related terms. This can be observed by focusing on the highest and lowest ranked terms on each affective dimension. For instance, we have observed that, in suspenseful contexts, subjects tend to assign higher scores to success-related terms in contrast to those implying interpersonal relations, since the latter have higher scores in uncontextualized scenarios. This tendency has also been described in the original ANEW (Bradley and Lang, 1999). Similar differences regarding context influence can be found in Tables 4-6.
In any case, the semantic study and the corresponding interpretation is intended to be discussed as one of the possible explanations for the model's improvement. Even when an increased number and a reinterpretation of the semantics could modify the observed groupings, the current range seems to evidence that some semantics are prioritized over others depending on the context. This is shown quantitatively in the referred tables. This ordinal and quantitative differentiation of the whole corpus from the compared ANEW is significant and large, as evidenced in the improvement (70% compared to the 36% of variance accounting) of the regression model obtained from the contextualized ratings when applied to the direct suspense ratings.
Finally, the set of suspense ratings in Delatorre et al. (2017) used to test the Spanish S-ANEW regression model, covers only a minor part of the original 1,034 words. The results may be considered robust enough: the same 25 concepts in two different narrative discourses have been contrasted with the model, explaining around 70% of the variance. However and despite of this evidence, it is not possible to discard the possibility that the model obtained might not appropriately predict suspense for others subsets from the original ANEW list, as well as any other narrative discourse. An experiment to gather an extensive set of scores will be addressed as part of the future work. Additionally, it should include new ratings for concreteness, imageability, context availability, and familiarity: It is conceivable that subjects face potential inconsistencies between the imagined scene and the rated word, and even that the onset of the word qualifies and alters the original background 7 . Other terms such as theory, poetry, phase, or history may be considered challenging to be literally integrated within the scene, so it is possible that subjects create a complex, more unique situation that associates these concepts. Gathering and analyzing subjective psycholinguistic indices should partially shed light on this bias.

CONCLUSIONS AND FUTURE DIRECTIONS
The objective of this work is to provide quantifiable evidence for the impact of context on the ratings of affective norms, as opposed to the non-contextualized existing corpora. In order to do this, 206 Spanish subjects used the SAM model to evaluate valence, arousal and dominance of the 1,034 words proposed in Redondo et al. (2007) ANEW, in a fictional context of suspense. Both datasets were contrasted, revealing similar behavior in the curves of ratings for the three affective dimensions. However, on average, Spanish S-ANEW words tended to be less pleasant, significantly evoked less arousal, and presented a slightly higher dominance. An analysis of the highest and lowest rated words for the three dimensions reveals the existence of different semantic groups, such as the highest rated positive terms being related to escaping a threat for Spanish S-ANEW, in contrast with the highest rated positive terms being related to interpersonal relationships for Redondo et al. (2007) ANEW. This indicates that, when contextualized, groups categorized by range (positive, neutral and negative) tend to receive higher scores for some specific semantically-related terms. This can be detected by observing the highest and lowest affective ranking terms.
These results provide a quantification of a specific context influence on affective terms. Specifically, a general observation suggests that valence and dominance scores tend to be neutral to low for concepts that are expected to be present in a suspenseful scene. This difference may be explained by the effects of predictability on the readers (Illouz, 2009;Schacht and Sommer, 2009;Scott et al., 2012;Smith and Levy, 2013), which will be considered for a study in future contributions. In line with the mentioned literature (Lazarus, 1966;Boekaerts, 2001;Guidry, 2005;Carnaghi et al., 2008;Madrigal et al., 2011;Lehne and Koelsch, 2015).
Additionally and in order to validate the Spanish S-ANEW ratings, a multiple regression analysis was conducted against a dataset of suspense obtained from Delatorre et al. (2017), in which twenty-five elements introduced in both a short text and an interactive 3D environment were rated in terms of suspense. Once calculated, the model for Spanish S-ANEW obtained a strong correlation to suspense ratings (r ≈ 0.85, p < 0.0001 in both narrative scenarios), while Redondo et al. (2007) ANEW regression model presented weaker correlations, as well as a high dependency on the media (r = 0.534, p < 0.0005 for the textual story; r = 0.685, p < 0.0005 for the interactive 3D environment). These findings support the validity of the Spanish S-ANEW corpus.
In conclusion, the result of this work provides an affective norms dataset in a specific context of suspense. We reckon this is an innovative proposal, as no other similar research that covers the affective impact of such an extensive set of words introduced in a situational context has been found in the literature. The mid-term objective is to deliver a formal corpus that can provide a ground definition for suspenseful terms, as part of the development of a computational model of suspense to predict the effect of texts in the audience (Delatorre et al., 2016a). In this line, the long-term aim is to improve the model to predict the emotional effect of complex concepts when they are introduced in different contexts, not restricted to suspense. Additionally, we reckon this may be the precursor of similar future works, to contrast Redondo et al. (2007) ANEW with different contexts, to expand the set of words, or to analyze the effects of suspense in different narrative media.

DATA AVAILABILITY
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of national and international ethics guidelines, Código Deontológico del Psicólogo and American Psychological Association. This study did not present any invasive procedure, and it did not carry any risk to the participants' mental or physical health, thus not requiring ethics approval according to the Spanish law BOE 14/2007. All subjects participated voluntarily and gave written informed consent in accordance with the Declaration of Helsinki. They were free to leave the experiment at any time.

AUTHOR CONTRIBUTIONS
PD, AS, CL, and AT contributed to the conception of the study, designed the experimental method, and wrote and approved the version to be published. PD investigated and sorted the related work and interpreted the findings. PD and AS conducted the experiment and collected the data.