Visual Complexity and Affect: Ratings Reflect More Than Meets the Eye

Madan, Christopher R.; Bayer, Janine; Gamer, Matthias; Lonsdorf, Tina B.; Sommer, Tobias

doi:10.3389/fpsyg.2017.02368

ORIGINAL RESEARCH article

Front. Psychol., 18 January 2018

Sec. Emotion Science

Volume 8 - 2017 | https://doi.org/10.3389/fpsyg.2017.02368

Visual Complexity and Affect: Ratings Reflect More Than Meets the Eye

Christopher R. Madan^1,2*

Janine Bayer¹

Matthias Gamer^1,3

Tina B. Lonsdorf¹

Tobias Sommer^1*

¹Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
²School of Psychology, University of Nottingham, Nottingham, United Kingdom
³Department of Psychology, University of Würzburg, Würzburg, Germany

Pictorial stimuli can vary on many dimensions, several aspects of which are captured by the term ‘visual complexity.’ Visual complexity can be described as, “a picture of a few objects, colors, or structures would be less complex than a very colorful picture of many objects that is composed of several components.” Prior studies have reported a relationship between affect and visual complexity, where complex pictures are rated as more pleasant and arousing. However, a relationship in the opposite direction, an effect of affect on visual complexity, is also possible; emotional arousal and valence are known to influence selective attention and visual processing. In a series of experiments, we found that ratings of visual complexity correlated with affective ratings, and independently also with computational measures of visual complexity. These computational measures did not correlate with affect, suggesting that complexity ratings are separately related to distinct factors. We investigated the relationship between affect and ratings of visual complexity, finding an ‘arousal-complexity bias’ to be a robust phenomenon. Moreover, we found this bias could be attenuated when explicitly indicated but did not correlate with inter-individual difference measures of affective processing, and was largely unrelated to cognitive and eyetracking measures. Taken together, the arousal-complexity bias seems to be caused by a relationship between arousal and visual processing as it has been described for the greater vividness of arousing pictures. The described arousal-complexity bias is also of relevance from an experimental perspective because visual complexity is often considered a variable to control for when using pictorial stimuli.

Introduction

Berlyne (1958) described visual complexity as being influenced by a variety of factors, including number of the comprising elements, as well as their heterogeneity (e.g., a single shape repeated vs. multiple distinct shapes), their regularity (e.g., simple polygons vs. more abstract shapes) and the regularity of the arrangement of elements (e.g., symmetry, distribution characteristics) (see Figure 1 of Berlyne, 1958). More recently, Pieters et al. (2010) provided a similar list of complexity factors, along with example advertisements that highlight these differences (Oliva et al., 2004; see Figure 2 of Pieters et al., 2010). Research into visual complexity has recently become a multidisciplinary topic, involving researchers in fields ranging from marketing (Pieters et al., 2010; Braun et al., 2013) to computer science (Itti et al., 1998), and from esthetics (Nadal et al., 2010) to human–computer interaction (Tuch et al., 2009) as well as psychology (Donderi, 2006; Cassarino and Setti, 2015, 2016). This approach to examining visual complexity in pictures that are clearly and consciously viewed, and judged based on the number of objects and their aggregate structure, is the focus of the current work.

Many studies have observed a relationship between visual complexity and affect (e.g., pleasantness). This relationship has been observed dating back to the early 1970s (e.g., Kaplan et al., 1972; Aitken, 1974; Aitken and Hutt, 1974) and this idea has re-emerged more recently (e.g., Stamps, 2002; Marin and Leder, 2013, 2016; Schlochtermeier et al., 2013; Machado et al., 2015; Marin et al., 2016). Importantly, these studies suggest that more complex pictures are perceived as more pleasant than less complex pictures, a hypothesis supported by earlier work where pleasantness and physiological arousal have been found to be higher for more complex abstract shapes (e.g., Berlyne et al., 1963, 1968; Vitz, 1964; Day, 1967). At particularly high levels of complexity, pleasantness decreases, however, following an inverted-U shaped function (“Wundt curve,” Berlyne, 1970). Importantly, in these studies, the directionality of this relationship between complexity and affect is always discussed as complexity influencing affect. However, there might be also an effect in the opposite direction, i.e., affect influencing perceived visual complexity.

This hypothesis seems plausible because emotionally arousing stimuli attract bottom-up attention, are processed with priority as well as a higher signal-to-noise ratio and are perceived more vividly (Pourtois et al., 2013; Markovic et al., 2014; Mather et al., 2016). The relationship between arousal and perception is reflected in greater activity in visual regions as well as in different eye movement patterns (Pessoa and Adolphs, 2010; Bradley et al., 2011; Ni et al., 2011). This difference in visual processing activity has primarily been attributed to autonomic activity and motivational salience. In addition, emotionally arousing stimuli possess certain cognitive characteristics that might influence experienced complexity. First, emotionally arousing stimuli are more distinctive relative to prior experiences (Schmidt, 1991). Second, they are semantically related, i.e., often belong to the same scripts such as disease, poverty, or crime (Talmi and McGarry, 2012). During processing of such a stimuli the associated script might more easily get activated and influence experienced complexity. Taken together, from a basic science perspective there is good evidence to hypothesize that emotionally arousing stimuli are perceived as more complex (Marin and Leder, 2013).

From an experimental approach, visual complexity is often considered a variable to control for when using pictorial stimuli to investigate affective (e.g., Ochsner, 2000; Talmi et al., 2007; Sakaki et al., 2012) or memory processes (e.g., Snodgrass and Vanderwart, 1980; Berman et al., 1989; Isola et al., 2014; Nguyen and McDaniel, 2015). As we were particularly interested in potential top–down effects of affect within studies of affect or memory, we presented pictures for several seconds, as opposed to other studies where pictures may only be presented briefly (e.g., 50–200 ms) along with visual masks. Importantly, visual complexity is often measured through ratings provided by participants in initial norming studies. However, while many studies have matched stimuli using visual complexity ratings, these studies did not consider that ratings of visual complexity may themselves be correlated with affective ratings (i.e., arousal and valence), and thus controlling for ratings of visual complexity might bias the affective quality of the stimulus material.

Computational Measures of Visual Complexity

To measure visual complexity, computational approaches can also be used. One of the simplest and most prevalent methods used to computationally measure visual complexity is to simply use the picture’s file size after using JPEG compression. The general idea behind this approach is that more complex pictures, given the same picture dimensions, can be compressed to a lesser degree than less complex pictures and thus more complex pictures have larger file sizes. Using this rationale, a number of studies and review articles have suggested the use of JPEG file size as a computational measure of visual complexity (e.g., Machado and Cardoso, 1998; Székely and Bates, 2000; Donderi, 2006; Forsythe et al., 2008, 2011; Martinovic et al., 2008; Forsythe, 2009; Tuch et al., 2009; Pieters et al., 2010; Stickel et al., 2010; Purchase et al., 2012; Marin and Leder, 2013; Schlochtermeier et al., 2013; Simola et al., 2013; Machado et al., 2015). While the JPEG file size does correlate with visual complexity, for scientific research it seems more appropriate to use a method designed to be similar to the computational processes that occur in early visual cortices. Nonetheless, here we will also evaluate the efficacy of JPEG file size as a computational measure of visual complexity.

One such computational process is edge detection, which is the identification of boundaries within a picture. Though several edge detection algorithms have been developed (for reviews, see Nadernejad et al., 2008; Juneja and Sandhu, 2009; Maini and Aggarwal, 2009), Canny’s (1986) algorithm has been found to generally be better able to detect edges (Nadernejad et al., 2008; Juneja and Sandhu, 2009; Maini and Aggarwal, 2009; Machado et al., 2015) and has been used in behavioral research (e.g., Rosenholtz et al., 2007; Forsythe et al., 2008; Ptak et al., 2009; Sakaki et al., 2012; Machado et al., 2015). See Figures 1C,D for examples of edge detection applied to naturalistic pictures¹. Edge detection can be summarized as ‘edge density,’ where an edge detection map is averaged to calculate a single value corresponding to the proportion of the map was identified as an edge.

FIGURE 1

FIGURE 1. Example pictures shown to participants as exemplars of (A) low visual complexity and (B) high visual complexity. (C,D) Illustrate the edge density of the two pictures. (E,F) Illustrate the feature congestion of the two pictures.

Based on prior work developing computational approaches to measure visual complexity (Rosenholtz et al., 2007), we additionally used two other computational measures: feature congestion and subband entropy. Feature congestion quantifies how ‘cluttered’ a picture is and incorporates color, luminance contrast, and orientation. See Figures 1E,F for examples of Rosenholtz et al.’s (2007) feature congestion algorithm applied to naturalistic pictures. Subband entropy quantifies the ‘organization’ within the picture, through Shannon’s entropy in spatial repetitions of hue, luminance, and size (i.e., spatial frequency). Rosenholtz et al. (2007) found that all three measures (edge density, feature congestion, and subband entropy) correlated with response time in a visual search task, demonstrating that these three computational measures relate to behavior.

It is worth considering, however, that the computational measures of visual complexity described here are limited to low-level visual features. As such, they would provide similar complexity values for a picture of dozens of leaves as for a picture of dozens of unique toys—whereas a viewer may know that the toys represent characters from different cartoon shows and are associated with more varied semantic information. Ratings of visual complexity are based on both low-level features in addition to high-level features such as object information. Nonetheless, higher-level visual features are difficult to systematically characterize (i.e., using computational algorithms and without subjective biases) and the current work focused on the relationships between computational measures of visual complexity and affective measures with ratings of complexity.

In sum, visual complex stimuli are perceived as more (positive) emotionally arousing. Further, it is well established that emotionally arousing stimuli attract selective attention, alter sensory processing and are reported as having higher vividness which might translate into higher experienced visual complexity (Marin and Leder, 2013). Such a greater visual complexity of emotionally arousing stimuli might also be supported by differences in cognitive processing, i.e., their higher distinctiveness and semantic relatedness. Based on these considerations we aimed to characterize the relationship between emotional arousal and perceived visual complexity in the current study.

The hypothesized relationship between emotional arousal and experienced visual complexity is also relevant from an experimental perspective because ratings are widely used to generate equally complex picture sets that differ only with respect to arousal and valence. Therefore, we investigated the relationship between affective properties of scenic stimuli, arousal and valence, and computational measures of visual complexity on experienced visual complexity (i.e., participant ratings). In doing so, the four most often proposed computational measures of visual complexity were compared as a secondary outcome. Admittedly, here we did not manipulate the pictures themselves and investigated the correlational relationship between affect and perceived visual complexity, rather than attempting to causally influence this relationship.

In a series of experiments, we examined the contributions of affective processes and computational measures of visual complexity to visual complexity ratings in naturalistic pictures. After establishing this effect across different subsets of stimuli, rating procedures, and presentation times (Experiments 1–2) we further explored how ratings of visual complexity related to measures of cognition, eye-tracking, emotion-related traits and deliberate control (Experiments 3–5).

Experiment 1

In Experiment 1, we first tested for relationships between affective and visual complexity ratings, as well as for relationships with the computational measures of visual complexity, across a large set of 720 pictures. In addition, the four computational measure of visual complexity—edge density, feature congestion, sub band entropy, and JPEG file size—were formally compared with respect to the shared variance with the ratings of visual complexity.

Methods

Participants

As prior studies have indicated that there are likely sex differences in affective processing (Cahill et al., 2001; Canli et al., 2002; Sergerie et al., 2008; Schneider et al., 2011; Brown and Macefield, 2014), we only recruited female participants in all experiments. In addition, we restricted the sample to female volunteers to improve inter-rate consistency, particularly since some positively valenced, arousing stimuli were erotic in nature. A total of 35 female volunteers (ages 18–40) with normal or corrected-to-normal vision participated. In all experiments, volunteers were recruited through an advertisement on the homepage of the University of Hamburg, gave informed written consent, and received monetary reimbursement (10€ per hour) for their participation. No volunteer participated in more than one experiment. The research was approved by the local ethics board (Board of Physicians, Hamburg, Germany).

Materials

A total of 720 pictures were used in the experiment: 239 pictures were selected from the International Affective Picture System (IAPS; Lang et al., 2008) database, and were supplemented by an additional 481 pictures found on the Internet that were thematically similar to pictures in the IAPS (and were adjusted to have the same picture dimensions as the IAPS pictures).

Pictures were chosen such that the picture set was approximately one-third each of positive, negative, and neutral pictures. Importantly, pictures were chosen such that pictures were distributed across six categories with different numbers of primary objects in the foreground (objects, animals, faces, one-person scenes, two-person scenes, multi-person scenes). Pictures in each topic category were evenly distributed across the three valences.

Procedure

Participants were told that they will be shown emotional pictures and be asked to rate these pictures on three scales: valence, arousal, and visual complexity. Participants were provided with instructions describing each measure. For the valence and arousal ratings, instructions were identical to those used by Lang et al. (2008) and participants rated the pictures using the 9-point Self-Assessment Manikin (SAM; Bradley and Lang, 1994). For the visual complexity rating, participants were instructed that: “A picture of a few objects, colors, or structures would be less complex than a very colorful picture of many objects that is composed of several components.” To further orient participants to this type of rating, participants were provided two example pictures: one low-complexity picture (Figure 1A) and one high-complexity picture (Figure 1B). A 9-point Likert scale, shown in Figure 2, was used for complexity ratings. For ratings of complexity and arousal, the left-most options corresponded to higher ratings of complexity and arousal, respectively. For valence, left-most options corresponded with higher ratings of pleasantness, lower ratings corresponded with unpleasantness.

FIGURE 2

FIGURE 2. Examples of the experimental methods. (A) Example pictures used in the experiments. (B) Scale used for the valence, arousal, and visual complexity ratings. The valence and arousal scales are adapted from the self-assessment manikin (SAM) developed by Bradley and Lang (1994).

On each trial, participants were first shown a picture for 2000 ms, followed by the rating screen, which persisted until all three ratings were given using the computer mouse. The order of ratings was constant across all trials and participants: valence, arousal, visual complexity.

Over two consecutive days, participants rated all 720 pictures for valence, arousal, and visual complexity (360 pictures per day; 1 h per day). An additional 5 ‘buffer’ pictures were presented as the first trials on the first day, to allow participants to become accustomed to the task.

Data Analysis

Effects were considered significant based on an alpha level of 0.05. Ratings of visual complexity, valence and arousal were computed as averages across participants to obtain normative ratings for each picture, as some of the experiments only involved a subset of the rating scales.

To examine the relative relationships of the examined measures with ratings of visual complexity, we conducted a hierarchical regression. In this regression, we first evaluated regression models that included only individual measures. Next, we evaluated models that had either affective ratings or computational measures. Finally, we evaluated a ‘full’ model that contained both sets of measures. The list of models considered and their respective model fitness measures are reported in Table 1. In subsequent experiments, we used subsets of the 720 pictures, with either 360 or 144 pictures; to demonstrate the robustness of the observed findings, model fitness indices are reported for these subsets as well. All subsets had an equal number of images from each category. Mean ratings/scores for each measure, for each category, are reported in Table 2.

TABLE 1

TABLE 1. Hierarchical regression analysis of rated visual complexity with affective ratings and computational measures of visual complexity, across all 720 images used in Experiment 1 and for only the subsets used in Experiments 2–4 (360 and 144 image subsets).

TABLE 2

TABLE 2. Mean (SD) values for each of the rating and computational measures, for each picture category, from the full set of 720 pictures.

For each regression model, we report both R², with ratings of visual complexity as the dependent measure, and ΔBIC. This second fitness index is the Bayesian Information Criterion (BIC), which includes a penalty based on the number of free parameters. Smaller BIC values correspond to better model fits. By convention, two models are considered equivalent if ΔBIC < 2 (Burnham and Anderson, 2004). As BIC values are based on the relevant dependent variable, ΔBIC values are reported relative to the best-performing model (i.e., ΔBIC = 0 for the best model).

Results and Discussion

Individual Regression Models

Arousal and valence

As shown in Figure 3 and Table 1, arousal was more related to ratings of visual complexity than valence [arousal: R² = 0.294; valence: R² = 0.06].

FIGURE 3

FIGURE 3. Scatter plots based on the ratings obtained in Experiment 1. Relationship between visual complexity and affect ratings: (A) arousal and (B) valence. Relationship between visual complexity ratings and computational measures: (C) edge density, (D) feature congestion, (E) subband entropy, and (F) JPEG file size. Each dot represents an individual picture (720 in total); lines represent linear regressions.

Computational measures of visual complexity

Visual complexity was computationally measured using edge density, feature congestion, and subband entropy. The three measures of visual complexity were significantly correlated with each other [edge density ↔ feature congestion: r(718) = 0.78, p < 0.001; edge density ↔ subband entropy: r(718) = 0.60, p < 0.001; feature congestion ↔ subband entropy: r(718) = 0.65, p < 0.001]. Here we additionally included JPEG file size to test if these three more formal measures of visual complexity are able to account for the variance explained by the JPEG file size (Figure 3F).

As shown in Table 1, these computational measures were able to explain significant portions of variability in visual complexity ratings, particularly feature congestion [R² = 0.199]. However, all of the R² values were still lower than ratings of arousal.

As an additional test for the utility in operationalizing visual complexity as JPEG file size, we conducted correlations between it and the other three computational measures of visual complexity. JPEG file size correlated highly with all three measures [edge density: r(718) = 0.78, p < 0.001; feature congestion: r(718) = 0.82, p < 0.001; subband entropy: r(718) = 0.64, p < 0.001].

Multiple Regression Models

To better characterize the relationships between these seven measures (arousal, valence, edge density, feature congestion, subband entropy, JPEG file size) on ratings of visual complexity, we conducted a series of multiple regression models within the hierarchical regression framework.

In the first model, we included only affective ratings (arousal, valence) and found that together they accounted for a sizeable portion of the variance in rated visual complexity [R² = 0.298]. In the second model, we included only the computational measures of visual complexity (edge density, feature congestion, subband entropy, JPEG file size) and found that together they yielded adjusted R² = 0.235. Excluding JPEG file size had a minimal effect on the amount of variance explained (decrease in adjusted R² from 0.235 to 0.233). Given this lack of additional variance explained, and the correlations reported above, JPEG file size was excluded from further analyses.

In the last model we included six measures, two of affect ratings (arousal, valence) and three of computational measures of visual complexity (edge density, feature congestion, subband entropy). Here we found that the combined model explained half of the variance in visual complexity ratings [R² = 0.524]. Given this incremental approach, it is clear that the affective ratings and computational visual complexity measures each explain unique portions of variance in visual complexity ratings. Nonetheless, we further tested for associations between affective ratings and computational visual complexity measures, and all were found to be non-significant [all r’s < 0.1; p’s > 0.05].

Taken together, the results of Experiment 1 demonstrated that despite the ratings indicating the contrary, emotional pictures were not more complex when evaluated using computational measures. This implies that complex pictures may not be higher in positive-valenced arousal as suggested in earlier studies (e.g., Berlyne et al., 1963, 1968; Vitz, 1964; Day, 1967; Kaplan et al., 1972; Aitken, 1974; Aitken and Hutt, 1974; Nadal et al., 2010; Forsythe et al., 2011). One potential explanation could be the nature of the employed visual stimuli, e.g., paintings or abstract pictures vs. natural scenes.

Importantly, Experiment 1 shows clearly that the affective factors, arousal and valence, relate to visual complexity ratings independent of the computational measures, where the effect of arousal is much more pronounced, i.e., explains substantially more unique variance. Mechanisms regarding how emotional arousal might enhance not only perception and related experienced vividness (Todd et al., 2012), but also perceived complexity, will be discussed in the general discussion. Nonetheless, as arousal was more strongly related to this bias in ratings of visual complexity than valence; thus, hereafter we will refer to this effect as the ‘arousal-complexity bias.’

Experiment 2

In Experiment 1, participants were presented with the pictures for a short duration (2000 ms). During this relatively brief period, participants needed to sample the information necessary to evaluate the pictures for arousal, valence, and visual complexity. Under time pressure, searching for potentially relevant picture characteristics is a demanding, goal-directed task under top-down control. It is known that emotionally arousing stimuli preferentially recruit attentional resources, such as in dual-task conditions, resulting in even greater memory advantages relative to neutral stimuli (Kensinger and Corkin, 2004; MacKay et al., 2004; Mather and Sutherland, 2011; Kang et al., 2014; Madan et al., 2017). Therefore, it may be possible that the greater experienced visual complexity for arousing stimuli in Experiment 1 was partly driven by the preferential recruitment of attentional resources. To test this hypothesis, we presented the pictures for 5 s in Experiment 2 to potentially attenuate any such effect.

Additionally, given that participants in Experiment 1 made their ratings in a fixed order, with the affective ratings always preceding the visual complexity rating, it is possible that we unintentionally induced an effect of affect on ratings of visual complexity. In Experiment 2A, we changed the rating procedure such that the ratings were made sequentially, rather than presenting all three rating scales simultaneously (as in Experiment 1). In Experiment 2B, participants were only asked to make visual complexity ratings, removing potential confounding effects of being asked to attend to the emotional features of the picture before the complexity rating as well as lowering the demands on information sampling during processing of the pictures.

To evaluate the influence of these potentially confounding factors, we correlated the ratings obtained in each of these experiment, for each picture, with those obtained in Experiment 1. We also report the mean absolute difference between ratings to evaluate the absolute agreement between the experimental procedures.

Since presentation time was increased, we decreased the picture set to prevent the experiment from becoming too long. This was done by randomly selecting 360 pictures from the full set of 720 pictures used in Experiment 1. To ensure that this picture set was representative, we re-calculated the correlations from Experiment 1 using only this picture subset. As shown in Table 1, correlations for this subset were comparable to the full stimulus set.