Cross-modal associations between materic painting and classical Spanish music

The study analyses the existence of cross-modal associations in the general population between a series of paintings and a series of clips of classical (guitar) music. Because of the complexity of the stimuli, the study differs from previous analyses conducted on the association between visual and auditory stimuli, which predominantly analyzed single tones and colors by means of psychophysical methods and forced choice responses. More recently, the relation between music and shape has been analyzed in terms of music visualization, or relatively to the role played by emotion in the association, and free response paradigms have also been accepted. In our study, in order to investigate what attributes may be responsible for the phenomenon of the association between visual and auditory stimuli, the clip/painting association was tested in two experiments: the first used the semantic differential on a unidimensional rating scale of adjectives; the second employed a specific methodology based on subjective perceptual judgments in first person account. Because of the complexity of the stimuli, it was decided to have the maximum possible uniformity of style, composition and musical color. The results show that multisensory features expressed by adjectives such as “quick,” “agitated,” and “strong,” and their antonyms “slow,” “calm,” and “weak” characterized both the visual and auditory stimuli, and that they may have had a role in the associations. The results also suggest that the main perceptual features responsible for the clip/painting associations were hue, lightness, timbre, and musical tempo. Contrary to what was expected, the musical mode usually related to feelings of happiness (major mode), or to feelings of sadness (minor mode), and spatial orientation (vertical and horizontal) did not play a significant role in the association. The consistency of the associations was shown when evaluated on the whole sample, and after considering the different backgrounds and expertise of the subjects. No substantial difference was found between expert and non-expert subjects. The methods used in the experiment (semantic differential and subjective judgements in first person account) corroborated the interpretation of the results as associations due to patterns of qualitative similarity present in stimuli of different sensory modalities and experienced as such by the subjects. The main result of the study consists in showing the existence of cross-modal associations between highly complex stimuli; furthermore, the second experiment employed a specific methodology based on subjective perceptual judgments.


Introduction
In recent years the field of perception studies has seen an increasing amount of research showing the tendency for a sensory feature, or attribute, in one modality to be matched with a sensory feature in another modality (Simner et al., 2005(Simner et al., , 2011Ward et al., 2006a,b;Cohen Kadosh et al., 2009. For a review see Spence, 2011). The phenomenon had already been pointed out by Köhler, who showed the tendency of the general population systematically to associate visual and auditory attributes (the so-called "takete-maluma" phenomenon) (Köhler, 1929;Gallace et al., 2011;Nielsen and Rendall, 2011). Initially prompted by interest in the field of synesthesia (Wicker, 1968;Melara and O'Brien, 1987;Cytowic, 1995;Baron-Cohen and Harrison, 1997;Ward and Simner, 2003;, studies then considered similar phenomena occurring in the general population (Martino and Marks, 2001;Maurer and Mondloch, 2005;Maurer, 2008, 2011;Parise and Spence, 2009;. In regard to the nature of such associations, Spence has distinguished among structural correspondences (due to neural correlates, hence potentially universal), statistical correspondences (due to learning, hence potentially influenced by different environments), and semantic correspondences (due to language influence, hence potentially different among cultures) (Spence, 2011). Recently, a growing number of researchers have sought to explain synesthetic and cross-modal associations in semantic more than sensory terms by re-evaluating the possible role of cognitive factors in the associations. In particular, the question has been raised in regard to inducers that take the form of concepts-such as the days of the week or the monthsusually associated with colors. In short, since inducers have a conceptual nature, it has been asked whether a full account of synesthesia should not go beyond the standard sensory-sensory approach (Dixon et al., 2006;Simner and Ward, 2006;Ward et al., 2007;Eagleman, 2012;Nikolić, 2012, 2014;Mroczko-Wasowicz and Werning, 2012;Simner, 2012;Ward, 2013;Mroczko-Wasowicz and Nikolić, 2014).

Bottom up and Top Down Explanations
The opposition between the sensory interpretation (bottom-up, i.e., sense-driven) and the conceptual interpretation (top-down, i.e., concept driven) of synesthetic associations has arisen within the classical framework of cognitive science, which counterposes the two levels of information processing. Consequently, the former interpretation has sought to explain the associations in terms of direct synaptic connections between neurons (representing the inducer and the concurrent); the latter is based on high-level processes due to language, culture, abstract symbolization, learning, etc. (Ward et al., 2007). The second interpretation, however, would seem better suited to explaining cases of color sequence synesthesia Tomson et al., 2013) and spatial sequence synesthesia Eagleman, 2010), where the names of time units and ordinal categories are involved. This second interpretation, which considers cases of synesthesia occurring independently of external inducers, has taken the name of ideasthesia (Meier, 2013;Jürgens and Nikolić, 2014 for a thorough discussion of the topic see Mroczko-Wasowicz and Nikolić, 2014).
There is also another interpretation. It rests on a not necessarily linguistic or symbolic conception of semantics. This third approach, of Gestalt derivation (Albertazzi, 2013), explains associations in terms of patterns of qualitative similarity present in different sensory modalities and perceived as such: for example, hot and cold, sad and happy, and pleasant and unpleasant, are connotative properties of both sounds and colors. This therefore concerns, not semantic information projected top-down into other domains, but qualities intrinsic to perceived phenomena. This position obviously does not preclude investigation of correlations at neuronal level or of the presence of cognitive dimensions due to learning, language, symbolization, etc. This interpretation has been adopted in studies on the associations between color and shape in the general population (Dadam et al., 2012;Albertazzi et al., 2013Albertazzi et al., , 2014Albertazzi et al., , 2015. Whatever viewpoint is adopted in interpreting the phenomenon, there is growing interest in cross-modal associations both within synesthesia (Simner, 2012;Ward, 2013) and in the general population .
In the field of cross-modal associations occurring in the general population, a perceptual attribute which proves to play an important role is color. In fact, it has been shown that color is associated with olfaction (Gilbert et al., 1996;Kemp and Gilbert, 1997;Demattè et al., 2006;Hanson-Vaux et al., 2013;Levitan et al., 2014), touch (Ludwig and Simner, 2013), and acoustics (Ward et al., 2006a;Moos et al., 2013Moos et al., , 2014.

Cross-Modal Associations between Visual and Auditory Stimuli
As regards studies on associations between visual and auditory stimuli, initially considered were predominantly single tones by means of psychophysical methods and forced choice responses (Walker, 1987). More recently, the relation between visual and more complex auditory stimuli (i.e., music clips) has also been tested (Tan and Kelly, 2004;Küssner, 2013b), and free response paradigms have also been accepted (Reybrouck et al., 2009). In particular, the task in Tan and Kelly (2004) was to create marks or drawings that visually represented five short orchestral compositions, and to write essays explaining their graphic representations; while in Küssner (2013b) the task was to visualize sound and music by creating representations with an electronic graphics tablet (in two different experimental conditions, i.e., drawing during and after the sounds). In their experiment, Tan and Kelly tested musically trained and untrained subjects, and found a difference between them: i.e., the musically trained participants provided abstract representations, such as lines of symbols, while the untrained participants produced pictorial representations, such as images or pictures telling a story. A second difference consisted in the fact that trained participants focused more on musical characteristics (such as theme, mode, changes in pitch, etc.), while untrained participants focused more on emotions.
It has also been found that music in the major mode more closely matches lighter colors than does music in the minor mode (Bresin, 2005), while faster music in the major mode more closely matches more saturated and lighter colors than does slower music in the minor mode . In particular, Bresin's study explicitly addressed the role of expressivity in music, which was verified by testing the association made by expert subjects between colors and performances of classical music. The results of this study showed that participants used different color profiles to classify the same piece of music, but these differences depended mainly on the performance and on the instrument. The study by Palmer et al. (2013) instead tested participants of different cultures (United States and Mexico), and found that in both cultures, faster music in the major mode produced color choices that were more saturated, lighter, and yellower, whereas slower, minor music produced the opposite pattern. Similarly, other studies have been conducted in order to explain the association between visual and auditory stimuli on the basis of their shared emotional content, such as the association between happy music and happy colors (Whiteford et al., 2013;Langlois et al., 2014).
As is well known, cross-modal and cross-dimensional associations have always played a role in aesthetics. Recently, the association between color and shape has been experimentally tested by studies relating to Kandinsky's hypothesis of a systematic association between geometrical shapes and colors (Droste, 1990;Lupton and Miller, 1991;Jacobsen, 2002;Kharkhurin, 2012;Albertazzi et al., 2013Albertazzi et al., , 2015Makin and Wuerger, 2013;Chen et al., 2015). Besides the above-mentioned shape/color association in Kandinsky, the analogy between scales of colors and scales of notes is a major component of the harmony theories developed by Klee (1956), Kandinsky (1926Kandinsky ( /1994, and Itten (Itten et al., 1970;Gage, 1999) with particular regard to Schönberg (Schönberg and Kandinsky, 1980;Bidaine, 2004). In our study, and following previous studies of ours dealing with aesthetics (Albertazzi et al., , 2015, we tested the association in the general population between some artistic works in painting and some pieces of classical music, in order to evaluate whether systematic cross-modal associations occur among stimuli of high complexity. Specifically, we tested whether images with varying perceptual characteristics and contents led to consistent associations with the music clips, and what attributes might be responsible for the phenomenon. The choice of the images and of the clips was based on the hypothesis that their artistic modes of expressions (the coloratura of the flamenco and the materic style of the paintings), and a series of connotative properties holding for colors and tones (like weak and strong, calm and agitated) play a role in the associations. Precisely, the specific coloratura (Tonkolorit) of flamenco music is characterized by very brief sharp notes and a minor scale. As to materic (or material) painting, this is painting realized with a great quantity of pictorial material, and characterized by a thick and tendentially 3D pictorial surface. The study that may show an affinity with ours was conducted in the 1930s by Cowles (1935), who also made use of complex stimuli (8 pieces of classical music, although composed by different musicians, and 8 paintings by various well-known artists), and with expert and non-expert participants. There are differences with our study, however, both in the number and in the kind of stimuli: in Cowles (1935) the pictures were mainly landscapes or scenes with simple content, without uniformity of style; the auditory stimuli were taken from works by different composers, differing in character, although there were no more than slight variations in volume, tempo, or tone quality. Finally, the aim of one of the two experiments conducted in Cowles' study, differently from ours, was to verify whether similar affective moods were found between the musical selections and the pictures.

The Study
The purpose of our research was to test whether the general population exhibits cross-modal associations between complex stimuli of two different modalities (vision and sound). Specifically, the aim of our research was to test whether significant associations existed between a series of paintings and a series of clips of classical (guitar) music, and whether these associations were consistent when evaluated on the same subjects. The research also sought to evaluate whether the findings were confirmed on different subjects with different backgrounds and expertise. Our expectation was that, if found, these associations would be consistent from one subject to another, suggesting a predisposition to perceive specific cross-modal natural associations between complex visual and auditory stimuli.
The selection of the paintings and the music clips was discussed with the painter (Matteo Boato: http://www.matteoboato. net/), who is also a musician and who provided a description of the individual art works and the characteristics of the music clips selected. The choice of Boato's works was made (apart from personal preference) on the basis of their characteristics of high chromaticity and saturation. Our hypothesis was that corresponding to these visual characteristics are similar patterns in the acoustic modality as to vibrato, coloratura, and quick tempo: for example, we expected that a quick tempo would correspond to a very chromatic and saturated red or yellow. Specifically, the hypothesis was that the association, if found, would be due to multisensorial and connotative features present in both the visual stimuli and the auditory stimuli, such as warmth/coldness, brightness/darkness, sadness/happiness, softness/hardness, etc. The prediction was therefore that the subjects would make systematic associations between the paintings and the music, and that the associations would be due to the presence of similar features in the paintings and the music, as also evidenced by the semantic differential. Because of the complexity of the stimuli, we tried to keep the maximum amount of uniformity possible. The purpose of using works by the same painter as stimuli was to maintain the same style (materic painting) and composition (expressionist) notwithstanding the diversity of content and colors (achromatic and mainly chromatic paintings depicting landscapes and figures were tested). The purpose of using clips from the same repertoire was to maintain the coloratura of classical Spanish flamenco music. The clips were instead chosen for their specific musical features, such as having a strong, hard, agitated sound and a quick tempo (presto) (for example, Asturias by Albéniz). The recorded music clips were performed by Boato himself. Finally, we did not test individual preferences because it was not an objective of our experiment.

Methods Participants
Sixty-three participants volunteered for two experiments: 38 women and 25 men (mean age: 22.6 years; standard deviation: 3.5; median: 22 years). All participants were recruited by e-mail from students in the Department of Cognition and Education Sciences, University of Trento, Italy. The address list of the students was provided by the student office. We firstly sent a mail asking the students to adhere to the experiment, mentioning that we were looking for people with a background in music, people with a background in art, and non-expert people. We didn't ask for professional people, however. When we contacted the students who adhered to the experiment, we decided to accept people who had a public or private artistic education in music for at least 4 years, people who had a private or public education in art, and non-expert people. The questionnaire reported this information. The subjects were also asked about a possible conscious synesthesia (Palmer and Schloss, 2010;Albertazzi et al., 2013Albertazzi et al., , 2015Palmer et al., 2013). The only exclusion criterion was self-reported defective color vision or acoustic impairment. After the experiment, the subjects were asked whether they had previously known the paintings and the pieces of music that they evaluated. For all the subjects the stimuli were totally new.
The first experiment was performed using the semantic differential on a unipolar rating scale of adjectives. It was decided to use a unipolar scale instead of the classic bipolar one of the Osgood semantic differential (Osgood, 1956) because the bipolar scale is not always one-dimensional. Sixty-one subjects participated in the experiment (two subjects who did not complete the experiment were excluded from the analysis). The second experiment evaluated the association between visual and auditory stimuli and was completed by all the 63 subjects. We tested non-experts (31), music experts (20), and art experts (12), the purpose being to investigate a possible influence of expertise on the associations. Individuals with training in private or public schools were considered expert participants in the present study. All the subjects signed an informed consent form. The experiments reported here complied with the ethical guidelines of the University of Trento.

Procedure
The experiment was performed in a laboratory with constant and controlled lighting conditions (230-250 lux) in the room, correlated color temperature 3400K, halogen lamp). The visual stimuli appeared on a Quato Display 242ex (Intelli Prof 242 excellence) 24 ′′ screen (51.8 × 32.4 cm visible area); the auditory stimuli were administered through Sennheiser HD580 Precision headphones. Automatic 48 bit USB-hardware calibration with 3 × 16 bit 3D Look-Up Table and luminance inside the monitor, dedicated luminance stability circuit, UDACT display analysis built-in; the measurement device was a 4-channel Silver Haze Pro colorimeter. The resolution used was 1920 × 1200 pixels (the native and the maximum possible for the monitor Display Quato 242.
Participants were seated at a desk. The distance from the center of the screen to the eye was about 65 cm. Chin supports were not used, but during each session the postures of the participants were checked and corrected if their chests approached the screen or their backs were hunched.
The clips (performed by the same player) were taken from the following musical works: (1)  The assessments of the adjectives were arranged on a continuous scale between 0 and 1024. We selected for the experiment mainly adjectives that could be applied to both music and paintings. The experiment was preceded by a pilot test with the same characteristics as the experiment itself but a much longer list of adjectives. The original list of adjectives included 49 items evaluated by 35 subjects. After a correlational study, the list of adjectives was shortened to include 22 items. The final list of adjectives (presented in Italian) was the following: slow, quick, agitated, calm, happy, sad, warm, cold, heavy, light, continuous, rhythmic, strong, weak, dark, bright, hard, soft, impression of horizontality, impression of verticality, adagio, presto (the two last items were left in the adverbial form as they are in Italian). As to the chromatic dimensions, neither hue nor saturation were considered (all the paintings were uniformly drawn with very saturated hues), but rather the dimensions of warmth (warm/cold) and brightness (light/dark) (relying on the contrast between the fragments of colors and the painted background used by the painter). The choice of dimensions was due to their perceptual salience and to the fact that they are the most meaningful dimensions in cross-modal associations where color is involved. The asymmetric choice of having the subjects listen to a music clip and asking them to associate three paintings with them, and not vice versa, was dictated by the complexity of the task, which was of considerable duration (about an hour and a half, with a pause). We also hypothesized that asking the subjects to look at the paintings and associate three music clips from the classical guitar repertoire with them would have been an excessively burdensome task. In fact, it would have required listening to 15 clips sequentially (although in random order) for each painting. Instead, as shown in Figure 1, the 15 paintings were seen all together.

Experiment 1
The experiment was performed using the semantic differential on a unidimensional rating scale of adjectives. First the individual images (in random order) were presented on the screen and then each music clip was executed (also in this case, the order of presentation was randomized). Participants were told that they would first see a set of images (each was displayed on the screen for 10 s) and then hear a series of music clips (each lasting 60 s). For each stimulus the subject had to evaluate, on a continuous scale, his/her degree of agreement with a series of adjectives. Participants were given the following written instructions for the task: You will be presented with images on the screen or music clips through your headphones accompanied by a series of adjectives in succession. You should evaluate these adjectives with reference to the image or music presented. Evaluation of the adjective will be made on a continuous scale. You should prefer accuracy to promptness of response.
The purpose of the experiment was to check whether complex images and music clips with varying perceptual characteristics led to consistent choices of adjectives. Images were shown one by one (in random order) on the left half of the screen, while on the right half of the screen participants saw one after the other the adjectives presented randomly (Figure 2). The same occurred with the music clips, which could be heard by clicking on a button positioned on the left side of the screen.

Experiment 2
The purpose of the second experiment was to check whether images with varying perceptual characteristics and contents led to consistent associations with music clips taken from the repertoire of classical (guitar) music. Each subject saw a series of images of paintings in preview on the screen. The subject clicked on a specific image, which thus appeared in full screen mode, and likewise with the other images, in no particular order. The subject viewed the images while simultaneously listening to a music clip. The subject had to choose the image(s) that s/he most naturally associated with that music. S/he could list up to three images associated with the clip, arranging them in order of appropriateness from 1 to 3 in three different boxes at the bottom on the screen (Figure 1). The subject could go back to re-view images already seen, and s/he could also listen repeatedly to the music clip. Once the association had been decided, the images selected were transported down into one of three boxes, depending on the degree of association, in order from 1 to 3. Once the choice had been confirmed, it could not be changed, and the task continued with re-presentations of all the images and further music clips until the latter were exhausted.
Participants were given the following written instructions for the task: You will see a series of images of paintings in preview on the screen. Click on one of them, which will appear in full screen mode, and then do likewise with the other images. At the same time, you will hear a music clip. Select which image(s) you most naturally associate with the music. You can go back to re-view images already seen, and also to hear the music clip again. You can list up to three images associated with the music, placing them in order FIGURE 1 | Example of a painting selected in association with a given music clip (The arrow points where to click to hear the music clip again).
Frontiers in Psychology | www.frontiersin.org of appropriateness from 1 to 3. Once you have confirmed your choice, it cannot be changed, and the task will continue with further music clips until there are none left. You should prefer accuracy to promptness of response.

Statistical Methods
Associations between quantitative variables were evaluated by means of the non-parametric "rho" correlation coefficient. The chi-square test for a contingency table was employed to evaluate the associations between the paintings and the music clips. A residual analysis was performed to identify which painting/clip combinations were significant . Analyses were performed with R 3.0.0 software (R Core Team, 2013).

Experiment 1
Table 1 reports the mean rating values for each word-painting pair given by the 61 participants. Means range between 186 and 842. This latter value was obtained when considering painting number 3 ("Mantova, 2009") and the adjective "bright"; therefore this painting was considered the most luminous. The minimum value was obtained when considering painting number 14 ("Leopard") and the adjective "weak"; therefore this painting was considered the least weak of the 15 paintings. Table 2 reports the mean rating values for each word-clip pair given by the 61 participants. Means range between 133 and 864. This latter value was obtained when considering clip no. 14 (Villa Lobos, Study n. 6) and the adjective "agitated"; therefore this clip was considered the most agitated. The minimum value was obtained when considering clip no. 8 (Fernando Sor, Variations on a theme by Mozart-I var) and the adjective "dark"; therefore this clip was considered the least dark of the 15 clips.
To evaluate the degree of association between the semantic rating (i.e., considering the mean ratings of the 22 words) of one selected painting and one selected clip, non-parametric rho correlation coefficients were calculated. The results are shown in Table 3 (the rows contain the 15 music clips, the columns the 15 paintings).
These correlations ranged between -0.69 and 0.90. This latter value was obtained when considering the mean ratings of the 22 words given to painting no. 11 ("Land-Hora et Labora") (see Table 1) and to clip no. 6 (Gaspar Sanz, Canarios) (see Table 2). The highest negative correlation was found between painting no. 14 ("Leopard") and clip no. 3 (Francisco Tárrega, Recuerdos de la Alhambra). Table 4 shows the results of Experiment 2.
Since the test did not indicate which clip was associated (positively or negatively) with which painting, a residual analysis was performed. A standardized form of the residual was employed. This behaves like a normal deviate to determine whether the residual is large enough to indicate a departure from a random choice. In this case, there is only about a 5% chance that any particular standardized residual exceeds 1.96 in absolute value. When we inspected 225 cells, about 11 residuals (i.e., 5% of 225) could have been so large solely because of random variation. On the other hand, as can be seen in Table 5, there were 40 residuals greater than 1.96 in absolute value.
Overall, there were 22 residuals greater than 1.96, and 18 residuals lower than -1.96. A positive residual means that the selected clip "attracted" the corresponding painting; a negative residual means that the selected clip "repelled" the corresponding painting.

A Comparison between the Results of Experiment 1 and Experiment 2
To evaluate if and to what extent the "direct" associations found in Experiment 2 were in agreement with the correlations in terms of semantic differential (Experiment 1), we counted how many times the sign of the "significant" residuals (i.e., residuals greater than 1.96 in absolute value) shown in Table 5 for the 40 clip/painting combinations was the same as the corresponding correlation shown in Table 3. For 21 combinations, both the residuals and the correlations were positive, showing that, when a particular painting was attracted by a given clip, the 22 words had similar ratings. On the other hand, for 12 combinations both the residuals and the correlations were negative, showing that, when a particular painting was repelled by a given clip, the 22 words  Word  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15   Adagio  444  312  595  182  333  206  350  221  269  603  409  597  274  163  392   Presto  489  620  392  655  580  682  583  696  664  410  559  349  592  696  549   Agitated  598  661  346  805  645  685  500  634  725  370  500  397  646  864  578   Calm  453  366  600  137  327  209  415  238  254  597  423  570  308  162  had opposite ratings. In the remaining seven combinations, the sign of the residual and the sign of the correlation disagreed. If the clip/painting associations shown in Table 5 randomly agreed with the correlations shown in Table 3, a total of 20 combinations would have the same sign and 20 combinations would have different signs. An exact binomial test yielded a significant result (p < 0.001), in contrast with the hypothesis that the associations found were essentially random. Furthermore, the correlation between the values reported in Table 3 and all the standardized residuals shown in Table 5 was significantly different from zero (rho = 0.338; p < 0.001). Therefore, at least in part, the painting/clip association could be explained by similar perceptual characteristics.
Quite similar results were found when the analyses described above were performed considering all the three paintings selected for a given clip. The final correlation coefficient was 0.365 (0.338 was found when only the first painting chosen was considered).
The subjects who participated in the experiment were classified into three groups: music experts, painting experts, and nonexperts. When all the analyses were repeated selecting only the subjects of the same group, similar results were found. When music experts were selected, the correlations were 0.293 (considering only the first painting chosen) and 0.341 (all the three paintings chosen). When painting experts were selected, the correlations were, respectively, 0.210 and 0.319. Painting/clip association in terms of similar perceptual characteristics was confirmed also within the three groups.

Discussion
The study tested whether the general population exhibits crossmodal associations between complex stimuli of two different modalities, and specifically between a series of paintings and a series of clips of classical (guitar) music. The test was conducted with subjects who were both expert and non-expert in visual and musical arts.
The study tested the association in two experiments. One was conducted using the semantic differential on a unidimensional rating scale of adjectives; the other was based on subjective judgments on the association between visual and auditory stimuli. The hypothesis was that the association, if found, would be linked to the presence of characteristics of the paintings and the music clips, perceived as such by the subjects, and evidenced also when evaluated by means of the semantic differential. Due to the experimental nature of the study, the link between the two experiments cannot be consistently found for each clip/image couple; in some cases such a link may not be consistent. Overall, the results show the existence of an association between paintings and music clips among experts in music, experts in painting, and subjects with no artistic training, within each group and overall.
These results were consistent when considering both the first painting chosen and all the three paintings selected for a given clip. Specifically, there were five clip/image couples for which a very strong attraction was found: specifically, clip no. 3  (Francisco Tárrega, Recuerdos de la Alhambra) was strongly associated with image no. 5 ("Full moon"); clip no. 4 (Isaac Albéniz, Leyenda) was strongly associated with image no. 14 ("Leopard"); clip no. 14 (Villa Lobos, Study n. 6) was strongly associated with image no. 14 ("Leopard"); clip no. 1 (Villa Lobos, Prelude n. 4) was strongly associated with image no. 10 ("Sea II"); clip no. 7 (Fernando Sor, Variations on a theme by Mozart-Theme) was strongly associated with image no. 15 ("Matilada and Beatrice").
As an example, Figure 3 shows the scatterplot of the ratings given to the 22 adjectives for clip 3 (on the vertical axis) and image 5 (on the horizontal axis).
Most of the adjectives show a linear pattern, with low values for "happy, " "agitated, " "quick, " "presto, " and "strong" and high values for "sad, " "horizontal, " "slow, " "calm, " and "continuous." As a second example, Figure 4 shows the scatterplot of the ratings given to the 22 adjectives for clip 14 (on the vertical axis) and image 14 (on the horizontal axis).
Also in this case the adjectives show a linear pattern, with low values for "calm, " "slow, " "adagio, " and "weak, " and high values for "agitated, " "quick, " "strong, " and "presto." The study considered stimuli of great complexity. Consequently, we chose to have the maximum possible uniformity of  style, composition and musical coloratura. The results show that the associations were made on specific characteristics that the subjects perceived as similar between the paintings and the music clips.
In particular, the associations between paintings and music clips proved to be consistent (even within the triads of images selected in the associations). Specifically, a strong positive association was found between clip no. 4 (Isaac Albéniz, Asturias) and image no. 14 ("Leopard"); between clip no. 14 (Villa Lobos, Study n. 6) and image no. 14 ("Leopard"); between clip no. the attributes that seemed to play the most significant role in the associations obtained were "calm, " "agitated, " "slow, " "quick, " "strong, " "presto, " "adagio." Consequently, relevant features in the association with paintings seem to be the timbre and the musical tempo, as shown by the positive associations between clip no. 3 (Francisco Tárrega, Recuerdos de la Alhambra) and painting no. 5 ("Full Moon"), and between clip no. 14 (Villa Lobos, Study n. 6) and painting no. 14 ("Leopard"); and by the negative association between clip no. 3 and painting no. 14.
Contrary to what was expected (Bresin, 2005), the results instead show that the musical mode usually related to feelings of happiness (major mode), or to feelings of sadness (minor mode), and the spatial orientation (vertical and horizontal) as expressed by the attributes tested with the semantic differential, did not play a significant role in the association. Finally, no substantial difference was apparent between expert and non-expert subjects. Because all the images had highly saturated colors, we did not test for a potential association between these color dimensions and major and minor modes, or slow and fast music (see Bresin, 2005;Palmer et al., 2013). The purpose of our study was not to analyse the production of visual representation of a sequence of sounds in simple drawings as in Küssner (2013b; see also Küssner, 2013a), but the association of highly complex paintings and musical pieces of classical music. The two studies are then only partially comparable because our goal was not the visualization of music. What we asked the subjects to do was associate highly complex Gestalten in the visual and acoustic fields (not single parameters such as pitch and loudness) while listening to the clips. In other words, the task was much more complex and closer to the natural global perception of stimuli in the environment (in this case, of an artistic kind). Also different from Küssner (2013b) was the expertise of the participants; in fact we tested experts in music, non-experts in music, and art experts, but obviously we did not test experts in dance because our goal was not to test the motor action aspects of the associations (see also Maes et al., 2014). As to the study conducted by Cowles (1935), there were differences in the number and the kind of stimuli, in the aims and in the methodology: in Cowles' test the pictures, as mentioned, were mainly landscapes or scenes with simple content, while there was greater uniformity in our stimuli as to the paintings (which were by the same artist, and in the same style, materic and expressionist) and the music (all our clips were taken from Spanish classical guitar music). The contents of the paintings, instead, were different. In our experiment, besides the cross-modal association between auditory and visual stimuli, we also made use of the semantic differential method. But similarly to Cowles, our results showed no difference between experts and non-experts. The methods used in the experiment (i.e., semantic differential and subjective judgements) corroborated the interpretation of the results as associations due to patterns of qualitative similarity present in stimuli of different sensory modalities and experienced as such by the subjects (Albertazzi et al., , 2014. Also in this respect the methodology that we used differed from the standard ones: we did not rely on psychophysical methods, reaction times (as in Marks, 2004;Spence, 2011), and forced choice responses (Walker, 1987); and we obviously did not make use of computational technologies. Our aim was to remain as close as possible to the natural perception of auditory and visual items. As said, the tested adjectives very frequently exhibited a linear pattern in the association between the paintings and the music clips: for example, having low values for "happy, " "agitated, " "quick, " "presto, " and strong and high values for "sad, " "horizontal, " "slow, " "calm, " and "continuous." On the basis of these findings, and the fact that we didn't find any difference between expert and non-expert subjects, the tested semantic connotations of the stimuli might be considered as affordances playing the role of general semantic information clues, which makes perfect sense in a framework of an ecology of meaning. It has been recently shown, for example, that subjects in the general population group natural shapes on the basis of certain visual qualitative characteristics: specifically, non-spiculed, non-holed, and flat shapes are experienced and classified as harmonic and static, while rounded shapes are classified as harmonic and dynamic, and elongated shapes as somewhat disharmonious and somewhat static (Albertazzi et al., 2014). Because of the complex nature of the stimuli, and on the basis of our results, one can conclude that there are aesthetic, sometimes ideaesthetic dimensions in perceptual awareness. These dimensions act as Gestalten or templates playing the role of an immediate understanding of the complex objects we usually encounter in the environment. Furthermore, these Gestalten exhibit common patterns in the different modalities, as we have found in our study.
Finally, in our study we did not specifically test the emotional response, as in Cowles (1935), Di Dio and Gallese (2009), Juslin and Sloboda (2001), Krumhansl andLerdahl (2011), Langlois et al. (2014), Madison (2011 and Zaidel (2010), because it was not our primary interest. However, some of the adjectives tested with the semantic differential test, such as "calm" and "agitated, " "happy" and "sad" proved to have an important role.
In light of the overall results, one cannot exclude the presence of potential top-down influences (however unconscious), although our study did not aim to investigate these aspects. In this regard, what we did in our experiments was to invite the subjects to be as careful as possible to avoid the influence of past experience.
On the basis of our results, it is likely that the choice of a different number of adjectives restricted to a small number of characteristics, and limiting the range of associations and the length of the experiment, might yield further consistent information about the cross-modal associations obtained. Presenting adjectives in pairs, like calm/agitated, weak/strong, might also contribute to shortening the duration of the test. However, such a choice would have overestimated the correlation which in our study is also possibly overestimated, because the adjectives were not entirely independent. It is also likely that choosing a more uniform theme for the paintings (only landscapes, for example) would make the test shorter. A further development of the design might consist in testing the associations between the paintings and a series of music clips from a different musical repertoire, reducing the uniformity of patterns. Nevertheless, it seems worthwhile to continue testing cross-modal associations in complex stimuli, because these are usually experienced in perceiving. Finally, it would be advisable to repeat the experiment with subjects from other cultures, such as oriental ones, in order to test for the presence of possible pictorial and musical biases in the associations found.
In conclusion, our study shows (i) the existence of crossmodal associations between complex visual and auditory stimuli, (ii) the existence of associations between visual and auditory stimuli when evaluated employing the semantic differential, and (iii) that these associations were at least partially consistent with each other. These findings corroborate the interpretation that the associations are partially due to patterns of qualitative similarity present in stimuli of different sensory modalities.