Institute of Neuroradiology, University of Zurich, Zurich, Switzerland
To the naïve observer, cubist paintings contain geometrical forms in which familiar objects are hardly recognizable, even in the presence of a meaningful title. We used fMRI to test whether a short training session about Cubism would facilitate object recognition in paintings by Picasso, Braque and Gris. Subjects, who had no formal art education, were presented with titled or untitled cubist paintings and scrambled images, and performed object recognition tasks. Relative to the control group, trained subjects recognized more objects in the paintings, their response latencies were significantly shorter, and they showed enhanced activation in the parahippocampal cortex, with a parametric increase in the amplitude of the fMRI signal as a function of the number of recognized objects. Moreover, trained subjects were slower to report not recognizing any familiar objects in the paintings and these longer response latencies were correlated with activation in a fronto-parietal network. These findings suggest that trained subjects adopted a visual search strategy and used contextual associations to perform the tasks. Our study supports the proactive brain framework, according to which the brain uses associations to generate predictions.
Object recognition is a highly developed visual skill in primates. Behavioral and electrophysiological studies in humans and monkeys have suggested that object recognition is a rapid process that can be achieved within a few hundred milliseconds (Rousselet et al., 2002
). Moreover, it has been shown that identification of objects within natural scenes is facilitated when the context is meaningful (Biederman, 1972
; Bar, 2004
). The process of parsing the world into meaningful objects is mediated by activation in ventral occipitotemporal cortex, the so called “what” pathway (Ungerleider and Mishkin, 1982
; Goodale and Milner, 1992
; Haxby et al., 1994
). Recent functional brain imaging studies in humans have shown that objects elicit neural responses in a distributed cortical network that encompasses a wide expanse of extrastriate cortex (Ishai et al., 1999
, 2000a
; Haxby et al., 2001
), where various object categories such as faces, animals, houses, tools, and body parts elicit distinct patterns of activation (Kanwisher et al., 1997
; Aguirre et al., 1998
; Epstein and Kanwisher, 1998
; Ishai et al., 2000a
; Downing et al., 2001
; Yago and Ishai, 2006
). Furthermore, ambiguous figures (Kleinschmidt et al., 1998
), illusory contours (Stanley and Rubin, 2003
), binocular rivalry (Tong et al., 1998
), and visual imagery (Ishai et al., 2000b
, 2002
; Mechelli et al., 2004
) evoke activation in these object-responsive regions, suggesting that the visual system imposes top-down interpretations on ambiguous bottom-up retinal input.
Art compositions comprise a special class of visual stimuli with which one can investigate the mechanisms of various cognitive processes (e.g., Chatterjee, 2004
). Specifically, abstract and indeterminate paintings, which resist identification, can be used to investigate the neural correlates of object recognition (Ishai et al., 2007
). Indeterminate artworks (Pepperell, 2006
) present the viewer with an apparently meaningful yet persistently meaningless scene, or a “potential image” (Gamboni, 2002
), namely a complex multiplicity of possible images, none of which ever finally resolves. Meanwhile, traditional abstract compositions, which do not suggest natural objects, use purely visual forms of line, color and shape to evoke emotional and aesthetic responses, and tend not to produce a representational dilemma in the viewer.
Recently we have shown that compared with representational paintings that explicitly depict objects, subjects are slower to recognize familiar objects in indeterminate and abstract art works. Moreover, representational paintings are more likely to be remembered than indeterminate compositions in a delayed memory task, suggesting that meaningful content is critical for incidental memory (Ishai et al., 2007
). Using fMRI, we have shown that representational paintings, which depict scenes cluttered with familiar objects, evoke stronger activation than indeterminate and abstract paintings in higher-tier visual areas and in the temporoparietal junction, whereas scrambled paintings evoke imagery-related activation in the precuneus and prefrontal cortex. Our findings suggest that recognition of familiar content in art works is mediated by object recognition, memory recall and mental imagery, cognitive processes that evoke wide spread activation (Fairhall and Ishai, 2008
).
It has been previously suggested that relevant contextual knowledge is a prerequisite for comprehending prose passages (e.g., Bransford and Johnson, 1972
). It is currently unknown, however, to what extent prior knowledge about indeterminate works of art affect their perception. In cubist artworks, objects are broken up, analyzed, and re-assembled to produce abstracted forms, which often depict the same objects from different viewing points. To the naïve observer, these paintings appear to contain geometrical forms in which familiar objects are hardly recognizable, even in the presence of a meaningful title. Cubist paintings are therefore unique “stimuli” with which one can study the effect of top-down knowledge on bottom-up processing. The aim of the current study was to test the extent to which a short training session about Cubism would facilitate object recognition in paintings by Picasso, Braque and Gris. We assumed that providing naïve subjects with some information about Cubism would aid task performance and hypothesized that subjects who received training would recognize familiar objects faster than control subjects, and would exhibit stronger activation in object-responsive and attention-related regions.
Subjects
Twenty-four healthy, right-handed subjects (13 males, 11 females, mean age 24 years) with normal vision participated in the study. All subjects gave informed written consent for the procedure in accordance with protocols approved by the University Hospital of Zurich. The subjects, students from the University of Zurich, had no formal art education and reported visiting art museums once a year or less. Post-scan questionnaires revealed that all subjects were unfamiliar with the paintings and had not seen them prior to the experiment.
Stimuli and Tasks
Stimuli were displayed using Presentation (www.neurobs.com
, version 12.2) and were projected with a magnetically shielded LCD video projector onto a translucent screen placed at the feet of the subject. Stimuli consisted of 42 color and 42 monochrome Cubist paintings by Picasso, Braque and Gris. In half the trails, scrambled images, which were created by phase scrambling luminance and color information from these paintings, were used for visual baseline. We used an event-related design: in each trial, a meaningful title (e.g., “Vase with flowers”) or the word “Untitled” was presented for 1.5 s, followed by a painting or a scrambled image, which was presented for 3.5 s. While the picture was on the screen, subjects had to answer the question “Do you recognize any familiar objects?” by pressing one of two buttons (Yes/No). A screen then appeared for 3 s with the question “How many objects did you recognize?” and subjects had to press one of four buttons to indicate “0”, “1”, “2” or “3 or more” objects. A blank screen (inter-stimulus-interval) was then presented for 8 s, thus, the duration of each trial was 16 s. Trial types (painting/scrambled; title/untitled) were randomized and for each subject 7 time series of 16 trials each were collected. Thirty minutes before scanning, half the subjects (six males, six females) received a short training session, during which they were presented with information about Cubism, viewed examples of Cubist paintings, and practiced recognizing familiar objects in these paintings.
Data Acquisition
Data were collected using a 3T Philips Intera whole body MR scanner (Philips Medical Systems, Best, The Netherlands). Changes in blood-oxygenation level-dependent MRI signal were measured by using sensitivity encoded gradient-echo echoplanar sequence (SENSE, Pruessmann et al., 1999
) with 33 axial slices, TR = 2 s, TE = 35 ms, flip angle = 80°, field of view = 220 mm, acquisition matrix = 128 × 128, reconstructed voxel size = 1.72 × 1.72 × 4 mm, and SENSE acceleration factor R = 2.
High-resolution spoiled gradient recalled echo structural images were collected in the same session for all the subjects (160 sagittal slices, TR = 8.21 ms, TE = 3.8 ms, field of view = 240 mm, acquisition matrix = 256 × 256, reconstructed voxel size = 1 × 0.9 × 0.9 mm). These high-resolution structural images provided detailed anatomical information for the region-of-interest (ROI) analysis and for 3D normalization to the Talairach and Tournoux atlas (1998)
.
Data Analysis
For each subject, responses and reaction times were computed for stimulus type (painting/scrambled), title (meaningful title/untitled), object recognition (Yes/No) and number of objects (0, 1, 2, 3 or more) tasks. ANOVA was used to compare the various conditions.
Functional MRI data were analyzed in BrainVoyager QX Version 1.10 (Brain Innovation, Maastricht, The Netherlands). All volumes were realigned to the first volume, corrected for motion artefacts and spatially smoothed using a 5-mm full-width-at-half-maximum Gaussian filter. Stimulus events were modeled using a delta function, which was convolved with a canonical hemodynamic response function to yield a regressor for each condition. The main effects of interest (paintings vs. scrambled images; Yes vs. No objects; number of recognized objects; and titled vs. untitled paintings) were analyzed using the General Linear Model (Friston et al., 1995
). Based on the main effect (paintings vs. scrambled images, p < 0.001, uncorrected) a set of ROIs was defined, which included the dorsal occipital cortex (DOC), fusiform gyrus (FG), parahippocampal cortex (PHC), intraparietal sulcus (IPS), inferior frontal gyrus (IFG), putamen and the anterior cingulate cortex (ACC). Note that the specification of ROIs was orthogonal to the subsequent tests that were addressed at the second level analysis. For each subject and in each ROI, the mean parameter estimates were calculated separately for each experimental condition (title, training, objects and number of objects) and were used for between-subjects random-effects analyses.
Finally, we tested whether reaction times were correlated with brain activation by including the response latencies as a covariate in the GLM analysis. The reaction times of each subject and each trial were normalized by z-transformation, and the standard hemodynamic response function (HRF) was then multiplied with the new z-values for each trial, thus creating a latency-correlated design-matrix.
Behavioral Data
The behavioral data collected while subjects performed the tasks in the scanner are shown in Figure 1
. During the first task (“did you recognize any familiar objects?”) trained subjects had a significantly higher proportion of Yes responses than control subjects [t(22) = 2.35, p < 0.05]. In terms of response latencies, it took control subjects the same time to respond “Yes, I recognized familiar objects” and “No, I did not recognize familiar objects”. In contrast, trained subjects took significantly longer to report “No, I did not recognize familiar objects”, both relative to their own Yes responses [t(22) = 3.85, p < 0.001], and to the Yes responses made by the control subjects [t(22) = 3.25, p < 0.01].
Figure 1. Behavioral data. (A) Mean responses and reaction times recorded during the first task (“did you recognize any familiar objects?”). (B) Mean responses and reaction times recorded during the second task (“how many familiar objects did you recognize?”). In this and subsequent graphs, error bars indicate standard error of the mean (SEM).
During the second task (“how many objects did you recognize?”), trained subjects showed both significantly lower proportion of “0” responses [t(22) = 2.37, p < 0.05], and significantly higher proportion of “2” responses than control subjects [t(22) = 3.02, p < 0.01]. Interestingly, it took both control and trained subjects longer to report recognizing 2 and 3 familiar objects than 0 objects [control subjects: t(22) = 3.56, p < 0.01, t(21) = 2.77, p < 0.05; trained subjects: t(21) = 2.53, p < 0.05, t(21) = 2.15, p < 0.05].
We then compared the response to titled and untitled paintings. During the object recognition task, trained subjects had a significantly higher proportion of Yes responses (0.8 ± 0.04, mean ± SE) than control subjects (0.6 ± 0.06) for paintings that were preceded by meaningful titles [t(22) = 2.77, p < 0.05]. During the number of objects task, trained subjects reported not recognizing any objects (“0”) significantly less than control subjects [t(22) = 2.69, p < 0.05] for paintings that were preceded by meaningful titles, and a 2-way ANOVA revealed a significant interaction between the two groups and the reported number of objects [F(3,95) = 6.06, p < 0.001]. Moreover, trained subjects reported recognizing two objects in titled paintings significantly more than control subjects [t(22) = 3.64, p < 0.01].
Finally, we tested whether there were any differences between control and trained subjects in terms of their responses to the scrambled images. During the object recognition task, control subjects did not recognize familiar objects in 91% ± 4% of the scrambled paintings, whereas trained subjects did not recognize familiar objects in 96% ± 2%. Moreover, response latencies were virtually identical (793 ± 42 and 780 ± 47 ms for control and trained subjects, respectively). In terms of the number of recognized objects, control subjects recognized one object in 7% ± 3% of the scrambled paintings and their mean reaction time was 1089 ± 113 ms, whereas trained subjects recognized one object in 4% ± 2% and their mean response latency was 1101 ± 98 ms. The differences were not statistically significant.
Imaging Data
The main effect, namely responses evoked by all paintings as compared with the scrambled paintings baseline, revealed activation within a distributed cortical network that included multiple, bilateral regions (Figure 2
). Significant activation was found in DOC, FG, IPS, PHC, IFG, and ACC (see Table 1
for mean Talairach coordinates and cluster size). Comparing color with monochrome paintings revealed activation in extrastriate cortex (mean Talairach coordinates: 30, −70, −13; −26, −70, −13), consistent with previous findings of activation in human V4 (e.g., McKeefry and Zeki, 1997
).
Figure 2. Activation evoked by Cubist paintings as compared with scrambled images. Group statistical maps, illustrating significant activation in DOC, FG, IPS, putamen and ACC are shown for control (A) and trained (B) subjects.
Table 1. Regions activated during presentation of cubist paintings. N indicates number of subjects who showed activation in a region. Coordinates are in the normalized space of the Talairach and Tournoux brain atlas. Numbers in parentheses indicate standard error of the mean (SEM).
We then conducted an ROI analysis to test for differences between Yes and No responses during the object recognition task, number of recognized objects, and titled as compared with untitled paintings. We found that within the IPS, recognizing familiar objects evoked stronger activation than not recognizing any objects. In both hemispheres, The difference between Yes and No responses was statistically significant in both control [mean parameter estimates ± SE were 1.61 ± 0.09 and 1.26 ± 0.12, respectively, t(22) = 4.92, p < 0.0001] and trained subjects [1.79 ± 0.06 and 1.42 ± 0.07, respectively, t(34) = 7.52, p < 0.0001].
Within the FG and PHC, Yes responses for untitled paintings evoked stronger responses in trained (1.91 ± 0.06 and 1.68 ± 0.07, respectively) than control subjects (1.70 ± 0.04 and 1.45 ± 0.05, respectively) and the differences between the groups were statistically significant [t(40) = 2.98, p < 0.01 for FG; t(39) = 2.47, p < 0.05 for PHC].
We also found an effect of title on the number of recognized objects. Thus, within the PHC, trained subjects showed higher activation than control subjects for “3 or more objects” responses [1.95 ± 0.06 and 1.71 ± 0.08, respectively, t(39) = 2.28, p < 0.05]. Furthermore, within the FG, trained subjects showed higher activation than control subjects for recognizing 3 or more objects [2.11 ± 0.05 and 1.85 ± 0.07, respectively, t(40) = 2.83, p < 0.01].
Significant differences between trained and control subjects were found in the paraphippocampal cortex, in terms of the evoked response associated with the number of recognized objects (Figure 3
). Trained subjects showed an increase in the amplitude of the fMRI signal as a function of the number of objects they recognized. Thus, recognizing 3 or more objects evoked higher activation than not recognizing any objects [t(40) = 6.03, p < 0.0001]. The difference between trained and control subjects in terms of activation in the FG and PHC was statistically significant for 3 or more objects [t(40) = 2.83, p < 0.01 and t(39) = 2.28, p < 0.05, respectively]. Finally, the interaction within the FG between group and number of objects was significant [F(3,335) = 4.16, p < 0.01].
Figure 3. Activation in parahippocampal cortex. (A) Group statistical maps, illustrating significant activation in PHC for control (left) and trained (right) subjects. (B) Mean BOLD responses evoked when subjects reported recognizing 0, 1, 2, or 3 or more familiar objects in the paintings. Data were averaged across 12 subjects in each group.
Finally, we tested whether the reaction times were correlated with brain activation. Interestingly, we found that in trained, but not in control subjects, the longer response latencies associated with “No, I did not recognize any familiar objects” were correlated with activation in a network of brain regions (Figure 4
), which included the medial temporal gyrus (mean Talairach coordinates: 41, −62, 21); IPS (37, −41, 46); medial frontal gyrus (37, 0, 49); inferior frontal gyrus (41, 29, 30) and insula (38, 2, 2).
Figure 4. Correlations between response latencies and brain activation. In trained subjects, the slower reaction times recorded during their “No, I did not recognize any familiar objects in the painting” responses, were correlated with activation in a network of regions that included the medial temporal gyrus (MTG); intraparietal sulcus (IPS); medial frontal gyrus (MFG); inferior frontal gyrus (IFG) and insula (INS).
In this study we tested whether a short training session would facilitate object recognition in cubist paintings. We found that training resulted in significant behavioral and neural changes. Trained subjects were faster and recognized significantly more familiar objects in the paintings, and exhibited enhanced activation in the parahippocampal cortex. Furthermore, trained subjects were significantly slower to report not recognizing any familiar objects in the paintings and these longer response latencies were correlated with activation in a fronto-parietal network that mediates spatial attention (e.g., Kastner and Ungerleider, 2000
).
In contradistinction with perceptual learning, which requires repeated sessions, or the long-term acquisition of expertise (Poldrack, 2002
; Bukach et al., 2006
), our subjects underwent a short training session, 30 min before their brains were scanned. During this training session, subjects were presented with examples of cubist paintings and learned how to recognize familiar objects depicted in these paintings. The behavioral and neural changes observed in our trained subjects are therefore likely due to a strategy they adopted during training. Based on their responses during the object recognition and number of object tasks, and given the observed patterns of brain activation, it is reasonable to assume that trained subjects used contextual associations and a visual search strategy in order to perform the tasks.
The extent to which titles do or should influence the perception of meaning and the aesthetic impression of art compositions is contentious. In art theoretical terms, critics of a formalist persuasion claim that titles are merely “identification tags” that should not affect the viewer’s reading of the work. Others, however, claim titles function as guides to interpretation and provide important contextual cues to engage the attention of the viewer (Fisher, 1984
). Empirical evidence suggests that titles influence both the understanding and the appreciation of art paintings (e.g., Leder et al., 2006
). In a compelling example of the top-down effects of titles on art perception, viewers’ description of the content of paintings varied according to the title (e.g., “Agony” vs. “Carnival”) they were presented with (Franklin et al., 1993
). In our experiment, cubist paintings were preceded by their meaningful title or by the word “Untitled”. We found that meaningful titles facilitated object recognition, but only in trained subjects. Thus, relative to control subjects, trained subjects reported recognizing more familiar objects in paintings with meaningful titles. Moreover, recognition of two objects in titled paintings resulted in enhanced activation in the IPS. These findings suggest that meaningful titles can provide the top-down solution for ambiguous visual input, but only when prior knowledge or experience exists. Our findings are consistent with previous findings which showed that presenting a topic before a prose passage facilitates its subsequent comprehension and recall (Bransford and Johnson, 1972
), indicating that relevant contextual information is required for understanding. Recent studies, in which eye-movement recordings were compared, have shown that artists view pictures differently from laymen: artists spent more time scanning structural and abstract features, whereas artistically untrained subjects viewed human features and objects (Vogt and Magnussen, 2007
). Taken together, these observations suggest that recognition of familiar content in art works is a skill acquired through training.
The most surprising and intriguing finding in our study is the enhanced activation in the parahippocampal cortex of trained subjects. The PHC, a region implicated in the representation and processing of spatial navigation information (Epstein and Kanwisher, 1998
), episodic memory (e.g., Gabrieli et al., 1997
) and remote spatial memories (Spiers and Maguire, 2007
), is a major node in the cortical network for contextual associations (Bar et al., 2008
). Associations are formed over time, when repeated patterns and statistical regularities are extracted from the environment and stored in memory. It has been recently suggested that the role of associations is to generate predictions about the immediate future in order to guide behavior (Bar, 2007
). It is highly likely that due to the short training session, our subjects used contextual associations in order to perform the tasks. For example, a meaningful title such as “Woman Reading” likely activated an existing “script” of a living room, a familiar scene which was previously encountered and stored in memory (see Bar, 2009
). Subjects were therefore able to anticipate a woman sitting, a chair or a sofa, hands holding a book, etc. Thus, prior experience and stored representations facilitated the comprehension of visual scenes represented in indeterminate cubist paintings.
On a more speculative note, our findings could also provide empirical evidence for Bayesian analysis, which was proposed as a model for object perception (Kersten et al., 2004
) and evoked cortical responses (Friston, 2003
, 2005
). According to the Bayes perspective, the short training session enabled our subjects to successfully match the indeterminate visual input with their top-down predictions. It is reasonable to assume that trained subjects were more likely than control subjects to suppress errors and establish a consensus between the actual bottom-up input and the top-down prediction. Thus, minimizing prediction error resulted in faster recognition of more familiar objects in cubist paintings.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank Robert Pepperell for providing us with the stimuli and for reading the manuscript and Karl Friston for his helpful comments. This study was supported by the Swiss National Science Foundation grant 3200B0-105278 and by the Swiss National Center for Competence in Research: Neural Plasticity and Repair.