Old Proverbs in New Skins – An fMRI Study on Defamiliarization

We investigated how processing fluency and defamiliarization (the art of rendering familiar notions unfamiliar) contribute to the affective and esthetic processing of reading in an event-related functional magnetic-resonance-imaging experiment. We compared the neural correlates of processing (a) familiar German proverbs, (b) unfamiliar proverbs, (c) defamiliarized variations with altered content relative to the original proverb (proverb-variants), (d) defamiliarized versions with unexpected wording but the same content as the original proverb (proverb-substitutions), and (e) non-rhetorical sentences. Here, we demonstrate that defamiliarization is an effective way of guiding attention, but that the degree of affective involvement depends on the type of defamiliarization: enhanced activation in affect-related regions (orbito-frontal cortex, medPFC) was found only if defamiliarization altered the content of the original proverb. Defamiliarization on the level of wording was associated with attention processes and error monitoring. Although proverb-variants evoked activation in affect-related regions, familiar proverbs received the highest beauty ratings.


INTRODUCTION
The emerging field of neuroesthetics is marked by a lack of neuroimaging studies regarding the esthetic perception of literature (Schrott and Jacobs, 2011). In this paper, we will use the term "literature" in a broad sense including any text that has the potential to elicit an esthetic response either triggered by certain features of the text itself, such as rhetorical figures, or by external framing, for instance by the label "novel" on the front cover (cf. Bleich, 1978;Schrott and Jacobs, 2011). Likewise, we use the term "esthetics" in the broad sense of neuroesthetics "to encompass the perception, production, and response to art, as well as interactions with objects and scenes that evoke an intense feeling, often of pleasure" (Chatterjee, 2011, p. 53). This paper aims to investigate how feelings of pleasure, so familiar to everyone who enjoys a good book or poem, can arise from a task such as reading, which is primarily based on a number of cognitive skills. The key concept of defamiliarization refers to the use of artistic techniques in order to turn something familiar into something that appears unfamiliar or strange. Specifically, we investigated to which extent defamiliarization (Miall and Kuiken, 1994) contributes to the affective and esthetic perception of literature by making it harder to process. In the following, we will introduce three theoretical frameworks of how cognitive load potentially relates to affective responses and preference judgments.

HEDONIC FLUENCY HYPOTHESIS
A number of theories predict a higher preference for familiar and conventional stimuli over novel and unfamiliar ones: describing the mere-exposure effect, Zajonc (1968) claimed that mere repeated exposure to a stimulus is sufficient for people to develop a positive attitude toward it. In a seminal study, Zajonc (1968) demonstrated a positive correlation between word frequency and affective connotation. Such a positive correlation between exposure frequency and preference has been replicated several times for a number of different stimulus categories and can be regarded as a robust effect, although it is influenced by certain variables such as the number of repetitions, initial familiarity etc. (Bornstein, 1989). The preference-for-prototypes effect (Martindale and Moore, 1988;Martindale et al., 1990;Winkielman et al., 2006) shows that more typical members of a category are frequently preferred over less typical members. Both effects are probably driven by the force of perceptual fluency, the ease by which a sensory input can be processed. The hedonic fluency hypothesis states that simply because a stimulus with a high familiarity/typicality/expectedness/exposure is processed faster than a novel or unknown stimulus, it is accompanied by a positive affective evaluation (Reber et al., 1998(Reber et al., , 2004Winkielman and Cacioppo, 2001). In literature, processing fluency could, for instance, be modulated through the use of stylistic devices that regulate the cognitive processing demand (e.g., formulaic expressions, repetition figures). Fluency and familiarity evaluation are automatic processes that can influence the esthetic judgment prior to conscious processing (Kunst-Wilson and Zajonc, 1980;Leder et al., 2004;Kuchinke et al., 2009). Applying the hedonic fluency hypothesis to the process of reading, the prediction follows that easy-to-read text is preferred over more difficult-to-read text. Indeed, subtle semantic coherence of words can increase the experience of hedonic fluency (Topolinski et al., 2009), and comparable effects have been shown for rhetorical devices, especially www.frontiersin.org repetition figures such as rhyme (McGlone and Tofighbakhsh, 2000), alliteration (Lea et al., 2008), and parallelistic syntactic structures (Sturt et al., 2010). The main prediction following the hedonic fluency hypothesis would be that text with low cognitive processing demand is preferred over more difficult-to-read text.

FOREGROUNDING HYPOTHESIS
Defamiliarization, achieved through "the novelty of an unusual linguistic variation" (Miall and Kuiken, 1994, p. 391), is a very influential concept for twentieth century art, impacting for instance film, theater, visual arts, and literature. Already the Russian Formalists and Czech Structuralists (Mukarovský, 1964;Shklovsky, 1998) claimed that "the technique of art is to make objects 'unfamiliar,' to make forms difficult, to increase the difficulty, and length of perception because the process of perception is an esthetic end in itself and must be prolonged" (Shklovsky, 1998, p. 18). Following their ideas, the concept of foregrounding is used "to indicate the (psycholinguistic) processes by which -during the reading act -something may be given special prominence" (Van Peer and Hakemulder, 2006). The theory of foregrounding is based on principles of cognitive psychology and the empirical study of literature, suggesting that certain linguistic devices on the phonetic, syntactic, or semantic level can be used to defamiliarize the reading experience and thereby slow down the automatic reading process even in skilled readers. Empirical evidence comes from behavioral studies (van Peer, 1986;Miall and Kuiken, 1994;Hanauer, 1998;Hakemulder, 2004), in which participants gave stronger affect and "strikingness" ratings for those literary texts that took longer to read due to a high density of "foregrounding" features (i.e., stylistic elements). According to foregrounding theory (Miall and Kuiken, 1994), reading times increase with higher density of foregrounding devices because stylistic variation increases text complexity. As complexity is a diffuse concept, predictability could serve as a simpler, quantifiable, moderating variable. There is sound empirical evidence that the predictability of words in a sentence context affects eye movements, reading times, and brain potentials (McDonald and Shillcock, 2003;Rayner et al., 2004;Frisson et al., 2005;Dambacher et al., 2006Dambacher et al., , 2009Dambacher and Kliegl, 2007). Unexpected words slow down the speed of reading and cause characteristic event-related brain potentials (Kutas and Hillyard, 1980). Apart from such cognitive effects, discrepancy can also raise physiological arousal (MacDowell and Mandler, 1989), trigger appraisal processes (Scherer, 2001), and be accompanied by interest or surprise (Silvia, 2008). When readers are confronted with novel, unexpected elements in a text, they usually react emotionally with curiosity and dishabituation (Oatley, 1994). High defamiliarization by means of stylistic variation lowers the predictability of single words in a text and promotes the esthetic perception of poetry and literature (Miall and Kuiken, 1994;Hanauer, 1998). In summary, foregrounding theory predicts that a text which is highly defamiliarized through the use of stylistic elements will be preferred over text which is easier to read, because it is processed in a more affective manner.

OPTIMAL INNOVATION HYPOTHESIS
An alternative to the theories described above is provided by the optimal innovation hypothesis (Giora et al., 2004). A phrase that elicits a salient response by carrying familiar elements while at the same time eliciting a non-salient, novel response (e.g., weapons of mass distraction), should be more pleasurable than an all too easily processed, conventional phrase (e.g., weapons of mass destruction), which only elicits a salient response. It should equally be more pleasurable than a similar-sounding, novel phrase that only elicits a non-salient response (e.g., weapons of glass deduction). For a response to be "novel" in the sense of Giora and colleagues, it has to bring forward a "discretely different conceptual meaning than the one activated by the familiar original from which it stems" (Giora et al., 2004, p. 117). As difficult as it is to define what makes an "optimally innovative stimulus" (which is further complicated by interindividual differences), twists of conventional expressions, so-called "proverbvariants" (e.g., Absence makes the heart go wander), are a common rhetorical device in journalism, advertisement, song lyrics, or catch-phrases, to raise attention and often to create ironic effects (Mieder, 2008). The ability creatively to transform figurative language and to create novel metaphors and figurative expressions is one of the final abilities acquired in the process of language learning, as it requires the ability to reflect on language (Levorato and Cacciari, 2002), and is used as a measure of verbal creativity (Bechtereva et al., 2007;Fink et al., 2007). The fluency hypothesis and the theory of foregrounding/defamiliarization postulate a linear relationship between cognitive processing effort and pleasure. The optimal innovation hypothesis, on the contrary, states a non-linear relationship by predicting that the most pleasing text will offer neither a maximum ease of processing fluency nor a maximum degree of defamiliarization, but will provide an optimal combination of both dimensions.

PRESENT EXPERIMENT
The present experiment investigated how processing fluency and defamiliarization modulate the affective and esthetic perception of proverbs by manipulating proverb familiarity and introducing two different types of defamiliarized proverb-variants. Explicit readerresponses and changes in the regional cerebral blood flow, measured by functional magnetic-resonance-imaging (fMRI) were analyzed. While not intuitively associated with the term "literature" by many people, proverbs turned out to be adequate stimuli for this purpose of interdisciplinary research. Short enough to fulfill multiple experimental requirements, they are at the same time complex psycholinguistic stimuli that people encounter in real life, and that are applied for clinical diagnosis (Gibbs and Beitel, 1995; for a comprehensive review see Thoma and Daum, 2006).The five experimental conditions (exemplified in Table 1 and described in greater detail in the Materials and Methods) were (a) familiar proverbs (e.g., Rome was not built in a day) (b) unfamiliar proverbs (e.g., Not every cloud rains) Similar to McGlone et al. (1994), two versions of manipulated proverbs, based on the template of a familiar proverb were created: (c) proverb-substitutions: defamiliarized versions of the familiar proverbs in which one word was substituted by a close synonym, thereby violating the form but not changing the Frontiers in Psychology | Language Sciences content (e.g., Rome was not erected in a day). This version was considered relatively low innovative. (d) proverb-variants: defamiliarized versions of the familiar proverbs in which by substitution of a single word the central concept of the familiar proverb was changed (e.g., Rome was not destroyed in a day). This version was considered relatively high innovative. (e) literal, non-rhetorical sentences that served as a baseline condition (e.g., Salt makes better taste).

EXPLICIT ESTHETIC JUDGMENTS
The three theoretical frameworks described so far make different predictions regarding explicit esthetic judgments (i.e., ratings of "beauty" given by participants after the fMRI experiment). Given that familiar proverbs are the easiest condition to process, according to the hedonic fluency hypothesis one can predict a linear relationship between stimulus complexity (which affects processing fluency) and beauty ratings in a way that familiar proverbs will be favored, followed by the simple literal sentence condition (e) and the defamiliarized conditions (c, d), while the most difficult-to-process, unfamiliar proverbs should score the worst. Galak and Nelson (2011) demonstrated that when reading for enjoyment, readers prefer text that they can read fluently and a positive linear relationship between processing fluency and liking has been described before (Reber et al., 1998). Foregrounding theory makes just the opposite prediction: highly foregrounded unfamiliar proverbs would be preferred to proverb-variants (d) and proverb-substitutions (c), followed by familiar proverbs. The literal condition (e), lacking any foregrounding elements, should receive the lowest esthetic appreciation. Third, the optimal innovation hypothesis predicts a non-linear relationship between cognitive fluency and pleasure and thus qualitative differences between the two defamiliarized versions: Proverb-variants should be favored over proverb-substitutions, because only the former fulfill the criteria of being "optimally innovative" (Giora et al., 2004). The latter, being also defamiliarized versions of the original proverbs but maintaining their central concepts, meet the criterion of defamiliarization, but are not "optimally innovative." Recognizing the original proverb elicits a salient response, while at the same time the unexpected word triggers a non-salient response. Following from the optimal innovation hypothesis, proverb-variants (d) should be favored over all other conditions.

PREDICTED NEURAL CORRELATES
While the three theoretical frameworks described above make predictions about explicit reader response, these theories clearly are based on behavioral experiments and are mute regarding the brain. However, a positive explicit esthetic judgment could potentially be reflected in activation of affect-and reward-related brain regions. Assuming that familiarity and defamiliarization modulate explicit reader response, we hypothesized that these two dimensions would also modulate the involvement of the reward system (Kringelbach et al., 2008), and regions formerly found for affective processing www.frontiersin.org of single words. A recent review by Citron (2012) highlights the influence of emotion variables on written word processing. Based on previous neuroimaging studies, we expected to find activity related to familiarity and defamiliarization in brain regions such as the left amygdala (Landis, 2006), extrastriatal visual regions (Herbert et al., 2009), the striatal region (Hamann and Mao, 2002), the left orbito-frontal cortex (OFC), the bilateral inferior frontal gyrus (IFG), and the superior frontal and middle temporal gyrus (SFG/MTG; Kuchinke et al., 2005;Jacobs, 2011).
To investigate effects of familiarity independently from rhetorical foregrounding, we contrasted the neural correlates of reading familiar proverbs against reading unfamiliar proverbs. A stronger engagement of affect-related regions during the reading of familiar proverbs would support the hedonic fluency hypothesis, while a similar activation during the reading of unfamiliar proverbs would support the foregrounding theory.
To investigate effects of defamiliarization we contrasted the neural correlates of reading the two types of defamiliarized proverbs ("optimally innovative" proverb-variants and proverbsubstitutions) both with familiar proverbs and with each other. If both defamiliarized conditions activated affect-related regions to a similar degree, this would be in line with the foregrounding theory. Differences in the intensity of activation between proverb-variants and proverb-substitutions would support the optimal innovation hypothesis, whereas stronger affective involvement for familiar proverbs would be in line with the hedonic fluency hypothesis.
Additionally, we assumed that foregrounding and defamiliarization would be correlated with increased attention demands. We expected activation of the bilateral frontoparietal attention network, covering both inferior frontal gyri and the inferior parietal lobes (Corbetta and Shulman, 2002). Familiar proverbs and non-rhetorical sentences were expected to serve as background conditions that were relatively easy to process. Thus, for the familiar proverbs and non-rhetorical sentences we predicted relatively stronger activity within the default mode network (Buckner et al., 2008), which is usually involved in self-referential thinking during resting state.

PARTICIPANTS
Twenty-six healthy participants underwent the fMRI study (mean age 25 years, range 20-45; 13 female, 13 male). Informed consent was obtained from all participants and the experiment was approved by the local ethics committee (Charité, Berlin). All participants were native German speakers, right-handed as determined by the Edinburgh handedness inventory (Oldfield, 1971), had normal or corrected-to-normal vision, and neither obvious reading deficits (assessed with the SLS -Salzburger Lesescreening; unpublished version for adults), nor a history of neuropsychiatric disorders or psychoactive medication.

STIMULI
In total, five different categories of stimuli were shown, each comprising 40 items. To investigate effects of familiarity, (a) familiar proverbs, frequently used in German, and (b) unfamiliar German proverbs were collected. Familiar proverbs are conventionalized and therefore rather easy to process. They are "prefabricated: that is, stored and retrieved as a whole from memory at the time of use" (Wray, 2002, p. 9), and are therefore read faster than novel phrases (Cacciari and Tabossi, 1988;Tabossi et al., 2009). To investigate effects of defamiliarization, two types of defamiliarized proverbs were created: (c) proverb-variants, in which the content of a familiar proverb was twisted by replacing a single word. Proverb-variants can thus be considered as defamiliarization of common proverbs mainly on the semantic level. In contrast, (d) proverb-substitutions were created by replacing a single word of a familiar proverb with a synonym, thereby preserving the original meaning as far as possible, but violating the conventional form. Proverb-substitutions can be considered as a type of defamiliarization affecting mainly the level of style and wording. To have a high-level baseline, (e) 40 non-rhetorical sentences, which lacked proverb-characteristic stylistic features and had a valid literal interpretation, served as a control condition ( Table 1; see Appendix for a complete list of stimuli). Non-rhetorical sentences were carefully chosen to match the familiar proverbs and unfamiliar proverbs regarding topics (simple statements about folk psychology and world-knowledge) but importantly, they did not correspond to specific proverbs and they lacked proverb-characteristic rhetorical features. All other conditions were matched in number and type of rhetorical features (i.e., phonological similarities such as rhyme/alliteration, meter, parallelism, and ellipses).
All conditions were matched for important lexical parameters such as number of words, number of syllables, and mean word frequency taken from the Wortschatz Lexikon of the University of Leipzig 1 . Google counts dating from July 2009 were used as an approximation of the whole item's frequency and are reported in Table 2 with all other lexical parameters. The frequency measures indicate that in real life the familiar proverbs occur far more frequently than all other conditions (all ps < 0.001) and that the nonrhetorical control items occur the least frequently (all ps < 0.05).

Pretests
Familiar and unfamiliar proverbs were selected according to dichotomous familiarity judgments (known/unknown) that 14 participants had given for each item of a pool of 800 German proverbs and aphorisms collected from proverb dictionaries and online databases. Unfamiliar proverbs had been judged as "unknown" and familiar proverbs had been judged as "known" by all 14 participants. Twenty-five participants (12 female, 13 male) not involved in the main fMRI experiment or any other pretest rated the stimulus set regarding arousal and valence using the SAM instrument (Bradley and Lang, 1994). No significant differences in arousal or valence ratings were found. Another sample of 29 (19 female, 10 male) participants not involved in the main fMRI experiment or any other pretest rated the stimulus set on inventiveness using a seven-point Likert scale. In line with classical rhetorical theory of wit and wordplay (Cicero, De oratore, II.216-290), all rhetorical conditions (familiar proverbs, unfamiliar proverbs, proverb-substitutions and proverb-variants) were rated as significantly more inventive than the non-rhetorical sentences. Among the rhetorical conditions, proverb-substitutions, in which the original proverb's wording was violated, but not the content level, were rated as significantly less inventive than the others. See Table 3 for a list of Means and SDs of all pretest ratings.

TASK DESIGN
After task practise and instruction outside of the scanner, participants underwent a scanning session consisting of five imaging runs with 40 trials each. Each trial started with a fixation cross at the position of the first letter, followed by a sentence presented in one line of white letters on a black background (Arial; 18 pt; left aligned; 4 s), and a blank screen after which a verbal category cue (everyday life, health and well-being, love and relationship, or work and success) was presented on the screen (2 s). The blank screen and fixation cross were jittered (2-12 s; mean 5 s, respectively). Participants were instructed to read silently, and to indicate whether or not an item fitted into the provided semantic category by pressing a button within the response window of 2 s, during which the category cue was shown. Four independent raters had rated the fit of each item into each category before, so that across participants, each item was paired equally often with a fitting and a non-fitting category (defined as categories on which all four raters agreed). Within participants, the number of fitting and non-fitting category cues was also balanced. Each run comprised eight items per condition. The order of items in each condition was counterbalanced across participants/runs using a Latin Square design to rule out sequence-effects. Stimuli were pseudo-randomly mixed within each run. Timing and order of stimulus presentation were optimized for estimation efficiency using rsfgen (AFNI) 2 . Presentation (Neurobs, Inc., Albany, CA, USA) 3 was used for stimulus delivery and timing on a Dell computer under Windows XP. Visual stimuli were presented using MR-compatible LCD goggles (Resonance Technology Inc., Northridge, CA, USA), and the computer was synchronized with the onset of each functional run to ensure the accuracy of event timing. Participants responded with their right hand via a MR-compatible button box. Following the functional scan, participants gave explicit esthetic judgments outside of the MR by rating each item on overall "beauty" (allowing for consideration of stylistic quality, pleasantness, but also approval of the social or moral value implied in a proverb) on a sevenpoint Likert scale ranging from 1 (not beautiful at all) to 7 (very beautiful). Afterward, the participants were asked if they had encountered the item in exactly this wording ever before, prior to the experiment. For this task, they made use of a seven-point confidence scale ranging from −3 (definitely unfamiliar) to +3 (definitely familiar). Values around zero indicate that they were unsure whether they had encountered the expression before. The familiarity rating served as a measure to control if the participants had really known the "familiar" items before and if the "unfamiliar proverbs," proverb-substitutions and proverb-variants had been novel to them. For an illustration of the task and procedure see Figure 1.

fMRI ACQUISITION
Imaging was performed using a 3T Siemens (Erlangen, Germany) Tim Trio MRI scanner fitted with a 12-channel head coil at the Dahlem Institute for Neuroimaging of Emotion (DINE).

DATA ANALYSIS
Behavioral data from the semantic categorization task and from the post-scan ratings were analyzed using repeated measures analysis of variance (ANOVA) in PASW 18 (IBM SPSS Statistics). The only interest in the semantic categorization data was to check if participants performed above chance level, which was interpreted as indicating that participants had achieved at least some access to the semantic meaning of the items. The main purpose of the semantic categorization task was to keep participants' attendance high throughout the experiment. Interpreting any "accuracies" or reaction time data seemed unreliable. Proverbs can have manifold interpretations and the item difficulty in the semantic categorization task could not be controlled, so even a "wrong" answer would have been no hard proof for misunderstanding (because the participant might have had a different, but equally valid interpretation in mind). Therefore, "accuracies" and reaction times were not modeled in the analysis of the functional data. This was, however, unproblematic, because the regressors that modeled the conditions covered the 4 s period during stimulus presentation prior to the semantic categorization task. BrainVoyager QX 2.0 (Brain Innovation, Maastricht, The Netherlands) was used to analyze the recorded MRI data (Goebel et al., 2006). The functional data were slice-scan time corrected (cubic-spline interpolation) to correct for the sequentially executed interleaved slice acquisition and motion corrected. Intra-session image alignment to correct for motion across runs was performed using the first image of the first functional run as the reference image. Following linear trend removal, data was filtered temporally in 3D with a high pass Fourier filter of two cycles in time course to remove low frequency drifts. Preprocessed data were spatially smoothed using an 8 mm full-width-halfmaximum Gaussian kernel to reduce noise. For spatial normalization the individual T1 images were transformed into Talairach space (Talairach and Tournoux, 1988) and all statistical analyses were performed in Talairach space. Anatomical regions were identified by manual inspection using the Talairach atlas and the Talairach demon 4 . The statistical analyses were carried out using a voxel-wise General Linear Model (GLM) at the single-participant-level first, based on design matrices, which included the estimated 3D motion parameters obtained during preprocessing as well as predictors for all task conditions and the button-response window. Separate regressors per condition were modeled using a boxcar function with the length of the duration of the stimulus presentation (4 s per trial), which was convolved with a theoretical Two Gamma hemodynamic response function (Friston et al., 1998), and the model was independently fitted to the signal of each voxel. Fixation periods were not modeled and the response period during the semantic categorization task was modeled as a regressor of no interest. These estimates of the trial responses relative to baseline were subsequently combined to provide an estimate of the condition effects, which could then be used to contrast the experimental conditions. The reported group analyses were conducted following a random effects model. Analysis space was covering the whole brain, head and skull tissue excluded. Unless stated otherwise, the correction level for reported activations was p < 0.05 [based on an initial voxel-level threshold of p (uncorrected) < 0.005]. To control for Type I error, the uncorrected maps were entered into a Monte Carlo simulation to determine the cluster size correction level. Clusters below the correction level are neither reported nor visualized. To determine familiarity effects, familiar, and unfamiliar proverbs were directly contrasted against each other, as well as contrasted separately against non-rhetorical sentences.
To discover brain regions sensitive to both types of defamiliarization, a conjunction analysis [(proverb-substitutions > familiar proverbs) and (proverb-variants > familiar proverbs)] was carried out. To specify which brain regions were sensitive to the type of defamiliarization, proverb-variants were directly contrasted with proverb-substitutions.

BEHAVIORAL RESULTS
Task performance during the distracter task was analyzed to check participants' involvement during the fMRI experiment. The mean accuracy of the semantic categorization task during the experiment was significantly above the 50% chance level for all conditions (see Table 4), rendering it highly likely that participants were engaged in interpreting the semantic meaning of the items in all conditions. The mean response times in the semantic categorization task were comparable between sentence types, F (4,22) = 0.93, p > 0.05, indicating equal task difficulty across conditions. Postscan familiarity ratings were analyzed to validate stimulus categorization into "familiar" and "unfamiliar" conditions. Significantly higher familiarity ratings were assigned to"familiar" proverbs than to all other conditions (all p-values < 0.001), thereby validating our categorization. Familiarity ratings of two participants were lost due to a programming error. To test the three theories described in the introduction, the ratings on the beauty scale were analyzed. Significant differences in beauty ratings between types of sentence were found [F (2.8,71) = 21.46, p < 0.001, η 2 p = 0.46]. Post hoc comparisons revealed that in line with the hedonic fluency hypothesis, familiar proverbs were judged as significantly more beautiful than all other conditions (all p-values < 0.001). However, the finding that proverb-substitutions were significantly less beautiful than unfamiliar proverbs (p < 0.01) and familiar proverbs (p < 0.001), but comparable to proverb-variants and non-rhetorical sentences was not predicted by any of the theories; nor was the finding that unfamiliar proverbs, proverb-variants, and non-rhetorical sentences received equal beauty ratings. These results will be discussed later. Table 4 lists the Means and SDs of all behavioral measures.

Effect of familiarity
To investigate how familiarity affects the reading process, we contrasted familiar proverbs against unfamiliar proverbs. The hedonic fluency hypothesis would have predicted a stronger contribution of affect-related regions on reading familiar proverbs, while foregrounding theory would have predicted an activation of affect-related regions for unfamiliar proverbs.
In the direct contrast"familiar-unfamiliar"shown in Figure 2C, familiar proverbs elicited a bilateral activation pattern similar to the dorsomedial part of the default network (Buckner et al., 2008). In the reversed contrast, unfamiliar proverbs showed relatively stronger activity in areas related to sentence reading, comprising nearly the whole left superior and middle temporal lobe, as well as the right MTG/STG (BA 21, 22, 38), bilateral occipital cortex (BA 18/19), and cerebellum. Additionally, the bilateral posterior rostral part of the medial prefrontal cortex (prMFC, BA 9) and bilateral parts of the motor and premotor cortex (BA 4/6) were activated. A list of all coordinates and statistical values is provided in the Table A1 in Appendix. The activation pattern for familiar proverbs was similar to the default mode network, indicating that familiar proverbs were easier to process than unfamiliar proverbs. No differences in activation were found in reward-related regions (OFC, nucleus accumbens, ventral striatum, ACC). The activation pattern for unfamiliar proverbs included several areas associated with affective processing, such as the temporal poles and the medial prefrontal cortex; however, these regions are also sensitive to cognitive load.
When familiar and unfamiliar proverbs were separately contrasted against non-rhetorical, literal sentences, familiar proverbs recruited the anterior part of the left parahippocampal gyrus (PHG), whereas unfamiliar proverbs activated a broad bilateral frontotemporal network in the semantic system (Binder et al., 2009). These findings are illustrated in Figures 2A,B. None of the classical reading-related frontotemporal areas emerged from the www.frontiersin.org contrast "familiar/non-rhetorical," suggesting that the two conditions relied on them in equal measure. In line with recent findings on figurative language processing, we found RH involvement only for novel, but not for familiar proverbs (Faust and Mashal, 2007;Schmidt et al., 2007;Pobric et al., 2008;Yang et al., 2009;Cardillo et al., 2012). Affect-related brain regions were predominantly observed as related to unfamiliar proverbs, being in line with foregrounding theory and the concept of defamiliarization rather than with the hedonic fluency hypothesis.

Effect of defamiliarization
To investigate general effects of defamiliarization across types of defamiliarization, a conjunction analysis [(proverbsubstitutions > familiar proverbs) and (proverb-variants > familiar proverbs)] was calculated. It revealed that the bilateral IFG (LH: BA44; RH: BA9) and bilateral inferior occipital gyrus (IOG; BA18/19) were activated significantly stronger for both versions of defamiliarized proverbs (proverb-variants and proverbsubstitutions) than for familiar proverbs (Figure 3A). The lack of activation in affect-related areas hints toward a more cognitive effect of defamiliarization itself. However, both foregrounding theory and the optimal innovation hypothesis would imply a stronger affective involvement for innovative proverb-variants than for proverb-substitutions. To uncover regions sensitive to the type of defamiliarization, proverb-variants were directly contrasted with proverb-substitutions ( Table A2 in appendix). Proverb-variants activated areas related to affective evaluation, such as the bilateral temporal poles (BA 38), medial prefrontal area (medial OFC, vmPFC, dmPFC), and posterior cingulate region (PCC/cuneus), as well as the parahippocampal gyri. Furthermore, regions probably associated with visual attention, such as the bilateral occipital cortex, and regions relevant for semantic integration and sentence processing (bilateral IFG, BA 47 and left MTG/STG) were activated more strongly by proverb-variants than by proverb-substitutions. Proverb-substitutions, however, recruited areas mostly related to cognitive evaluation and error detection, such as a cluster in the dorsal ACC, right dlPFC (BA 10), left anterior frontal cortex, right IPL (BA40), and left fusiform gyrus. In summary, while proverb-substitutions were associated with activation of the frontoparietal attention network, proverb-variants engaged the affect-related medial prefrontal and medial temporal regions (Figure 3B), which can be explained by foregrounding theory and the optimal innovation hypothesis, but not by hedonic fluency.

DISCUSSION
This study analyzed explicit reader response and neural correlates related to the reading of proverbs, in order to investigate the contribution of familiarity and defamiliarization on the affective and esthetic perception of literature, exemplified by proverbs. Familiarity and the degree of defamiliarization were manipulated. We conclude that familiarity and defamiliarization are two distinct components that can influence the affective and esthetic processing of literature. After discussing the neural correlates of familiarity and defamiliarization separately, we will turn to a general discussion of their implications for an esthetic perception of literature.

THE FAMILIARITY EFFECT
The rating data indicate that familiarity can affect the esthetic perception of sentences: Familiar proverbs received significantly higher beauty ratings than all other conditions. However, the hedonic fluency hypothesis would have predicted a linear correlation between beauty ratings and complexity, which was not found. Instead, relatively simple literal sentences were not rated significantly different from cognitively challenging unfamiliar proverbs, the latter even ranked second in beauty ratings. These non-linear results suggest that other parameters beside fluency influenced the Frontiers in Psychology | Language Sciences explicit esthetic evaluation. The fMRI data show that in addition to the increased demands of cognitive processing, more affective processing (in the amygdala, temporal poles, and medPFC) is triggered by the unfamiliar proverbs. Although affective processing per se does not predict a positive or negative esthetic judgment, it probably still affects the evaluation process; in the current experiment, it resulted in a relatively high evaluation of beauty, even though the conditions were matched in terms of valence and arousal. In summary, although familiar proverbs were the condition with the highest processing fluency, the result that they were singled out by beauty ratings is probably not only based on processing fluency. While their rhetorical features may account for their cultural success and familiarity in the first place, their success in the beauty ratings may be based on the successful recognition of familiar items in the context of novel and defamiliarized ones.

GENERAL EFFECT OF DEFAMILIARIZATION
In the present experiment, two conditions (proverb-substitutions and proverb-variants) represented defamiliarized versions of familiar German proverbs. Only the proverb-variants fulfilled the criterion of being "optimally innovative" as defined by Giora et al. (2004). Common to both conditions, the bilateral IFG and left IOG responded more strongly to the defamiliarized version than to the original proverb. The LIFG is one of the most frequently found regions in neuroimaging studies on semantics and language processing in general. We had expected to find the more semantic ventral part, the pars orbitalis and triangularis, to be associated with defamiliarization. Surprisingly, we found activation of the more syntactic dorsal part of the pars opercularis (BA 44) instead (Dapretto and Bookheimer, 1999;Friederici et al., 2000;Newman et al., 2010). This encourages the interpretation that both proverbvariants and proverb-substitutions share an enhanced demand for syntactic processing relative to the formulaic structure of the familiar proverb. If defamiliarization destroys the expected structure of the familiar proverb, further syntactic processing becomes necessary. However, semantic processing is not ruled out, as both the ventral and dorsal LIFG have been shown to be related to semantic memory in a recent lesion study (Yang et al., 2010). In the present experiment, we did find stronger activation of the more semantic ventral part of the LIFG by proverb-variants than by proverbsubstitutions. We attribute the activation of the RIFG and the enhanced activation of the visual areas to attention shifts toward the unexpected word, as the RIFG has recently been found to be involved in the detection of relevant, unexpected stimuli (Corbetta and Shulman, 2002;Hampshire et al., 2010). Interestingly, neighboring areas in the prefrontal cortex and the occipital gyri have also been found in experiments on art perception, where they have been associated with viewing pictures in an esthetic rather than a pragmatic mode (Cupchik et al., 2009). This observation is in line with the claim of foregrounding theory that defamiliarization leads to esthetic perception.
In summary, our results indicate that the technique of defamiliarization effectively draws attention to stimuli that would not have been further considered in their conventional form. We interpret this internal attention shift as a sign of participants entering an esthetic mode of perception. However, contrary to what a strict interpretation of foregrounding theory would predict, defamiliarization as such does not elicit spontaneous affective evaluation, nor are defamiliarized items generally judged as especially beautiful.

OPTIMAL INNOVATION
The key prediction of the optimal innovation hypothesis says that optimally innovative proverb-variants should be processed with stronger affective involvement and therefore be preferred over all other conditions. In the present experiment, only the proverbvariants fulfilled all criteria of "optimal innovativeness" (Giora et al., 2004), whereas the proverb-substitutions had the same semantic content as the corresponding familiar proverbs. The participants noticed this difference and rated the inventiveness of the proverb-substitutions significantly lower. The rating data thus do not confirm the optimal innovation hypothesis, because proverb-variants were considered neither more beautiful nor more inventive than familiar and unfamiliar proverbs. However, the fMRI data revealed processing differences between the two types of defamiliarized proverbs. Consistent with the optimal innovation hypothesis, the proverb-variants activated brain regions associated www.frontiersin.org with self-related emotional memories, comprising the bilateral anterior temporal lobes, dmPFC, medial OFC, cuneus, and bilateral PHC including the right amygdala. We interpret the relatively greater involvement of these regions as enhanced affective processing elicited by the proverb-variants. Reading proverb-variants also required a greater semantic integration effort because proverbvariants evoke two contrasting responses (that of the familiar proverb and the novel word) that have to be related. This integration effort is related to the stronger ventral IFG activation for proverb-variants than for proverb-substitutions. The medial OFC frequently associated with monitoring, learning, and memory of reward value (see Kringelbach and Rolls, 2004, for a review) could reflect the rewarding aspect of successful semantic integration. Comparable frontomedian activation was found for selfreferential processing (Zysset et al., 2002), explicit esthetic judgments (Jacobsen et al., 2006), and idiom comprehension (Lauro et al., 2008). Lauro and colleagues showed that activation in the frontomedian area increased the functional connection between bilateral frontotemporal areas during idiomatic processing, thus assigning the frontomedian area a key role in the selection between alternative sentence meanings. Proverb-variants most certainly triggered a similar comparison between the content of the familiar proverb that echoes in the background and the content of the proverb-variant which cognitively comes to the foreground (Jacobs, 2011).
One might wonder why in spite of more intense affective processing the resulting judgment is not one of high "beauty" (a term widely associated with something good and pleasurable). An explanation can be found in the specific characteristics of proverbs: proverbs are used to convey moral and social values, which are questioned by the alternative content of proverb-variants. The affect-related areas that we found have also been attributed to moral emotions (Heekeren et al., 2003). While proverbs express traditionally valued and accepted cultural norms and beliefs, proverb-variants often oppose, or at least question the traditional value. Proverb-substitutions, which do not question the content of the corresponding familiar proverbs, did not recruit this moral emotion network; instead, they activate the right IPL, left fusiform gyrus, and the ACC which are associated with attention shifting, error detection, and conflict management. In short, although both types of defamiliarization seem to enhance attention, only proverb-variants that included a conceptual mismatch were correlated with affective/moral evaluation. Proverb-substitutions that only provided formal defamiliarization received very low beauty ratings. Functional data suggest that this less innovative condition may have been processed as containing errors. Importantly, enhanced affective evaluation does not necessarily result in a positive esthetic judgment.

GENERAL DISCUSSION
In the present fMRI experiment we tested three hypotheses of how cognitive processing is linked to esthetic perception. Concerning the esthetic perception of literature, our data yield mixed results. We agree with Giora that neither high processing fluency nor novelty is a sufficient precondition to elicit pleasure. However, the predictions of the optimal innovation hypothesis were not met, as proverb-variants did not stand out from the other categories in terms of beauty. This might be due to "non-optimal" stimuli, but for the optimal innovation hypothesis to make valid predictions, one would need a more elaborated concept of how to "optimally" combine familiar and novel elements, which will be difficult to define, and many more dimensions apart from familiarity, such as aptness, or imageability might have to be considered when estimating preference for figurative expressions, as these dimensions have been shown to be important in metaphor processing (Gerrig and Healy, 1983;Marschark et al., 1983;Katz et al., 1988;McGlone, 2007). Our rating data support the hedonic fluency theory, while the functional data are consistent with the foregrounding theory. Beauty ratings singled out familiar proverbs thus supporting the hedonic fluency theory and preference for prototype models (Martindale and Moore, 1988;Martindale et al., 1990). The behavioral results demonstrate the hedonic value of familiarity, represented on a neural level by the activation of the left PHC by familiar proverbs rather than by baseline sentences. The resulting feeling of familiarity might be considered a safety signal and carry hedonic value. However, the hedonic fluency model alone cannot account for the whole behavioral pattern, e.g., for the fact that unfamiliar proverbs (the condition with the lowest familiarity and a high processing effort) ranked second in "beauty." According to foregrounding theory, unfamiliar proverbs confront the reader with a condensed content that is enhanced by stylistic devices. Items high in foregrounding and defamiliarization were expected to set the reader into a mode of esthetic perception. If used in a text, they stand out against the background of fluent and literal language and offer the closing of a new meaning gestalt. During the process of reading, the readers might switch from an automatic reading of "background" information to a slower, cognitively more demanding reading mode whenever they encounter "foregrounded" passages (Iser, 1974;Jacobs, 2011). Functional neuroimaging data are consistent with this aspect of foregrounding theory. Relative to unfamiliar proverbs, familiar proverbs, and non-rhetorical baseline sentences engaged the default mode network which is associated with mentalizing, imagination, and self-referential thinking, probably due to the lower cognitive demand. These relatively easily processed conditions may have served as a "background" for the more difficult-to-process, unfamiliar proverbs. Proverbvariants recruited a network of self-referential moral evaluation, suggesting that a certain amount of conceptual defamiliarization (violating world-knowledge or moral standards) might effectively trigger affective evaluation. However, unlike previous behavioral studies ( van Peer, 1986;Miall and Kuiken, 1994;Hakemulder, 2004), we did not observe a positive relation of foregrounding and explicit esthetic judgment. The reason for this might be exactly that proverbs characteristically imply a social or moral value. Proverbvariants naturally question these traditional values, and account for some interindividual variation on the esthetic value of such a critical statement. Nevertheless, we propose that in literature and poetry, passages that question world-knowledge or moral values have a high foregrounding potential as they might automatically trigger affective evaluations that feed into the esthetic judgment.
However, the finding that the most fluent, prototypical stimuli were preferred over less fluent conditions should not be generalized too rashly. As the results of Menenti et al. (2009) suggest, the Frontiers in Psychology | Language Sciences perception of a sentence or a phrase is strongly influenced by the context in which it appears: whether a fluent text is appreciated for its readability or rejected because of its low quality can be strongly genre-dependent (Galak and Nelson, 2011). More generally, the semantic context of an artwork is known to modulate esthetic judgments (Kirk et al., 2009), especially if it is embedded in a narrative or a pictorial composition. Hence, future studies should try to shed light on the interaction of text quality, content, and context. Furthermore, expert and non-expert readers, whose ways of reaching esthetic judgments might differ (Hekkert and van Wieringen, 1996), represent just one example of interindividual differences that would be worth addressing in further studies. Our findings emphasize that in the case of complex linguistic structures such as sentences, figurative language, and ultimately text, no single factor is likely to explain all of the dimensions of an esthetic judgment. To specify the role of figurative language and stylistic devices in the esthetic perception of literature, an elaborated foregrounding theory (ideally fed by classic rhetorical theory) allowing for more specific predictions about the esthetic and affective potential of rhetorical language may be necessary.