Abstract and Concrete Sentences, Embodiment, and Languages

One of the main challenges of embodied theories is accounting for meanings of abstract words. The most common explanation is that abstract words, like concrete ones, are grounded in perception and action systems. According to other explanations, abstract words, differently from concrete ones, would activate situations and introspection; alternatively, they would be represented through metaphoric mapping. However, evidence provided so far pertains to specific domains. To be able to account for abstract words in their variety we argue it is necessary to take into account not only the fact that language is grounded in the sensorimotor system, but also that language represents a linguistic–social experience. To study abstractness as a continuum we combined a concrete (C) verb with both a concrete and an abstract (A) noun; and an abstract verb with the same nouns previously used (grasp vs. describe a flower vs. a concept). To disambiguate between the semantic meaning and the grammatical class of the words, we focused on two syntactically different languages: German and Italian. Compatible combinations (CC, AA) were processed faster than mixed ones (CA, AC). This is in line with the idea that abstract and concrete words are processed preferentially in parallel systems – abstract in the language system and concrete more in the motor system, thus costs of processing within one system are the lowest. This parallel processing takes place most probably within different anatomically predefined routes. With mixed combinations, when the concrete word preceded the abstract one (CA), participants were faster, regardless of the grammatical class and the spoken language. This is probably due to the peculiar mode of acquisition of abstract words, as they are acquired more linguistically than perceptually. Results confirm embodied theories which assign a crucial role to both perception–action and linguistic experience for abstract words.


INTRODUCTION
The distinction between "abstract" and "concrete" concepts and words is all but uncontroversial. People disagree when trying to categorize a specific noun as "abstract," and even more when classifying as such a specific verb. Evidence suggests that the "abstract-concrete dimension" reflects a continuum rather than a dichotomy. Indeed, Nelson and Schreiber (1992) and Wiemer-Hastings et al. (2001) asked people to judge the concreteness of large sets of words; they found a bimodal distribution (according to features, such as tangibility or visibility), not a dichotomy. Things are even more complicated when words are embedded within contexts. Most of us would agree that the noun "apple" and the verb "to grasp" are concrete, but judging verb-noun pairs such as "to grasp the meaning," or "to think about an apple" (e.g., Aziz-Zadeh et al., 2006) is all but simple. In addition, the meaning of a sentence is often influenced by a specific language and culture; furthermore, it has been shown that this linguistic and cultural influence is particularly strong for abstract compared to concrete words (Boroditsky, 2003).
The study of how abstract concepts and words are represented has been the focus of many investigations in the 1960s-1990s. The two most influential views were the context availability theory (CAT, Schwanenflugel, 1991) and the dual coding theory (DCT, e.g., Paivio, 1986). CAT would ascribe the processing difference between concrete and abstract words to the fact that concrete words have stronger semantic relations with the context represented by other words. According to DCT, instead, abstract words would be represented only in a linguistic system while concrete words would be represented both in imagery and linguistic system.
As to the neural substrates of language comprehension, the integration of lesions analyses, white matter tractography, and resting state functional magnetic resonance imaging (e.g., Dronkers et al., www.frontiersin.org 2004;Turken and Dronkers, 2011) have recently brought into question traditional models: not only the left posterior temporal cortex but an extensive network in the left hemisphere seems to be critical for the processing of language (e.g., left posterior middle temporal gyrus, MTG; the anterior part of Brodmann's area 22; the posterior superior temporal sulcus). The investigation of the structural and functional connectivity of the keys regions (using diffusion tensor imaging) has shown a bilateral temporo-parieto-frontal network supported by long-distance white matter pathways. This network seems to interact with other brain regions outside the traditionally recognized language areas (Turken and Dronkers, 2011). Pertaining to the aim of the present work, in the last years we have assisted a renewed interest for the way concrete and abstract words are represented, as the growing body of brain imaging studies reveals (e.g., Desai et al., 2010;Ghio and Tettamanti, 2010). Many of these studies supported the original proposal by Paivio, showing for example that processing of abstract words is more lateralized in the left hemisphere than processing of concrete ones (for a review see Binder et al., 2005).
In the same line, on the theoretical side it has been recently proposed that language comprehension is both embodied and symbolic (e.g., Louwerse and Jeauniaux, 2008;Dove, 2010). In keeping with Paivio, Dove (2009 argues in favor of "representational pluralism," claiming that perceptual simulations play an important role in highly imageable concepts while amodal linguistic representations play a crucial role in abstract concepts. One of the reasons of the renewed interest for abstract words is that understanding the way we represent abstract words is a testbed for the increasingly popular (e.g., Chatterjee, 2010) embodied theories of language comprehension, according to which language is grounded in perception, action, and emotional system (for reviews, see Barsalou, 2008;Fischer and Zwaan, 2008;Gallese, 2008). Whereas it is now widely recognized that the evidence in support of embodied theories is compelling regarding concrete or highly imageable words, the issue is much debated regarding abstract words and sentences (Pezzulo and Castelfranchi, 2007;Louwerse and Jeauniaux, 2008;Dove, 2010). Within the embodied framework abstract words would be explained as the result of the transfer in abstract domains of image-schemas derived from sensorimotor experiences: for example, the image-schema derived from "container" would be used to understand the notion of "category" (Lakoff, 1987;Gibbs and Steen, 1999;Boot and Pecher, 2011), the action of giving a concrete object (pizza) would be used to understand the action of giving some news (Glenberg et al., 2008). Alternatively, it has been proposed that abstract words evoke different kinds of properties, i.e., that they activate situations and introspective relationships more frequently than concrete words (Barsalou, 1999;Barsalou and Wiemer-Hastings, 2005; for a review see Pecher et al., 2011).
More crucial to our work are some recent proposals which, starting from an embodied perspective and avoiding assuming the existence of amodal symbols, detached from perceptual and motor experience, share with Paivio the idea that multiple types of representation underlie knowledge (for a review see special topic on Embodied and Grounded Cognition, Borghi and Pecher, 2011). These proposals differ from Paivio's view as they hypothesize that not only concrete, but also abstract words are embodied and grounded. According to the language and situated simulation (LASS) theory , linguistic forms and situated simulations interact continuously and different mixtures of the two systems underlie a wide variety of tasks. The linguistic system (comprising the left-hemisphere language areas, and especially the left inferior frontal gyrus, Broca's area) is involved mainly during superficial linguistic processing, whereas a deeper conceptual processing necessarily requires the simulation system, made up of the bilateral posterior areas associated with mental imagery and episodic memory.
The word as social tools (WAT) proposal (Borghi and Cimatti, 2009) differs from the LASS theory because, according to WAT, the linguistic system does not simply involve a form of superficial processing: words are not conceived of as mere signals of something but also as tools that allow us to operate in the world. In addition, WAT extends LASS as it formulates more detailed predictions on the representation of abstract and concrete words. Indeed, according to WAT abstract word meanings would rely more than concrete word meanings on the everyday experience of being exposed to language in social contexts. According to WAT the difference between abstract and concrete words basically relies on the different mode of acquisition (MoA; Wauters et al., 2003), which can be perceptual, linguistic, or mixed. MoA ratings, which correlate but are not totally explained by age of acquisition, concreteness, and imageability, gradually change over grades. In the first grades acquisition is mainly perceptual, later it is mainly linguistic. It can follow that abstract words are typically acquired later, also because it is more difficult to linguistically explain a word meaning than to point at its referent while labeling. The acquisition of abstract words, due to their complexity, typically require a long-lasting social interaction, and it often implies complex linguistic explanations and repetitions. In contrast, the process by which young children learn concrete words appears effortless and often occurs within a single episode of hearing the word spoken in context (e.g., Carey, 1978; see also Pulvermüller, in press). This has the consequence that, even if for the representation of both concrete and abstract words meanings sensorimotor and linguistic experience are crucial, we rely more on language to understand the meaning of concrete words, whereas we rely more on non-linguistic sensorimotor experience to grasp the meaning of abstract words. (Borghi and Cimatti, 2009). Given that abstract words do not have a specific object or entity as referent, many of them might be acquired linguistically, i.e., listening to other people explaining their content to us, rather than perceptually. This might be due also to their different degree of complexity: learning to use a word such as "lipstick" is simpler than learning to use a word like "justice," and the linguistic label might be more crucial for keeping together experiences as diverse as those related to the notion of "justice."  used novel categories to mimic the acquisition of concrete and abstract concepts; they found that linguistic explanations are more important for the acquisition of abstract than for concrete words, and showed with a property verification task that concrete words evoke more manual information, while abstract words elicit more verbal information. WAT hypothesizes also that the MoA determines the representation of the word in our brain: when the words refer to categories learned through sensorimotor experiences (e.g., "bottle"), they have a much higher level of grounding in the perception and action systems than words learned mainly through the mediation of other words (e.g., "democracy"; ; see also Prinz, 2002). Consistently, concrete words should evoke more manual information, activating precociously motor areas (Jirak et al., 2010;Pulvermüller, in press), whereas abstract words should elicit more verbal-linguistic information, activating precociously motor areas related to the mouth, as data on transcranial magnetic stimulation study  and on words acquisition modality suggest .
Notice that claiming that concrete and abstract words are acquired through different modalities does not require the postulation of any difference in format between the two kinds of words, nor any transduction from sensorimotor experience into amodal symbols. It simply means that abstract word meanings should rely more on the embodied experience of being exposed to language than concrete word meanings. However, we do not intend to imply that abstract words rely on the simple embodied experience of speaking and listening -this would not suffice to call their representation embodied. In contrast with non-embodied approaches to abstract words, in our view a word like "philosophy" would activate perceptual and motor experiences, together with linguistic experience. As demonstrated by , with abstract terms the advantage of linguistic over manual information was present only when linguistic information did not contrast with perceptual one.
The major difference between Paivio's approach and multiple representation theories such as WAT's approach to concrete and abstract words is that, according to the first, abstract words rely only on the verbal system, while for WAT both concrete and abstract words are grounded in perception and action systems, even if the linguistic system plays a major role for abstract words representation.
The best way to disambiguate these hypotheses is the selection of a paradigm that allows contrasting abstract and concrete words combined in sentences. So far most evidence has been found with brain imaging rather than with behavioral studies, it concerns single words rather than words embedded in contexts, and tasks requiring deep semantic processing are typically not used [an exception is given by a recent fMRI study by Desai et al. (2010), in which a sentence evaluation task was used]. In contrast, our study focuses on how words meaning changes depending on the context in which it is embedded. For this reason we will compare not only whole abstract and concrete sentences, but also sentences which result from a mixture of abstract and concrete nouns and verbs in a well-balanced design. We believe this may represent an important step for a systematic investigation of abstraction. One of the advantages of this design resides in the possibility to study abstractness in a continuum, and to verify the effects on comprehension using different combinations and studying how the meaning of single words can change depending on the context. In addition, focusing on sentences instead than on single words offers the possibility to investigate linguistic processing in a more ecological way, and allows us detecting eventual influences of the different spoken languages.
In the present study we asked participants to judge the sensibility of sentences. We chose this task because it is established that it implies a deep semantic processing of the sentences (see also Turken and Dronkers, 2011). Coherently with previous literature, we defined as "concrete" only nouns that refer to manipulable objects and only verbs referring to manual actions (e.g., "a flower"/"to grasp"). We decided to define as "abstract" only nouns that do not refer to an object, rather to an entity that can neither be grasped nor touched, and only verbs that refer to an action 1 that cannot be performed with any part of the body, that is, an action that does not explicitly require any movement or any activation of the motor system (e.g., "a concept"/"to describe"). In addition, to investigate the specific effects of the specific language we use, we examined different combinations of nouns (abstract and concrete ones) and verbs (abstract and concrete ones), in two languages, German and Italian, which are syntactically different: in German the noun precedes the verb; in Italian it is the opposite.
There are several possible views: 1. No difference view: abstract and concrete concepts have the same core representations. According to the amodal theories their representations in the brain would be most probably in the language domain; according to the strictly modal view both concrete and abstract concepts would be represented in the perception and action system. 2. Non-embodied multiple representation view: concrete and abstract words have distinct representations: the first are represented in the sensorimotor system, abstract words in the language system. This view, proposed by Paivio (1986), is adopted by multiple representation views not adopting an embodied approach to abstract words, i.e., to views arguing that concrete and abstract words differ in format (e.g., Binder et al., 2005;Dove, 2010). 3. Embodied multiple representation view: abstract and concrete concepts are represented both in the language domains and in the perception and action systems. However, they are not represented in the same way in the two systems but there is a different distribution. Linguistic information should be more relevant for abstract words, perception, and action information for concrete ones. This is the view consistent with multiple representation theories adopting an embodied perspective, such as WAT and LASS.
In contrast with strictly amodal and strictly modal views (No difference views), both embodied and non-embodied multiple representation views predict costs in mixed combinations, when switching from one perceptual modality to another (Pecher et al., 2003). In addition, according to the WAT proposal mixed combinations should be differently modulated by the syntactical structure of the two different chosen languages. As the Age of Acquisition clearly affects performance in semantic tasks (Lewis, 1999;Brysbaert et al., 2000) and is correlated with the Modality of Acquisition, WAT predicts that in mixed conditions RTs should be slower when the abstract word precedes the concrete one, due to the fact that the former is acquired later and relies more on linguistic information than the second (Bloom, 2000;Colombo and Burani, 2002;Mestres-Missé et al., 2009).

EXPERIMENTAL METHOD PARTICIPANTS
Thirty-eight students from the University of Hamburg (group I) and 38 students from the University of Bologna (group II) took part in the study. All were native German speakers (group I) or native Italian speakers (group II), right-handed according to the Edinburgh Handedness Questionnaire (Oldfield, 1971), and all had normal or corrected-to-normal vision. They all gave their informed consent to the experimental procedure. Their ages ranged from 18 to 32 years old (German group: M = 26.26; SD = 3.64; Italian Group: M = 24.61; SD = 3.58). The study was approved by the local ethic committees.

MATERIALS
Materials consisted of word pairs (sentences) composed of a transitive verb and a concept noun. To study the dimension abstract-concrete in a continuum we contrasted two kinds of Verbs (Concrete vs. Abstract) with two kinds of Nouns (Concrete vs. Abstract). We defined Concrete Nouns as nouns referring to graspable objects, Concrete Verbs as verbs referring to hand actions, Abstract Nouns as nouns that do not refer to manipulable objects, and Abstract Verbs as verbs that do not refer to motor actions. Therefore we created 192 sentences -48 quadruples -in the German language and 192 sentences -48 quadruples -in the Italian language. Each quadruple was constructed by pairing a Concrete Verb (e.g., to grasp) both with a Concrete Noun (e.g., a flower) and an Abstract Noun (e.g., a concept); and by pairing an Abstract verb (e.g., to describe) with the previously used concrete and abstract nouns (e.g., to squeeze/find a sponge/friendship; to lift/receive a table/criticism; to caress/wait for a dog/idea; to bend/respect the menu/will; to paint/admire the frame/sunset; to write/look for the document/end; to carve out/wait for a newspaper/moment). We decided to use sentences with a very simple grammatical structure (a verb plus a noun) as it was not possible to develop more complex sentences with a similar grammatical structure that fulfilled the criteria of the quadruples. The majority of these sentences' meanings matched in both languages; a few of them slightly differed, as some pairs did not allow for a literal translation. Due to the different syntax of the German and Italian languages, the German sentences were composed of a noun followed by a verb; the Italian ones were composed of a verb followed by a noun. We chose to compare these two languages as the specific differences in the syntactical structure allowed us to speculate on the different effects caused by a verb preceded by a noun (German sample) vs. a noun preceded by a verb (Italian sample).
To select 30 critical quadruples from the 48 ones, we asked 20 German students and 20 Italian students to judge how familiar each sentence sounded and with what degree of probability they would use each sentence. They were required to provide ratings on a continuous scale (Not familiar -Very Familiar; Not probably -Very probably), by making a cross on a line. We selected the quadruples with highest scores for both familiarity and probability of use, and, from these, we finally chose the quadruples with lower scores in the SDs. Thus we obtained 120 verb-noun pairs (balanced for familiarity and probability of use).
Due to the peculiarity of our linguistic materials, to further test if the 120 selected verb-noun pairs differed as far as the frequency of use is concerned, we checked on the research engine "Google" the frequency of each pair, by using quotations marks (Page et al., 1998;Griffiths et al., 2007;Sha, 2010). The frequencies were submitted to a 2 (kind of Noun: Concrete vs. Abstract) × 2 (kind of Verb: Concrete vs. Abstract) × 2 (Language: German vs. Italian) ANOVA. Crucially we did not find any significant effect. This further control on written frequency prevented us from accounting for possible differences on processing resting on different association degrees between words pairs composing German and Italian quadruples.
In addition to the 30 critical quadruples, we created 30 filler quadruples using the same criteria. We combined a concrete verb both with a concrete noun and with an abstract noun; and we combined an abstract verb with the same concrete noun and abstract noun, leading to nonsensical sentences (e.g., "to switch off the shoe"). Each quadruple was presented only once.

PROCEDURE
German and Italian participants were randomly assigned to one of two groups. Members of both groups were tested individually in a quiet library room. They sat on a comfortable chair in front of a computer screen and were instructed to look at a fixation cross that remained on the screen for 1000 ms. Then a sentence appeared on the screen for 2600 ms. The German sentences were composed of a determinative or non-determinative article plus a noun plus a verb (example for the concrete noun -concrete verb combination: "einen Kuchen anschneiden," to cut a cake), while the Italian sentences were composed of a verb plus a determinative or nondeterminative article plus a noun (example for the concrete verb -concrete noun combination: "stringere una spugna," to squeeze a sponge).
The timer started operating when the sentence appeared on the screen. For each verb-noun pair, participants were instructed to press one key if the combination made sense, and to press another key if the combination did not make sense.
Participants in the first group (both German and Italian) were asked to respond"yes" with their left hand and"no" with their right hand; participants in the other group (both German and Italian) were required to do the opposite. All participants were informed that their response times (RT) would be recorded and were invited to respond as quickly as possible while still maintaining accuracy. Stimuli were presented in a random order. The 240 experimental trials were preceded by 8 training trials, in order to allow the participants to familiarize themselves with the procedure.

STATISTICAL ANALYSIS
In our analyses we considered only the sensible sentences. Participants were accurate in responding; no participant's responses included errors over 15%. To screen for outliers, scores 2 SDs higher or lower than the mean participant score were removed for each participant. Removed outliers accounted for 3.6% of response trials. The remaining RT and errors were submitted to a 2 (kind of Noun: Concrete vs. Abstract) × 2 (kind of Verb: Concrete vs. Abstract) × 2 (Mapping: yes-right/no-left vs. yesleft/no-right) × 2 [Language: German: noun (first), verb (second) vs. Italian: noun (second), verb (first)] mixed factor ANOVA, with Mapping and Language as between-participants variables.

Frontiers in Psychology | Cognition
We conducted the analyses with participants as a random factor. As the error analysis revealed that there was no speed-accuracy trade-off, we will discuss only the RT analysis

ASSESSMENT OF GERMAN AND ITALIAN PAIRS
Materials were controlled regarding a variety of dimensions. 30 students from the University of Hamburg and 30 students from the University of Bologna were asked to rate the ease or difficulty with which each pair evoked mental images (imageability: Low imagery rate -High Imagery rate) on a continuous scale (scores ranging from 0 to 100); how literally they would take each pair (literality: Literal -No Literal); whether and to what extent each pair elicited movement information (quantity of motion: Not much movement -Much movement). Finally 10 German students and 10 Italian students were asked to rate at which age approximately they had learned to use each pair (age of acquisition ratings). For each rating, we calculated the scores' averages and the scores' SDs for each condition.

Imageability
Both German and Italian participants judged the Concrete Verb -Concrete Noun pairs as the easiest to imagine (see . Results showed that German and Italian participants had the same pattern: the pair containing two concrete words was judged as the easiest to imagine. Moreover for both groups the noun was stronger than the verb in determining the imageability of the sentence.

Literality-metaphoricity
German participants rated the Abstract verb -Concrete noun pairs as the ones that they would take most literally (see The sentences rated as more literal are the ones which contained a Concrete Verb plus a Concrete Noun for Italian participants and containing an Abstract Verb plus a Concrete Noun for German participants. Both groups judged the combination Concrete Verb -Abstract Noun as the most metaphorical one. It is worth noting that while the concrete noun meaning remains the same through the quadruples, the concrete verb meaning, as well as its concreteness/abstractness, changes through the quadruples, depending on the context: for example, the meaning of the verb "to grasp" is not the same in "grasping an apple" and in "grasping a concept" (Parisi, personal communication).

Quantity of motion
German participants rated the Concrete Verb -Concrete Noun pairs as the ones that elicited most movement information (see  www.frontiersin.org FIGURE 2 | Both groups judged the combination Concrete Verb plus Abstract Noun as the most metaphorical one. Note: while the concrete noun meaning remains the same through the quadruples, the concrete verb meaning, as well as its concreteness/abstractness, changes through the quadruples, depending on the context.

FIGURE 3 | Both groups agreed in judging the Abstract Verb plus Concrete Noun combination as the one that elicits less movement. The main difference concerns the Concrete Verb plus Abstract Noun vs. Concrete
Verb plus Concrete Noun combinations: the former suggested the biggest amount of movement for Italian participants; the latter evoked the huger quantity of motion in German participants.
Both groups agreed in judging the Abstract Verb -Concrete Noun combination as the one that elicits less movement. The main difference concerns the combinations Concrete Verb -Abstract Noun vs. Concrete Verb -Concrete Noun combination, as while the former suggested the biggest amount of movement for Italian participants, the latter evoked the larger quantity of motion in German participants.

Age of acquisition
A number of studies (Gilhooly and Gilhooly, 1980;Zevin and Seidenberg, 2002) have demonstrated the validity of age of acquisition ratings, by showing that age rated by adults is the major independent predictor of the objective age of acquisition indices. In our study German participants rated the Concrete Verb -Concrete Noun pairs as the ones they learnt first (see  Results suggest that the different age of acquisition of sentences is explained by the noun: as shown in the literature regarding single word age of acquisition, the concrete noun is learned before the abstract one. Consistently, we found that sentences containing a concrete noun, even if in combination with an abstract verb, are acquired earlier than sentences containing an abstract noun.

RESULTS
Neither a main effect of the kind of Mapping nor a main effect of the Language used was found. Crucially, we found a significant interaction between the kind of Noun and the kind of Verb: German and Italian participants responded faster to both kinds of congruent pairs, that is both to pairs composed of an Abstract Verb plus an Abstract Noun (M = 1172.56 ms) and to pairs composed of a Concrete Verb plus a Concrete Noun (M = 1168.83 ms). Consecutively they were slower with the mixed pairs, that is, with pairs composed of an Abstract Verb plus a Concrete Noun (M = 1211.95 ms) and pairs composed of a Concrete Verb plus an Abstract Noun (M = 1206.81 ms), F (1, 72) = 48.83, MSe = 2328.79, p < 0.0001. Interestingly, Abstract Verbs combined with Abstract Nouns did not require a longer processing time than Concrete Verbs -Concrete Nouns pairs.
We also found a significant three-way interaction between Language, kind of Noun, and kind of Verb, F (1, 72) = 5.07, MSe = 2328.79, p < 0.03, see Figure 5. Newman-Keuls post hoc www.frontiersin.org analyses showed that German participants, noun (first), verb (second), were 13.25 ms faster with Abstract Verb plus Concrete Noun pairs than with Concrete Verb plus Abstract Noun pairs; on the contrary, Italian participants, noun (second), verb (first), were 23.51 ms faster with Concrete verb plus Abstract Noun pairs than with Abstract Verb plus Concrete Noun pairs; this difference reached significance only for Italian participants, p < 0.04. As the syntactic construction of German and Italian is different for pairs containing a transitive verb plus an object-noun, German participants, differently from Italians, were presented with the noun preceding the verb. Results with mixed pairs indicate that participants were faster when the first word was concrete rather than when it was abstract -that is when it referred to an object on which we can perform an action involving the hands (German pairs), or to an action performed with the hands (Italian pairs). This suggests that the degree of abstractness of the word plays a more important role than its grammatical class.
Moreover, the interaction between Language and kind of Verb almost reached significance as well, F (1, 72) = 3.68, MSe = 3490.70, p < 0.06. German participants, noun (first), verb (second), were 8.57 ms faster with pairs containing Abstract Verbs than with pairs containing Concrete Verbs. On the contrary, Italian participants, noun (second), verb (first), were 17.42 ms slower with pairs containing Abstract Verbs than with the pairs containing Concrete Verbs. Integrating these results with those obtained previously allows us to speculate that word's concreteness vs. abstractness strongly determines the time necessary to process the sentence (three-way interaction), but also that the verb has a stronger effect than the noun.

DISCUSSION
Our study showed three main new results. First we found that both the abstract verb -abstract noun combinations and the concrete verb -concrete noun combinations were processed faster than the mixed combinations. This in itself is new, particularly considering the fact that it is well known that the sentence evaluation task we used implies accessing to deep semantic representation. Our results on mixed pairs are not predicted by the No difference explanation (view 1); instead, they are predicted by views 2 and 3, and are consistent with the idea that concrete and abstract words activate parallel systems, one relying more on purely perception and action areas, the other more on sensorimotor linguistic areas. Indeed, switching between systems implies a cost in RTs, whereas remaining within the same system does not affect performance. This effect per se favors theories implying multiple types of representation over strictly modal and strictly amodal theories (this issue is addressed more extensively in the second section of the discussion).
The second major result we found is the three-way interaction between Language, kind of Verb, and kind of Noun. This interaction was mainly due to the fact that Germans' and Italians' results on mixed combinations were the opposite: German participants, noun (first), verb (second), were faster with abstract verb and concrete noun combinations than with concrete verb and abstract noun combinations; Italian participants, noun (second), verb (first), showed a mirror pattern. This result can be easily accounted for if we consider that the word presentation order differed across the two languages: German participants saw the noun first and then the verb, while Italians saw the same combination in a reverse order. Thus, participants were faster when the first word shown in the sentence was a concrete one, regardless of its grammatical class (verb vs. noun) and of the spoken language (German vs. Italian; for a similar result see Paivio, 1965: differently from us, in a learning and recall task he contrasted only abstract and concrete nouns, rather than sentences).
The third result is the marginally significant interaction we found between Language and kind of Verb. Integrating the last two findings, it seems that the abstractness vs. concreteness of the first word -that depends on the different sentences' structures -modulates sentence processing more strongly (interaction Language × Noun × Verb) than its grammatical class. Nevertheless it seems to be also an effect of the linguistic category, as verbs are more powerful than nouns in influencing subjects' responses. Fascinatingly, this result could be in keeping with the idea that the grammatical structure of a language shapes to some extent its speakers' perception of the world (Boroditsky, 2003;Gentner, 2003;Mirolli and Parisi, 2009).
Let us now consider results from RTs together, integrating them with the results obtained from the ratings of the materials. We will discuss how each theory could account for them and the problems each theory faces. We will also provide a possible neuroanatomical explanation of the results.
1. No difference view: abstract and concrete concepts have the same core representations. According to both (a) amodal (e.g., Fodor, 1998) and (b) strictly modal (e.g., Barsalou, 1999) theories of concepts and words, concrete, and abstract sentences are represented in the same format (amodal vs. modal). Therefore, for both amodal and modal views we should expect no difference between the four conditions, unless these differences are explained by association degree and familiarity for amodal theories, and by imageability for modal theories. (a) According to amodal theories the results should be explained resting on the association rate between words. Therefore, the advantage of congruent over mixed sentences should be due to a higher association rate of these pairs compared to that of the mixed combinations. To check for this possibility, we calculated the familiarity and the probability of use score averages in each condition for the 120 pairs selected for the behavioral experiment. Ratings showed that, for both German and Italian participants, the advantage of congruent combinations over the mixed pairs is not explained by a supposed higher familiarity or higher probability of use of the first. (b) According to a strictly modal theory, results regarding RT should be explained by imageability rating. An approach based more on metaphors (Lakoff, 1987) should account for the behavioral results considering the literality ratings (that indirectly give us information on the degree of metaphoricity). Actually the advantage for the Concrete Verb -Concrete Noun combination can be explained resting on its high imageability, low metaphoricity rate, and precocious age of acquisition. But neither the modal theory nor the approach based on metaphors was verified by our results on Abstract Verb -Abstract Noun pairs, which were neither imageable nor literal (as opposed to being metaphorical) but provoked a response that was as fast as that for Concrete Verb-Concrete Noun pairs. Finally, an approach proposing that words are grounded in perceptual and especially in motor systems (Glenberg, 1997) would predict a relationship between the behavioral data and the quantity of motion scores. This was not the case, however, as the amount of movement evoked by the sentence did not explain the pattern of results with RT. Therefore, we can conclude that neither a strictly amodal nor a strictly modal theory adequately accounts for our results. 2. Non-embodied multiple representation view and 3. Embodied multiple representation view.
Theories based on multiple types of representation -both in their non-embodied vs. embodied version -can explain the difference between congruent and mixed pairs more easily, even if resting on different reasons, that is: (I) different kinds of formats (still assuming a transduction process: Dove, 2009), or (II) a shift between different kinds of modalities, i.e., linguistic vs. a sensorimotor coding (LASS, WAT).
The interpretation that better accommodates our results assumes that abstract words are processed predominantly in the language system and concrete words are processed in the sensorimotor system to a larger extent. If processing occurs in separate systems, then the switching between concrete and abstract would imply not only conceptual costs, but also costs connected with switching between anatomical systems working in parallel. Within each system (concrete-concrete vs. abstract-abstract) the costs remain low. Some recent pieces of evidence are in line with our results. In a brain imaging study on abstract words Rüschemeyer et al. (2007) found that the processing of verbs with motor meanings (e.g., "to grasp") differed from the processing of verbs with abstract meanings (e.g., "to think"). Motor verbs produced greater signal changes than abstract verbs in several regions within the posterior premotor, primary motor (M1), and somatosensory (S1) cortices, as well as in secondary somatosensory (S2) cortex. More crucially, our interpretation is also consistent with results obtained in a brain imaging study performed using the same paradigm as the one used in the present work (Menz et al., 2011; see also Jirak et al., 2010). Using quadruples containing every possible combination for motor/non-motor verbs and for graspable/non-graspable objects, evidence showed that all motor areas were activated by language stimuli with both concrete and abstract content; but in case of concrete verb plus concrete noun processing there was a stronger engagement of areas typically involved in planning of complex and goal-directed actions (e.g., frontal operculum). In case of abstract verb plus abstract noun combinations, instead, there was a stronger engagement of the supramarginal gyrus (SMG) -typically involved in motor planning (e.g., Tunik et al., 2008) but also during phonological and articulatory words processing (e.g., Celsis et al., 1999;Pattamadilok et al., 2010) -, as well as of the MTG -that is also recruited when performing tasks critical in communication and social interaction (Mellet et al., 1998;Binder et al., 2005;Sabsevitz et al., 2005).

Embodied multiple representation view.
The advantage of non-mixed combinations (AA and CC) on the mixed ones (AC and CA) rules out the No difference views but can be accounted by both the Non-embodied (2) and the Embodied versions of multiple representations views (3). In order to disentangle them, the most critical result is the advantage we found when the first word was a concrete one. A Non-embodied multiple representation view (2) has difficulties in explaining this result: since the task used in the present study is a linguistic one, it should be easier to process first words which activate linguistic information, i.e., abstract words, rather than concrete ones.

LASS AND WAT
Both LASS and WAT can explain the advantage of the first concrete word. However, the explanation based on LASS would be a posteriori. The argument would be that, even if the task is a linguistic one, it requires deep semantic processing, and this might require more time for abstract than for concrete words. A more straightforward explanation of the longer RTs when the first word is an abstract rather than a concrete one derives from the WAT proposal. WAT assumes that both linguistic and sensorimotor processing have the same status -coherent with the advantage of the AA and CC pairs on the mixed pairs -, and it treats the issue of concepts representation as strictly related to their acquisition, stressing the different function of linguistic label for concrete vs. abstract word meanings. So the advantage of concrete words when presented first would be due to the fact that abstract words are learnt differently from concrete ones, and often with the help of a verbal explanation (see . It follows that for the acquisition of abstract terms the social experience due to the presence of others explaining to us specific word meanings is particularly crucial. In support of this interpretation it is worth noting that in the linguistic materials' ratings we basically found the same patterns for Imageability and Age of acquisition for both Germans and Italians: sentences containing a concrete noun (even if in combination with an abstract verb) were the easiest to imagine, and they were acquired earlier than sentences containing abstract nouns. Conversely German and Italian participants showed different patterns as far as Metaphoricity and Quantity of Motion ratings are concerned, thus they were differently influenced by the specific linguistic milieu.

In sum
The results of our behavioral study showed that participants were faster with congruent combinations, and that with mixed combinations they were faster when the first word was a concrete one, independently of the spoken language and of the word grammatical class. Results are in line with those embodied views, such as LASS and WAT, according to which both linguistic and perception www.frontiersin.org and action experience play a role in accounting for word representation. The WAT proposal is able to explain the advantage of the first concrete word better than the LASS view, ascribing it to the fact that abstract words require more time as a consequence of their peculiar acquisition modality.
Our results have a variety of implications as to how concrete and abstract words are represented in the brain, as they suggest that linguistic and perception and action information are differently distributed in accounting for concrete and abstract meanings. Consistently with recent brain imagining study (Rüschemeyer et al., 2007;Menz et al., 2011), we hypothesize that words with concrete motor content are processed to a greater extent in the perception and action systems than words with abstract content, which in turn are processed more in the linguistic areas.