Embodied Conceptual Combination

Lynott, Dermot; Connell, Louise

doi:10.3389/fpsyg.2010.00212

HYPOTHESIS AND THEORY article

Front. Psychol., 25 November 2010

Sec. Cognition

Volume 1 - 2010 | https://doi.org/10.3389/fpsyg.2010.00212

This article is part of the Research TopicEmbodied and Grounded CognitionView all 24 articles

Embodied conceptual combination

Dermot Lynott¹*

Louise Connell²

¹ Manchester Business School, University of Manchester, Manchester, UK
² School of Psychological Sciences, University of Manchester, Manchester, UK

Conceptual combination research investigates the processes involved in creating new meaning from old referents. It is therefore essential that embodied theories of cognition are able to explain this constructive ability and predict the resultant behavior. However, by failing to take an embodied or grounded view of the conceptual system, existing theories of conceptual combination cannot account for the role of perceptual, motor, and affective information in conceptual combination. In the present paper, we propose the embodied conceptual combination (ECCo) model to address this oversight. In ECCo, conceptual combination is the result of the interaction of the linguistic and simulation systems, such that linguistic distributional information guides or facilitates the combination process, but the new concept is fundamentally a situated, simulated entity. So, for example, a cactus beetle is represented as a multimodal simulation that includes visual (e.g., the shiny appearance of a beetle) and haptic (e.g., the prickliness of the cactus) information, all situated in the broader location of a desert environment under a hot sun, and with (at least for some people) an element of creepy-crawly revulsion. The ECCo theory differentiates interpretations according to whether the constituent concepts are destructively, or non-destructively, combined in the situated simulation. We compare ECCo to other theories of conceptual combination, and discuss how it accounts for classic effects in the literature.

Introduction

Cognition is inherently constructive. Our cognitive functioning is not confined to retrieving familiar ideas and concepts, but rather is predicated upon the ability to understand new things and represent new concepts. Conceptual combination research investigates the processes involved in creating and understanding new meanings from old referents. For example, how do people interpret novel combinations such as cactus beetle, mouse potato, or fame advantage? Such combinations are used liberally (conversations, newspaper headlines, signage, novels, etc.) and people generally have little difficulty in constructing plausible interpretations, even where the surrounding context may be quite limited or uninformative.

Of course, central to understanding how people process these combinations is an understanding of what constitutes the representations of these concepts. Of existing theories of conceptual combination, many take an explicitly a modal view of the conceptual system (Wisniewski, 1997; Estes and Glucksberg, 2000). That is, concepts are represented in some abstracted format (e.g., feature lists, propositional representations) that do not relate directly to the modality-specific experiential basis of how these concepts were acquired. Other theories are agnostic as to the nature of the underlying representation (Gagné and Shoben, 1997; Costello and Keane, 2000), often using propositions as a descriptive or computational shortcut, but without making strong representational commitments. We suggest that any theory that fails to take an embodied or grounded view of the conceptual system cannot account for the role of perceptual, motor, affective, introspective, and social information in conceptual combination and cognition more generally (Barsalou et al., 2008). The flip side of this argument is that current embodied theories of cognition must also draw on theories of conceptual combination in order to explain the constructive, generative, and creative capacities of human cognition. In this paper, we aim to address these oversights in proposing a theoretical model of conceptual combination, embodied conceptual combination (ECCo) which draws on recent empirical and theoretical work in areas of language processing, mental representation, and links between our perceptual and conceptual systems.

In ECCo, we outline an embodied conceptual combination system based on a representation of knowledge that incorporates linguistic distributional information and situated simulation. Linguistic information guides or facilitates the simulation process, but the new concept created during conceptual combination is fundamentally a situated, simulated entity. The paper is divided into four main sections. In the rest of the introduction, we outline the structure of the conceptual system and review evidence for the roles of the linguistic and simulation systems during conceptual processing. In Section “ECCo: Embodied Conceptual Combination”, we outline the core tenets of the ECCo theory and explain how it accounts for classical conceptual combination effects. In Section “Comparison of ECCo with Previous Theories of Conceptual Combination”, we illustrate how ECCo relates to existing theories of conceptual combination and highlight key differences. Lastly, in Section “Concluding Remarks”, we summarize and briefly consider future directions for conceptual combination research.

The Conceptual System

Embodied theories of cognition hold that our perceptual, motor, and other experience plays a fundamental role in how we talk about, think about and interact with people, objects and the world around us. In essence, the same neural systems that are responsible for representing information during perception, action, and introspection are also responsible for representing (or simulating) the same information during conceptual thought (e.g., Barsalou, 1999, 2008; Glenberg and Robertson, 2000; Wilson, 2002; Gallese and Lakoff, 2005; Gibbs, 2006). A concept is an aggregated memory of aspects of experience that have repeatedly received attention in the past, and incorporates perceptual, motor, affective, introspective, social, linguistic, and other information. For instance, a concept of dog could potentially include a host of perceptual-motor information, possibly including visual information of the color and shape of a dog, tactile information regarding the feel of a dog’s coat, olfactory information of the smell of a dog, auditory information of a dog’s bark, motor information about patting a dog, social information about the status of dogs in human households, along with positive or negative affective valence depending on one’s experience with dogs in the past. Any time the word “dog” is encountered, a subset of these aspects will be retrieved to suit the task at hand. However, human language is full of statistical regularities. Words and phrases tend to occur repeatedly in similar contexts, just as their referents tend to occur repeatedly in similar situations, which allows people to build up substantial distributional knowledge of linguistic associations. In this way, lexical associates of “dog” are also activated (e.g., “bark,” “pet,” “cat,” etc.), which might in themselves suffice for a response and which in turn can activate their own simulation information. Importantly, the concept retrieved is situated and context-specific, with linguistic and simulation content changing dynamically with our experiences, current goals, and available resources. One cannot, in effect, retrieve the same concept twice.

Thus, both linguistic and simulation systems are central to human conceptualization (Clark, 2006; Barsalou et al., 2008; Louwerse and Jeuniaux, 2010); language bootstraps simulations to facilitate more complex conceptual processing than would otherwise be possible. The Language and Situated Simulation theory (LASS: Barsalou et al., 2008; see also the Symbol Interdependency Hypothesis, Louwerse and Jeuniaux, 2008), for instance, describes a general framework where both linguistic and simulation systems are simultaneously activated on encountering a word, with the linguistic system reaching peak activation slightly sooner than the simulation system. While it is a statistical trend that shallow, linguistic distributional responses are faster than responses that rely on deeper, situated simulation, the relative importance of each type of system will change according to the current context or specific task demands. In short, the concept to which a word refers is ultimately grounded in the simulation system, but a word does not need to be fully grounded every time it is processed (Louwerse and Connell, in press). It is important to note that distributional information in the linguistic system arises not only from associations between lexical items (e.g., between “dog” and “cat”), but also from associations between their referents in past experience (e.g., encountering cats and dogs in household pet situations). This constant interactivity between the linguistic and simulation systems means that they are, to some extent, partial reflections of each other. However, the linguistic system offers a fuzzy approximation that can provide an adequate heuristic in certain tasks, whereas the simulation system provides representational precision for more complex conceptual processing.

Affordances are key to the simulation system, and refer to the ways in which a particular object enables interaction (or meshing) with other entities (Gibson, 1979; Glenberg, 1997). A sweater affords filling with leaves in a way that a chair does not, and a leaf-filled sweater affords use as a pillow in a way that a rock-filled sweater does not (Glenberg and Robertson, 2000). In this example, the affordances of leaves, sweaters, and pillows mesh successfully within the situation of a person improvising a pillow on a camping trip. When affordances mesh successfully, they form a coherent and stable simulation, which is what allows conceptual processing to be both productive and creative.

Evidence for Linguistic Distributional Information in Conceptual Processing

The linguistic system contains statistical distributional information that is powerful enough to support superficial strategies in a broad range of linguistic and conceptual tasks that might otherwise be assumed to require deeper processing (Glaser, 1992; Solomon and Barsalou, 2004). For example, Solomon and Barsalou (2004) showed that responses in property verification tasks, where participants judge whether a property is usually true of an object (e.g., lemon-yellow) are predominantly based on word associations (e.g., “lemon” and “yellow” are closely associated, therefore respond yes) rather than on conceptual access. Indeed, this shallow, linguistic shortcut is the norm unless special care is taken to include filler items that are closely associated but nonetheless false (e.g., monkey-banana), which forces people to simulate the entity in question in order to avoid associative errors. In terms of conceptual combination, knowledge of how words have previously combined affects how people understand and evaluate future word combinations. Lapata et al. (1999) showed that the co-occurrence frequencies of adjective-noun combinations (e.g., strong tea versus powerful tea) were highly correlated with human plausibility ratings of those combinations, while the frequency of the noun alone was not. Indeed, the influence of the linguistic system is not limited to language stimuli. When participants were presented with two images in vertical alignment (e.g., a lamp above a table) and asked to judge whether the items usually appeared in those relative positions in the real world, Louwerse and Jeuniaux (2010) found that word order was a significant predictor of response times. Even though lamps are usually found above tables (and seldom below), people’s ability to perform this ostensibly visuospatial memory task was affected by the fact that “table … lamp” is a more common linguistic construction than “lamp … table”.

Regarding the timecourse predictions of both LASS (Barsalou et al., 2008) and the SIH (Louwerse and Jeuniaux, 2008), recent evidence supports the notion of the linguistic system offering a fuzzy heuristic that operates faster than the more precise simulation system. Louwerse and Connell (in press) analyzed the corpus distributions of a large set of perceptual object properties and found that, while human ratings are distinct for five perceptual modalities (i.e., auditory, gustatory, haptic, olfactory, visual: Lynott and Connell, 2009), distributional statistics identified only three “linguistic modalities” (i.e., auditory, visuo-haptic, and olfacto-gustatory). Previous work had shown that that switching between perceptual modalities in consecutive trials incurs a processing cost (e.g., Pecher et al., 2003). In a modality switching paradigm that asked people to verify modality-specific properties (e.g., haptic marble can be cool), Louwerse and Connell tested whether switching costs were better predicted by switches between three distributional linguistic modalities or five simulated perceptual modalities. Consistent with LASS and SIH predictions, fast responses showed an effect of linguistic switching, while slow responses showed an effect of simulated perceptual switching. In other words, not only do these findings demonstrate distinct roles for the linguistic and simulation systems, but also their relative impact in the timecourse of responses.

Evidence for Situated Simulated Information in Conceptual Processing

However, linguistic distributional information has limits. Previous experience with language will not suffice when trying to judge whether a description of a novel situation is sensible. People’s capacity to understand how a sweater filled with leaves can be used as a pillow is rooted in their ability to simulate the objects’ affordances and mesh them into a coherent situation (Glenberg and Robertson, 2000; see also “Affordances and Meshing in the Simulation System”). Just as objects, people, ideas, and emotions are encountered as part of broader, situated experience, the representations that people create during conceptual processing are situated simulations. When reading about everyday objects, people simulate perceptual properties such as shape (Zwaan et al., 2002), color (Connell, 2007; Connell and Lynott, 2009), and spatial location (Estes et al., 2008). For example, Estes et al. showed that people were slower to respond to an X at the top of the screen after having read cowboy hat (as opposed to cowboy boot) because the simulation of a cowboy hat was occupying their attention in its typical, high location. In a property-listing task, Wu and Barsalou (2009) found that people listed visual features of novel adjective-noun combinations that were occluded for the canonical noun: for example, roots and dirt were rarely listed for lawn, but were frequently listed for rolled-up lawn. Crucially, Wu and Barsalou showed that the pattern of property listing was not due to shallow processing in the linguistic system, but came from the visual simulation of the conceptual combination. Neuroimaging studies have also demonstrated modality-specific perceptual simulation during conceptual processing (see Barsalou, 2008, for review). González et al. (2006), for instance, found that passively reading scent-related words increased activation in the piriform cortex, an area normally activated during olfaction. Furthermore, Goldberg et al. (2006) showed that, when people verified properties related to color, sound, touch, and taste, regions of the cortex normally associated with perceiving visual, auditory, haptic, and gustatory information were activated.

Indeed, the emergence of several perceptual phenomena during conceptual processing strongly suggests that the conceptual system has co-opted the perceptual system for the purposes of representation. One such phenomenon is the tactile disadvantage: in perception, people are generally slower to detect tactile stimuli (e.g., finger vibration) than visual (e.g., light flash) or auditory (e.g., noise burst) stimuli, even when they are told which modality to expect (Spence et al., 2001; Turatto et al., 2004). Connell and Lynott (2010) replicated this effect in conceptual processing by using a modality detection task, where participants were asked to judge whether a particular word corresponded to a particular target modality. They showed that people were slower and less accurate in responding to touch-related words (e.g., warm, itchy) than words related to vision, sound, taste, or smell. In both perceptual and conceptual processing, the tactile disadvantage reflects people’s difficulty in focusing attention on the tactile modality.

Another key phenomenon is the modality switching effect. In perception, switching costs arise when attention must be reallocated from one modality-specific neural system to another in successive trials: processing an auditory beep following a visual light flash following incurs a cost compared to when the trials are in the same modality (Spence et al., 2001; Turatto et al., 2004). Similar modality switching costs emerge when verifying properties in successive conceptual trials (e.g., auditory leaves can be rustling following visual apple can be shiny: Pecher et al., 2003; but see also Louwerse and Connell, in press), or when verifying a property following a perceptual stimulus (e.g., auditory leaves can be rustling following a visual light flash: van Dantzig et al., 2008). Moreover, modality switching costs are not just restricted to the retrieval of familiar conceptual information, but also emerge during conceptual combination when a new conceptual entity is created. Connell and Lynott (in press) asked participants to interpret adjective–noun combinations that had been normed to produce interpretations that related strongly to one perceptual modality (e.g., interpretations for jingling onion were predominantly auditory). They found that people were slower to interpret novel compounds when they followed familiar compounds in a different modality (e.g., auditory jingling onion following visual shiny penny). Importantly, switching costs in this interpretation paradigm were not subject to the linguistic shortcut that Louwerse and Connell (in press) observed in a property verification paradigm. Rather, situating the simulation of a novel conceptual combination in one perceptual modality incurs a switching cost if attention has already been grabbed by another modality.

Situated simulations are not just perceptual, but also extend to motor, affective, and other representations. Glenberg and Kaschak (2002) found that, when people read a sentence such as “You handed Courtney the notebook,” they were faster to make a movement away from their bodies (compared to towards their bodies), which is consistent with having simulated the situationally appropriate movement. Even the current physical situation of the body can influence conceptual processing. For example, participants who made their responses into a microphone were faster to process phrases about mouth-related actions (e.g., “to suck the sweet”) than hand-related actions (e.g., “to unwrap the sweet”), even though there was no response time difference for participants who responded with a foot pedal (Scorolli and Borghi, 2007). Similar effects extend into the simulation of affective situations. People’s speed in understanding sentences that describe sad or unpleasant situations (e.g., “You hold back your tears as you enter the funeral home”) is facilitated when the mouth forms a pouting expression by holding a pen between the lips (Havas et al., 2007), but inhibited when botulinum toxin has been used to immobilize the frown muscles (Havas et al., 2010). Similarly, adopting a congruent bodily posture facilitated people’s recall of various social situations such as opening a door for a visitor or applauding at a concert (Dijkstra et al., 2007). Taken collectively, the above evidence indicates that conceptual processing routinely requires perceptual, motor, affective, and social situated simulations.

ECCo: Embodied Conceptual Combination

Both linguistic and simulation information are central to conceptual representations, and are therefore also central to the processes of conceptual combination. When we refer to conceptual combination, we mean creating or understanding a new concept by actively combining two already-known concepts (e.g., mushroom chair as a chair shaped like a mushroom). Our goal in presenting an embodied theory of conceptual combination is to put forward a single framework that can accommodate all the above evidence regarding the roles of the linguistic and simulation systems alongside the plethora of findings that have accumulated over the years in the conceptual combination literature. ECCo is thus the first theory of conceptual combination to do so.

The mechanisms described in this paper encompass the processing of both lexicalized and novel compounds, since the end representation in both cases is still a situated simulation. However, actively constructing a meaning for a novel combination should be distinguished from simply retrieving already-known concepts labeled with a lexicalized phrase. For example, processing the compound office chair also relies on linguistic and simulation information, but it does not require an active conceptual combination process in order to be successfully simulated and understood because it has a strong, frequency-reinforced link between the phrasal unit and its simulation (i.e., it is easily retrievable). A novel compound, conversely, is missing this link and therefore requires other means to arrive at a simulation and interpretation (i.e., a combination process).

Combination Processes

All conceptual combinations are situated, meaning that they involve representing a broader setting as part of the simulation. When a compound is presented, both linguistic and simulation systems are rapidly engaged; activation begins to spread out from the words to other linguistic tokens, the neural mechanisms involved in direct experience begin to simulate perceptual, motor, affective, and other situated information, and the two systems continually feed into one another (i.e., words help to activate simulations, and simulations help to activate words). In ECCo, as in other embodied theories of language comprehension (Barsalou et al., 2008), peak activation of the linguistic system is usually reached before peak activation of the simulation system. It is important to note that this is a statistical trend only – fast, shallow responses tend to rely more on linguistic information, and slow, deep responses tend to rely more on simulation information (e.g., Louwerse and Connell, in press) – but this trend can still influence the conceptual combination process, depending on the task at hand.

Differential task demands

If a participant is simply asked whether a noun–noun compound is sensible (i.e., whether or not it makes sense: Gagné and Shoben, 1997; Estes, 2003a), then this is a relatively shallow judgment for which the linguistic system offers a quick and dirty shortcut. If a compound consists of two words that have no shared statistical, distributional history, then the linguistic system will offer an heuristic for rejecting the compound as non-sensical without any attempt at conceptual combination actually taking place. On the other hand, if a compound consists of two words that are very frequently juxtaposed, then the linguistic heuristic will lead to its acceptance as sensible. Of course, participants do not have to rely solely on this linguistic shortcut just because it exists – they may use the simulation system as a double-check on any compounds that seem linguistically sensible, or some individuals may even base every decision on whether the concepts can combine into a coherent simulation – but an easy shortcut is hard to refuse. Because the linguistic heuristic is faster and computationally cheaper than basing a judgment on the simulation system, and because there are no penalties within the sensibility judgment paradigm to prevent its use (e.g., Solomon and Barsalou, 2004), participants can safely exploit it.

Interpretation tasks contrast with sensibility judgments in requiring deeper processing in the simulation system (Tagalakis and Keane, 2006; Lynott and Connell, 2010). If a participant is asked to give an interpretation for a noun–noun compound, then there must be an attempt to actually combine the concepts before making a response. If the concept affordances cannot be meshed in a situated simulation, then the compound will not be interpretable, but a successful situated simulation can be described in words as an interpretation for the compound. However, it is still sometimes possible for a noun–noun compound to be given a definitional interpretation predominantly on the basis of shallow, linguistic information. For example, a typological definition of cactus beetle as “a type of beetle” does not necessarily require any deep processing. Or if someone is told that a sun holiday is a holiday in the sun, she should be able to define a snow holiday as a holiday in the snow, or a desert holiday as a holiday in the desert, without necessarily requiring the simulation system. Because the linguistic and simulation systems operate in overlapping waves, with only a statistical tendency for the linguistic system to be faster, such rapid definitions do not mean that the simulation system is not engaged at all. Rather, the definition can be triggered and the participant can respond just from linguistic information, but, even while speaking or pressing the response button, the situated simulation is still taking shape.

Affordances and meshing in the simulation system

Each concept in the compound has a myriad of potential affordances based on past experience, and many more can be created on the fly if the situation requires (Glenberg and Robertson, 2000). We use the term affordances in a broader sense than just the perceptual-motor properties proposed by Gibson (1979) and the action-enabling view proposed by Glenberg (1997Glenberg and Robertson, 2000). Similar to Estes et al. (in press), we view affordances as embodying much of what is often described as relational information, by referring to the ways in which a concept offers opportunities for meshing with other concepts. When the head and modifier concepts¹ are paired in a compound, they mutually constrain the number and type of affordances that can be meshed (see Maguire et al., 2010).

Meshing describes the process of integrating the complementary and potentially interactive aspects of two or more concepts, and “underlies our ability to understand novel combinations” (Glenberg, 1997, p. 6). Both concrete and abstract concepts can mesh affordances. Because relatively abstract concepts are heavily reliant on simulating perceptual, social, introspective (Barsalou and Wiemer-Hastings, 2005; Wiemer-Hastings and Xu, 2005), and affective information (Kousta et al., 2010), they still afford meshing with objects, agents, and other entities that can cause changes in mental states. An elephant complaint (see “Choice of Process”) allows the concepts to mesh in a situation where either the complaint itself is large and important, or where the elephant is the originator or subject of the complaint. Indeed, the affordances of relatively abstract concepts can be meshed in a variety of situations, including perceptual (e.g., value sandwich as a sandwich that is cheap or good value for money), social (e.g., fame advantage as the favorable position conferred by being well-known), affective (e.g., stress season as a particular time of year when people are extra-stressed), and so on. A successful mesh will result in a coherent and stable simulation, which is the goal of the conceptual combination process.

There are no hard and fast rules regarding whether a particular concept is suitable for meshing in a particular situation – it entirely depends on the other concept used. For example tree snake is unlikely to be a snake that eats trees, and so one could argue that, in fact, tree can never be meshed with an eating situation because trees just aren’t eaten. However, this assumption would be inaccurate. A tree termite could easily be a termite that eats trees; suddenly, trees afford eating. In short, it is not the case that one can independently slot the head and modifier concepts into a particular role or frame. The affordances of the head or modifier concept are affected by the other concept in the combination.

Affordances can be meshed in one of two ways. Sometimes, the head and modifier concepts are meshed directly with each other even if this involves substantial destruction of one of the concepts. For example, cactus and beetle can destructively combine into a spiky beetle because cactus is reduced to its spikiness, and beetle affords having a variety of exoskeleton shapes for defense or camouflage (thus giving rise to a situated simulation where the beetle wards off predators with its sharp spikes, or uses its green and spiky casing to hide on the surface of a cactus, etc.). However, sometimes this meshing is non-destructive as it incorporates the head and modifier concepts in a situation that requires little adaptation. For example, cactus and beetle mesh easily with an eating situation because beetles must eat something and cacti are a plausible food for beetles (thus giving rise to a situated simulation where the beetle is sitting on a cactus and eating away, or is munching a piece of cactus flesh as pet food, etc.).

Destructive and non-destructive processing

All else being equal, it is quicker to leave two concepts intact than to engage in situationally appropriate destruction. However, it’s rare that all else is equal between two possible interpretations, which means that non-destructive processing is often, but not necessarily, faster than destructive processing. The length of interpretation time depends on the associations from the linguistic representation (e.g., does this compound resemble any lexicalized compounds? what are the close associate words?), the interaction between the linguistic and simulation systems (e.g., has a similar simulation been created before for these kinds of concepts?), and the ease of creating the situated simulation itself (e.g., can the mutually constrained affordances mesh in a plausible situation?). Sometimes interpretation is easy, whether destructive or non-destructive, and sometimes effortful.

The main difference between destructive and non-destructive processes lies in how the affordances are constrained and meshed. The destructive process seeks to mesh the head and modifier concepts together even if it means substantially reducing one of them, while the non-destructive process seeks to mesh the head and modifier affordances in a situation that allows both concepts to remain relatively intact. Note that in both destructive and non-destructive processes, meshing the concept affordances is not solely the province of the simulation system. The linguistic system also helps to cue and create these affordances, and thus helps to determine which process is followed. Take the compound whale seal: immediately on encountering the word “whale,” closely associated linguistic tokens will be activated, including “fish,” “big,” “ocean,” and so on. Such tokens will, in their turn, begin to activate their relevant representations in the simulation system, such as “big” rapidly and automatically drawing attention to the visual and haptic modalities (Lynott and Connell, 2009; Connell and Lynott, 2010). Furthermore, whale seal is analogous to the (for some people) lexicalized phrasal token elephant seal, and so the simulation of this existing species of large seal will also begin to be formed. This rolling wave of linguistic and simulation activations will help to reduce whale to its bigness and to cue the affordance of seals coming in a variety of sizes, and so lead to the common interpretation of whale seal as a type of large seal. There are, of course, many other interpretations possible for whale seal (e.g., a seal that hunts whales, a seal with black-and-white orca-like markings), but they will all follow a similar course of affordance cuing and meshing.

Choice of process

Critically, destructive and non-destructive interpretations do not compete in parallel within the mind of any one individual. It is cognitively wasteful to pursue destructive and non-destructive processes concurrently, and, while possible, it is not the norm. Rather, even though both processes may be open to pursuit at the start of interpretation, one process is preferentially enabled by a number of interactive factors (depending on, e.g., frequency of encountering similar compounds, previous attentional focus on aspects of the concept, experience with a plausible non-destructive situation, available perceptual resources for representing the destructive form of a concept). An individual thus commits quickly to either a destructive or non-destructive interpretation, and attempts to create a coherent simulation using that process.

Take the concept elephant: usually, when people refer to elephants they mean the holistic animal. Additionally, people have plenty of experience of the word being used to refer to something large and ungainly, both in isolation (e.g., “he’s such an elephant”, meaning he is of large build and/or is clumsy in his movements) and in lexicalized compounds (e.g., elephant seal and elephant garlic both emphasize larger than normal size). Thus, experience has built up a link between the linguistic system’s “elephant” token and a simulation of largeness. Furthermore, because elephants tend to be larger than any surrounding creatures in most situations in which they are encountered, people have plenty of experience of their attention being drawn to the elephant’s large size, meaning that the simulation system is also likely to emphasize largeness in the visual and haptic components of the elephant simulation. Therefore, when one encounters the compound elephant complaint, one can either commit to keeping the elephant in its intact form or to using a reduced version.

For some people, the holistic form of elephant is highlighted (due to recency and priming effects as well as cumulative experience) and so they will attempt a non-destructive combination. Here, because elephant and complaint can mesh together in a situation where the elephant constitutes the reason for the complaint, the non-destructive interpretation of elephant complaint could be a complaint that people make about the behavior of an elephant at a zoo. For others, a reduced form of elephant is highlighted by its presentation in the compound and so they will attempt a destructive interpretation. Here, the elephant’s largeness can mesh with complaint in a situation where size is equated to seriousness, giving the destructive interpretation of elephant complaint as a large and important complaint. Of course, other destructive interpretations are possible if some other reduced form of elephant is highlighted for a particular person (e.g., a long-living complaint that is never resolved), because considerable individual differences exist in linguistic and simulation experience.

Destructive Interpretations

A destructive interpretation occurs when one, or both, constituent concepts are reduced during the interpretation process from their intact holistic forms to some situationally appropriate aspect(s) of the concepts. Sometimes one concept is reduced to a particularly salient or diagnostic aspect (e.g., the slowness of a snail, the black-and-white stripes of a zebra, the coldness of icicle), but, since both concepts mutually constrain each others’ affordances, what appears salient or diagnostic for a concept in isolation may not apply to a concept in combination. For example, icicle fingers may be interpreted as freezing cold fingers: here, icicle is reduced to its coldness because it can mesh with the affordance of fingers to have a variety of temperatures. However, icicle fingers can also be interpreted as cold and stiff fingers, even though stiffness is not usually a salient or diagnostic aspect of icicle in isolation, because fingers also afford variations in flexibility according to temperature. We have experience of fingers being difficult to bend when they are particularly cold, and so reducing icicle to its coldness and stiffness allows the two concepts to mesh together. Indeed, such complementary affordances are part of situating the combination in our wider experience of cold and physical sensation.

Reversals

With destructive interpretations, the head concept usually remains intact while the modifier is destructively reduced, but this is only a general pattern rather than a golden rule. Nothing precludes the modifier staying intact while the head is reduced (or, indeed, both concepts being reduced: see llama camel in “Types of Interpretation”). For example, take butter police as referring to the dietary advisors who replace pats of butter in university canteens with low-saturate butter substitutes; here, police is reduced (to its function of enforcing regulations), while butter remains intact (because it is the thing being regulated). A stone duck (i.e., a statue), toy duck (i.e., a child’s plaything), or cloud duck (i.e., a distinctively shaped cloud) all reduce duck to its general shape, and in the toy’s case, maybe also its color: there is no longer an actual animal present in the simulation. Sometimes, reducing the head and leaving the modifier intact means that the focus of the interpretation is actually on the modifier concept. In the earlier example of icicle fingers, most of the interpretations kept the focus on the head (i.e., cold fingers are still fingers). However, other interpretations of icicle fingers could focus on icicle, such as “finger-shaped icicles forming outdoors in the cold.” Such cases can be described as reversals, because the same interpretation could be produced from switching the order of the head and modifier (e.g., finger icicles). Similarly, cloud duck and duck cloud are both interpretable as a duck-shaped cloud. In this way, ECCo does not distinguish in principle between non-reversal and reversal destructive interpretations: both result from reducing one concept to certain situationally relevant aspects and meshing with the other concept. Whether an individual chooses to reduce the head or modifier will depend on their past experience with similar words, concepts, combinations, and situations.

Figurative combinations

Some compounds that could be described as having figurative interpretations actually use existing meanings of polysemous words. For example, tiger executive could refer to an executive who is fierce or ruthless in business dealings, but this interpretation makes use of the fact that the word “tiger” already has a standard figurative meaning of fierce or ruthless (e.g., “used to refer to someone fierce, determined or ambitious”: New Oxford American Dictionary, 2009). In this case, the destructive interpretation process is assisted by the rapid retrieval and simulation of fierceness from the tiger modifier, which can mesh with the head concept as a trait of the executive in question. Similarly, taste explosion exploits the standardized use of explosion to refer to suddenness in sensory experience (e.g., “a sudden outburst of something such as noise, light, or violent emotion”: New Oxford American Dictionary, 2009), which can easily mesh with taste in the simulation of a sudden burst of flavor in the mouth. In other words, such combinations are destructive interpretations that are greatly assisted by previous experience of a concept’s usage in a reduced form.

However, many other combinations that could be described as figurative are more novel in their juxtapositions. Such combinations tend to be destructive, with one or both concepts being reduced in the situated simulation to an adapted form of some situationally appropriate aspect. For example, the compound dragon soup can be interpreted as a hot and spicy soup with chili, which involves reducing dragon to its hot, fire-breathing aspect. The synesthetic conversion of hot from the sense of high temperature to that of chili spiciness is facilitated by the fact that the word “hot” is polysemous, with a conventional meaning that refers to the gustatory heat of chilies (see also Lynott and Connell, 2009). Hot, spicy chili is thus not a directly reduced aspect of dragon, but is rather an aspect that has adapted from tactile heat to gustatory heat, assisted by the linguistic system that allows the simulation of spicy taste from the word “hot,” which then affords meshing with soup. In other words, many ostensibly figurative interpretations are destructive interpretations that are greatly assisted by previous experience of how a concept’s associates may be simulated in more than one form.

Non-Destructive Interpretations

A non-destructive interpretation occurs where the constituent concepts remain relatively intact in a shared situation. Both concepts, in their holistic forms, have mutually constrained affordances that mesh together in a situated simulation. An octopus apartment, for example, could be an apartment where an octopus lives: octopus affords having a place to live, and apartment affords providing a home, and so the two concepts mesh in a living arrangements situation. As with destructive interpretations, participants frequently specify details of how they have situated their simulation when they give interpretations. Many of our participants (Lynott and Connell, 2010) situated their simulations in ways that explain why an octopus might be living in an apartment, such as “an apartment for an octopus in an octopus sanctuary,” “an apartment that has a pet octopus in it,” or “an underwater apartment block for an octopus, like in Spongebob” (see “The Importance of Experience”). Although each of these participants situated their simulations slightly differently, they all succeeded in combining the concepts non-destructively as variants of the “place where an octopus lives” interpretation. Indeed, a conceptual combination can often be non-destructively interpreted in very different ways because the concepts have meshed in different situations. For example, a kidnapper killer could either be someone who kills kidnappers (because a killer must have a victim, and a kidnapper affords being the target of a killer for a variety of revenge or vigilante reasons), or a kidnapper who kills his or her victims (because both kidnapper and killer have victims, and afford merging the two crimes into the actions of one individual).

Reversals

In many non-destructive combinations, the order of the head and modifier concepts is not particularly important to the way in which the affordances mesh. The only difference is attentional focus. For example, an octopus apartment (as an apartment where an octopus lives) and an apartment octopus (as an octopus who lives in an apartment) essentially describe the same situation, with attention focused on different elements according to which concept is in the head position (although attentional focus can also be influenced by contextual and prosodic effects: e.g., Fernald and Mazzie, 1991). The level of detail is likely to differ according to attentional focus, so that a simulation of a murder town may situate extra details on the safety and desirability of the town itself, while a town murder may situate extra details on the nature or victim of the murder. ECCo therefore takes the same position with non-destructive interpretations as it does with destructive interpretations: it does not distinguish in principle between non-reversal and reversal interpretations because, in many cases, the simulation is essentially the same and only differs in attentional focus.

Representational potential

Sometimes, a non-destructive combination does not need to have both head and modifier concepts fully present in the simulation, but instead the situation allows one concept to exist in potentia. For example, coffee cup is a lexicalized compound, but it can be represented in more than one way: as a cup that contains coffee, or as a cup that can potentially contain coffee as its usual purpose. In the latter case, there is technically no coffee present. However, this absence does not mean that the interpretation is destructive. Rather, because the situated simulation involves a representational placeholder for an intact concept of coffee, it is a non-destructive interpretation in which the modifier concept exists in potential form. Similarly, cactus beetle (meaning a beetle who eats cacti) does not necessarily require a cactus to be present in the simulation, but the representational potential for an intact cactus concept means that the interpretation is non-destructive.

In terms of simulation content, the potential coffee or cactus in such interpretations is similar to the representation of a negated object. Kaup et al. (2006) found that, after reading a sentence such as “there was no eagle in the sky,” people were faster to respond to a picture of a eagle with outstretched wings than if they had just read “there was no eagle in the nest.” In other words, even though there was no eagle present in the described situation, the potential shape of the eagle was nevertheless simulated. Likewise, even though there may be no cactus present in the above interpretation of cactus beetle, the potential existence of cactus (as a source of food for the beetle) is still simulated.

The Importance of Experience

Any two randomly paired nouns can be processed as a conceptual combination, but past experience will make some compounds more likely than others to produce a plausible interpretation in most people. An interpretation will be plausible if it fits with prior experience and knowledge (Costello and Keane, 2000; Connell and Keane, 2004, 2006). It is important to note that plausibility does not depend solely on experience of the real, mundane world, but that experience of fictional words also counts. The consequence is that interpretations do not have to adhere to the conventional laws of gravity (e.g., elephant bubble as an elephant floating around in a bubble), genetic combination (e.g., canary pear as a cross between a canary and a pear), actuality (dragon soup as a soup made with dragon meat), or animacy (chair complaint as a chair complaining about being sat upon) in order to be plausible. In other words, prior experience counts whether vicarious and fictional or direct and physical. For example, more than one of our participants (Lynott and Connell, 2010) interpreted octopus apartment as an underwater apartment where an octopus lives, like in Spongebob. While there is no octopus character in the cartoon Spongebob Squarepants, there are, nonetheless, a number of other sea creatures, such as crab, squid, and starfish, who live in underwater houses and apartments. Thus, the cartoon world of Spongebob Squarepants provides a useful set of situational affordances into which octopus and apartment can plausibly fit.

If the simulation can mesh the head and modifier into a familiar situation (e.g., horse house as a stable), then this interpretation will be readily accepted. Even if the head or modifier concepts do not fit the situation exactly, past experience may still provide a useful basis for interpretation because compounds are often interpreted by analogy with a more familiar compound (Lynott et al., 2004; Tagalakis and Keane, 2006). A bullet car, for instance, is similar to the lexicalized compound bullet train, and people interpret it similarly (i.e., as a fast car) rather than some other plausible interpretation (i.e., a car for transporting bullets). Such use of past experience with related combinations is fundamental to ECCo’s account of the combination process. The fact that “car” and “train” are closely related linguistic tokens, and the fact that a simulation of car can be situationally adapted in many of the same ways as a simulation of train, means that familiarity with a bullet train makes a bullet car easier to interpret.

Conceptual Combination in Development

Children are capable of both destructive and non-destructive conceptual combination from quite early stages of linguistic and conceptual development. By the age of three, most children can process a variety of non-destructive conceptual combinations. For example, Clark et al. (1985) found that, when asked to point to the mouse hat, most 3-year olds could reliably point to the relevant picture (i.e., a mouse wearing a hat) as opposed to distractor pictures of a mouse, a hat, or a fish wearing a hat. Performance for these non-destructive combinations was at ceiling by the age of four. Nonetheless, children of this age group are also capable of destructive conceptual combination. When asked to point to the picture of a rabbit car, 3-year olds preferred to point to a destructive interpretation (i.e., a car with rabbit ears and a fluffy tail) than a non-destructive alternative (i.e., a car beside a rabbit) or pictures of either object alone (Nicoladis, 2003). Four-year olds showed the same pattern, but were even more likely to choose the destructive interpretation. While it could be argued that the available non-destructive interpretations were in some way inferior or unlikely (e.g., a sun bag as a bag beside a multi-rayed cartoon sun), they nonetheless represented a valid means of distinguishing the compound subcategory from the head category (e.g., the bag beside the sun as opposed to the bag by itself); a pragmatic reason for conceptual combination (Downing, 1977; Clark et al., 1985; Wisniewski, 1997). Indeed, when children were given only two options to chose from – the destructive and non-destructive interpretations – they showed equal preference for both pictures (Gottfried, 1997). In other words, while children were capable of both destructive and non-destructive conceptual combination, they were not always sure which was the “correct” strategy for interpreting the compound.

Many of the difficulties experienced by children in understanding novel compounds are consistent with a preference to simulate two intact concepts, with younger children in particular having problems with combinations that require extensive concept destruction or representation of a concept in potentia. Regarding destructive conceptual combination, Gottfried (1997) found that certain types of destruction were harder than others for children to process. Compounds like fish plate or butterfly mask, where the modifier concept has been reduced to multiple visual features such as shape, texture, and color, posed few problems for children. In a picture-pointing task, 3-year olds could successfully identify the destructive interpretation (e.g., a mask decorated to look like a butterfly) as opposed to distractors (e.g., a butterfly, a mask, or a mailbox), and 5-year olds’ performance was at adult level. In contrast, performance of both age groups was much worse for compounds that reduced the modifier concept to just a single visual feature, such as basic shape (e.g., mitten leaf as a leaf shaped like a mitten), or color/pattern (e.g., zebra shells as shells patterned with black-and-white stripes). Indeed, for these items, the 3-year-old children were close to chance in choosing the correct picture. These findings suggest that, by the age of three, children are willing and able to destructively interpret noun–noun compounds, but find it easier to do so when more of the modifier concept is left intact in the simulation.

Furthermore, there is some evidence that children aged three and younger may have difficulty with non-destructive interpretations where one concept exists in potentia (e.g., a baby bottle does not require that a baby is actually present in the simulation, but instead has the representational potential for the baby’s existence). Krott et al. (2010) used novel words and objects in an attempt to control the amount of information children had about the concepts being combined. Children were introduced to a pair of novel objects in two different configurations, either showing possession/attachment (e.g., two objects that have been glued together, described as “a donka that has a kig”) or function (e.g., one object is actively placed inside the other, described as “a donka that holds a kig”). When asked to point to the kig donka, children aged four and over behaved like adults in choosing each type of combination in approximately equal numbers. However, very young children (2- and 3-year olds) tended to prefer the simple combination where both objects were permanently joined to one another. In other words, the youngest children found it difficult to represent a kig donka as an object whose function is to hold a kig, because such a combination would require that the kig be represented in potentia (i.e., it is still a kig donka whether or not a kig is present). Thus, between the ages of two and four, children begin to lose their preference for representing two intact concepts and become capable of simulating the potential existence of one concept in a combination.

Accounting for Classical Effects in Conceptual Combination

As well as offering a theoretical model that is based on the importance of grounded simulations in conceptual representation, the ECCo theory is also consistent with empirical findings from decades of classical conceptual combination research.

Property specificity

People represent the same color term differently depending on the object it describes. For example, Halff et al. (1976) found that people represented the color red differently when paired with hair, wine, flag, brick, and blood. Using a similarity-rating paradigm, they found that people rated the similarity of red flag and red light to be greater than the similarity between red light and red wine. Similarly, Medin and Shoben (1988) found that, when asked to compare the color gray with both black and white, people considered gray to be more similar to white in the context of hair, but more similar to black in the context of clouds. In terms of perceptual simulation, these results are unsurprising and largely inevitable. Since all of these words and combinations refer to known or lexicalized concepts no actual combination process is required to understand them. Rather, a simple retrieval process will suffice. Because the redness of wine and the redness of a light are initially perceived as being different hues, they will be perceptually simulated as different hues. The same could be said of other object properties such as size: a tall ladder and a tall man are perceived differently (i.e., a tall ladder would be considerable taller than a tall man) and so their perceptual simulation will reflect these differences.

Typicality gradients

For any given category it is possible to list members of that category in descending order of their typicality. So, people judge members of the category “spoon” to be typically small and metal. However, people also readily agree that large wooden spoons are typical members of the “spoon” category, equal to small metal spoons (Medin and Shoben, 1988). As with the property specificity, this effect is not surprising when such retrieval is based on situated simulations of prior experience rather than the rearrangement or modification of correlated size and substance attributes within an amodal SPOON concept. From an embodied perspective, small metal spoons and large wooden spoons are used in very different situations with different accompanying objects (e.g., adding sugar to a teacup versus stirring ingredients in a mixing bowl), different grips (precision versus power), different motor actions (finger and wrist movement versus full arm and shoulder movement), and even different bodily postures (often sitting versus standing). Thus, in ECCo, because people simulate situationally appropriate information when they retrieve concepts, we should not be surprised that people are happy to accept both large wooden spoons and small metal spoons as representing typical spoon experiences.

Emergent properties

When people are asked to list features or properties of a combined concept they often list features that are not mentioned for the concepts in isolation. For example, pet birds are described as living in cages and able to talk, even though these features are not listed for pet or bird in general: such features have been described as emergent properties (Hampton, 1987). Since pet bird is a lexicalized compound, people will be able to form a situated simulation by retrieval (rather than by active conceptual combination), which is likely to contain situational information as to where the bird lives (in a cage or aviary) and what sounds it makes (learned words and phrases as well as squawks). Thus, these so-called emergent features do not materialize from the ether, but rather come from the situated nature of the simulation, based on each individual participant’s own experience of pet birds (see also Barsalou, 1999).

To take a more novel example, a helicopter blanket is often said to be waterproof, even though neither helicopters nor blankets are generally described as such (Wilkenfeld and Ward, 2001). Here, although the combination helicopter blanket may not be directly retrievable due to its novelty, the same story applies. Because people create a situated simulation for any combination they fully interpret, its situation will often contain information that may not necessarily be present in more usual experiences of a concept (e.g., a blanket on a bed). By situating a helicopter blanket outdoors as part of the process of meshing the concepts into a type of cover for a helicopter, many of Wilkenfeld and Ward’s participants included situationally-appropriate details to their simulation that suggest why a helicopter might need to be covered, such as the cover being camouflaged, waterproof, fireproof, or durable. In ECCo, such emergent features are an inevitable consequence of a situated simulation.

Relation frequency

Interpretations that use a high-frequency relation of the modifier (mountain lake, mountain cabin, mountain stream are all located in mountains) are understood more quickly than ones that use a low-frequency relation (mountain magazine is about mountains; Gagné and Shoben, 1997). In ECCo, strong links between the modifier word in the linguistic system and particular situations in the simulation system (e.g., between “mountain” and a situation where entities or events have a mountain location) would lead to that situation being a strong candidate for meshing head and modifier affordances. However, recent evidence suggests that relation frequency applies to classes of groups of concepts rather than individual lexical items (e.g., the category geographical location rather than the specific modifier mountain), and that combination times are influenced by relational frequencies of the head interacting with those of the modifier (Maguire et al., 2010). In ECCo, such a finding is consistent with the idea that not only do people situate the entire combination during interpretation, but also that the affordances of the constituents of the combination constrain the possible interactions between the concepts. So, for example, a mountain rat is quickly and easily interpreted as a rat who lives in the mountains because mountains afford providing habitats for animals (similar to mountain goat, mountain dog), and rats afford having habitats in a variety of geographic locations (similar to desert rat, river rat), and so the affordances mesh in a habitat situation. In contrast, mountain carpet will take longer to interpret as a carpet with a pattern of mountains because there are fewer similar compounds to help mutually constrain the affordances: people have little experience of meshing mountains with fabric designs, or meshing carpets with geographical locations.

Context

A key issue in conceptual combination research has been whether the processes involved in interpreting novel combinations in and out of context are the same. Gerrig and Bortfeld (1999) showed that, out of context, the combination doll smile is more quickly interpreted than the combination baseball smile, but they are understood equally quickly in a supportive context. In ECCo, whether a combination is encountered in or out of context, it must be appropriately situated to be understood. The only difference a surrounding discourse context makes is to allow some or most of this situation to be already in place when a person encounters the novel combination. Obviously, prespecifying a complete situation can tightly constrain the possible interpretations that are situationally appropriate, and even suggest affordances for combinations that would ordinarily be difficult to situate. In Gerrig and Bortfeld’s work, the discourse contexts clearly established a meaning for the novel combinations in advance. Therefore, when people encountered the novel combination, much of the hard work of simulating a situation is already done, thereby minimizing the differences between the compounds that had appeared out-of-context.

Similarity

A compound is more likely to be interpreted in a destructive manner when its constituent concepts are similar (Wisniewski, 1997; Wilkenfeld and Ward, 2001). In such cases, it is more difficult to find a situation in which reasonably similar head and modifier concepts can mesh complementary affordances. For example, take the compound zebra clam, which combines two animals: it is difficult to generate a plausible situation that would allow both zebra and clam to be kept relatively intact because their affordances do not lend themselves to mesh in a single situation. Instead, it is easier to allow zebra to be destructively reduced to its color and pattern in a situation where clam remains intact, because clam affords having a variety of markings and textures on its shell. It is likely that such statistical regularities (i.e., that concepts from the same broad class, such as “living things,” tend toward destructive combination) are reflected in the linguistic system. Thus, in ECCo, encountering two similar tokens in a compound will lead to preferential activation of potential situations that involve destruction. Such situations may involve meshing one concept with the other on the basis of visual markings (e.g., zebra clam as a striped clam), size (elephant carrot as a huge carrot), thickness and texture (coat shirt as a thick, heavy shirt), motor function (hammer shoe as a shoe used to hammer in a nail), and many more.

In addition, the destructive combination process decreases perceived similarity between the constituent concepts. Estes (2003b) found that people believed concepts such as zebra and clam to be moderately similar when simply asked for their rating, but less similar when they had first interpreted zebra clam to be a striped clam. This finding is consistent with ECCo’s account of destructive interpretations; because there was relatively little of the original zebra concept left in the simulation, participants judged it to be quite dissimilar to clam. Furthermore, non-destructive interpretations show the opposite pattern by increasing perceived similarity between constituent concepts. Estes also found that people tended to judge concepts such as mountain and snake as more similar if they first interpreted the compound (i.e., as a snake that lives in mountainous areas). Since similarity between concepts is enhanced when they are incorporated in the same scenario (Wisniewski and Bassok, 1999), Estes’s participants rated the concepts as more similar because their simulation of the non-destructive interpretation left the constituent concepts intact.

Compositionality

Evidence is mixed regarding whether emergent properties of a compound (e.g., green for unripe banana) are represented faster (Springer and Murphy, 1992) or slower (Swinney et al., 2007) than properties that are true of the head concept but not the compound (e.g., yellow for unripe banana). However, these experiments predominantly used lexicalized compounds like boiled celery or peeled apple, which constitute concept retrieval rather than true combination. We know from other work in sentence processing that, when context implies an atypical representation of a concept (e.g., an unripe banana as opposed to a typically ripe one), both typical and atypical versions of the concept are simulated in parallel (Connell and Lynott, 2009). A similar mechanism could operate in the processing of lexicalized compounds, where unripe banana leads people to rapidly simulate a typical yellow banana alongside the specified green banana. Thus, pending new evidence of compositionality in the processing of novel compounds, ECCo remains equivocal on whether parallel simulations occur when two concepts are being meshed for the first time.

Comparison of ECCo with Previous Theories of Conceptual Combination

ECCo dispenses with many of the assumptions and dichotomies that are traditional in much conceptual combination research. In this section, we will concentrate on five current accounts of conceptual combination (see Table 1). Competition among relations in nominals (CARIN: Gagné and Shoben, 1997) posits that people interpret novel combinations using a set of thematic relations (e.g., made of, located, used by), where processing time depends on how often a particular relation has been previously used with the modifier concept. Dual process theory (Wisniewski, 1997) holds that two different processes compete in parallel to generate different types of interpretation for a compound: property-based interpretations are constructed by applying a property of the modifier to the head concept, while relation-based interpretations are constructed by binding the concepts to thematic roles in an augmented schema. The interactive property attribution model (IPA: Estes and Glucksberg, 2000) allows for both property and relational interpretations, but specifies that both types arise from the interaction of candidate modifier features and relevant head dimensions. Constraint theory (Costello and Keane, 2000) asserts that people use three pragmatic constraints – diagnosticity, informativeness, and plausibility – in order to narrow down the wide range of possible interpretations to an optimal few. Lastly, the retrieval–composition-analysis model (RCA: Prinz, 2002) argues for three stages of combination: attempt to retrieve lexicalized meaning, compositional integration of concepts (with two parallel processes for property and relational interpretations, as in dual process theory), and analysis using background knowledge. While there are many potential issues for discussion, the rest of this section will focus on the key areas in which ECCo diverges from previous theories.

TABLE 1

Table 1. Comparison of ECCo with existing theories of conceptual combination.

Theoretical Differences

ECCo differs from previous theories in many major ways. Table 1 summarizes the principal positions, with the most fundamental issues discussed below.

Nature of conceptual representation

ECCo describes how both the linguistic and simulation systems are central to conceptual representations. Of the existing theories of conceptual combination, some are agnostic as to the nature of the underlying representation (CARIN, constraint theory), while others take an explicitly amodal view of the conceptual system (dual process and IPA theories). Such views of the conceptual system lie in contrast to the embodied perspective that views conceptual representations as situated simulations. Both ECCo and the RCA model commit to the perceptual and motor basis of much of the conceptual system, although the RCA model describes concepts as frames or schemata that contain feature slots with particular values. In contrast, ECCo highlights the importance of affective and social information as well as sensorimotor (particularly for more abstract concepts), and describes conceptual structure in terms of affordances that are meshed when situating a simulation (see “A” ffordances and Meshing in the Simulation System”). Furthermore, no theory but ECCo underscores the importance of distributional linguistic information in conceptual combination², and how it can predominate depending on task demands (see “Differential Task Demands”).

Types of interpretation

Embodied conceptual combination describes interpretations as destructive or non-destructive, depending on whether the constituent concepts are reduced or left intact when their affordances are meshed. In contrast, existing theories of conceptual combination tend to categorize interpretations as relation-based (e.g., cactus beetle as a beetle that eats cacti) and property-based (e.g., cactus beetle as a spiky beetle), although CARIN disagrees that property-based interpretations constitute a distinct type. Dual process, constraint and RCA theories also include hybrids (e.g., llama camel as a cross between a llama and a camel, or a creature that is half-llama and half-camel) and/or conjunctives (e.g., pet rhino is both a pet and a rhino). However, the need for this fragmentation of interpretation types is questionable, with many of these categories serving only descriptive roles as a legacy of previous research (e.g., Downing, 1977). ECCo’s destructive and non-destructive interpretations subsume these categories, although their overlap is not isomorphic: while property-based interpretations are principally destructive, relational and hybrid interpretations conflate destructive and non-destructive combinations.

Most, if not all, property-based interpretations are destructive combinations in ECCo. The IPA model tends to focus on the transfer of a single property, although dual process, RCA and constraint theories are clear that multiple properties may be transferred. In ECCo, a destructive interpretation involves the reduction of one or both concepts to situationally appropriate aspects that can be meshed with the other concept’s affordances, which means that there is no default number of “properties” that may comprise a concept’s reduced form. So, for example, a zebra clam may indeed reduce zebra to a visual black-and-white striped pattern, but icicle fingers reduces icicle to a haptic, motor and proprioceptive representation of coldness and stiffness (see “Destructive Interpretations”).

Many relation-based interpretations qualify as non-destructive interpretations in ECCo. For example, CARIN specifies head- causes-modifier (e.g., flu virus), modifier-causes-head (e.g., mall headache), head-uses-modifier (e.g., gas antiques) and so on. However, one relation in CARIN’s taxonomy is always destructive (e.g., head-resembles-modifier: zebra clam). Furthermore, the same relation can vary in whether the actual interpretation is destructive or non-destructive. For example, the head-has-modifier relation is destructive in song book (described as a book that “has” songs) because song has been reduced to a purely visual representation (i.e., the song in song book does not contain any auditory component, which is generally a core aspect of a song). In contrast, picture book (a book that “has” pictures) is non-destructive because the pictures in question are still intact entities in the pages of the book. Other inconsistent relations include head-made-of-modifier (e.g., destructive stone lion versus non-destructive stone wall) and head-is-modifier (e.g., destructive horse toy versus non-destructive servant girl). Because ECCo does not rely on a set of abstracted relations, focusing rather on situated simulations to derive meanings, interpretations can be more specific than is possible within a finite relational taxonomy.

Hybrid interpretations are also split between destructive and non-destructive interpretations in ECCo. For example, a llama camel may be destructively interpreted as a cross between a llama and a camel: here, the resulting creature is part-llama and part-camel, but neither llama nor camel is simulated in holistic form because their meshing involves representing an offspring that retains some aspects of both. On the other hand, singer songwriter and pet fish both have non-destructive interpretations because there is still an intact singer and songwriter (or pet and fish) in the simulation even though the concepts have been meshed into a single individual (see “Non-destructive Interpretations”).

Evidential Differences

Embodied conceptual combination explicitly addresses many empirical phenomena in the conceptual combination literature (see “Accounting for Classical Effects in Conceptual Combination”) that other theories have failed to address (see Table 1). For example, the IPA model does not specify any role for wider conceptual knowledge in the combination process, contrary to the other theories (which specify a limited role at some point during processing) and to ECCo (which regards wider conceptual information as an inevitable and important resource in situating the simulation). Similarly, neither the IPA model nor constraint theory have addressed how context affects conceptual combination, while other theories allow it to influence the availability of relations or properties, and ECCo regards wider context as playing a central role in how the simulation is situated (see “Context”). Emergent properties are not currently explained by either CARIN or the IPA model, while other theories account for their appearance via elaboration of the combined concept, and ECCo holds that they arise naturally from the situationally-appropriate details in the simulation (see “Emergent Properties”). Indeed, ECCo is the only theory that is consistent with children’s developmental trajectory in first preferring to simulate two intact concepts to later becoming capable of simulating potential and much-reduced concepts in combination (see “Conceptual Combination in Development”).

ECCo and the RCA model are the only accounts of conceptual combination that can accommodate the role of perceptual information in the combination processes. For example, Connell and Lynott (in press) showed that people are slower to simulate a novel conceptual combination (e.g., visual shimmering tuna) if their attention has already been engaged by a different perceptual modality in a previous trial (e.g., auditory loud motorcycle), and that this modality switching cost is not due to linguistic associations between words. Similarly, effects of visual occlusion (Wu and Barsalou, 2009) and the orienting of spatial attention (Estes et al., 2008) are only compatible with the ECCo and RCA frameworks.

However, only ECCo is compatible with evidence that different types of interpretation emerge from early commitment to a particular process. The RCA model adopts dual process theory’s assumption that relation- and property-based processes compete in parallel in the mind of each individual, with the first process to be completed providing the interpretation. However, there is no positive evidence for this assumption, as much of the evidence cited in favor of parallel processes is consistent with ECCo’s early commitment account. For example, relation-based interpretations are usually, but not necessarily, faster than property-based interpretations (Gagné, 2000; Estes, 2003a; Tagalakis and Keane, 2006); however this does not mean that relation-based processing tends to “win” a parallel race, but simply indicates that one process, from start to finish, is generally faster than the other (see “Destructive and Non-destructive Processing”). Also, the finding that property-based targets are slowed down by relational primes just as much as relation-based targets are slowed down by property primes (Estes, 2003a) does not mean that the processes compete with each other, but merely shows that the processes do not operate sequentially with property-based processing as a last resort.

Critical evidence for the early commitment account and against the parallel assumption comes from Lynott and Connell (2010), who showed that prosody affects the speed of property-based interpretations, but not relation-based interpretations. Dual emphasis (i.e., equal prosodic stress on both nouns in the compound) led to faster response times for property-based interpretations (e.g., octopus apartment as “an apartment with eight rooms”) than relation-based interpretations (e.g., octopus apartment as “an apartment where an octopus lives”). Crucially, the relative proportions of different interpretation types were unaffected by prosody, with relation-based interpretations remaining more frequent than property-based interpretations even under dual emphasis. Since the fastest interpretation type was not the most frequent, as would be expected in a parallel race between processes where the first-completed interpretation “wins,” this finding is not consistent with the RCA assumption of parallel competition. Rather, it is consistent with the ECCo account that any one individual rapidly commits to either a destructive or non-destructive interpretation: those who committed to destructive processing were facilitated by dual emphasis, whereas those who committed to non-destructive processing were not.

Concluding Remarks

In ECCo, we have outlined a conceptual combination system based on the idea that people’s conceptual representations are built through complex interactions between linguistic, perceptual, motor, affective, introspective, and social experience. Both linguistic and simulation systems are critical to the combination process, but the new concept is fundamentally a situated, simulated entity. So, for example, a cactus beetle is represented as a multimodal simulation that includes visual (e.g., the shiny appearance of a beetle) and haptic (e.g., the prickliness of the cactus) information, all situated in the broader location of a desert environment under a hot sun, and with (at least for some people) an element of creepy-crawly revulsion. While ECCo builds on the contributions of existing theoretical and empirical work in conceptual combination, as well as drawing on the wider literature on language processing and embodied cognition, it marks a clear departure from previous work on conceptual combination in terms of representation and processing. Importantly, from these proposals, specific predictions can be derived to test the claims of the theory in observable behavior. For example, the depth-of-processing differences that arise according to task demands, and the specified early commitment to destructive or non-destructive interpretation, provide clear avenues for further investigation.

It is not possible in the initial presentation of a theory to address every issue, make every possible comparison, or describe every piece of supporting evidence, but we aim to provide a framework that will enhance our understanding of conceptual combination. Future work in this area will endeavor to explore some issues in greater depth, such as the mechanisms by which concepts mutually constrain each other’s affordances, the factors that enable children to develop their destructive combination skills, and the potential differences in brain localization between destructive and non-destructive interpretation processes.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The order of authorship is arbitrary. This work was supported by the UK Economic and Social Research Council (Grant number RES-000-22-3248).

Footnotes

^The constituent concepts in a combination (e.g., cactus beetle) have traditionally been referred to as the modifier concept (e.g., cactus) and the head concept (e.g., beetle); in English, the compound word order means that the modifier is assumed to come first and the head second. While we feel that this terminology is misleading – the “modifier” concept does not necessarily modify the head, and the “head” concept is not necessarily the primary focus of the combination – we have retained these terms for the sake of consistency with prior research.
^CARIN does incorporate a type of distributional information in the form of relation frequencies. However, the scope of this information is much narrower in CARIN than in ECCo as it does not consider other types of statistical linguistic information nor its interaction with the simulation system (see also “Relation Frequency”).

References

Barsalou, L. (1999). Perceptual symbol systems. Behav. Brain Sci. 22, 577–609.

Pubmed Abstract | Pubmed Full Text

Barsalou, L. W. (2008). Grounded cognition. Annu. Rev. Psychol. 59, 617–645.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Barsalou, L. W., Santos, A., Simmons, W. K., and Wilson, C. D. (2008). “Language and simulation in conceptual processing,” in Symbols, Embodiment, and Meaning, eds M. De Vega, A. M. Glenberg, and A. C. Graesser (Oxford: Oxford University Press), 245–283.

Barsalou, L. W., and Wiemer-Hastings, K. (2005). “Situating abstract concepts,” in Grounding Cognition: The Role of Perception and Action in Memory, Language, and Thinking, eds D. Pecher and R. A. Zwaan (Cambridge: Cambridge University Press), 129–163.

Clark, A. (2006). Language, embodiment, and the cognitive niche. Trends Cogn. Sci. 10, 370–374.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Clark, E. V., Gelman, S. A., and Lane, N. M. (1985). Compound nouns and category structure in young children. Child Dev. 56, 84–94.