Embodied Language Comprehension Requires an Enactivist Paradigm of Cognition

Van Elk, Michiel; Slors, Marc; Bekkering, Harold

doi:10.3389/fpsyg.2010.00234

PERSPECTIVE article

Front. Psychol., 27 December 2010

Sec. Cognition

Volume 1 - 2010 | https://doi.org/10.3389/fpsyg.2010.00234

This article is part of the Research TopicEmbodied and Grounded CognitionView all 24 articles

Embodied language comprehension requires an enactivist paradigm of cognition

Michiel van Elk^1,2*

Marc Slors³

Harold Bekkering¹

¹ Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, Netherlands
² Laboratory of Cognitive Neuroscience, Brain Mind Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
³ Department of Philosophy, Radboud University Nijmegen, Nijmegen, Netherlands

Two recurrent concerns in discussions on an embodied view of cognition are the “necessity question” (i.e., is activation in modality-specific brain areas necessary for language comprehension?) and the “simulation constraint” (i.e., how do we understand language for which we lack the relevant experiences?). In the present paper we argue that the criticisms encountered by the embodied approach hinge on a cognitivist interpretation of embodiment. We argue that the data relating sensorimotor activation to language comprehension can best be interpreted as supporting a non-representationalist, enactivist model of language comprehension, according to which language comprehension can be described as procedural knowledge – knowledge how, not knowledge that – that enables us to interact with others in a shared physical world. The enactivist view implies that the activation of modality-specific brain areas during language processing reflects the employment of sensorimotor skills and that language comprehension is a context-bound phenomenon. Importantly, an enactivist view provides an embodied approach of language, while avoiding the problems encountered by a cognitivist interpretation of embodiment.

Introduction

One of the most exciting discoveries in cognitive neuroscience over the last decades is certainly the finding that our brain “resonates” to certain classes of stimuli. Observing the actions of others, for instance, activates brain areas comparable to the areas that are activated when one would perform these actions oneself (Rizzolatti et al., 1996; Calvo-Merino et al., 2006; Grafton, 2009). But neural resonance is not limited to action perception and social interaction. In this paper we will focus on neural resonance in language comprehension. Following the notion of communicative motor resonance during speech perception (the motor theory of speech perception, for overview, see Galantucci et al., 2006), several studies have shown that reading verbs referring to concrete action results in the recruitment of effector-specific regions of primary motor and premotor cortex, comparable to the activation observed when moving the effector most strongly associated with these actions (Hauk and Pulvermuller, 2004; Pulvermuller et al., 2005; Aziz-Zadeh et al., 2006; Boulenger et al., 2009).

The theoretical significance of these findings is a matter of ongoing debate. Those who incline toward an embodied approach to cognition claim that resonance mechanisms support language comprehension by providing an internal representation of described actions or events (Barsalou, 1999; Glenberg, 2010). Parallel to the simulationist interpretation of neural resonance in social cognition (Gallese and Goldman, 1998; Rizzolatti et al., 2001; Goldman, 2006) such representations may consist in the re-enactment of these actions (Zwaan and Madden, 2005; Fischer and Zwaan, 2008; Zwaan, 2009). However, just like some have pointed out that resonance mechanisms are neither necessary nor sufficient for action understanding (Jacob and Jeannerod, 2005; Jacob, 2009) and that neural resonance does not amount to simulation (Gallagher, 2007; Zahavi, 2008), skeptics of an embodied approach to language comprehension argue that neural resonance is neither necessary nor sufficient for language comprehension (Bedny et al., 2008; Mahon and Caramazza, 2008; Postle et al., 2008; Toni et al., 2008; Kemmerer and Gonzalez-Castillo, 2010). In addition, it is unclear how an embodied approach to cognition can account for the understanding of concepts or actions that we have never experienced ourselves.

We state that these problems for an embodied approach to language comprehension hinge on a cognitivist, representationalist understanding of embodied cognition. Tackling them requires switching to an enactivist paradigm of cognition. Cognitivism is here defined as the theoretical approach that attempts to explain cognition in terms of the manipulation of discrete internal representations. Although on this account cognition may be used for the purpose of guiding actions, the cognitive process as such is thought of in terms that do not essentially involve the actions that it may help to guide. Cognition understood as the manipulation of internal representations is supposed to mediate between perception and action. But although it is enabled by perception and used for action, neither perception, nor action is constitutive of it. By contrast, enactivism can be defined as the view that cognition emerges in the interaction between an organism and the environment, such that perception and action are co-constitutive of it. Cognition is manifested in the kind of appropriate, dynamic perception–action coupling that allows us to cope effectively with our physical and social environment. On the enactivist view it is misleading to think of such coupling as requiring discrete representations of one’s environment: effectively dealing with one’s environment does not presuppose awareness of features of one’s environment, rather it reflects such awareness. Enactivism implies that cognition is essentially tied to action and that cognition is always context-bound.

In the present paper, drawing a parallel with the recent enactivist criticism of the simulation interpretation of the mirror neuron system, we argue that the data relating sensorimotor activation to language comprehension can best be interpreted as supporting a non-representationalist, enactivist model of language comprehension. We will start by outlining evidence from cognitive neuroscience in favor of an embodied approach to language comprehension and by elaborating on the problems such an approach faces. We will then briefly turn to a parallel with the debate on the function of the mirror neuron system and highlight the recent enactivist move made in that debate. After having contrasted the enactivist cognition paradigm with the current cognitivist paradigm in cognitive neuroscience, we will introduce a similar move in the present context and discuss how an enactivist approach to embodied language comprehension can deal with the objections to an embodied approach to language comprehension. We will conclude with a discussion on the perspectives and limitations of the enactive approach to language and discuss the prospective for future research on language and embodiment.

Embodiment and Language Comprehension

What exactly does it mean to say that “cognition is embodied”? In cognitive neuroscience, theories of embodied cognition often seem to imply that cognition is embodied, because it recruits neural resources comparable to those used in perception and action. For instance, according to Barsalou’s (1999) perceptual symbol systems, concepts have a perceptual basis and the recruitment of concepts involves the re-enactment of perceptual experiences in sensorimotor areas of the brain (see also Prinz, 2002). Similarly, in social cognition it has been argued that action understanding employs a process of motor simulation, involving brain structures comparable to those involved when one would perform the observed action oneself (Gallese and Goldman, 1998; Rizzolatti et al., 2001; Goldman, 2006). Central to these embodied theories is the idea that cognition is grounded in relevant perceptual or motor simulations.

The notion that the processing of concepts is accompanied by activation in modality-specific brain areas is supported by a number of studies (for review, see Fischer and Zwaan, 2008; Borghi and Cimatti, 2010). Behavioral studies support the idea that semantic processing recruits modality-specific resources. In a property verification study, for instance, participants were found to respond faster if the preceding word represented properties from the same instead of a different modality (e.g., gustatory, auditory, or visual; Pecher et al., 2003). In addition, the recruitment of modality-specific systems is found to extend beyond the word level, to the processing of semantics at the sentence-level. For instance, after reading sentences describing an agent viewing an object in a specific context, participants were faster in identifying pictures that were congruent with the situation described in the sentence (e.g., faster verification of visually degraded pictures when the context refers to fog; Yaxley and Zwaan, 2007). Neuroimaging studies have provided further support for the idea that the processing of language is accompanied by the activation of modality-specific brain regions. For instance, reading action verbs or sentences describing action-related language consistently results in the activation of motor-related brain areas (Hauk and Pulvermuller, 2004; Pulvermuller et al., 2005; Aziz-Zadeh et al., 2006; Boulenger et al., 2009), whereas the reading of words referring to concrete semantic concepts (e.g., animals or fruits) has been associated with increased activation in visual areas (Martin et al., 1996; van Schie et al., 2003). In sum, these studies support the assumption that the processing of linguistic concepts recruits modality-specific brain areas and on the basis of these findings it has been suggested that concepts are represented in brain areas comparable to those used for perception and action (Barsalou, 2008).

Two recurrent problems for the embodied approach to language are the “necessity question” and the “simulation constraint.” The “necessity question” is the question whether activation in modality-specific brain areas during language processing is necessary for language comprehension or whether it should be considered as an epiphenomenon (e.g., post-lexical simulation). In other words: if language is truly grounded in sensorimotor areas, we should expect language processing to break down if the activation of sensorimotor areas is disrupted. Unfortunately, the data from studies with patients showing category-specific deficits in association with damage to sensorimotor areas is still inconclusive (for review, see Mahon and Caramazza, 2009). Similarly, data from TMS studies that have attempted to disrupt processing in motor areas during the reading of action verbs has provided only mixed results, with some studies showing early effects in the motor system during reading action-related sentences (Oliveri et al., 2004; Buccino et al., 2005), while other studies only observed effects during the later stages of word processing or during explicit motor imagery (Tomasino et al., 2008; Papeo et al., 2009). In addition, although some studies have reported somatotopic-specific effects in the motor system when reading verbs referring to specific effectors (Hauk and Pulvermuller, 2004; Tettamanti et al., 2005; Aziz-Zadeh et al., 2006; Boulenger et al., 2009), other studies have pointed out that a strict overlap between the activation of motor areas during action execution and the reading of actions verbs has never been directly demonstrated (Postle et al., 2008; Kemmerer and Gonzalez-Castillo, 2010). Thus, although there is some evidence for the involvement of motor resonance in language comprehension, at present it is unclear how and when exactly activation in motor areas supports language comprehension.

The “simulation constraint” poses another, more principled problem for embodied approaches to language. If language comprehension involves the re-enactment of our own sensorimotor experiences, it remains unclear how we can understand language for which we lack the relevant simulations. For instance, how do we understand actions that are beyond our own motor repertoire, such as animal actions or how do we understand language that is unrelated to the concrete sensorimotor domain, such as abstract words like “love,” “war,” or “justice”? Although several attempts have been made to provide an embodied account of the representation of abstract concepts (Barsalou, 1999; Glenberg et al., 2008; Glenberg, 2010), most research supporting the embodied approach of language has focused selectively on the processing of language referring to concrete actions or objects (Hauk and Pulvermuller, 2004; Pulvermuller et al., 2005; Aziz-Zadeh et al., 2006; Boulenger et al., 2009).

Our aim in the remainder of this paper is to sketch a way of giving up on the simulation constraint, while retaining an embodied approach to language comprehension. In addition, we will speculate on the consequences for the necessity question.

Interlude: a Parallel with the Mirror Neuron Debate

In order to see how we can reject the simulation constraint while retaining an embodied approach to language comprehension, it is helpful to look at recent developments in an adjacent debate, the debate in social cognition on the function of the mirror neuron system. Mirror neuron activity has often been interpreted as representing simulations of perceived goal-directed actions for the purpose of grasping the intentions and emotions “behind” those actions (Gallese and Goldman, 1998; Gallese and Lakoff, 2005; Goldman, 2006; Gallese, 2007). The simulation interpretation of mirror neurons is controversial. One line of criticism is put forward by critics of embodied approaches to social cognition. It is argued that mirror neuron based simulation is at best sufficient to retrace motor intentions, while attribution of higher-level intentions (so-called “prior intentions”; Searle, 1983) requires much more elaborate cognitive activity (Jacob and Jeannerod, 2005; Saxe, 2005, 2009; Jacob, 2008). The point is that one type of movement may be recruited to carry out various higher-level intentions. It is also argued that mirror neuron activity is not necessary for the attribution of intentions. People attribute intentions, for instance, to moving geometric shapes in the famous Heider and Simmel (1944) movies and it is difficult to imagine how body-specific motor simulations could underlie this intention attribution (see also Castelli et al., 2002).

These arguments are intended to downplay the role of neural resonance in social cognition and hence to oppose embodied approaches to social cognition. However, they can also be taken seriously without abandoning an embodied view. Recently a number of philosophers have argued that mirror neurons may be part of larger neural processes underlying social perception, i.e., the direct pick-up of basic intentions and emotions in the conduct of other people (Gallagher, 2007; Gallagher and Zahavi, 2008; Zahavi, 2008; Hutto, 2009). Mirror neurons, according to these philosophers, need not be interpreted as coding for the re-enactment of the initiating stages of the other’s action. Rather, they should be interpreted as contributing to the processing of the perceived behavior of others for the direct purpose of social interaction. The idea here is to think of social perception as an enactive process involving sensorimotor skills and not as mere sensory input processing. This idea is borrowed from enactive theories of perception according to which perception involves active engagement with the world rather than mere passive reception of information from the environment (cf. Hurley, 1998; Noë, 2004). The enactivist interpretation of neural resonance in social cognition fits well with the fact that many mirror neurons are broadly congruent to an observed action, rather than strictly congruent (Fogassi and Gallese, 2002; Csibra, 2005) and with the finding that mirror neurons fire during cooperative tasks in which one’s own movements need to be complementary rather than imitative relative to the actions of the person one needs to cooperate with (Newman-Norlund et al., 2007). Thus according to an enactivist account, rather than reflecting a simulation process involving the mapping of observed actions onto one’s own motor system, mirror neuron activation should be conceived of as reflecting the employment of sensorimotor skills. More specifically, activation of mirror neurons should be considered an integral part of the process of perceiving and responding to other’s actions. In some cases this may require a covert response (e.g., perceiving other’s action goals), in other cases a more overt reaction may be required (e.g., catching a team player’s ball). What these cases have in common and what is a central notion of the enactive paradigm is that perceiving is an active process (Noë, 2004).

What is interesting about this recent enactivist move, in the context of our present discussion, is the fact that a simulationist interpretation of the function of mirror neurons is rejected (see, however, Slors, 2009) while their contribution to social cognition is still viewed from an embodied perspective. In order to see whether a similar move can be made with respect to resonance phenomena in language comprehension, we need to turn to the dominant cognitivism in current embodied approaches to language comprehension and the possible enactivist alternative.

Cognitivism Versus Enactivism

In philosophy, embodied cognition is usually conceived of as an alternative for cognitivism, where “cognitivism” stands for an approach to cognition in terms of the rule- or algorithm-based manipulation of discrete internal representations of the world (Brooks, 1991; Clark, 1991, 1997; Gallagher, 2005; Gibbs, 2005; Rowlands, 2006; Chemero, 2009). In cognitive neuroscience, however, embodied approaches to cognition are in an important sense still fully cognitivist. Their main quarrel with traditional approaches to cognition is not about whether cognition should be thought of in terms of representations, but about how we should think of these representations. Contrary to traditional cognitivism, the embodied approach argues that the vehicle used for representing concepts is sensorimotor in nature (cf. Barsalou, 1999; Zwaan and Madden, 2005; Fischer and Zwaan, 2008; Mahon and Caramazza, 2008; Glenberg, 2010). In cognitive neuroscience the notion that concepts are embodied primarily means that there is a correspondence between the brain activations associated with processing the referent of a concept and the processing of the concept itself. For instance, seeing a car and thinking or reading about a car involves the activation in comparable visual areas. Thus, the dispute between modal and amodal theories of language comprehension is basically a discussion about the representational vehicle of concepts (i.e., whether the representational vehicle of concepts is shared with neural resources used for perception and action). Both modal and amodal theories of language thus share a cognitivist notion of cognition in terms of discrete internal representations of the world.

This often applied representationalist notion of embodiment in cognitive neuroscience implies an important break from philosophical approaches to embodied cognition, which emphasize that cognition should be understood in terms of the dynamical interaction between an organism and its environment (Varela et al., 1991; Hurley, 1998; O’Regan and Noe, 2001; Noë, 2004; Gallagher, 2005; Thompson, 2007; Chemero, 2009). We refer to these diverse approaches as “enactivist.” A defining feature of the enactivist paradigm of cognition is that it challenges the representationalism of the traditional cognitivist paradigm by taking cognition to be based on “knowing how” instead of “knowing that.” That is, an organism’s knowledge of its environment is not taken to consist in the adequate representation or internal modeling of environmental features. Rather, knowledge consists in the way sensory information is linked to motor output. The structuring and restructuring of sensorimotor links in the recursive interaction of an organism with its environment, by means of which the organism adapts to it, implies or specifies knowledge of the world. Thus, in the enactivist paradigm, the fact that knowledge is essentially embodied and embedded involves its being non-representational (see, however, Hutto, 2005). Knowledge – cognition – as the American naturalist Dewey (1896) pointed out, cannot be understood by breaking it into parts; it always exists at the level of the situated organism as a whole (Ryle, 1949; Dennett, 1969). With its roots in Gibsonian ecological psychology (Gibson, 1979) an important branch of enactivism focuses on a non-representationalist account of perception based on so-called “sensorimotor contingencies” (Hurley, 1998; O’Regan and Noe, 2001; Noë, 2004). There are interesting connections here with earlier developments in Robotics. Brooks (1991), for instance, showed that robots without a central processor or an internal map of the environment can successfully move around due to independent “perception–action modules” that act directly on the incoming information. These approaches to cognition essentially highlight the direct coupling between perception and action, without invoking representations as an explanatory variable. Thus, the enactivist view rejects the notion of “shared representations” between language processing and sensorimotor processing.

Another branch of enactivism focuses on the continuity between mind and life by arguing that living is itself a cognitive process. A living being creates and maintains its own domain of meaningfulness by generating and maintaining its own self-identity as an embodied organism (Thompson, 2007). Again, the embodiment of cognition is taken to imply a non-representationalist notion of cognition. The mind is not seen as a complex system of cognitive cogs and levers, but rather as unified whole, an organism, whose cognitive feats can be described in terms of the non-linear dynamics of dynamic systems theory (Varela et al., 1991; see for applications in cognitive neuroscience: Thelen, 1994; Beer, 2000). Dynamical systems theory provides a model of cognition that consists of “a set of quantifiable variables changing continually, concurrently and interdependently over time in accordance with dynamical laws that can, in principle, be described by some set of equations” (Chemero, 2009). Initially, dynamical systems theory was applied to model relatively simple motor behaviors, such as walking (Thelen, 1994), finger wagging (Haken et al., 1985; Schoner and Kelso, 1988), or the social coupling of motor behavior (Schmidt et al., 1990; Richardson et al., 2007). In addition, dynamical approaches have been applied to model higher-level cognition as well, such as the A-not-B error (Thelen et al., 2001), categorical perception (Beer, 2000) and mathematical problem solving (Stephen et al., 2009a,b). Central to dynamical models is the assumption that seemingly complex behavior can be accurately described with relatively simple mathematical models, such as coupled oscillators or dynamic fields.

Several authors have argued for an approach to language comprehension that fits the enactive paradigm of cognition, broadly conceived (Barwise and Perry, 1981, 1983; Clark, 2006; Beckner et al., 2009) or have applied dynamical systems modeling to language perception (Pollack, 1991; Port et al., 1995; Port, 2003), and production (Elman, 1990; Port, 2003). Surprisingly, these approaches are largely ignored in recent discussions on the embodiment of language in cognitive neuroscience. Vice versa, recent findings in cognitive neuroscience showing the involvement of modality-specific brain areas during language processing have hardly been incorporated by enactivists or in dynamical models. This lack of cross talk is probably related to the incommensurable paradigms in the respective fields of research. Embodied cognition in cognitive neuroscience uses the cognitivist paradigm and has thus been concerned primarily with explaining how meaning is represented in the brain (Barsalou, 1999; Fischer and Zwaan, 2008; Mahon and Caramazza, 2008; Zwaan, 2009). By contrast, approaches to language that fit the enactivist paradigm are typically anti-representationalist and focus primarily on those aspects of language that allow for a dynamical explanation, such as speech rhythms (Port, 2003), syntax (Elman, 1990, 1995) or the functioning of language at an inter-individual level (Clark, 2006; Beckner et al., 2009).

An Enactivist Approach to Language Comprehension

In this section we shall briefly sketch the contours of an enactivist conception of language comprehension. We will then argue that this conception fits the neuroscientific data on embodiment and language better than a cognitivist embodied cognition approach in terms of modal representations and motor simulations. Finally and most importantly, we will argue that the enactivist conception of language comprehension provides an embodied approach to language comprehension that avoids the necessity question and the simulation constraint.

An enactivist approach to language comprehension implies that language, ultimately, is used for action and social interaction. This means that linguistic utterances acquire their meaning in context and not merely as a function of syntax and semantics. When you are sitting in a restaurant and your partner asks you “Can you give me the salt?” you do not reply by saying “yes,” although that would be the correct answer if syntax and semantics were all that matters. The speech act of your partner directs you to perform a certain action (Searle, 1969). Instead of asking for the salt, your partner could have pointed toward the salt as well to make the same request. Or suppose you are sitting in the restaurant again and the waiter asks you whether you would like anything for desert. You respond by saying that you are fine and that you would like to pay the bill. In this case, your response to the waiter’s request follows a linguistic convention in a script-like fashion (Schank and Abelson, 1977). In both examples, language comprehension can be accurately described as the procedural knowledge how to respond in certain situations to specific utterances. On the enactivist account this notion of language comprehension is paradigmatic; it can be extended to cover many or even most instances of language comprehension. Learning to understand language is learning how to couple specific linguistic inputs to specific actions. These actions may be immediate but they may also be in the more distant future (e.g., as in understanding the sentence “the election will be on May the 5th”). They may also be only “virtual” in the sense that understanding an utterance only involves being disposed to act in certain ways given certain circumstances. Of course in many instances responding appropriately to an utterance is responding linguistically. But linguistic practice is not free floating – it is a practice of embodied beings in a physical world. As Wittgenstein (1953) held, understanding the meaning of a word is knowing how it can be used. And this use always takes place within a social context involving the pragmatics of interacting embodied persons. In short, on an enactivist account, language comprehension can be described as procedural knowledge – knowledge how, not knowledge that – that enables us to interact with others in a shared physical world.

The enactivist view implies that language comprehension should be studied in relation to its potential for action. Thus, the brain activations associated with language processing do not mirror a representation-based inference process. In contrast, the activation of modality-specific brain areas during language processing should be conceived of as reflecting the employment of sensorimotor skills. On this account, the motor activation that has been found in association with the processing of action verbs or words referring to manipulable objects likely supports action prediction or anticipation. For instance, in the sentence “Can you give me the salt?” the motor activation in relation to the processing of the word “give” may prepare the listener for a subsequent grasping action (Zwaan and Kaschak, 2009) and the motor activation in association with the processing of words like “cup,” “scissors,” or “hammer” may reflect the retrieval of conceptual knowledge to enable the subsequent (virtual) interaction with the object (cf. van Elk et al., 2009a; Rueschemeyer et al., 2010). Similarly, perceptual resonance during language processing may reflect a pattern completion inference process used for prediction (see also Barsalou, 2009). For instance, the activation in visual areas that accompanies the processing of words referring to concrete concepts may support the categorical perception of behaviorally relevant categories (Ward, 2009), such as in “look, there’s an eagle up in the sky” (Zwaan et al., 2002) or may facilitate the retrieval of relevant contextual information that allows one to make inferences and predictions about objects and situations (Barsalou et al., 2003; Barsalou, 2009).

An advantage of an enactive approach is that it allows for the fact that language comprehension is a context-bound phenomenon that is dependent on the relation between the organism and the context in which the organism is acting. Cognitivist embodied approaches often make the implicit assumption that there is a core meaning of words that can be specified in terms of a specific representational vehicle. More specifically, cognitivist embodied approaches to language processing seem to imply that the sensorimotor representations that are activated in association with the processing of words occur relatively fast, automatic, and in a bottom-up fashion (Pulvermuller, 2005). The idea is that word reading results in the spreading of activation throughout a network of associated sensorimotor features, thereby constituting the meaning of the word. However, in one context, the motor activation associated with the processing of, e.g., the word “pass” may specify a specific action tendency, such as with the speech act “please pass me the salt,” whereas, in another context a different motor activation will be involved, such as in the utterance “pass me the ball” in a soccer game. In line with the idea that meaning is context-bound, recent studies indicate that the sensorimotor features that are co-activated in association with the processing of words are indeed dependent on the context in which the word is presented (Hoenig et al., 2008; van Dam et al., 2010). For instance, the word “tennis ball” primarily activates visual features when presented in a visual context, whereas motor features are more strongly activated when the word is presented in an action context. Similarly, in another study we found that a word’s long-term semantic associations can be selectively overruled when the word is used in a different context (van Elk et al., 2009b). For instance, whereas the concept “cup” is strongly associated to the word “mouth,” this semantic association can be overruled if one intends to use the cup in an unusual fashion (e.g., bring the cup toward the eye), thereby underlining the flexibility and context-dependence of language use. Moreover, these findings argue against a cognitivist interpretation of embodiment, according to which sensorimotor activation during language processing reflects the activation of representations, specifying the core meaning of concrete words. Thereby the enactive paradigm to language differs in important ways from previous theories that have argued that language is primarily for action (Glenberg, 1997; see also Borghi and Cimatti, 2010), but that still maintained the notion of internal simulation processes underlying language understanding. As pointed out, these approaches run into the simulation constraint and the necessity question that the enactivist paradigm tries to avoid, by avoiding the notion of internal simulations.

Another important advantage of an enactivist approach to embodied language comprehension over a cognitivist approach is that it accounts for a broad range of action-related effects during language processing that need not be restricted to simulation, re-enactment, or pre-enactment. Thus it can accommodate findings that are harder to interpret in cognitivist terms. For instance, in a recent study we found a stronger motor resonance for verbs describing animal actions compared to human actions (van Elk et al., 2010). If motor resonance is primarily related to the familiarity of the action, we should have expected a stronger motor activation for human actions, as the way in which most animals move is clearly different from the way in which humans move. In contrast, animals only have a very limited action repertoire (e.g., a duck can “swim,” “squeak,” or “fly”), whereas humans can perform many different actions. Accordingly, actions are easier to predict for animals than for humans and the stronger motor resonance for animal actions fits well with the idea that motor resonance is used for action prediction (van Elk et al., 2010). In another study it was found that making a lexical decision about verbs and imagining the actions described by these verbs are two neurally dissociable processes, involving activation in different regions of premotor cortex (Willems et al., 2010). This finding argues against a strict simulationist interpretation of motor resonance as well, but goes well with the enactivist view: making a decision about whether a string of letters represents a word and imaging the action described by a word are two different skills that involve different regions of premotor cortex. These studies underline the importance of sensorimotor activation for language processing but they cannot be accounted for merely in terms of simulation or re-enactment.

In an enactivist account of language comprehension, the simulation constraint mentioned in Section “Embodiment and Language Comprehension” is absent. That is, on an enactivist account the idea that language comprehension is embodied is not exhausted by the idea that the processing of action words involves re-enactment or pre-enactment of the described action. Thus, the fact that many instances of language comprehension are hard to conceive of in terms of simulation – either because utterances involve actions that are beyond our own motor repertoire, or because they are unrelated to the concrete sensorimotor domain – need not be an impediment to an embodied account of them. Hence, switching from a cognitivist to an enactivist paradigm of cognition effectively deals with an important objection to an embodied approach to language comprehension.

In the case of the necessity question a solution can be conceived of along similar lines. The necessity question starts from the implicit assumption that a core meaning of words can be specified and the critical issue is whether this core meaning is instantiated in sensorimotor areas. However, as indicated above, according to an enactivist view, language comprehension consists in the context-bound employment of sensorimotor skills, rather than in the search for cognitivist representations. On an enactivist account, blocking activation of motor or premotor areas associated with the specific action mentioned in an utterance thus need not impede understanding when language comprehension is subserved by sensorimotor activation other than re-enactment or pre-enactment.

It is important to stress that the emphasis on context sensitivity is not intended to simply replace standard accounts of context sensitive language understanding, such as Grice’s theory of conversational implicatures (Grice, 1989). Grice’s account of context sensitive language use identifies principles and maxims that describe the various ways in which context is taken into account when uttering and understanding sentences. Cognitivist approaches to language understanding, specifically those of a non-embodied kind, typically take these principles and maxims to be implemented in our cognitive architecture. But nothing in Grice’s theory implies this. On our view, we should be very careful in distinguishing levels of description here (cf. Dennett, 1969; Bennet and Hacker, 2003) and resist the tendency to explain personal-level cognitive phenomena in terms of isomorphic brain-level processes. On an enactive view on context sensitive language understanding, Grice’s principles and maxims that describe conversational implicatures model real life linguistic interaction. That is, such interactions are not governed by these principles and maxims. Rather they are informatively described by them, possibly in a slightly idealized way.

A related issue is the question whether a context needs to be represented, in order to be effective. The question “what would you like for dinner?” has different implications for action and is hence understood differently when sitting in a restaurant and when walking in a supermarket with your partner. When a given context is taken into account in understanding a sentence, can it not be said that the hearer somehow represents this context? On an enactive view, taking a context into account means that the context is relevant to specific perception–action couplings. In an innocent but uninformative way, this means that if one responds appropriately in a given situation, the context is represented accurately. But that does not mean, enactivists would stress, that such couplings are co-determined by a discrete representation of the context that is causally operative in bringing about one type of coupling rather than another. In fact, the situation itself is already part of the enactive process of perceiving and acting in the world, and thus there is no need to suppose the representation of the context. Context sensitivity, then, need not imply the kind of representationalism that is characteristic of cognitivism.

Scope and Limitations of the Enactive Approach

An important question is to what extent the enactive paradigm can scale up the requirements of a full-blown theory of high-level cognition. It has repeatedly been argued that the enactive model works relatively well when it comes to explaining lower level sensorimotor processes, but that so-called “representation-hungry problems” (Clark, 1997) are more difficult to explain within an enactive framework (see, however, van Rooij et al., 2002; Chemero, 2009).

In the present context, it is especially relevant to consider if and how the enactive paradigm can account for the processing of abstract words. First of all, we would like to point out that our aim was primarily to show how the enactive approach circumvents the problems associated with a cognitivist interpretation of sensorimotor activation during the processing of concrete words. When it comes to the processing of abstract concepts we have to be more careful. As discussed before, several attempts have been made to provide an embodied account of abstract word meaning (Barsalou, 1999; Lakoff and Johnson, 1999; Glenberg et al., 2008; Borghi and Cimatti, 2009; Glenberg, 2010. It has been repeatedly pointed out, for instance, that many abstract concepts bear a direct relation to the concrete domain, such as words referring to divine concepts (Meier et al., 2007), words describing power relations (Schubert, 2005) and even words referring to numbers (Dehaene et al., 1993). Typically, the relation between the abstract and the concrete domain is conceived as abstract words being linked to concrete sensorimotor representations. For instance, it has been argued that numbers are spatially represented in the brain along a “mental number line” (Moyer and Landauer, 1967; Dehaene et al., 1993). Although the enactive view acknowledges the idea that many abstract concepts are related to concrete sensorimotor experiences, it rejects the view that this relation can be identified at a brain level, in the form of specific neural representations. Similar to the enactive approach to concrete words, the processing of abstract words should always be considered in relation to its potential for action. For instance, the observed relation between number words and space may be part of a common magnitude system that is used both for perception and action (Walsh, 2003). In line with this suggestion, it has been found that number processing influences action planning, such that large numbers facilitate power grips and small numbers facilitate precision grips (Lindemann et al., 2007). Similarly, action compatibility effects in association with the processing of words with a positive or negative valence may reflect approach and avoidance tendencies (see for instance van Dantzig et al., 2008). In sum, although the enactive paradigm proposed in the present paper is primarily intended as an alternative to a cognitivist interpretation of neural resonance during language processing, one could conceive a similar approach in considering abstract concepts in relation to their potential for action (see for instance Borghi and Cimatti, 2009).

Another possible limitation of the enactive paradigm concerns the costs associated with abandoning sensorimotor simulations in language processing. It has been argued that perceptual symbols and sensorimotor simulations allow for the systematicity and productivity of thought (Barsalou et al., 2003). For instance, it has been argued that simulations allow one to make inferences beyond the information that is directly available. In addition, concepts can be combined into more complex concepts, via a selective process of merging existing simulations (e.g., Prinz, 2002). Although an in-depth discussion of these concerns is beyond the scope of the present paper, we would like to point out that a cognitivist embodied account of systematicity and productivity runs into the same problems as mentioned before. With respect to the simulation constraint, it remains unclear how it is possible to make inferences about concepts for which we lack the relevant simulations. In addition, when it comes to conceptual combination it remains unclear how combined concepts are understood, whose sensorimotor properties cannot be inferred on the basis of their constituent concepts (e.g., a “wooden spoon” is typically conceived as big, whereas neither the concept “wood” or “spoon” implies this property).

Implications for Future Research

In the final section of this paper the implications of an extended approach to language for future research will be discussed. As argued before, the enactive view can accommodate research findings that are difficult to reconcile with a simulationist interpretation of embodiment. In addition, the enactive view provides an important break from previous attempts aimed at determining the necessity of neural resonance for language understanding. Rather than focusing on the nature of linguistic representations, research should consider under what conditions and in which contexts language processing is accompanied by activation in modality-specific brain areas. We would like to suggest possible directions for future research on the functional role of neural resonance in language processing.

First, according to the enactive view language is primarily used for action and accordingly, motor activation in association with language processing should be considered in relation to its potential for action. In line with this suggestion, several studies have shown direct effects of language processing on motor performance (Boulenger et al., 2008; Nazir et al., 2008; Frak et al., 2010) or from action preparation on language processing (Lindemann et al., 2006; van Elk et al., 2008; van Elk et al., 2009b). Moreover, the enactive approach predicts that interactions between language and action are not restricted to relatively simple reaching and grasping movements, but extend to naturalistic action settings as well. One intriguing possibility would be to investigate the functional role of effects of language on action in a communicative setting for instance (e.g., such as when someone asks you to pass the salt across the table).

Second, the enactive view implies that the coupling between language and action is flexible and context-dependent. In contrast, embodied accounts of language processing have suggested that the coupling between language and action is obligatory and that the motor system is activated within the first few 100 ms after word onset (e.g., Pulvermuller et al., 2005). According to an enactive view, rather than being automatic, the activation of motor-related areas should be dependent on the context in which a word is presented. Thus, the word “pass” may be associated with the movement of different effectors, depending on the context. Similarly, whereas in some instances a word like “apple” may prime a power grip (Glover and Dixon, 2002; Glover et al., 2004), when presented in a different context it may prime a precision grip (e.g., after hearing a sentence like “when only the core was left, he threw away the apple”).

Third, as indicated in the previous section, motor activation in relation to language processing may support action prediction or anticipation. Thus, motor activation during language processing may prepare the listener for subsequent actions, as in the sentence “please pass me the salt.” Interestingly, studies on action observation suggest that violations of an expected action result in a stronger motor activation, likely reflecting the updating of a forward model (Koelewijn et al., 2008; Stapel et al., 2010). Similarly, if motor resonance in language processing is related to prediction we should expect a stronger motor activation if the actions described in a sentence do not match one’s expectations. In sum, these examples illustrate that the enactivist view on language generates testable predictions that should be addressed more broadly in future research.

Conclusion

We conclude that an embodied approach to language comprehension in cognitive neuroscience requires an enactivist rather than a cognitivist conception of embodied cognition. An enactivist paradigm allows us to make sense of more of the cognitive neuroscientific data relating language comprehension to action effects or modality specific neural processing than a cognitivist paradigm by including sensorimotor activations that cannot be subsumed under the heading of (p)re-enactment. Also, the enactivist paradigm more easily allows for the context-dependence of language comprehension. Finally and most importantly, an enactivist conception allows us to answer two of the most serious objections to an embodied account of language comprehension, the necessity question and the simulation constraint. In conclusion, the multidisciplinary evidence relating language comprehension to sensorimotor activity, argues for an enactivist conception of language. Language comprehension reflects the employment of sensorimotor skills and is a context-bound phenomenon that is dependent on the relation between the organism and the context in which the organism is acting.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This paper was written during fellowships supported by The International Human Frontier Science Program Organization (grant ST000078/2009 to Michiel van Elk) and the Marie Curie Intra European Fellowship within the Seventh European Community Framework Program (IEF grant 252713 to Michiel van Elk) and by the Netherlands Organization for Scientific Research (grant 453-05- 001 awarded to Harold Bekkering).

References

Aziz-Zadeh, L., Wilson, S. M., Rizzolatti, G., and Iacoboni, M. (2006). Congruent embodied representations for visually presented actions and linguistic phrases describing actions. Curr. Biol. 16, 1818–1823.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Barsalou, L. W. (1999). Perceptual symbol systems. Behav. Brain Sci. 22, 577–609; discussion 610–660.

Pubmed Abstract | Pubmed Full Text

Barsalou, L. W. (2008). Grounded cognition. Annu. Rev. Psychol. 59, 617–645.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Barsalou, L. W. (2009). Simulation, situated conceptualization, and prediction. Philos. Trans R Soc. Lond. B Biol. Sci. 364, 1281–1289.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Barsalou, L. W., Kyle Simmons, W., Barbey, A. K., and Wilson, C. D. (2003). Grounding conceptual knowledge in modality-specific systems. Trends Cogn. Sci. 7, 84–91.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Barwise, J., and Perry, J. (1981). Situations and attitudes. J. Philos. 77, 668–691.