Hypothesis and Theory ARTICLE
Sentence processing: linking language to motor chains
- 1 Institute of Sciences and Technologies of Cognition, National Research Council, Rome, Italy
- 2 School of Humanities and Informatics, University of Skövde, Skövde, Sweden
- 3 Department of Psychology, University of Bologna, Bologna, Italy
A growing body of evidence in cognitive science and neuroscience points towards the existence of a deep interconnection between cognition, perception and action. According to this embodied perspective language is grounded in the sensorimotor system and language understanding is based on a mental simulation process (Jeannerod, 2007 ; Gallese, 2008 ; Barsalou, 2009 ). This means that during action words and sentence comprehension the same perception, action, and emotion mechanisms implied during interaction with objects are recruited. Among the neural underpinnings of this simulation process an important role is played by a sensorimotor matching system known as the mirror neuron system (Rizzolatti and Craighero, 2004 ). Despite a growing number of studies, the precise dynamics underlying the relation between language and action are not yet well understood. In fact, experimental studies are not always coherent as some report that language processing interferes with action execution while others find facilitation. In this work we present a detailed neural network model capable of reproducing experimentally observed influences of the processing of action-related sentences on the execution of motor sequences. The proposed model is based on three main points. The first is that the processing of action-related sentences causes the resonance of motor and mirror neurons encoding the corresponding actions. The second is that there exists a varying degree of crosstalk between neuronal populations depending on whether they encode the same motor act, the same effector or the same action-goal. The third is the fact that neuronal populations’ internal dynamics, which results from the combination of multiple processes taking place at different time scales, can facilitate or interfere with successive activations of the same or of partially overlapping pools.
In recent years an increasing number of studies have adopted an embodied approach. According to embodied cognition theories (Barsalou, 2008 ), language is grounded in the sensorimotor system and language processing enhances previous sensorimotor experiences with objects or situations language refers to. Within the embodied approach, many studies focused on the role of motor simulation in language comprehension (e.g., Decety and Grèzes, 2006 ; Gallese, 2008 ). In particular, it has been highlighted that the comprehension of action verbs and action sentences involves the same sensorimotor and emotional brain circuits that are also activated during the actual interaction with the objects, situations and events the sentences refer to (for reviews, see Barsalou, 2008 ; Fischer and Zwaan, 2008 ; Toni et al., 2008 ). In particular, studies show that the simulation formed during language comprehension is sensitive to the involved effector (e.g., mouth, hand, leg). Although there is thus increasing evidence for a relation between action and language, the precise nature of this relation is still poorly understood. At the same time, an attractive aspect of this area of research is that both behavioral and neuroscientific data is available. In a sense, these are ideal conditions for carrying out computational modeling work that furthers our understanding of observed behavior. It is therefore our intention to use such an approach to elucidate the relationship between the neural mechanisms underlying language and the motor system.
Here we first review the relevant behavioral and brain imaging studies and emphasize the differences in results. We then present a computational model of the underlying neural circuits (based on the Chain model, see Chersi et al., 2006 , 2007 ) which is able to account for the different findings. The model has been chosen as it is strongly motivated by neurophysiological findings which are relevant for the behavioral data discussed below. The fact that a single model can reproduce all results (in particular controversial and apparently conflicting data on timing in sentence comprehension) is a strong indication that they are not intrinsically in conflict but have rather captured different aspects of a single system.
Review: Studies on Words and Effectors
Neurophysiological and Brain Imaging Results
A number of neurophysiological and brain imaging studies have demonstrated that during action words and sentence comprehension different areas of the brain are activated depending on the effector (arm/hand, mouth, leg/foot) involved. The first study showing this was performed by Pulvermüller et al. (2001) , who recorded neurophysiological data (specifically, they calculated event-related current source densities from EEG) pertaining to the processing of verbs referring to actions performed with the face, the arm/hand and the leg/foot. They found topographical differences in the brain activity patterns generated by the different verbs (e.g., to “lick”, “pick”, “kick”) in a lexical decision task, starting at 250 ms after word onset. Hauk et al. (2004) confirmed the result with functional magnetic resonance imaging (fMRI). They found that during a passive reading task, words referring to face, arm, or leg actions differentially activated areas along the motor strip that were contiguous or overlapped with areas where that particular effector is represented. Tettamanti et al. (2005) showed with fMRI that passive listening to sentences expressing actions performed with the mouth, the hand or the foot led to signal increase in regions of the premotor cortex that are related to the effector involved in that sentence.
Overall, these studies thus reveal that during processing of words and sentences part of the brain is activated in a somatotopic way. Importantly, this early activation suggests that the activation of motor and premotor cortices is not simply a by-product. Rather, it appears to play an important functional role in action word comprehension even in tasks which require a rather shallow processing (such as lexical decision or even passive listening tasks). The hypothesis that the motor system is activated in a direct and straightforward way is much more plausible and economical compared to the idea that information is first translated into an abstract format which then influences the motor system (Mahon and Caramazza, 2008 ).
Transcranial Magnetic Stimulation Results
Results like those reported above strongly suggest that the motor system activation is a fundamental part of the word and sentence comprehension process. However, it is still a matter of debate whether or not the activation of the motor system plays a causal role for sentence comprehension (for a position different from the one presented here, see Mahon and Caramazza, 2008 ). In addition, as we will show in this section, the actual effect the motor system activation can have on the comprehension process is not well understood. Results obtained in studies with Transcranial magnetic stimulation (TMS) are controversial, as some report facilitation while others find interference during the processing and execution of combinations of actions, verbs and action sentences.
In a recent study, Buccino et al. (2005) found an interference effect when the effector stimulated through TMS and the stimulus were congruent. More specifically, they acoustically presented three kinds of action sentences, referring to either hand action (e.g., he/she sewed the skirt), foot action (e.g., he/she kicked the door) or abstract content (e.g., he/she loved his land) related sentences. Participants were simply required to listen to the sentences. A TMS pulse was delivered at the end of the second syllable of the verb and motor evoked potentials (MEPs) were recorded from hand and foot muscles. Results showed a decrease in amplitude of MEPs recorded from hand muscles while listening to hand-action-related sentences, and from foot muscles when listening to foot-related sentences.
In contrast to the previous results, Pulvermüller et al. (2005) used a lexical decision task in which participants had to respond with a lip movement to arm- and leg-related words (e.g., “pick” vs. “kick”), and to refrain from responding to pseudowords. Transcranial magnetic stimulation pulses were delivered 150 ms after stimulus onset. Arm area TMS led to faster lexical decision times with arm words, whereas leg area TMS led to faster RTs with leg words; no facilitation was found in control conditions. A similar facilitation effect was found by Oliveri et al. (2004) , who applied TMS to the left motor cortex when participants produced action-related and non-action verbs (e.g., “pour” vs. “detest”) and nouns (e.g., “key” vs. “hill”). The motor cortex activation increased for action words (verbs and nouns) compared to non-action words during paired-pulse TMS at 10 ms ISI, no difference was present at 1 ms ISI. Recently, Papeo et al. (2009) recorded TMS-induced MEPs from right hand muscles. They found an increase of M1-activity only at 500 ms, while no increase was present when they delivered single pulse TMS at 170 and 350 ms after action verbs appearance.
In a behavioral experiment performed by Buccino et al. (2005) , participants were required to respond with either the hand or the foot if a presented verb was concrete and had to refrain from responding if the verb was abstract. Results showed that, if subjects responded with the same effector necessary for executing the action described by the sentence, response times were slower than if participants had to respond with the other effector. Sato et al. (2008) performed three experiments using a go-no go paradigm; participants had to answer with the right hand to verbs referring to hand actions (e.g., to applaud), foot actions (e.g., to walk) or abstract content (e.g., to love). Stimuli were presented both in the acoustic and visual modality. The authors manipulated both the task and the delivery of the go signal. More specifically, they used both a task implying shallower processing (a lexical decision task) and one implying deeper processing (a semantic decision task). In the semantic decision task, response times were slower with hand-related compared to foot-related verbs when the go signal was delivered early (at the isolation point). No effect was found with a late delivery of the go signal. In the lexical decision task no effect was found independently of the delivery of the go signal. This result suggests that the interference effect occurred only with deep semantic processing of sentences, and that it was confined to early delivery of the go-signal. In a kinematics study by Boulenger et al. (2006) participants were required to reach and grasp a cylindrical object. In the first experiment they had to start reaching when a fixation cross appeared, and continue moving when words appeared but stop for pseudowords. Words could either be verbs referring to hand, leg or mouth actions, or nouns representing non-manipulable objects. Results showed a modulation of kinematics parameters: processing action verbs interfered with concurrent early reaching movements.
Scorolli and Borghi (2007) extended the results of Buccino et al. (2005) using combination of nouns and verbs referring to hand, mouth and foot actions. Participants were presented with pairs of nouns and verbs that could refer to either hand and mouth actions (e.g., to unwrap vs. to suck the sweet) or to hand and foot actions (e.g., to throw vs. kick the ball). An equal number of non sensible pairs were presented. The participants’ task consisted in deciding whether or not the combination made sense. Half of them were asked to respond by saying yes loudly into a microphone whereas the other half responded by pressing a pedal. If the combination did not make sense, they were invited to refrain from responding. The authors found a facilitation in response to “mouth” and “foot” sentences compared to “hand” sentences in case of congruency between the effectors – mouth and foot – involved in the motor response and in the sentence. It should be noted that the task, although different from the one by Buccino et al. (2005) and Sato et al. (2008) , required deep semantic processing as well. Importantly, however, the presentation modality of the stimuli differed: the stimuli were presented visually and the noun was presented when the verb was processed. Given that Sato et al. (2008) did not find any difference in the stimulus modality (visual vs. auditory), and that both tasks require deep semantic processing, we have reason to believe that the most influential difference between the two studies is related to different timing.
Borghi and Scorolli (2009) performed experiments where, instead of using a go-no go paradigm, participants used both hands to choose between two possible answers. When pairs of words were presented that referred to manual and mouth actions, participants responded faster with the dominant hand. The advantage of the dominant hand was limited to sensible sentences.
Finally, in a second experiment of the same study by Boulenger et al. (2006) reported above, participants had to start reaching when a string of letters appeared on the screen. It was found that action verbs assisted the reaching movement when processed before movement onset. Despite the interest of this study, the results obtained are only partially relevant for our model, as a rather different paradigm was used, and kinematics measures were recorded, while our model focuses on RTs (see below).
A Reason of the Discrepancy: Timing?
The discrepancies in TMS and behavioral results support the hypothesis that the precise task timings play a fundamental role in determining the type of interaction between language processing and action execution. For a similar interpretation see Boulenger et al. (2006) , and, although not related to the role played by effectors during sentence comprehension, see Borreggine and Kaschak (2006) and De Vega et al. (2004) .
All results support embodied theories as they demonstrate that there is a modulation of the motor system during sentence processing. However, the precise mechanisms underlying the conflicting data presented above are still poorly understood. In this respect, the detailed modeling of the possible processes could help to shed a new light on these phenomena. The model we will describe in the following section addresses this issue and leads to novel predictions.
Materials and Methods
The Chain Model
Recent neurophysiological experiments (Fogassi et al., 2005 ; Bonini et al., 2009 ) have shown that in the parietal and premotor cortices, the great majority of motor and mirror neurons coding a specific motor act (e.g., reaching, grasping, etc) show markedly different activation patterns according to the final goal of the action sequence in which the act is embedded. More specifically, a neuron that is highly active during the grasping phase in a “grasping to eat” sequence may fire very little during a “grasping to place” sequence. These results have led to the hypothesis that motor and mirror neurons in the parietal and premotor cortices are organized in chains that encode short habitual action sequences (Chersi et al., 2006 , 2007 ). According to this view, for example, the action of taking a piece of food is encoded as the concatenation of neurons that represent the reaching, the grasping and the retrieving motor act (see Figure 1 ). The execution and the comprehension of motor sequences correspond to the propagation of activity within specific chains. This chained organization allows a smooth and automatic execution of action sequences, and can be used to mentally simulate action sequences by “running” chains decoupled from the overt motor output.
Figure 1. Schematic representation of the chain model derived from Chersi et al. (2007) . Colored ellipses represent pools of neurons that encode specific motor acts in the parietal cortex (IPL) or intentions in the prefrontal cortex (PFC). Lines indicate the connections between different pools. Sensory areas provide information about the ongoing action and pre/motor areas interpret abstract commands to generate motor output.
Due to the dual property of mirror neurons (i.e., the fact that they are active both during execution and observation of action sequences executed by others) mirror chains can be used to understand others’ actions and intentions by mapping the observed acts on one’s own motor repertoire.
Taken together, the reviewed results strongly support the notion that the processing of language stimuli, at least for sentences expressing a motor content, modulates the activity of the motor system and that this modulation specifically concerns those sectors of the motor system where the effector involved in the processed sentence is represented. Interestingly, depending on the temporal relation between language and motor tasks, processing action words can facilitate or interfere with overt motor behavior.
The model we propose to explain these observations is based on three main points. First, the processing of action-related sentences involves the chained activation of specific pools of mirror neurons that encode the motor acts referred to in the sentences (Chersi et al., 2006 ). This is the same mechanism as the one taking place during the recognition of actions done by other individuals.
Second, as shown by recent experiments (Fogassi et al., 2005 ; Bonini et al., 2009 ), part of the neurons representing a motor act (e.g., reaching) embedded in a sequence dedicated to a specific goal (e.g., grasping an object) respond also when the same act is embedded in another sequence (e.g., pressing a button).
The third point concerns the dynamics of neuronal pools. The detailed analysis of the experiments reported above has revealed that interference occurs between 160 and 500 ms after stimulus presentation, whereas facilitation becomes evident between 550 and 800 ms after sentence appearance (Boulenger et al., 2006 ). These time scales suggest that short term neural dynamics may be the cause underlying these phenomena. In vitro recordings have shown that neuronal responses result from the combination of several dynamic processes occurring at different time scales. In general it is possible to distinguish two main components that determine the neuronal response: (1) an early but brief buildup of ionic currents (typically potassium) that causes an adaptation of the firing rates; (2) a slow but long lasting accumulation of neurotransmitters (NMDA, GABA, AMPA) and other ions (e.g., calcium) that facilitate neuronal firing. More precisely, for high enough spike frequencies a calcium-dependent potassium current (see e.g., McCormick et al., 1985 ; Sah, 1996 ) builds up lasting up to a few hundred milliseconds and reducing the firing frequency of neurons. Simultaneously, due to incoming spikes the concentration of neurotransmitters increases rapidly and fades away slowly after the input has ceased (this is especially true for NMDA). Additionally, the accumulation of calcium (Powers et al., 1999 ) produces a spiking facilitation effect that can last up to more than a second. Taken together these effects produce a time window (up to half a second after stimulation) during which neurons decrease their firing rate and thus reach their maximum activity more slowly, and a facilitation time window (from half a second to about a second) during which pools react more rapidly.
The general mechanism proposed in our study is therefore the following. During the processing of an action-related sentence, pools of mirror neurons that encode the single phases (motor acts) of the expressed action are activated due to a motor resonance mechanism. Neuronal activity propagates along the chain and sequentially activates the motor neurons connected downstream. Although pools fire only for a short interval of time (around 200–300 ms) synaptic currents decay at a much slower rate due to their slower internal dynamics. The firing rate adaptation current is active shortly after the firing of the pool causing a momentary activity slowdown. When a response action has to be produced, the prefrontal cortex (PFC) activates the corresponding neuronal chain. The precise activation profile of each pool in the chain will depend on the degree of overlap it has with any previously activated pools of other chains and on how big the time interval between the activations is. More precisely, the larger the overlap, the stronger the influence. Furthermore, pools will respond faster or more slowly depending on whether their activation falls within the adaptation or the facilitation phase of previous pools.
In order to test our hypothesis, we simulated an experiment by virtually combining those by Buccino et al. (2005) and Scorolli and Borghi (2007) previously discussed. In our experiment, a hypothetical subject has to watch a screen where one of two short sentences can appear. The first sentence is “to grasp the apple”, while the second one is “to kick the ball”. The subject has to read the sentence and, when the “Go” signal is given, reach and press a button. The delay between the sentence presentation and the “Go” signal varied between 200 and 1200 ms.
We suppose that the first of the two sentences is represented as the concatenation of a “reaching with the hand” and a “grasping” motor act; the second as a “reaching with the foot” followed by a “hitting” motor act. The action that the subject has to perform consists in a “reaching with the hand” and “pressing” motor act. Each action is encoded by a neuronal chain composed of pools that represent the different motor acts. When the subject reads the displayed sentences, neurons that encode the described motor acts start to fire due to a mirror resonance process. More precisely, if the participant reads the first sentence, initially the “reaching (with the hand)”, then the “grasping” pools are activated. If the subject reads the second sentence, first the “reaching (with the foot)” then the “kicking” pool is activated. When the subject has to respond by pressing a button, the “reaching-pressing” chain is run, i.e., the “reaching” pool is activated first and this in turn activates the “pressing” pool.
One important characteristics of our model is that neuronal pools encoding the same motor act (involving the same effector) but being part of different chains share a small fraction of neurons and axonal projections. In our case, the common part between the action described in the first sentence and the subject’s motor response is the “reaching” motor act. Consequently, the pool encoding “reaching” in the “reaching-pressing” chain is partially activated when the sentence is read. If the “reaching-pressing” chain is then executed shortly afterwards, the previously activated sub-threshold dynamics affect the firing rate of the pool in either a positive or negative way.
In our simulated experiment the elaboration of the sentence is assumed to last around 300 ms, with the peak to peak time interval between two pools being around 150 ms. We would like to emphasize that the motor content of each sentence is independent of the agent (here impersonal) and of the target objects (“the apple”, “the ball”, “the button”), all of which are not explicitly encoded in the chain but rather considered as parameters of the action. Note that this is possible because mirror neurons do not explicitly encode the agent of an action nor the objects involved.
The neural network we used in our simulations was composed of six pools of neurons, each one coding a specific motor act. The pools were arranged in three chains of two pools each (see Figure 2 ).
Figure 2. Schematic representation of the chained organization of the network. Each large circle represents a pool of neurons (small spheres) encoding a specific motor act. Lines represent the connections between neurons. Lines on the left represent external inputs that start the chains.
The behavior of each neuronal pool is described by a firing rate model with time-dependent synaptic currents (Dayan and Abbott, 2001 ). This allows us to both compactly represent complex interactions between excitatory and inhibitory neurons within the pools and explicitly take into account the dynamics of ionic currents and neurotransmitters. The set of equations is the following:
where νi is the mean firing rate of the i-th pool and τν = 70 ms the corresponding time constant, g() is the I–f pools’ response function (see below), η is an additional term that simulates spontaneous activity, Isyn,i is the total synaptic current and τi = 260 ms the corresponding time constant, Ifra,i is the firing rate adaptation current, Whi is the connection strength from unit h to unit i, and Iext,i is the external input current arriving from areas that are active while reading the sentence or executing an action. This signal has been modeled as a bell shaped activity peak lasting 200 ms.
In the present implementation a fitting procedure has been used in order to determine the synaptic weights that produce the activation of pools encoding subsequent motor acts in each chain with the correct timing and amplitude (yielding Wi,i + 1 ≈ 0.03). Furthermore, the connectivity (i.e., the overlap) between the first pool of the “reaching-grasping” chain and the first pool of the “reaching-pressing” chain (both pools encoding “reaching”) has been set to a value that produces an activation of 30% of the maximum firing rate when the other chain is activated (Whi ≈ 0.02). All other connections (including self connections) have been set to zero. Note that the “reaching with the hand” pools have no overlap with the “reaching with the foot” pool because the effectors involved are not the same.
The firing rate adaptation has been modeled as a current, that, when activated, will hyperpolarize the neurons of a pool, slowing down any spiking that may be occurring. We assume that this current is proportional (through α) to the firing rate and relaxes to zero at a rate of β. In our implementation α ≈ 0.09 nA and β ≈ 3 nA/s.
In order to reproduce more faithfully the behavior of real neurons (in particular the fact that there is a minimum value for the injected current below which no firing takes place) the pools’ response function has been modeled in the following way:
where g0 determines the maximum firing rate, γ determines the steepness of the response and Ithr is the firing threshold. In this implementation we have chosen g0 = 150 Hz, γ = 1.5 cm2/nA, and Ithr = 0.25 nA/cm2. All the parameters in this model have been chosen in order to reproduce as close as possible biological data. Figure 3 shows the currents and the firing rate of a single pool in response to an external stimulus.
Figure 3 .Time course of each variable of a pool after stimulation (gray peak). The green curve represents the response of the pool (scale on the left), the blue curve is the synaptic current, the yellow curve is the firing rate adaptation current, and the dashed line is the resultant total current (current scale on the right).
The parameters α and β determine the shape of the firing rate adaptation current curve Ifra. Increasing α, for instance, increases the influence of the firing rate on the growth of Ifra, which in turn decreases the firing rate (Figure 4 A). Decreasing β instead causes a slower deactivation of Ifra thus shifting the inhibitory phase of Itot further in time (Figure 4 B).
Figure 4. Effect of the parameters α and β on the shape of the firing rate adaptation current. (A) A decrease of β (from 3.0 to 1.9 nA/s) produces a slower deactivation of Ifra, thus shifting the inhibitory phase further in time. (B) An increase of α (from 0.09 to 0.13 nA) produces a more intense Ifra and thus a stronger inhibition in case of re-activation of the same pool within 700 ms.
The results of the simulated experiment are reported in Figure 5 . Figure 5 A shows the activity profile of the “reaching” pool (green curve) of the “reaching-grasping” chain activated by the presentation of the sentence “to grasp the apple” (gray curve). In our implementation this input (Iext) is simulated as a bell shaped activation of the duration of 200 ms. Note that both in the experiments and in the model each sentence is considered as a whole. The detailed modeling of single words comprehension is beyond the scope of this paper. The pool reaches its maximum activity 254 ms after stimulus onset. Figure 5 B shows the response of the “reaching” pool of the “reaching-pressing” chain. The first bump is due to the crosstalk between the first chain and the second chain. The Go signal (gray curve) is given 350 ms after the stimulus presentation. The activity peak is reached 276 ms after the Go signal.
Figure 5. (A,C) Time course of the activation of the “reaching” pool in the “reaching-grasping” chain after the presentation of the hand-related sentence. (B,D) Time course of the activation of the “reaching” pool in the “reaching-pressing” chain after the appearance of the “Go” signal.
Figure 5 C shows the response of the same pool to the presentation of the same sentence and below the response of “reaching” pool of the “reaching-pressing” chain when the Go signal is given 650 ms after the sentence presentation. In order to remove the reaction time component due solely to the physical execution of the action (executed only virtually in our case), we calculate the “facilitation factor” (ΔtD − ΔtC) and the “interference factor” (ΔtB − ΔtA) as the decrease or increase of the reaction time of the specific task compared to the control task.
In our simulations, we obtain a facilitation factor of −25 ms, and an interference factor of 20 ms, which is comparable to the results found by Buccino et al. (2005) .
Figure 6 shows the time course of the activation of the “reaching-pressing” chain after the presentation of the two sentences in the late Go signal condition. Figure 6 A represents the sequential activation of the “reaching” (green curve) and the “grasping” pool (red curve) after the presentation of the sentence “to grasp the apple” and a late Go signal. Figure 6 B represents the activation of the same pools after the presentation of the sentence “to kick the ball” and a late Go signal. As can be seen reading a sentence that contains a motor act present also in the response motor sequence produces an overall decrease in the reaction time of 25 ms.
Figure 6. Activation profile of the “reaching-pressing” chain after the presentation of the two sentences in the simulated task. The dashed vertical lines indicate the sentence presentation onset (left), the Go signal presentation (middle), and moment of maximal activity (right). In case of early “Go” presentation, the “reaching-pressing” chain responds slower when the hand sentence was presented (A) compared to when the foot sentence was presented (B). In case of late “Go” presentation, the effect is the opposite [(C) vs. (D)].
Figure 7 shows the modulation of the reaction times of the simulated “reaching” pool as a function of the time interval between the presentation of the hand-related sentence and the “Go” signal. The reaction time data shows that there is a first phase in which interference dominates (up to 500 ms) and a phase in which facilitation dominates. This effect eventually fades to zero. In our model, for time intervals below 200-ms input signals overlap and pools’ responses merge thus not allowing a clear interpretation of the results.
Figure 7. Modulation of the reaction times of the simulated “reaching” pool as a function of the time interval between the presentation of the “hand” sentence and the “Go” signal. Reported below are the timings of the experiments of Buccino et al. (2005) , Sato et al. (2008) , Boulenger et al. (2006) (Exp. 2) and Scorolli and Borghi (2007) .
Discussion and Conclusion
As reviewed in the first part of the paper, both interference and facilitation are widely observed in TMS and behavioral experiments on language comprehension and motor system activation. The underlying mechanisms, however, are a topic of ongoing debate. It is interesting to note that one can find similar facilitation and interference effects also in the action observation literature (e.g., Brass et al., 2001 ). In the present work, however, we focused on the controversial results related to language processing.
Recently, Sato et al. (2008) postulated that the cause may be the nature and the deepness of the involvement of the motor system determined by the different difficulty of the single tasks. Boulenger et al. (2006) hypothesized that facilitation could result from side or after-effects of linguistic processes while competition for common resources, for instance, could give rise to interference.
In this work we proposed a simple neural mechanism that is capable of explaining both the facilitatory and the inhibitory interactions between language and action. Our model is based on a chain structured organization of the parietal and premotor cortex (Fogassi et al., 2005 ; Chersi et al., 2007 ) in which action sequences are encoded as concatenations of neuronal pools representing specific motor acts. Interactions between sensory and motor modalities have been modeled in the present work as a crosstalk between neuronal pools in motor and mirror chains and we have shown that the neural dynamics governing the activation of the pools can qualitatively reproduce the timings observed in behavioral experiments well.
Taken together, these results allow us to draw the following conclusions. First, the fact that our simple model can reproduce different experimental results by exploiting only “low level” properties of neurons supports the idea that these interaction effects might be principally due to neurodynamical factors within the mirror neuron circuit rather than to high-level cognitive processes. Second, this unifying theory suggests that seemingly conflicting behavioral experiments may have observed different time windows of the same mechanism rather than different mechanisms,. This has important theoretical implications because, as previously discussed, it is currently debated in the literature whether the activation of motor and premotor cortices is essential for language understanding or just a by-product of the process. The early activation of the motor system is typically considered a strong point in support for the first thesis. Showing that interference and facilitation are actually two manifestations of the same process greatly strengthens the embodied view according to which the recruitment of the motor system is fundamental for sentence comprehension.
Finally, on the basis of our model we can formulate a variety of predictions that could guide future experimental research.
(1) It should be possible to produce precise interference and facilitation profiles by carefully designing experiments.
(2) If language processing produces a modulation of action execution timings due to the overlap of neural representations, it is reasonable to expect that action execution has the same effect on language processing because overlaps are most probably bidirectional.
(3) Since timing variations are supposed to be caused by the re-activation of neuronal pools, it should be possible to obtain a similar or even greater interaction effect if the tasks were “language following language” or “action following action”.
(4) The fact that modeling results support the idea that all the interaction effects between language and action might be due principally to neurodynamical processes taking place within the mirror neuron circuit rather than to high-level cognitive processes, leads us to think this might be a general principle valid for other sensorimotor interactions as well.
(5) If more perceptual modalities exploit the same motor representations it should be possible to observe interactions between these modalities mediated by the common motor substrate.
(6) If the neurodynamical and the embodiment hypotheses are true then we expect to find a mixture of interference and facilitation patterns also in tasks that involve, for example, object affordances (AIP-F5 circuit) and interpersonal interactions (PG-F4 circuit).
(7) Using more sophisticated experimental and/or data analysis techniques, such as for example signal correlation studies, it should be possible to discover weak or very late interactions.
Notwithstanding these interesting results, we are perfectly aware that the mechanisms coming into play during the elaboration of stimuli and decision making are much more complex than depicted here, so our proposal should be considered as a first attempt to model such a complex system. We believe that this computational modeling work may also prove useful in building a biologically inspired robotic model for use in human–humanoid interaction, which is the longer-term goal of this work. From this perspective it is important that embodiment is taken into account at an appropriate level of abstraction that allows computational models of human biological mechanisms to be transferred to a robotic context. Furthermore, from a scientific perspective, it is clear that additional targeted experimental and modeling work is necessary to better understand the mechanisms underlying the relationship between sentence comprehension and motor system activation. As a first step, however, we believe it was important to show in this paper that interference and facilitation may well be two sides of the same coin.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work was supported by the FP7 project ROSSI, “Emergence of communication in Robots through Sensorimotor and Social Interaction”, Grant agreement no. 216125, and by the project IM-CLeVer, “Intrinsically Motivated Cumulative Learning Versatile Robots”, Grant agreement no. 231722. A special thank goes to Gianluca Baldassarre for his help and support.
Bonini, L., Rozzi, S., Ugolotti, F., Maranesi, M., Ferrari, P. F., and Fogassi, L. (2009). Ventral premotor and inferior parietal cortices make distinct contribution to action organization and intention understanding. Cereb. Cortex. doi: 10.1093/cercor/bhp200.
Boulenger, V., Roy, A. C., Paulignan, Y., Deprez, V., Jeannerod, M., and Nazir, T. A. (2006). Cross-talk between language processes and overt motor behavior in the first 200 msec of processing. J. Cogn. Neurosci. 18, 1607–1615.
Buccino, G., Riggio, L., Melli, G., Binkofski, F., Gallese, V., and Rizzolatti, G. (2005). Listening to action related sentences modulates the activity of the motor system: a combined TMS and behavioral study. Cogn. Brain Res. 24, 355–363.
Chersi, F., Mukovskiy, A., Fogassi, L., Ferrari, P. F., and Erlhagen, W. (2006). A model of intention understanding based on learned chains of motor acts in the parietal lobe. Comput. Neurosci. 69, 48.
De Vega, M., Robertson, D. A., Glenberg, A. M., Kaschak, M. P., and Rinck, M. (2004). On doing two things at once: temporal constraints on actions in language comprehension. Mem. Cognit. 32, 1033–1043.
Oliveri, M., Finocchiaro, C., Shapiro, K., Gangitano, M., Caramazza, A., and Pascual-Leone, A. (2004). All talk and no action: a transcranial magnetic stimulation study of motor cortex activation during action word production. J. Cogn. Neurosci. 16, 374–381.
Papeo, L., Vallesi, A., Isaja, A., and Rumiati, R. I. (2009). Effects of TMS on different stages of motor and non-motor verb-processing in the primary motor cortex. PLoS ONE 4, e4508. doi: 10.1371/journal.pone.0004508.
Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo, P., Fazio, F., Rizzolatti, G., Cappa, S. F., and Perani, D. (2005). Listening to action-related sentences activates fronto-parietal motor circuits. J. Cogn. Neurosci. 17, 273–281.
Keywords: sentence comprehension, embodied cognition, motor system, neural network, action chains
Citation: Chersi F, Thill S, Ziemke T and Borghi AM (2010) Sentence processing: linking language to motor chains. Front. Neurorobot. 4:4. doi: 10.3389/fnbot.2010.00004
Received: 15 December 2009;
Paper pending published: 29 January 2010;
Accepted: 27 April 2010; Published online: 28 May 2010
Edited by:Angelo Cangelosi, University of Plymouth, UK
Reviewed by:Wolfram Erlhagen, University of Minho, Portugal
Michel Hoen, Institut National de la Santé et de la Recherche Médicale, France Vadim
Tikhanoff, Italian Institute of Technology, Italy
Copyright: © 2010 Chersi, Thill, Ziemke and Borghi. This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
*Correspondence: Fabian Chersi, Istituto di Scienze e Tecnologie della Cognizione, National Research Council, Via S. Martino della Battaglia 44, 00185 Roma, Italy. e-mail: firstname.lastname@example.org