Edited by: Leonid Perlovsky, Harvard University and Air Force Research Laboratory, USA
Reviewed by: Nashaat Z. Gerges, Medical College of Wisconsin, USA; Yueqiang Xue, The University of Tennessee Health Science Center, USA
*Correspondence: Benjamin Straube, Department of Psychiatry and Psychotherapy, Philipps-University Marburg, Rudolf-Bultmann-Str. 8, 35039 Marburg, Germany e-mail:
This article was submitted to the journal Frontiers in Behavioral Neuroscience.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Abstractness and modality of interpersonal communication have a considerable impact on comprehension. They are relevant for determining thoughts and constituting internal models of the environment. Whereas concrete object-related information can be represented in mind irrespective of language, abstract concepts require a representation in speech. Consequently, modality-independent processing of abstract information can be expected. Here we investigated the neural correlates of abstractness (abstract vs. concrete) and modality (speech vs. gestures), to identify an abstractness-specific supramodal neural network. During fMRI data acquisition 20 participants were presented with videos of an actor either speaking sentences with an abstract-social [AS] or concrete-object-related content [CS], or performing meaningful abstract-social emblematic [AG] or concrete-object-related tool-use gestures [CG]. Gestures were accompanied by a foreign language to increase the comparability between conditions and to frame the communication context of the gesture videos. Participants performed a content judgment task referring to the person vs. object-relatedness of the utterances. The behavioral data suggest a comparable comprehension of contents communicated by speech or gesture. Furthermore, we found common neural processing for abstract information independent of modality (AS > CS ∩ AG > CG) in a left hemispheric network including the left inferior frontal gyrus (IFG), temporal pole, and medial frontal cortex. Modality specific activations were found in bilateral occipital, parietal, and temporal as well as right inferior frontal brain regions for gesture (G > S) and in left anterior temporal regions and the left angular gyrus for the processing of speech semantics (S > G). These data support the idea that abstract concepts are represented in a supramodal manner. Consequently, gestures referring to abstract concepts are processed in a predominantly left hemispheric language related neural network.
Human communication is distinctly characterized by the ability to convey abstract concepts such as feeling, evaluations, cultural symbols, or theoretical assumptions. This can be differentiated from references to our physical environment consisting of concrete objects and their relationships to each other. In addition to our language capacity, humans also employ gestures as flexible tool to communicate both concrete and abstract information (Kita et al.,
Recently, a hierarchical model of language and thought has been suggested (Perlovsky and Ilin,
The impact of abstractness on speech processing (e.g., Rapp et al.,
In contrast to abstract information processing, it has been suggested that concrete information is processed in different brain regions sensitive to the specific information type: e.g., spatial information in the parietal lobe (Ungerleider and Haxby,
In addition to our speech capacity, gesturing is a flexible communicative tool which humans use to communicate both concrete and abstract information via the visual modality. Previous studies on object- or person-related gesture processing have either presented pantomimes of tool or object use, hands grasping for tools or objects (e.g., Decety et al.,
In sum, the left IFG represents a sensitive region for abstract information processing in speech or gesture, whereas the brain areas activated by concrete information depend on communication modality and semantic content. However, whether the same neural structures are relevant for the processing of gestures and sentences with an abstract content or gestures and sentences with a concrete content remains unknown.
Common neural networks for the processing of speech and gesture information have been suggested (Willems and Hagoort,
In a similar vein, Xu et al. (
Although tentative proposals regarding a supramodal neural network for speech and gesture semantics have been made (Xu et al.,
As hypothesized above, concrete object-related information might be represented in mind with and/or without speech, whereas abstract information could require/rely on a representation in speech. Consequently, common processing mechanisms for the processing of speech and gesture semantics can be specifically expected when abstract (in contrast to concrete) information is communicated. Therefore, the current study focused on the neural correlates of abstractness and modality in a communication context. With a factorial manipulation of content (abstract vs. concrete) and communication modality (speech vs. gestures) we wanted to shed light on supramodal neural network properties relevant for the processing of abstract in contrast to concrete information. We tested the following alternative hypotheses: first, if only abstract concepts—activated through speech or gesture in natural communication situations—are processed in a supramodal manner, then we predict consistent neural signatures only for abstract in contrast to concrete contents across different types of communication modality. However, if concrete concepts—activated through speech or gestures—are also represented in a supramodal network, we predict overlapping neural responses for concrete in contrast to abstract contents across modality.
To manipulate abstractness and communication modality we used video clips of an actor either speaking sentences with an abstract-social [AS] or concrete-object-related content [CS], or performing meaningful abstract-social (emblematic) [AG] or concrete-object-related (tool-use) gestures [CG]. Gestures were accompanied by a foreign language (Russian) to increase the comparability between conditions and naturalness of the gesture videos where spoken language frames the communication context. We used emblematic and tool-related gestures to guarantee high comprehensibility of the gestures. During the experiments participants performed a content judgment task referring to the person vs. object-relatedness of the speech and gesture communications to ensure their attention to the semantic information and the adequate comprehension of the corresponding meaning. We hypothesized modality independent activations exclusively for the processing of abstract information (AS > CS ∩ AG > CG) in language-related regions encompassing the left inferior frontal gyrus, the left middle, and superior temporal gyrus (MTG/STG) as well as regions related to social/emotional processing such as the temporal pole, the medial frontal, and anterior cingulate cortex (ACC). In addition, modality specific activations were expected in bilateral occipital, parietal, and temporal brain regions for gesture (G > S) and in left temporal, temporo-parietal, and inferior frontal regions for the processing of speech semantics (S > G).
Twenty healthy subjects (7 females) participated in the study. The mean age of the subjects was 25.4 years (
Video clips were selected from a large pool of different videos. Some of them have been used in previous fMRI studies, focusing on different aspects of speech and gesture processing (Green et al.,
We decided to present gestures accompanied by a foreign language to increase the comparability between conditions and the naturalness of the gesture videos where spoken language frames the communication context. All sentences had a similar grammatical structure (subject—predicate—object) and were translated into Russian for the gesture conditions. Words that sounded similar in each language were avoided. Examples for the German sentences are: “The blacksmith
The same male bilingual actor (German and Russian) performed all the utterances and gestures in a natural spontaneous way. Intonation, prosody and movement characteristics in the corresponding variations of one item were closely matched. At the beginning and at the end of each clip the actor stood with arms hanging comfortably. Each clip had a duration of 5 s including 500 ms before and after the experimental manipulation, where the actor neither spoke nor moved. In the present study the semantic aspects of the stimulus material refer to differences in abstractness of the communicated information (abstract vs. concrete content).
For stimulus validation, 20 participants not taking part in the fMRI study rated each video on a scale from 1 to 7 concerning understandability, imageability and naturalness (1 = very low to 7 = very high). In order to assess
1 | AG | 26 | 2.163 | 0.391 | 2.313 | 0.440 | 3.625 | 0.578 | 4.498 | 0.587 | 4.565 | 0.379 |
CG | 26 | 2.303 | 0.434 | 3.033 | 0.364 | 3.537 | 0.808 | 4.785 | 0.695 | 4.340 | 0.540 | |
AS | 26 | 2.400 | 0.308 | 6.527 | 0.179 | 3.481 | 0.321 | 4.077 | 0.258 | |||
CS | 26 | 2.332 | 0.290 | 6.650 | 0.209 | 2.967 | 0.308 | 3.181 | 0.293 | |||
Total | 104 | 2.299 | 0.366 | 2.673 | 0.540 | 5.085 | 1.595 | 3.933 | 0.894 | 4.041 | 0.649 | |
2 | AG | 26 | 2.144 | 0.296 | 2.219 | 0.336 | 3.392 | 0.766 | 4.381 | 0.698 | 4.479 | 0.501 |
CG | 26 | 2.160 | 0.391 | 2.989 | 0.415 | 3.327 | 0.660 | 4.598 | 0.621 | 4.181 | 0.444 | |
AS | 26 | 2.332 | 0.281 | 6.490 | 0.154 | 3.454 | 0.372 | 3.935 | 0.237 | |||
CS | 26 | 2.274 | 0.229 | 6.652 | 0.155 | 3.083 | 0.207 | 3.181 | 0.279 | |||
Total | 104 | 2.228 | 0.311 | 2.604 | 0.539 | 4.965 | 1.693 | 3.879 | 0.810 | 3.944 | 0.612 | |
Total | AG | 52 | 2.153 | 0.343 | 2.266 | 0.390 | 3.509 | 0.682 | 4.439 | 0.641 | 4.522 | 0.442 |
CG | 52 | 2.231 | 0.415 | 3.011 | 0.387 | 3.432 | 0.738 | 4.691 | 0.659 | 4.261 | 0.496 | |
AS | 52 | 2.366 | 0.294 | 6.509 | 0.166 | 3.467 | 0.344 | 4.006 | 0.256 | |||
CS | 52 | 2.303 | 0.260 | 6.651 | 0.182 | 3.025 | 0.266 | 3.181 | 0.283 | |||
Total | 208 | 2.263 | 0.340 | 2.639 | 0.538 | 5.025 | 1.642 | 3.906 | 0.851 | 3.992 | 0.631 |
The ratings on understandability for the videos of the four conditions used in this study clearly show a main effect of modality, with the speech varieties scoring higher than the gesture varieties [
Imageability ratings indicated that there were also differences between the conditions concerning their property to evoke mental images. A significant main effect for modality showed that videos consisting of Russian sentences with gesture were evaluated as being better imaginable than videos consisting only of German sentences [4.57 vs. 3.25, respectively;
Naturalness ratings showed a main effect for modality as well. Videos including Russian sentences with gestures were evaluated as more natural than videos including German speech [4.39 vs. 3.59, respectively;
The sentences had an average speech duration of 2263 ms (
Events for the fMRI statistical analysis were defined in accordance with the bimodal German conditions [compare for example Green et al. (
During fMRI data acquisition participants were presented with videos of an actor either speaking sentences (S) or performing meaningful gestures (G) with an abstract-social (A) or concrete-object-related (C) content. Gestures were accompanied by an unknown foreign language (Russian). Participants performed a content judgment task referring to the person vs. object-relatedness of the utterances.
All MRI data were acquired on a 3T scanner (Siemens MRT Trio series). Functional images were acquired using a T2-weighted echo planar image sequence (
An experimental session comprised 182 trials (26 for each condition) and consisted of two 14-min blocks. Each block contained 91 trials with a matched number of items from each condition (13). The stimuli were presented in an event-related design in pseudo-randomized order and counterbalanced across subjects. As described above (stimulus material) across subjects each item was presented in corresponding conditions, but a single participant only saw complementary derivatives of one item, i.e., the same sentence or gesture information was only seen once per participant. Each clip was followed by a gray background with a variable duration of 2154–5846 ms (jitter average: 4000 ms).
Before scanning, each participant received at least six practice trials outside the scanner to ensure comprehensive understanding of the experimental task. Prior to the start of the experiment, the volume of the videos was individually adjusted so that the clips were clearly audible. During scanning, participants were instructed to watch the videos and to indicate via left hand key presses whether the content of the sentence or the gesture referred to objects index finger or interpersonal social information (e.g., feelings, requests, etc.) middle finger. This task enabled us to focus participants' attention to the semantic content of speech and gesture and to investigate comprehension in a rather implicit manner. Performance rates and reaction times were recorded.
MR images were analyzed using Statistical Parametric Mapping (SPM8) standard routines and templates (
Statistical whole-brain analysis was performed in a two-level, mixed-effects procedure. In the first level, single-subject BOLD responses were modeled by a design matrix comprising the onsets of each event within the videos (see stimulus material) of all seven experimental conditions. As additional factor each video phase was modeled as mini-bock with 5 s duration. To control for condition specific differences in speech and gesture duration these stimulus characteristics were used as parameters of no interest on single trial level. The hemodynamic response was modeled by the canonical hemodynamic response function (HRF). Parameter estimate (β-) images for the HRF were calculated for each condition and each subject. Parameter estimates for the four relevant conditions were entered into a within-subject flexible factorial ANOVA.
A Monte Carlo simulation of the brain volume was employed to establish an appropriate voxel contiguity threshold (Slotnick and Schacter,
The neural processing of abstract information was isolated by computing the difference contrast of abstract-social vs. concrete-object-related sentences [AS > CS] and gestures [AG > CG], whereas the opposite contrasts were applied to reveal brain regions sensitive for the processing of concrete information communicated by speech [CS > AS] and gesture [CG > AG].
In order to find regions that are commonly activated by both processes, contrasts were entered into a conjunction analysis (abstract: [AS > CS ∩ AG > CG]; concrete: [CS > AS ∩ CG > AG]), testing for independently significant effects compared at the same threshold (conjunction null, see Nichols et al.,
The identical approach has been applied to demonstrate the effect of modality by calculating the following conjunctional analyses, for gesture [AG > AS ∩ CG > CS] and for speech semantics [AS > AG ∩ CS > CG].
Finally, interaction analyses were performed ([AS vs. AG] vs. [CS vs. CG]) to explore modality specific effects with regard to the processing of abstract vs. concrete information. Masking procedure has been used to ensure that all interactions are based on significant differences of the first contrast (e.g., [CG > CS] > [AG > AS] inclusively masked by [CG > CS]).
Subjects were instructed to indicate via button press whether the actor in the video described a socially related action or an object-related action. Correct responses and their reaction times were analyzed each with a Two-Way within-subjects ANOVA with the repeated measurement factors modality (gesture vs. speech) and abstractness (abstract vs. social).
Correct responses showed a significant main effect for modality with videos depicting gesture with Russian speech receiving slightly lower scores than videos depicting German speech only [21.8 vs. 22.95 out of 26, respectively;
For each participant the median reaction time for each condition was computed from all correct responses of that condition. A significant interaction effect of modality and abstractness [
For the effect of gesture in contrast to speech semantics independent of the abstractness [AG > AS ∩ CG > CS] we found activation in bilateral occipital, parietal, and right frontal brain regions (see Table
AS > AG ∩ CS > CG | Middle temporal gyrus L | 673 | −52 | −12 | −20 | 5.61 |
Middle temporal pole L | ||||||
Angular gyrus L | 166 | −54 | −68 | 34 | 4.81 | |
Precuneus L | 69 | −4 | −56 | 34 | 3.78 | |
AG > AS ∩ CG > CS | Middle occipital gyrus L | 6691 | −48 | −74 | 4 | 19.91 |
Inferior temporal gyrus L | ||||||
Middle temporal gyrus R | 9536 | 50 | −62 | 0 | 19.62 | |
Fusiform gyrus R | ||||||
Superior occipital gyrus R | ||||||
IFG, pars opercularis R | 1313 | 44 | 10 | 28 | 6.72 | |
Middle frontal gyrus R | ||||||
Precentral gyrus R | ||||||
Supramarginal gyrus L | 202 | −62 | −36 | 32 | 4.56 | |
Superior parietal lobe L | 299 | −38 | −54 | 60 | 4.22 | |
Inferior parietal lobe L |
The exploration of general activation for each condition in contrast to low-level baseline (gray background) indicates that other regions are commonly activated in all conditions (Figures
Analyses targeting at within-modality processing of abstractness in language semantics [AS > CS] showed activation in a mainly left-lateralized network encompassing an extended fronto-temporal cluster (IFG, precentral gyrus, middle, inferior, and superior temporal gyrus) as well as medial frontal regions and the right anterior middle temporal gyrus (Table
AS > CS | Middle temporal gyrus L | 3150 | −52 | −34 | −6 | 5.98 |
IFG, pars orbitalis L | ||||||
Medial superior frontal gyrus L | 1441 | −8 | 56 | 34 | 5.72 | |
Middle temporal pole R | 289 | 48 | 12 | −34 | 5.16 | |
Middle temporal gyrus R | ||||||
Angular gyrus L | 458 | −42 | −58 | 24 | 4.41 | |
Precentral gyrus L | 248 | −38 | 0 | 62 | 4.36 | |
Precuneus L | 195 | −8 | −50 | 34 | 4.22 | |
AG > CG | Superior temporal pole L | 910 | −36 | 18 | −24 | 5.43 |
IFG, pars triangularis L | ||||||
IFG, pars orbitalis L | ||||||
Medial superior frontal gyrus L | 3215 | −4 | 30 | 54 | 5.10 | |
Angular gyrus L | 682 | −60 | −60 | 30 | 4.64 | |
Caudate nucleus R | 786 | 12 | 2 | 8 | 4.54 | |
Thalamus L | ||||||
Middle temporal gyrus L | 209 | −48 | −16 | −18 | 4.04 | |
AS > CS ∩ AG > CG | Medial superior frontal gyrus L | 1015 | −8 | 56 | 30 | 4.97 |
Superior temporal pole L | 779 | −36 | 18 | −22 | 4.93 | |
IFG, pars triangularis L | ||||||
IFG, pars orbitalis L | ||||||
Middle temporal gyrus L | 161 | −48 | −14 | −20 | 3.99 | |
Angular gyrus L | 253 | −54 | −56 | 26 | 3.95 | |
CS > AS | Cerebellum L | 580 | −32 | −36 | −28 | 5.95 |
Inferior temporal gyrus L | ||||||
Fusiform gyrus L | ||||||
CG > AG | Middle occipital gyrus L | 1046 | −44 | −76 | 8 | 5.86 |
Middle temporal gyrus R | 285 | 50 | −62 | 2 | 4.73 | |
CS > AS ∩ CG > AG | n.s. |
Processing of abstract information independent of input modality as disclosed by the conjunction of [AS > CS ∩ AG > CG] was related to a left-sided frontal cluster including the temporal pole, the IFG (pars triangularis and orbitalis), the middle temporal and angular as well as the medial superior frontal gyrus (Table
No significant activation could be identified in the interaction analyses on the selected significance threshold. However, by applying a different cluster size to voxel level threshold proportion to correct for multiple comparisons (
We hypothesized that the processing of abstract semantic information of spoken language and symbolic emblematic gestures is based on a common neural network. Our study design tailored the comparison to the level of abstract semantics, controlling for processing of general semantic meaning of speech and gesture by using highly meaningful concrete object-related information as control condition. The results demonstrate that the pathways engaged in the processing of semantics contained in both abstract spoken language and abstract-social gestures comprise the temporal pole, the IFG (pars triangularis and orbitalis), the middle temporal, angular and the superior frontal gyri. Thus, in line with our hypothesis we found modality-independent activation in a left hemispheric fronto-temporal network for the processing of abstract information. The strongly left lateralized activation pattern supports the theory that abstract semantics is independent of communication modality represented in language (at least on neural level represented in language-related brain regions).
The results of the speech [CS > CG ∩ AS > AG] and gesture contrasts [CG > CS ∩ AG > AS] clearly demonstrate that communication modality affects neural processing in the brain independent of the communication content (abstract/concrete). In line with other studies that contrasted the processing of a native against an unknown foreign language (Perani et al.,
In line with studies on action observation (e.g., Decety et al.,
The processing of abstract spoken language semantics (AS > CS) and abstract semantic information conveyed through abstract-social in contrast to concrete-object-related gestures (AG > CG) activated an overlapping network of brain regions. These include a cluster in the left inferior frontal cortex (BA 44, 45) which expanded into the temporal pole, the left inferior, and middle temporal gyrus as well as a cluster in the left medial superior frontal gyrus. Those findings support the model of a supramodal semantic network for the processing of abstract information. By contrast, for concrete vs. abstract information we obtained no overlapping activation.
These results extend studies from both the gesture and the language domain (see above) in showing a common neural representation of specific speech and gesture semantics. Furthermore, the findings go beyond previous reports about common activation for symbolic gestures and speech semantics (Xu et al.,
The left-lateralization of our findings is congruent with the majority of fMRI studies on language (see Bookheimer,
With regard to the inferior frontal activations, functional imaging studies have underlined the importance of this region in the processing of language semantics. The junction of the precentral gyrus and the pars opercularis of the left IFG has been involved in controlled semantic retrieval (Thompson-Schill et al.,
Since semantic memory represents the basis of semantic processing, an amodal semantic memory (Patterson et al.,
Our data also partially coincide with Binder and Desai's (
As for the processing of concrete semantics, our results are somewhat surprising because we did not find an overlap between gestural and verbal-auditory input. This result falls beyond the prediction of both strict embodiment theories (Barsalou,
Our results are also in line with a recent mathematically-motivated language-cognition model proposed by Perlovsky and Ilin (
Language is not only a communication device, but also a fundamental part of cognition and learning concepts, especially with respect to abstract concepts (Perlovsky and Ilin,
In fact, we could demonstrate the activation of a supramodal network for abstract speech and abstract gestures semantics. The identified left lateralized fronto-temporal network not only maps sound patterns and their corresponding abstract meanings in the auditory domain, but also combines gestures and their abstract meanings in the gestural-visual domain. This modality-independent network most likely gets input from modality-specific areas in the superior temporal (speech) and occipito-temporal brain regions (gestures), where the main characteristics of the spoken and gestured signals are decoded. The inferior frontal regions are responsible for the process of selection and integration, relying on more general world knowledge distributed throughout the brain (Xu et al.,
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This research project is supported by a grant from the “Von Behring-Röntgen-Stiftung” (project no. 59-0002) and by the “Deutsche Forschungsgemeinschaft” (project no. DFG: Ki 588/6-1). Yifei He and Helge Gebhardt are supported by the “Von Behring-Röntgen-Stiftung” (project no. 59-0002). Arne Nagels and Miriam Steines are supported by the DFG (project no. Ki 588/6-1). Benjamin Straube is supported by the BMBF (project no. 01GV0615).