The role of semantic abstractness and perceptual category in processing speech accompanied by gestures

Space and shape are distinct perceptual categories. In language, perceptual information can also be used to describe abstract semantic concepts like a “rising income” (space) or a “square personality” (shape). Despite being inherently concrete, co-speech gestures depicting space and shape can accompany concrete or abstract utterances. Here, we investigated the way that abstractness influences the neural processing of the perceptual categories of space and shape in gestures. Thus, we tested the hypothesis that the neural processing of perceptual categories is highly dependent on language context. In a two-factorial design, we investigated the neural basis for the processing of gestures containing shape (SH) and spatial information (SP) when accompanying concrete (c) or abstract (a) verbal utterances. During fMRI data acquisition participants were presented with short video clips of the four conditions (cSP, aSP, cSH, aSH) while performing an independent control task. Abstract (a) as opposed to concrete (c) utterances activated temporal lobes bilaterally and the left inferior frontal gyrus (IFG) for both shape-related (SH) and space-related (SP) utterances. An interaction of perceptual category and semantic abstractness in a more anterior part of the left IFG and inferior part of the posterior temporal lobe (pTL) indicates that abstractness strongly influenced the neural processing of space and shape information. Despite the concrete visual input of co-speech gestures in all conditions, space and shape information is processed differently depending on the semantic abstractness of its linguistic context.


INTRODUCTION
In face-to-face communication people often use gestures to complement the content of their verbal message. People produce different kinds of gestures (McNeill, 1992), such as iconic gestures illustrating shape (e.g., "The ball is round") or deictic gestures referring to spatial information in our physical environment (e.g., "The cat is sitting on the roof "; pointing gesture). Shape gestures resemble the information they convey, as when someone draws a circle in the air to indicate a round shape ("The table in the kitchen is round," circle gesture). Space and shape gestures typically refer to concrete entities in the world. However, they can also make abstract references depending on the nature of the verbal message (McNeill, 1992;McNeill et al., 1993a,b). For instance, shape-related gestures can illustrate a deep connection between twins when the speaker touches the fingertips of both hands ("The twins had a spiritual bond between them"). Similarly space-related gestures can refer to abstract relationships or locations such as lifting the hand when saying that the discussion occurred at a very "high level." In direct face-to-face communication people use gestures (Ozyurek and Kelly, 2007), regardless of whether the utterances are concrete or abstract. In line with theories suggesting gestures may represent the phylogenetic origin of human speech (Corballis, 2003(Corballis, , 2009(Corballis, , 2010Gentilucci and Corballis, 2006;Bernardis et al., 2008), gestures might represent the basis of spatial or action representations in human language [for example, see Tettamanti and Moro (2011)]. Such spatial elements transferred into speech and gestures could be an expression of how our language is rooted in embodied experiences (Gibbs, 1996;Lakoff, 1987). Following this idea perceptual elements and the sensory-motor system might both contribute to the processing and comprehension of figurative abstract language (particularly in the context of metaphors such as "grasp an idea"), as suggested by the embodiment theory (Gallese and Lakoff, 2005;Arbib, 2008;Fischer and Zwaan, 2008;D'Ausilio, 2009;Pulvermüller and Fadiga, 2010). Thus, the investigation of the neural substrates underlying the processing of perceptual categories such as shape or space in the context of concrete vs. abstract language semantics would give an answer to this hypothesis.
Recent fMRI investigations have focused on the processing of speech and gesture for different gesture types beat gestures: (Hubbard et al., 2009); iconic gestures: ; and metaphoric gestures: Straube et al., 2009Straube et al., , 2011a. In general, left hemispheric posterior temporal (Holle et al., 2008(Holle et al., , 2010Green et al., 2009) and inferior frontal brain regions Kircher et al., 2009;Straube et al., 2009Straube et al., , 2011a are commonly found for the semantic processing of speech and gesture. The left posterior temporal lobe (pTL) seems to be involved during the apprehension of co-verbal gestures, whereas the left inferior frontal gyrus (IFG) seems to be additionally recruited when processing gestures in an abstract sentence context Straube et al., 2011aStraube et al., , 2013 or when accompanying incongruent ("The fisherman has caught a huge fish," while the actor is angling his arms) concrete speech Green et al., 2009;Willems et al., 2009). However, these studies do not examine the neural effects of processing of concrete or abstract utterances with different perceptual categories, such as gestures referring to shape (e.g., "The ball is round") or space (e.g., "The shed is next to the building").
In a previous study, we compared brain activation in response to object-related (non-social) and person-related (social) coverbal gestures (Straube et al., 2010). Person-related as opposed to object-related gestures activated anterior brain regions including the medial and bilateral frontal cortex as well as the temporal lobes. These data indicate that dependent of speech and gesture content (person-related vs. object-related) different brain regions are activated during comprehension. However, in the aforementioned study the content of the verbal utterances was confounded by differences in the level of abstractness, since person-related gestures are not only social, but also more abstract symbolic than object-related gestures (e.g., "The actor did a good job in the play"). Therefore, the specific influence of person-related and object-related content independent of abstractness was not disentangled.
Beside this evidence for a posterior to anterior gradient of processing for concrete to abstract speech-gesture information, it is generally assumed that specific regions of the brain are specialized for the processing of specific kinds of contents (Patterson et al., 2007). Information about shapes of objects are processed in lateral occipital and inferior temporal brain areas (e.g., Kourtzi and Kanwisher, 2000;Grill-Spector et al., 2001;Kourtzi and Kanwisher, 2001;Kourtzi et al., 2003;Panis et al., 2008;Karnath et al., 2009, whereas the parietal lobe is involved in processing of spatial information (Rizzolatti et al., 1997(Rizzolatti et al., , 2006Koshino et al., 2000Koshino et al., , 2005Rizzolatti and Matelli, 2003;Chica et al., 2011;Gillebert et al., 2011). Although gestures can be distinguished by perceptual category [e.g., deictic gestures convey spatial information and iconic gestures predominantly convey shape information (McNeill, 1992)] there is insufficient knowledge about the neural processing of these different perceptual categories in the context of abstract and concrete sentence contexts.
Here we investigate the way in which perceptual category and semantic abstractness of co-verbal gestures interact. Our experiment aims at the question whether different perceptual categories are processed in the same or in distinct brain regions, irrespective of their linguistic abstractness. To approach this research question, we applied a naturalistic approach comparing shape-related and space-related gestures in the context of concrete and abstract sentences.
On a cognitive level (concrete physical) gesture content has to be aligned with the content of speech, regardless of whether the message is concrete or abstract. We hypothesize that the effort to incorporate both abstract speech with concrete gestures will likely result in enhanced neural responses in the left inferior frontal cortex  and in bilateral temporal brain regions ) as compared to the concrete conditions, independent of perceptual category. With regard to shape-related and space-related gestural information we expected differential activation within the inferior temporal and parietal lobe, respectively. For the interaction of perceptual (space and shape) and semantic category (concreteness and abstractness) two alternative results were hypothesized: (1) If the same neural processes are engaged when processing shape and space information regardless of the abstractness of the message, we will find no significant activation in interaction analyses. In this case, conjunction analyses (e.g., aSP > aSH ∩ cSP > cSH) will result in common activation patterns in the parietal cortex for space and inferior temporal cortex for shape. (2) If abstractness influences the processing of shape-related and space-related gesture information, interaction analyses will show differential activations between conditions. Here, we expected an interaction since language content may differentially influence the interpretation of perceptual categories and consequently the neural processing predominantly in the left IFG and pTL. Enhanced neural responses in classical "language regions" would strengthen the assumption that perceptual categories are differentially processed if embedded into an abstract vs. concrete language context.

PARTICIPANTS
Seventeen male right handed (Oldfield, 1971) healthy volunteers, all native speakers of German (mean age = 23.8 ± 2.7 years, range: 20-30 years, mean years of school education = 12.65 ± 0.86, range: 10-13 years), without impairments of vision or hearing, participated in the study. None of the participants had any serious medical, neurological or psychiatric illness, past or present. All participants gave written informed consent and were paid 20 Euro for participation. The study was approved by the local ethics committee. Because of technical problems one fMRI-data set was excluded from the analyses.

STIMULUS CONSTRUCTION
A set of 388 short video clips depicting an actor was initially created, consisting of 231 concrete and 157 abstract sentences, each accompanied by co-verbal gestures.
Iconic gestures refer to the concrete content of sentences, whereas metaphoric gestures illustrate abstract information in sentences. For example in the sentences "To get down to business" (drop of the hand) or "The politician builds a bridge to the next topic" (depicting an arch with the hand), abstract information is illustrated using metaphoric gestures. By contrast, the same gestures can be iconic (drop of the right hand or depicting an arch with the right hand) with the sentences "The man goes down the hill" or "There is a bridge over the river" when they illustrate concrete physical features of the world. Thus, concrete utterances are those containing referents that are perceptible to the senses ("The man ascends to the top of the mountain"). Abstract sentences, on the other hand, contain referents that are not directly perceptible ("The man ascends to the top of the company"), where the spatial or shape terms in the utterance are being used figuratively. For the distinction between concrete and abstract concepts see Holmes and Rundle (1985).
Here we were interested in the neural processing of the following types of sentences accompanied by gestures: (1) utterances with concrete content and space-related perceptual information (cSP; "deictic gesture"); (2) utterances with concrete content and shape-related perceptual information (cSH; "iconic gesture"); (3) utterances with an abstract content and space-related perceptual information (aSP; "abstract deictic gestures"); and (4) utterances with an abstract content and shape-related perceptual information (aSH; "metaphoric gestures").
All sentences accompanying gestures had a length of 5-10 words, with an average duration of 2.37 s (SD = 0.35) and a similar grammatical form (subject-predicate-object). The speech and gestures were performed by the same male actor in a natural, spontaneous way. This procedure was continuously supervised by two of the authors (Benjamin Straube, Tilo Kircher) and timed digitally. All video clips had the same length of 5 s with at least 0.5 s before and after the sentence onset and offset, respectively, where the actor did not speak or move.

STIMULUS SELECTION: RATING / MATERIAL SELECTION/MATCHING
For stimulus validation, 17 raters not participating in the fMRI study evaluated each video on a scale ranging from 1 to 7 (1 = very low to 7 = very high) according to three content dimensions (space, shape and action information) and familiarity. Other general parameters like "understandability" and "naturalness" were previously validated and controlled for (for detailed information see Kircher et al., 2009;Straube et al., 2011a,b).
Material was selected to address our manipulations of interest (cf. above): 1. cSP = Concrete content and SPace-related information 2. cSH = Concrete content and SHape-related information 3. aSP = Abstract content and SPace-related information 4. aSH = Abstract content and SHape-related information For each condition 30 sentences were selected to differentiate both factors. Therefore, co-verbal gestures conveying space-related perceptual information (cSP, aSP) were selected to have similar spatial rating scores independent of the level of the abstractness of the utterance (c vs. a). Abstract co-verbal gestures (aSP, aSH) were selected to be similarly abstract independent of the perceptual category of information (space or shape; see Table 1).
To confirm that our stimuli met our design criteria, we calculated analyses of variances for the factors perceptual (space-, shape related) and semantic category (concrete, abstract) as represented in the 2 × 2 experimental design.
As intended we found for the rating of spatial information a significant main effect for perceptual category [SP > SH; F (1, 116) = 72.532, p < 0.001], but no significant effects for the main effect of semantic category [a vs. c; F (1, 116) = 0.149, p = 0.603] or the interaction of perceptual and semantic category [F (1, 116) For the rating of shape information we obtained again a significant main effect for perceptual category [SH > SP; F (1, 120) = 98.466, p < 0.001], but no significant effects for the main effect of abstractness [a vs. c; F (1, 120) = 0.001, p = 0.988] or the interaction of perceptual category and abstractness [F (1, 120) For the rating of abstractness we obtained a significant main effect for abstractness [a > c; F (1, 116)  For means and confidence intervals see Table 1. Together, these analyses confirm that stimulus selection worked out and stimulus characteristics for each condition met our design criteria.
In the event-related fMRI study design focusing on the co-occurrence of speech and gesture, differences in speech or gesture duration should not have a crucial impact on our results. However, we included differences in speech and gesture duration for each event as a covariate of no interest in our single-subject design matrix.  Apart from the aforementioned factors, further differences in movement characteristics were found between the conditions. For all four conditions predominantly right (cSP = 19; cSH = 13; aSP = 16; aSH = 11) or bimanual movements were performed (cSP = 11; cSH = 17; aSP = 14; aSH = 19). To ensure that none of the patterns of neural activation were produced by differences in hand movements (right hand vs. both hands) and speech length, a separate control analysis was run accounting for the aforementioned dimensions. A set of 11 exactly paired video clips for each condition was used for the additional analysis.
To account for differences in the size of movements between conditions, we coded each video clip with regard to the extent of the hand movement. We divided the video screen into small rectangles that corresponded to the gesture space described by McNeill (1992); McNeill (2005) and counted the number of rectangles in which gesture movements occurred see Straube et al. (2011a). For each video the number of rectangles was also included as covariate of no interest in the single subject model.

EXPERIMENTAL DESIGN AND PROCEDURE
During the fMRI scanning procedure, videos were presented via MR-compatible video goggles (VisuaStim © , Resonance Technology, Inc.) and non-magnetic headphones (audio presenting systems for stereophonic stimuli: Commander; Resonance Technology, Inc.), which additionally dampened scanner noise.
Thirty items of each of the four conditions were presented in an event-related design, in a pseudo-randomized order and counterbalanced across subjects. Each video was followed by a baseline condition (gray background with a fixation cross) with a variable duration of 3750-6750 ms (average: 5000 ms) see Figure 1.
During scanning participants were instructed to watch the videos and to indicate via left hand key presses at the beginning of each video whether the spot displayed on the actor's sweater was light or dark colored. This task was chosen to focus participants' attention on the middle of the screen and enabled us to investigate implicit speech and gesture processing without possible instruction-related attention biases. Performance rates and reaction times were recorded. Prior to scanning, each participant received at least 10 practice trials outside the scanner, which were different from the stimuli used in the main experiment. During the preparation scans additional clips were presented to adjust the volume of the headphone. Each participant performed two runs with 60 video clips and a total duration of 10.5 min each.

FIGURE 1 | Examples of the different speech and gesture video-clips.
The stimulus material consisted of video clips of an actor performing either space-related (top) or shape-related (bottom) gestures to corresponding sentences with an concrete (left) or abstract content (right). One screen shot of an example video is shown for each condition (cSP, concrete space-related; cSH, concrete shape-related; aSP, abstract space-related; aSH, abstract shape-related). In order to exemplify the stimulus material German sentences are translated into English, and written in speech bubbles for illustration (unlike in the actual stimuli). Slices were positioned to achieve whole brain coverage. During each functional run 315 volumes were acquired.

DATA ANALYSIS
MR images were analyzed using Statistical Parametric Mapping (SPM2; www.fil.ion.ucl.ac.uk) implemented in MATLAB 6.5 (Mathworks Inc., Sherborn, MA). The first five volumes of every functional run were discarded from the analysis to minimize T1saturation effects. To correct for different acquisition times, the signal measured in each slice was shifted relative to the acquisition time of the middle slice using a slice interpolation in time. All images of one session were realigned to the first image of a run to correct for head movement and normalized into standard stereotaxic anatomical MNI-space by using the transformation matrix calculated from the first EPI-scan of each subject and the EPItemplate. Afterwards, the normalized data with a resliced voxel size of 3.5 × 3.5 × 3.5 mm were smoothed with a 6 mm FWHM isotropic Gaussian kernel to accommodate intersubject variation in brain anatomy. Proportional scaling with high-pass filtering was used to eliminate confounding effects of differences in global activity within and between subjects. The expected hemodynamic response at the defined "points of integration" for each event-type was modeled by two response functions, a canonical hemodynamic response function (HRF; Friston et al., 1998) and its temporal derivative. The temporal derivative was included in the model to account for the residual variance resulting from small temporal differences in the onset of the hemodynamic response, which is not explained by the canonical HRF alone. The functions were convolved with the event sequence, with fixed event duration of 1 s, for the onsets corresponding to the integration points of gesture stroke and sentence keyword to create the stimulus conditions in a general linear model Kircher et al., 2009;Straube et al., 2010Straube et al., , 2011b. The fixed event duration of 1 s was chosen to get a broader range of data around the assumed time point of integration. This methodological approach was also applied successfully in previous studies of co-verbal gesture processing Straube et al., 2010Straube et al., , 2011b.
A group analysis was performed by entering contrast images into a flexible factorial analysis as implemented in SPM5 in which subjects are treated as random variables. A Monte Carlo simulation of the brain volume of the current study was conducted to establish an appropriate voxel contiguity threshold (Slotnick et al., 2003). Assuming an individual voxel type I error of p < 0.005, a cluster extent of 8 contiguous re-sampled voxels was necessary to correct for multiple voxel comparisons at p < 0.05. Thus, voxels with a significance level of p < 0.005 uncorrected, belonging to clusters with at least eight voxels are reported (Straube et al., 2010). Activation peaks of some of the activation clusters also hold a family wise error (FWE) correction. Corresponding corrected p-values for each activation peak were included in the tables. The reported voxel coordinates of activation peaks are located in MNI space. Statistical analyses of data other than fMRI were performed using SPSS version 14.0 for Windows (SPSS Inc., Chicago, IL, USA). Greenhouse-Geisser correction was applied whenever necessary.

CONTRASTS OF INTEREST
To test our hypothesis on the neural processing of different perceptual categories in concrete vs. abstract sentence contexts (cf. Introduction section), baseline contrasts (main effects of condition), conjunction analysis and interaction analysis were run.
At first, baseline contrasts were calculated in order to detect general activations with regard to the four main conditions (aSP, cSP, aSH, cSH) as compared to baseline (fixation cross).
In a next step, main effects (SH vs. SP and a vs. c) as well as the interaction were calculated (t-contrasts) to show brain regions involved in the processing of different factors (directed general effects).
To test the hypothesis that perceptual category is processed in the same neural structures regardless of the language context we performed conjunction analyses of difference contrasts (aSP > aSH ∩ cSP > cSH and aSH > aSP ∩ cSH > cSP). To test for general effects of abstractness independent of both space-related as well as shape-related contents the same approach was used (aSP > cSP ∩ aSH > cSH and cSH > aSH ∩ cSP > aSP).
Finally, we performed two interaction analyses to test the hypothesis that abstractness significantly changes the processing of perceptual categories, space and shape: (1) = (aSP > cSP) > (aSH > cSH) masked for (aSP > cSP) and aSP; (2) = (aSH > cSH) > (aSP > cSP) masked for (aSH > cSH) and aSH. The masking procedure was applied to avoid the interpretation of deactivation in the concrete conditions and restrict the effects to increased activity for aSP vs. low-level baseline and its concrete derivative (cSP). Based on our hypothesis, this methodological approach enables us to find specific neural responses for semantic category (concrete/abstract) in space-related (1) and shape-related (2) perceptual contexts.

Baseline contrasts (aSP, cSP, aSH, cSH)
To explore the general processing mechanisms for each condition and the high comparability between conditions baseline contrasts were calculated (Figure 2, Table 3). We found comparable activation patterns as in previous studies on speech and gesture stimuli (Straube et al., 2011a).

Main effects for perceptual category
To identify the general effect of speech-gesture information, the main effect for the factors perception category [space-related (SP) vs. shape-related (SH)] were calculated.
The processing of shape-related vs. space-related information (SH > SP) resulted in enhanced neural responses in bilateral occipital-parietal (BAs 18/37) and middle (BA 11) as well as inferior frontal (BA 45) gyri and left parietal (BA 40) brain region ( Table 4).

Main effects for abstractness
Abstract vs. concrete speech-gesture information (a > c) revealed a widespread pattern of activation. A large cluster of activation FIGURE 2 | Activation pattern in contrast to baseline (whole-brain, p < 0.005, cluster extend threshold = 8 voxels; MC corrected p < 0.05).
was found in the left IFG extending to the temporal lobe, including the temporal pole and the middle temporal gyrus. Activations were also found in the right superior temporal gyrus, in the left precuneus and right cuneus as well as in the left precentral and superior medial gyri (BAs 6/9). Enhanced neural responses were also found in the middle cingulate, the left superior frontal and superior medial cortex as well as in the left angular gyrus (BA 39/40) (see Table 5, Figure 3). For the reverse contrast (c > a) we found activations in the left and right parahippocampal and fusiform gyri (BA 36/37), in the left inferior frontal (BA 46) and in the temporo-occipital region (BA 37) as well as in the left superior occipital gyrus (BA 19) (see Table 5). Smaller clusters of activation were found in the right cerebellum, the middle frontal (BA 11) and in the precentral gyrus (BA 4).

Interaction of perceptual categories and abstractness
For the interaction of perceptual category and abstractness (aSP > cSP)>(aSH > cSH) we found superior medial frontal, left inferior frontal (BA45/44) and middle temporal and superior parietal brain regions (see Table 6).   For the contrast in the opposite direction (aSH > cSH) > (aSP > cSP) we found a more distributed predominantly right hemispheric activation pattern including the occipital lobe, the middle frontal gyrus, the inferior parietal lobe, the precuneus, the IFG (BA44/45), the middle occipital gyrus and the bilateral fusiform gyri (see Table 6).

Brain areas sensitive for shape-related and space-related perceptual contents independent of abstractness.
A conjunction analysis for shape-related form descriptive perceptual contents irrespective of the level of abstractness (aSH > aSP ∩ cSH > cSP) revealed enhanced neural responses in the left middle occipital gyrus (BA 37; see supplementary material Table 7).
No region was found to be significantly activated for space vs. shape-related processing on concrete and abstract level (aSP > aSH ∩ cSP > cSH) (see supplementary material Table 8).

Brain areas sensitive for abstractness independent of perceptual category (shape/space).
Common activations for abstract as opposed to concrete co-verbal gestures, irrespective of descriptive or spatial information (aSH > cSH ∩ aSP > cSP), resulted in a large cluster of activation encompassing the left temporal pole and the middle temporal gyrus. Another cluster of activation was found in the right superior temporal gyrus and in the left IFG, including the pars Orbitalis as well as the pars Triangularis (BA 44; see supplementary material Table 9).
The imaging results for concreteness independent of the shape-related or space-related perceptual content (cSH > aSH ∩ cSP > aSP) revealed enhanced BOLD responses in the left parahippocampal gyrus (BA 35; see supplementary material Table 10).
Taken together, significant main effects and interactions of brain activation with regard to the manipulated factors [type of communicated perceptual information (SP, SH) and abstractness (c, a)] revealed different patterns of activation. The specific contrasts indicated that subregions of the left IFG and the left  The same analysis, including only right-handed gesture stimuli of equal length (speech duration) revealed the same pattern of activation encompassing the left IFG as well as the left middle temporal gyrus, indicating that this effect is not based on irrelevant differences in stimulus material.

DISCUSSION
Space and shape are distinct perceptual categories. Words referring to space and shape also describe abstract concepts like "rising income" (space) or a "square personality" (shape). Gestures are an important part of human communication that underpin verbal utterances and can convey shape or space information even when accompanying abstract sentences. Recent studies have investigated the neural processing of speech and gesture Willems et al., , 2009Dick et al., 2009Dick et al., , 2012Green et al., 2009;Hubbard et al., 2009;Kelly et al., 2010;Kircher et al., 2009;Skipper et al., 2009;Straube et al., 2009;Holle et al., 2010). Despite the fact that the investigation of perceptual categories used in speech and gesture could give important answers with regard to the effect of abstractness on particular neural networks relevant for the processing of such perceptual information, the related effect is not known. Thus, the purpose of the current fMRI study was to investigate the neural processing of shape-related vs. space-related co-speech gesture information when presented with abstract or concrete utterances aiming at the question whether similar or distinct neural networks are involved.
In line with previous findings (Straube et al., 2011a) we found enhanced cortical activations for abstract (a) as opposed to concrete (c) utterances in the bilateral temporal lobes and in the left IFG for both, space as well as shape-related sentences (aSP > cSP and aSH > cSH). The interaction of perceptual category and abstractness in a more anterior part of the left IFG and inferior part of the pTL indicates that abstractness strongly influenced the neural processing of space and shape information. Only the effect of shape-vs. space-related information revealed activation in a single cluster of the left inferior occipital gyrus independent of abstractness (cSH > cSP ∩ aSH c> aSP). By contrast, the interaction resulted in enhanced BOLD responses in a more anterior part of the left IFG and inferior part of the pTL. Thus, we demonstrate the interaction of perceptual category and abstractness on the neural processing of speech accompanied by gestures. These data suggest a functional division of the pTL and left IFG being sensitive to the processing of both the level of abstractness and the type of categorical information. These imaging results further offer neural support for the traditional categorization of co-verbal gestures with regard to their content and abstractness (McNeill, 1992(McNeill, , 2005. The imaging results for the abstract co-verbal gesture condition revealed BOLD enhancements in the left inferior frontal and the bilateral temporal regions, respectively. This finding is consistent with previous evidence of involvement of the left IFG and bilateral temporal lobes in the integration of gestures with abstract sentences Straube et al., 2009Straube et al., , 2011a. With regard to the underlying neuro-cognitive processes, we assume that the concrete visual gesture information (e.g., illustrating an arch of a bridge) is being interpreted in context of the abstract sentence meaning ("the politician builds a bridge to the next topic"). Thus, correspondence of gesture and sentence meaning must be identified and figurative components of speech and gesture must be translated from their literal/concrete meanings. To build this relation between speech and gesture information on the level of abstractness, additional online unification processes within the IFG seem to be relevant (Straube et al., 2011a). Such processes might be similar to those responsible for making inferences (e.g., Bunge et al., 2009, relational reasoning (e.g., Wendelken et al., 2008, the building of analogies (e.g., Luo et al., 2003;Bunge et al., 2005;Green et al., 2006;Watson and Chatterjee, 2012), and unification Straube et al., 2011a). Those processes may also be involved in the comprehension of novel metaphoric or ambiguous communications and consistently activate the left IFG (Rapp et al., 2004(Rapp et al., , 2007Stringaris et al., 2007;Chen et al., 2008;Cardillo et al., 2012). Consequently, enhanced neural responses in the frontotemporal network may be evoked by the higher cognitive demand in an abstract metaphoric context which may have resulted in the recruitment of the left inferior frontal and middle temporal region Straube et al., 2011a).
Concrete speech accompanied by gestures revealed a pattern of enhanced BOLD responses in parahippocampal regions bilaterally as well as in the left superior occipital gyrus. Concrete co-verbal utterances such as, "the workman builds a bridge over the river," evokes a comparatively transparent connection/relation to a familiar everyday event. Accordingly, an experienced-based understanding of a scene may have resulted in the recruitment of the parahippocampal regions, whereas the direct imagery of concrete objects or actions may have resulted in enhanced neural responses in the left superior occipital region  facilitating the understanding of the concrete co-verbal content.

Frontiers in Behavioral Neuroscience
www.frontiersin.org December 2013 | Volume 7 | Article 181 | 9 The shape-related sentences accompanied by shape-related gestures revealed activations in the left middle occipital region. Similar to the activations found for the concrete condition (c > a), imagery of an experience-based perceptual representation resulted in the activations of the left occipital area. However, we did not observe common activation for the processing of spatial information in a concrete and abstract sentence context. Together these data do not support a universal neural processing of space and shape in a multimodal communication context. By contrast, we found an interaction for perceptual category and abstractness, as spatial information on an abstract level (aSP) specifically (in contrast to all other conditions) activated a particular part of the left IFG and the left superior temporal region. This finding was robust and independent of both hand movement and speech duration. Thus, BOLD enhancements in these regions suggest that predominantly spatial information is processed differently in an abstract vs. concrete sentence context. Additional semantic information is retrieved from the left superior temporal region. The higher cognitive load together with the resulting enhanced effort with regard to information-specific abstract and spatial lexical retrieval may account for the recruitment of the fronto-temporal network. However, specific activation of the left IFG could also represent competition between meanings of spatial terms in the aSP condition, including at a minimum the concrete/literal and the abstract/metaphoric interpretations (Chatterjee, 2008;Chen et al., 2008).
For the processing of shape-related information we found common activation within the inferior temporal gyrus and the occipital lobe for concrete and abstract utterances, suggesting a common perceptual representation activated during comprehension of shape information. This perceptual representation probably compensated for the need of additional resources of the IFG and pTL, which were activated for space-related information in an abstract sentence context. Thus, this finding suggests that a concrete representation of shape is also activated in an abstract sentence context. This might have further facilitated the processing of the abstract representation of shape. For the processing of space-related information we found no common activation for concrete and abstract utterances, indicating different neural processing mechanism for both types of communications. The transformation of space-related gesture information in an abstract sentence context probably required higher order semantic processing mechanisms (Straube et al., 2011a) which probably inhibited the actual perceptual spatial representation of these gestures.
A limitation of this study is that the specific effects of gesture as well as integration processes cannot be disentangled. Distinguishing between speech and gesture was not the purpose of the current study. The problem with regard to the interpretation of our results for the main effect of abstractness, irrespective of perceptual category, might be that the activation patterns found for abstract speech accompanied by gestures in the left IFG and bilateral temporal lobes is produced by differences in the abstractness between the sentences, as demonstrated by several studies about metaphoric speech processing (Rapp et al., 2004(Rapp et al., , 2007Eviatar and Just, 2006;Mashal et al., 2007Mashal et al., , 2009Nagels et al., 2013;Stringaris et al., 2007;Chen et al., 2008). However, in a previous study we observed increased activation in the left IFG for metaphoric co-verbal gestures in contrast to control sentences with the identical abstract semantic content . Furthermore, there is evidence that activation of the left IFG is specifically related to the processing of novel and therefore unconventional metaphoric sentences (Rapp et al., 2004(Rapp et al., , 2007Cardillo et al., 2012), in which abstract information must be interpreted online in terms of its non-literal meaning. However, the abstract sentences used in the current study were conventional and part of everyday communication, e.g., "The talk was on a high level." This is supported by our rating results, which revealed no differences between the conditions with regard to familiarity. Despite the fact, that we cannot exclude that differences between conditions might be explained by differences in difficulty due to our language manipulation (concrete vs. abstract), the lack of commonalities (e.g., Spa SHa ∩ SPc > SHc) cannot be explained by these potential differences. The robustness of the imaging results in the aforementioned regions is further supported by the separate control analyses encompassing a carefully matched subset of paired (hand movements and speech length) stimuli.
A further limitation is that the distinction between space-and shape-related information in the current experiment is artificial and do not represent independent factors. Shape gestures include some spatial information. However, despite this intrinsic connection between space and shape, our data demonstrate that these perceptual categories can be distinguished by independent raters and produce distinct interacting activation patterns with regard to abstractness. Therefore, our data support the validity of this separation, which has been traditionally applied in terms of deictic or abstract deictic gestures (which refer to space) in contrast to iconic and metaphoric gestures (which rather refer to form or shape; e.g., McNeill, 1992).
With this study we demonstrate the interaction of perceptual category and abstractness in the neural processing of speechgesture utterances. Besides abstractness, the type of information was relevant to the neural processing of speech accompanied by gestures. This finding illustrates the relevance of the interaction between language and cognition, which characterizes the complexity of natural interpersonal communication. Future studies should therefore consider the importance of perceptual type and abstractness for the interpretation of their imaging results. Our data suggest a functional subdivision of the pTL and left IFG with regard to the processing of space and shape-related information in an abstract sentence context. Such differences support the theoretically based traditional categorization of co-verbal gestures with regard to information type and abstractness (McNeill, 1992). Most likely the investigation of other types of co-verbal gestures will demonstrate further important differences in the processing of specific co-verbal gesture types, which will enlighten the finegrained differences of processing mechanisms, which underlie the comprehension of multimodal natural communication.
is supported by the BMBF (project no. 01GV0615). We thank Katharina Augustin, Bettina Freese and Simone Schröder for the preparation and evaluation of the stimulus material.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fnbeh.2013. 00181/abstract Table 7 | Brain areas sensitive for shape-related contents (independent of abstractness). Significance level (t-value), size of the respective activation cluster (No. voxels; number of voxels > 8) at p < 0.005 MC corrected for multiple comparisons. Coordinates are listed in MNI space. BA is the Brodmann area nearest to the coordinate and should be considered approximate. (cSP, concrete spatial; cSH, concrete shape; aSP, abstract spatial; aSH, abstract shape). (cSP, concrete spatial; cSH, concrete shape; aSP, abstract spatial; aSH, abstract shape).

Table 9 | Brain areas sensitive for abstractness (independent of content).
Significance level (t-value), size of the respective activation cluster (No. voxels; number of voxels > 8) at p < 0.005 MC corrected for multiple comparisons. Coordinates are listed in MNI space. BA is the Brodmann area nearest to the coordinate and should be considered approximate.

Table 10 | Brain areas sensitive for concreteness (independent of content).
Significance level (t-value), size of the respective activation cluster (No. voxels; number of voxels > 8) at p < 0.005 MC corrected for multiple comparisons. Coordinates are listed in MNI space. BA is the Brodmann area nearest to the coordinate and should be considered approximate.