The hemispheric lateralization of speech processing depends on what “speech” is: a hierarchical perspective
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, MO, USA
A recurring question in neuroimaging studies of spoken language is whether speech is processed largely bilaterally, or whether the left hemisphere plays a more dominant role (cf., Hickok and Poeppel, 2007; Rauschecker and Scott, 2009). Although questions regarding underlying mechanisms are certainly of interest, the discussion unfortunately gets sidetracked due to the imprecise use of the word “speech”: by being more explicit about the type of cognitive and linguistic processing to which we are referring it may be possible to reconcile many of the disagreements present in the literature.
Levels of Processing During Connected Speech Comprehension
A relatively uncontroversial starting point is to acknowledge that understanding a spoken sentence requires a listener to analyze a complex acoustic signal along a number of levels, listed schematically in Figure 1. Phonemes must be distinguished, words identified, and grammatical structure taken into account so that meaning can be extracted. These processes operate in an interactive parallel fashion, and as such are difficult to fully disentangle. Such interdependence also means that as researchers we often use “speech” as a term of convenience to mean:
- Amplitude-modulated noise or spectral transitions, as might be similar to aspects of spoken language;
- Phonemes (“b”), syllables (“ba”), or pseudowords (“bab”);
- Words (“bag”);
- Phrases (“the bag”);
- Sentences (“The bag of carrots fell to the floor”) or narratives.
Figure 1. The cortical regions involved in processing spoken language depend in a graded fashion on the level of acoustic and linguistic processing required. Processing related to amplitude modulated noise is bilateral (e.g., Giraud et al., 2000), shown at top. However, as the requirements for linguistic analysis and integration increase, neural processing shows a concomitant increase in its reliance on left hemisphere regions for words [see meta-analysis in Davis and Gaskell (2009)] and sentences [see meta-analysis in Adank (2012)].
Naturally, because different types of spoken language require different cognitive mechanisms—spanning sublexical, lexical, and supralexical units—using an unqualified term such as “speech” can lead to confusion about the processes being discussed. Although this point might seem obvious, a quick review of the speech literature demonstrates that many authors1 have at one time or another assumed their definition of “speech” was obvious enough that they need not give it, leaving readers to form their own opinions.
Below I will briefly review literature in relation to the neural bases for two types of spoken language processing: unconnected speech (isolated phonemes and single words) and connected speech (sentences or narratives). The goal is to illustrate that, within the context of a hierarchical neuroanatomical framework, there are aspects of “speech” processing that are both bilateral and lateralized.
Unconnected Speech is Processed Largely Bilaterally in Temporal Cortex
The first cortical way station for acoustic input to the brain is primary auditory cortex: not surprisingly, acoustic stimuli activate this region robustly in both hemispheres, whether they consist of pure tones (Belin et al., 1999; Binder et al., 2000) or amplitude-modulated noise (Giraud et al., 2000; Hart et al., 2003; Overath et al., 2012). Although there is speculation regarding hemispheric differences in specialization for these low level signals (Poeppel, 2003; Giraud et al., 2007; Obleser et al., 2008; McGettigan and Scott, 2012), for the current discussion, it is sufficient to note that both left and right auditory cortices respond robustly to most auditory stimuli, and that proposed differences in hemispheric preference relate to a modulation of this overall effect2.
Beyond low-level acoustic stimulation, phonemic processing requires both an appropriate amount of spectral detail and the relationship to a pre-existing acoustic category (i.e., the phoneme). The processing of isolated syllables results in activity along the superior temporal sulcus and middle temporal gyrus, typically on the left but not the right (Liebenthal et al., 2005; Heinrich et al., 2008; Agnew et al., 2011; DeWitt and Rauschecker, 2012). Although this may suggest a left hemisphere specialization for phonemes, listening to words (which, of course, include phonemes) reliably shows strong activity in bilateral middle and superior temporal gyrus (Price et al., 1992; Binder et al., 2000, 2008). In addition, stroke patients with damage to left temporal cortex are generally able to perform reasonably well on word-to-picture matching tasks (Gainotti et al., 1982); the same is true of healthy controls undergoing a Wada procedure (Hickok et al., 2008). Together these findings suggest that the right hemisphere is able to support at least some degree of phonemic and lexical processing.
That being said, there are also regions that show increased activity for words in the left hemisphere but not the right, particularly when pseudowords are used as a baseline (Davis and Gaskell, 2009). Both pseudowords and real words rely on stored representations of speech sounds (they share phonemes), but real words also involve consolidated lexical and/or conceptual information (Gagnepain et al., 2012). Left-hemisphere activations likely reflect the contribution of lexical and semantic memory processes that are accessed in an obligatory manner during spoken word recognition. Within the framework outlined in Figure 1, spoken words thus lie between very low level auditory processing (which is essentially bilateral) and the processing of sentences and narratives (which, as I will discuss below, is more strongly left lateralized).
Processing of phonemes and single words therefore appears to be mediated in large part by both left and right temporal cortex, although some indications of lateralization may be apparent.
Connected Speech Relies on a Left-Lateralized Frontotemporal Network
In addition to recognizing single words, comprehending connected speech—such as meaningful sentences—depends on integrative processes that help determine the syntactic and semantic relationship between words. These processes rely not only on phonemic and lexical information, but also on prosodic and rhythmic cues conveyed over the course of several seconds. In other words, a sentence is not simply a string of phoneme-containing items, but conveys a larger meaning through its organization (Vandenberghe et al., 2002; Humphries et al., 2006; Lerner et al., 2011; Peelle and Davis, 2012). In addition to providing content in and of itself, the syntactic, semantic, and rhythmic structure present in connected speech also supports listeners' predictions of upcoming acoustic information.
An early and influential PET study of connected speech by Scott et al. showed increased activity in the lateral aspect of left anterior temporal cortex for spoken sentences relative to unintelligible spectrally-rotated versions of these sentences (Scott et al., 2000). Subsequent studies, due in part to the use of a greater number of participants, have typically found intelligibility effects bilaterally, often along much of the length of superior temporal cortex (Crinion et al., 2003; Friederici et al., 2010; Wild et al., 2012a). In addition, a large and growing number of neuroimaging experiments show left inferior frontal involvement for intelligible sentences, either compared to an unintelligible control condition (Rodd et al., 2005, 2010; Awad et al., 2007; Obleser et al., 2007; Okada et al., 2010; Peelle et al., 2010a; McGettigan et al., 2012; Wild et al., 2012b) or parametrically correlating with intelligibility level (Davis and Johnsrude, 2003; Obleser and Kotz, 2010; Davis et al., 2011). Regions of left inferior frontal cortex are also involved in processing syntactically complex speech (Peelle et al., 2010b; Tyler et al., 2010; Obleser et al., 2011) and in resolving semantic ambiguity (Rodd et al., 2005, 2010, 2012; Snijders et al., 2010). In most of these studies activity in right inferior frontal cortex is not significant, or is noticeably smaller in extent than activity in the left hemisphere. These functional imaging studies are consistent with patient work demonstrating that participants with damage to left inferior frontal cortex have difficulty with sentence processing (e.g., Grossman et al., 2005; Peelle et al., 2007; Papoutsi et al., 2011; Tyler et al., 2011).
Processing connected speech thus relies more heavily on left hemisphere language regions, most obviously in inferior frontal cortex. The evidence outlined above suggests this is largely due to the increased linguistic demands associated with sentence processing compared to single words.
The Importance of Statistical Comparisons for Inferences Regarding Laterality
In many of the above papers (and in my interpretation of them) laterality was not statistically assessed, but inferred based on the presence or absence of an activation cluster in a particular brain region. That is, seeing a cluster of activation in left inferior frontal gyrus but not the right, and concluding that this particular task has a “left lateralized” pattern of neural activity. However, simply observing a response in one region, but not another, does not mean that these regions significantly differ in their activity (the “imager's fallacy”; Henson, 2005). This is a well-known statistical principle, but one that can remain difficult to follow in the face of compelling graphical depictions of data (Nieuwenhuis et al., 2011).
Nevertheless, for true claims of differential hemispheric contributions to speech processing, the left and right hemisphere responses need to be directly compared. Unfortunately, for functional imaging studies hemispheric comparisons are not as straightforward as they seem, in part because our left and right hemispheres are not mirror images of each other. There are, however, a number of reasonable ways to approach this challenge, including:
1. Extracting data from regions of interest (ROIs), including independently defined functional regions (Kriegeskorte et al., 2009) or probabilistic cytoarchitecture (Eickhoff et al., 2005), and averaging over voxels to compare left and right hemisphere responses. Sometimes these ROIs end up being large, which does not always support the specific hypotheses being tested, and not all regions may be available. However, this approach is relatively straightforward to implement and interpret.
2. Using a custom symmetric brain template for spatial normalization (Bozic et al., 2010). This may result in less veridical spatial registration, but enables voxel-by-voxel statistical tests of laterality by flipping images around the Y axis, avoiding the problem of ROI selection (and averaging).
3. Comparing left vs. right hemisphere responses using a multivariate classification approach (McGettigan et al., 2012). Multivariate approaches are robust to large ROIs, as their performance is typically driven by a smaller (more informative) subset of all voxels studied. Multivariate approaches may be somewhat more challenging to implement, however, and (depending on the size of the ROI used) may limit spatial specificity.
In the absence of these or similar statistical comparisons, any statements about lateralization of processing need to be made (and taken) lightly.
I have not intended to make any novel claims about the neural organization of speech processing, merely to clarify what has already been shown: phonological and lexical information is processed largely bilaterally in temporal cortex, whereas connected speech relies on a left-hemisphere pathway that includes left inferior frontal gyrus. Importantly, the distinction between unconnected and connected speech is not dichotomous, but follows a gradient of laterality depending on the cognitive processes required: lateralization emerges largely as a result of increased linguistic processing.
So, is speech processed primarily bilaterally, or along a left-dominant pathway? It depends on what sort of “speech” we are talking about, and being more specific in our characterizations will do much to advance the discussion. Of more interest will be future studies that continue to identify the constellation of cognitive processes supported by these neuroanatomical networks.
I am grateful to Matt Davis for helpful comments on this manuscript. This work was supported by NIH grants AG038490 and AG041958.
- ^Including me.
- ^In fact, the term “lateralization” is also used to variously mean (a) one hemisphere performing a task and the other not being involved, or (b) both hemispheres being engaged in a task, but one hemisphere is doing more of the work or being slightly more efficient, potentially compounding the confusion.
Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S., Springer, J. A., Kaufman, J. N., et al. (2000). Human temporal lobe activation by speech and nonspeech sounds. Cereb. Cortex 10, 512–528.
Davis, M. H., Ford, M. A., Kherif, F., and Johnsrude, I. S. (2011). Does semantic context benefit speech understanding through “top-down” processes? Evidence from time-resolved sparse fMRI. J. Cogn. Neurosci. 23, 3914–3932.
Eickhoff, S., Stephan, K., Mohlberg, H., Grefkes, C., Fink, G., Amunts, K., et al. (2005). A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. Neuroimage 25, 1325–1335.
Giraud, A.-L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S. J., and Laufs, H. (2007). Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron 56, 1127–1134.
Giraud, A.-L., Lorenzi, C., Ashburner, J., Wable, J., Johnsrude, I., Frackowiak, R., et al. (2000). Representation of the temporal envelope of sounds in the human brain. J. Neurophysiol. 84, 1588–1598.
Heinrich, A., Carlyon, R. P., Davis, M. H., and Johnsrude, I. S. (2008). Illusory vowels resulting from perceptual continuity: a functional magnetic resonance imaging study. J. Cogn. Neurosci. 20, 1737–1752.
Hickok, G., Okada, K., Barr, W., Pa, J., Rogalsky, C., Donnelly, K., et al. (2008). Bilateral capacity for speech sound processing in auditory comprehension: evidence from Wada procedures. Brain Lang. 107, 179–184.
McGettigan, C., Evans, S., Agnew, Z., Shah, P., and Scott, S. K. (2012). An application of univariate and multivariate approaches in fMRI to quantifying the hemispheric lateralization of acoustic and linguistic processes. J. Cogn. Neurosci. 24, 636–652.
Obleser, J., Wise, R. J. S., Dresner, M. A., and Scott, S. K. (2007). Functional integration across brain regions improves speech perception under adverse listening conditions. J. Neurosci. 27, 2283–2289.
Okada, K., Rong, F., Venezia, J., Matchin, W., Hsich, I.-H., Saberi, K., et al. (2010). Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. Cereb. Cortex 20, 2486–2495.
Overath, T., Zhang, Y., Sanes, D. H., and Poeppel, D. (2012). Sensitivity to temporal modulation rate and spectral bandwidth in the human auditory system: fMRI evidence. J. Neurophysiol. 107, 2042–2056.
Papoutsi, M., Stamatakis, E. A., Griffiths, J., Marslen-Wilson, W. D., and Tyler, L. K. (2011). Is left fronto-temporal connectivity essential for syntax? Effective connectivity, tractography and performance in left-hemisphere damaged patients. Neuroimage 58, 656–664.
Peelle, J. E., Cooke, A., Moore, P., Vesely, L., and Grossman, M. (2007). Syntactic and thematic components of sentence processing in progressive nonfluent aphasia and nonaphasic frontotemporal dementia. J. Neurolinguistics 20, 482–494.
Peelle, J. E., Eason, R. J., Schmitter, S., Schwarzbauer, C., and Davis, M. H. (2010a). Evaluating an acoustically quiet EPI sequence for use in fMRI studies of speech and auditory processing. Neuroimage 52, 1410–1419.
Peelle, J. E., Troiani, V., Wingfield, A., and Grossman, M. (2010b). Neural processing during older adults' comprehension of spoken sentences: age differences in resource allocation and connectivity. Cereb. Cortex 20, 773–782.
Price, C. J., Wise, R., Ramsay, S., Friston, K., Howard, D., Patterson, K., et al. (1992). Regional response differences within the human auditory cortex when listening to words. Neurosci. Lett. 146, 179–182.
Rodd, J. M., Longe, O. A., Randall, B., and Tyler, L. K. (2010). The functional organisation of the fronto-temporal language system: evidence from syntactic and semantic ambiguity. Neuropsychologia 48, 1324–1335.
Tyler, L. K., Marslen-Wilson, W. D., Randall, B., Wright, P., Devereux, B. J., Zhuang, J., et al. (2011). Left inferior frontal cortex and syntax: function, structure and behaviour in patients with left hemisphere damage. Brain 134, 415–431.
Tyler, L. K., Shafto, M. A., Randall, B., Wright, P., Marslen-Wilson, W. D., and Stamatakis, E. A. (2010). Preserving syntactic processing across the adult life span: the modulation of the frontotemporal language system in the context of age-related atrophy. Cereb. Cortex 20, 352–364.
Citation: Peelle JE (2012) The hemispheric lateralization of speech processing depends on what “speech” is: a hierarchical perspective. Front. Hum. Neurosci. 6:309. doi: 10.3389/fnhum.2012.00309
Received: 09 October 2012; Accepted: 25 October 2012;
Published online: 16 November 2012.
Edited by:Russell A. Poldrack, University of Texas, USA
Reviewed by:Russell A. Poldrack, University of Texas, USA
Copyright © 2012 Peelle. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.