Mini Review ARTICLE
Neural bases of accented speech perception
- 1Division of Psychology and Language Sciences, Department of Speech, Hearing, and Phonetic Sciences, University College London, London, UK
- 2School of Psychological Sciences, University of Manchester, Manchester, UK
The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Floccia et al., 2006; Adank et al., 2009). Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This review will outline neural bases associated with perception of accented speech in the light of current models of speech perception, and compare these data to brain areas associated with processing other speech distortions. We will subsequently evaluate competing models of speech processing with regards to neural processing of accented speech. See Cristia et al. (2012) for an in-depth overview of behavioral aspects of accent processing.
Processing Accent Variation at Pre- and Post-Lexical Levels
Models outlining the neural organization of speech perception (Hickok and Poeppel, 2007; Rauschecker and Scott, 2009) propose that the locus of processing intelligible speech is the temporal lobe within the ventral stream of speech processing. Rauschecker & Scott suggest that intelligibility processing has its center of gravity in left anterior STS (Superior Temporal Sulcus), while Hickok & Poeppel propose that processing intelligible speech is bilaterally organized and located both anteriorly and posteriorly to Heschl's Gyrus. However, both models are based on intelligible speech perception and do not make explicit predictions about the cortical substrates that subserve speech perception under challenging listening conditions (cf. Adank, 2012a) for a discussion on processing of intelligible speech).
A handful of fMRI studies address how the brain processes accent variation. Listening to difficult foreign phonemic contrasts (e.g., /l/-/r/ contrasts for Japanese listeners) has been associated with increased activation in auditory processing/speech production areas, including left Inferior Frontal Gyrus (IFG), left insula, bilateral ventral Premotor Cortex, right Pre- and Post-Central Gyrus, left anterior Superior Temporal Sulcus and Gyrus (STS/STG), left Planum Temporale (PT), left superior temporal parietal area (Stp), left Supramarginal Gyrus (SMG), and cerebellum bilaterally (Callan et al., 2004, 2014). It is noteworthy that the neural bases associated with listening to foreign languages overlap with those reported for unfamiliar accent processing, including bilateral STG/STS/MTG, and left IFG (Perani et al., 1996; Perani and Abutalebi, 2005; Hesling et al., 2012).
For sentence processing (Table 1, Figure 1), listening to an unfamiliar accent involves a network of frontal (left IFG, both Operculi/Insulas, Superior Frontal Gyrus), temporal (left Middle Temporal Gyrus [MTG], right STG), and medial regions (Supplementary Motor Area [SMA]) (Adank, 2012b; Adank et al., 2012b, 2013; Yi et al., 2014). It is unclear how the accent processing network maps onto the networks in Rauschecker and Scott (2009) and Hickok and Poeppel (2007). The coordinates for accent processing in the left temporal lobe are located anteriorly and posteriorly to Hickok and Poeppel's proposed STG area for spectrotemporal analysis, while the coordinates in left IFG are located inside Hickok and Poeppel's left inferior frontal area assigned to the dorsal stream's articulatory network. In contrast, the temporal coordinates in Table 1 fit well with Rauschecker & Scott's antero-ventral and postero-dorsal areas placed anteriorly and posteriorly to left primary auditory cortex, respectively, and the left IFG coordinates fall within their antero-ventral left inferior frontal area.
Table 1. Reported brain regions in studies investigating processing of accented, time-compressed, or noise-vocoded speech, plus speech with added background noise vs. undistorted words or sentences.
Figure 1. Clusters (logical) resulting from an Activation Likelihood Estimation (ALE) analysis conducted using GingerALE 2.3.3 (www.brainmap.org), q < 0.0001, cluster extent of 100 mm3, for the four accent studies (red), and the seven other distortions studies (pooled noise, time-compressed, and noise-vocoded studies) (green).
Accented Speech vs. Other Challenging Listening Conditions
As is the case with other types of distorted speech, understanding accented speech is associated with increased listening effort (Van Engen and Peelle, 2014). However, accent variation is of a conceptually different nature than variation in the acoustic signal resulting from an extrinsic source such as noise, i.e., phonetic realizations that differ from the listener's native realization of speech sounds. Furthermore, in contrast to speech-intrinsic variation, noise compromises the auditory system's representation of speech from ear to brain. Accented speech also differs from distortions such as noise-vocoded or time-compressed speech as the variation does not affect the acoustic integrity of the acoustic signal, as only specific phonemic and suprasegmental characteristics vary.
Processing speech in noise involves areas also activated for speech in an unfamiliar accent (Table 1): left insula (Adank et al., 2012a), left MTG (Peelle et al., 2010), left Pars Opercularis (POp), bilateral Pars Triangularis (PTr). Comprehension of time-compressed sentences activates left MTG (Poldrack et al., 2001; Adank and Devlin, 2010), right STG (Peelle et al., 2004; Adank and Devlin, 2010), SMA and left Insula (Adank and Devlin, 2010), while noise-vocoded speech activates left Insula (Erb et al., 2013), and left MTG/STG (Zekveld et al., 2014). However, it is clear from Figure 1 that processing accented speech also activates areas outside the network activated for processing speech in noise, time-compressed speech, and noise-vocoded speech.
Another problem in identifying networks governing accent processing is that perceiving variation in an unfamiliar accent (i.e., in an accent that differs from one's own accent and that the listener has had little or no exposure to) is confounded with cognitive load. Note that such confounds also exist for other distortions of the speech signal, such as background noise. Listeners process speech in an unfamiliar accent slower and less efficiently (Floccia et al., 2006). It is thus unclear to which extent the network supporting accented speech perception is shared with the network associated with increased task/cognitive load processing. Notably, an increase in task difficulty/working memory load relates to increases in BOLD-activation in left insula (Wild et al., 2012), and in left MTG, SMA, left PTr, and right STG (Wild et al., 2012), and could therefore explain activations in these regions related to processing accented speech. Directly comparing the neural processing of familiar/unfamiliar accents may help distinguishing between the two networks.
Accounts of Accented and Distorted Speech Processing
The current debate regarding how listeners understand others in challenging listening conditions focuses on the location and nature of neural substrates recruited for effective speech comprehension. The three accounts discussed below offer specific predictions regarding the neural networks involved in processing accented speech.
First, auditory-only accounts (Obleser and Eisner, 2009) hold that speech perception includes a prelexical abstraction process in which variation in the acoustic signal is “stripped away” to allow the perception system access to abstract linguistic representations. The abstraction process is placed at locations predominantly in the temporal (STS and STG) lobes. This account predicts that processing of accented speech takes place predominantly in the ventral stream, with minimal involvement of the dorsal stream.
Second, motor recruitment accounts suggest that auditory areas in the ventral stream and speech production areas in the dorsal stream are required to process unfamiliar speech signals (Wilson and Knoblich, 2005; Pickering and Garrod, 2013). These accounts assume that listening to speech results in the automatic activation of articulatory motor plans required for producing speech (Watkins et al., 2003). These motor plans provide forward models with information of articulatory mechanics, to be used when the incoming signal is ambiguous/unclear. Accented speech contains variation that can lead to ambiguities, and these accounts thus predict that perception of accented speech involves active involvement of speech production processes.
Third, executive recruitment accounts propose that activation of (pre-) motor areas during perception of distorted speech signals is not related to actual articulatory processing, but reflects the recruitment of general cognitive processes, such as increased attention, or decision processes (Rodríguez-Fornells et al., 2009; Venezia et al., 2012). Indeed, behavioral data suggest that recruitment of executive functions for processing accented speech (Adank and Janse, 2010; Janse and Adank, 2012; Banks et al., 2015) also predicts activation of frontal regions including left frontal operculum and anterior insula and precentral gyrus, as these regions have also been associated with executive functions such as working memory (Moisala et al., 2015).
The results in Table 1 contrast with predictions made by the auditory-only account (Obleser and Eisner, 2009), as areas associated with processing accent variation in Table 1 refer to a more widespread network than predicted. Instead, the network in Table 1 converges with the latter two accounts, as activation is located across ventral and prefrontal areas in the dorsal stream. We propose that these three accounts are synthesized into a single mixed account for processing of accented speech that brings together neural substrates associated with increased involvement of auditory and phonological processing (e.g., bilateral posterior STG), (pre-)motor recruitment for sensorimotor mapping (e.g., SMA), and substrates associated with increased reliance on cognitive control processes (e.g., IFG, insula, and frontal operculum).
The neural mechanisms responsible for processing accent variation in speech are not clearly outlined, but constitute a topic of active investigation in the field of speech perception. However, to progress our understanding in this area, future studies should meet several aims to overcome previous design limitations.
First, experiments should be designed so that contributions from processing accented speech and effortful processing can be teased apart (Venezia et al., 2012). Second, studies should aim to distinguish between brain activity related to processing accent variation and other distortions, such as background noise. Adank et al. (2012a) contrasted sentences in a familiar accent embedded in background noise with sentences in an unfamiliar accent, to disentangle areas associated with processing accent-related variation from those associated with processing speech in background noise: Left posterior temporal areas in STG (extending to PT) and right STG (extending into insula) were more activated for accented speech than speech in noise, while bilateral FO/insula were more activated for speech in noise compared to accented speech, indicating that the neural architecture for processing accented speech and speech in background noise is not generic. Third, different accents vary in how much they deviate from the listener's own accent. Greater deviation between accents is associated with greater processing cost, but the neural response associated with variations in distance between accents has not been explored using fMRI. A recent study using Transcranial Magnetic Stimulation (TMS) showed a causal role for lip and tongue motor cortex in perceived speaker and listener distance processing (Bartoli et al., 2013). Another study used EEG to show that regional and foreign accents might be processed differently: processing sentences in an unfamiliar foreign accent reduces the size of the N400 compared to unfamiliar native accents (Goslin et al., 2012). It may be fruitful to use a wider variety of neuroscience techniques, including (combinations of) fMRI, EEG, MEG, and TMS, to investigate how the brain successfully accomplishes accented speech perception. Third, as processing effort, or cognitive load, is inevitably confounded with processing unfamiliar variation in accented speech, experiments should be designed to identify neural substrates associated with processing accent variation and those associated with increased cognitive load. One possibility would be to examine task difficulty and accent processing in a fully crossed factorial design to single out areas that show increased BOLD-activation for accented speech and for task difficulty. Finally, the contribution of production resources to processing accented speech should be examined, to explicitly test predictions from motor and executive recruitment accounts (e.g., Du et al., 2014).
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This work was supported by the Leverhulme Trust under award number RPG-2013-254.
Adank, P. (2012b). The neural bases of difficult speech comprehension and speech production and their overlap: two Activation Likelihood Estimation (ALE) meta-analyses. Brain Lang. 122, 42–54. doi: 10.1016/j.bandl.2012.04.014
Adank, P., Davis, M., and Hagoort, P. (2012a). Neural dissociation in processing noise and accent in spoken language comprehension. Neuropsychologia 50, 77–84. doi: 10.1016/j.neuropsychologia.2011.10.024
Adank, P., Evans, B. G., Stuart-Smith, J., and Scott, S. K. (2009). Comprehension of familiar and unfamiliar native accents under adverse listening conditions. J. Exp. Psychol. Hum. Percept. Perform. 35, 520–529. doi: 10.1037/a0013552
Adank, P., Rueschemeyer, S. A., and Bekkering, H. (2013). The role of accent imitation in sensorimotor integration during processing of intelligible speech. Front. Hum. Neurosci. 4:634. doi: 10.3389/fnhum.2013.00634
Bartoli, E., D'Ausilio, A., Berry, J., Badino, L., Bever, T., and Fadiga, L. (2013). Listener–speaker perceived distance predicts the degree of motor contribution to speech perception. Cereb. Cortex 25, 281–288. doi: 10.1093/cercor/bht257
Callan, D. E., Callan, A. M., and Jones, J. A. (2014). Speech motor brain regions are differentially recruited during perception of native and foreign-accented phonemes for first and second language listeners. Front. Neurosci. 8:275. doi: 10.3389/fnins.2014.00275
Callan, D. E., Jones, J. A., Callan, A. M., and Akahane-Yamada, R. (2004). Phonetic perceptual identification by native- and second-language speakers differentially activates brain regions involved with acoustic phonetic processing and those involved with articulatory – auditory/orosensory internal models. Neuroimage 22, 1182–1194. doi: 10.1016/j.neuroimage.2004.03.006
Cristia, A., Seidl, A., Vaughn, C., Schmale, R., Bradlow, A. R., and Floccia, C. (2012). Linguistic processing of accented speech across the lifespan. Front. Psychol. 3:479. doi: 10.3389/fpsyg.2012.00479
Du, Y., Buchsbaum, B., Grady, C. L., and Alain, C. (2014). Noise differentially impacts phoneme representations in the auditory and speech motor systems. Proc. Natl. Acad. Sci. U.S.A. 111, 7126–7131. doi: 10.1073/pnas.1318738111
Eickhoff, S. B., Heim, S., Zilles, K., and Amunts, K. (2006). Testing anatomically specified hypotheses in functional imaging using cytoarchitectonic maps. Neuroimage 32, 570–582. doi: 10.1016/j.neuroimage.2006.04.204
Eickhoff, S. B., Paus, T., Caspers, S., Grosbras, M. H., Evans, A., Zilles, K., et al. (2007). Assignment of functional activations to probabilistic cytoarchitectonic areas revisited. Neuroimage 36, 511–521. doi: 10.1016/j.neuroimage.2007.03.060
Eickhoff, S. B., Stephan, K. E., Mohlberg, H., Grefkes, C., Fink, G. R., Amunts, K., et al. (2005). A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. Neuroimage 25, 1325–1335. doi: 10.1016/j.neuroimage.2004.12.034
Erb, J., Henry, M. J., Eisner, F., and Obleser, J. (2013). The brain dynamics of rapid perceptual adaptation to adverse listening conditions. J. Neurosci. 33, 10688–10697. doi: 10.1523/JNEUROSCI.4596-12.2013
Floccia, C., Goslin, J., Girard, F., and Konopczynski, G. (2006). Does a regional accent perturb speech processing? J. Exp. Psychol. Hum. Percept. Perform. 32, 1276–1293. doi: 10.1037/0096-15220.127.116.116
Hesling, I., Dilharreguy, B., Bordessoules, M., and Allard, M. (2012). The neural processing of second language comprehension modulated by the degree of proficiency: a listening connected speech FMRI study. Open Neuroimag. J. 6, 1–11. doi: 10.2174/1874440001206010044
Moisala, M., Salmela, V., Salo, E., Carlson, S., Vuontela, V., Salonen, O., et al. (2015). Brain activity during divided and selective attention to auditory and visual sentence comprehension tasks. Front. Hum. Neurosci. 9:86. doi: 10.3389/fnhum.2015.00086
Peelle, J. E., Eason, R. J., Schmitter, S., Schwarzbauer, C., and Davis, M. H. (2010). Evaluating an acoustically quiet EPI sequence for use in fMRI studies of speech and auditory processing. Neuroimage 52, 1410–1419. doi: 10.1016/j.neuroimage.2010.05.015
Peelle, J. E., McMillan, C., Moore, P., Grossman, M., and Wingfield, A. (2004). Dissociable patterns of brain activity during comprehension of rapid and syntactically complex speech: evidence from fMRI. Brain Lang. 91, 315–325. doi: 10.1016/j.bandl.2004.05.007
Perani, D., Dehaene, S., Grassi, F., Cohen, L., Cappa, S. F., Dupoux, E., et al. (1996). Brain processing of native and foreign languages. Neuroreport 15–17, 2439–2444. doi: 10.1097/00001756-199611040-00007
Poldrack, R. A., Temple, E., Protopapas, A., Nagarajan, S., Tallal, P., Merzenich, M., et al. (2001). Relations between the neural bases of dynamic auditory processing and phonological processing: evidence from fMRI. J. Cogn. Neurosci. 13, 687–697. doi: 10.1162/089892901750363235
Rodríguez-Fornells, A., Cunillera, T., Mestres-Missé, A., and de Diego-Balaguer, R. (2009). Neurophysiological mechanisms involved in language learning in adults. Philos. Trans. R. Soc. B Biol. Sci. 364, 3711–3735. doi: 10.1098/rstb.2009.0130
Watkins, K. E., Strafella, A. P., and Paus, T. (2003). Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia 41, 989–994. doi: 10.1016/S0028-3932(02)00316-0
Wild, C. J., Yusuf, A., Wilson, D. E., Peelle, J. E., Davis, M. H., and Johnsrude, I. S. (2012). Effortful listening: the processing of degraded speech depends critically on attention. J. Neurosci. 32, 14010–14021. doi: 10.1523/JNEUROSCI.1528-12.2012
Yi, H., Smiljanic, R., and Chandrasekaran, B. (2014). The neural processing of foreign-accented speech and its relationship to listener bias. Front. Hum. Neurosci. 8:768. doi: 10.3389/fnhum.2014.00768
Zekveld, A. A., Heslenfeld, D. J., Johnsrude, I. S., Versfeld, N. J., and Kramer, S. E. (2014). The eye as a window to the listening brain: neural correlates of pupil size as a measure of cognitive listening load. Neuroimage 101, 76–86. doi: 10.1016/j.neuroimage.2014.06.069
Keywords: cognitive neuroscience, speech perception, accented speech, fMRI, speech in noise, noise-vocoded speech, time-compressed speech
Citation: Adank P, Nuttall HE, Banks B and Kennedy-Higgins D (2015) Neural bases of accented speech perception. Front. Hum. Neurosci. 9:558. doi: 10.3389/fnhum.2015.00558
Received: 13 April 2015; Accepted: 22 September 2015;
Published: 06 October 2015.
Edited by:Guadalupe Dávila, University of Málaga, Spain
Reviewed by:Antoni Rodriguez-Fornells, University of Barcelona, Spain
Kristin Van Engen, Washington University in St. Louis, USA
Copyright © 2015 Adank, Nuttall, Banks and Kennedy-Higgins. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Patti Adank, Speech, Hearing and Phonetic Sciences, University College London (UCL), Chandler House, 2 Wakefield St., London WC1N 1PF, UK, firstname.lastname@example.org