Mini Review ARTICLE
The development of sensorimotor influences in the audiovisual speech domain: some critical questions
- 1Laboratoire Ethologie, Cognition, Développement, Université Paris Ouest Nanterre La Défense, Nanterre, France
- 2CNRS, Laboratoire Psychologie de la Perception, UMR 8242, Paris, France
- 3Université Paris Descartes, Paris Sorbonne Cité, Paris, France
Speech researchers have long been interested in how auditory and visual speech signals are integrated, and the recent work has revived interest in the role of speech production with respect to this process. Here, we discuss these issues from a developmental perspective. Because speech perception abilities typically outstrip speech production abilities in infancy and childhood, it is unclear how speech-like movements could influence audiovisual speech perception in development. While work on this question is still in its preliminary stages, there is nevertheless increasing evidence that sensorimotor processes (defined here as any motor or proprioceptive process related to orofacial movements) affect developmental audiovisual speech processing. We suggest three areas on which to focus in future research: (i) the relation between audiovisual speech perception and sensorimotor processes at birth, (ii) the pathways through which sensorimotor processes interact with audiovisual speech processing in infancy, and (iii) developmental change in sensorimotor pathways as speech production emerges in childhood.
A unique property of speech—compared to other auditory signals—is that it is multisensory. Speech involves not only auditory, but also visual, motor, as well as proprioceptive information, since we produce speech by moving our articulators (i.e., the jaw, tongue, lips, etc.). Accordingly, many speech researchers postulated that articulatory gestures, rather than acoustic cues, were the primary objects of speech perception (Liberman et al., 1967; Liberman and Mattingly, 1985; Fowler, 1986, 1996; Best, 1995; Galantucci et al., 2006), and in recent years, vigorous debates about these ideas have continued (Scott et al., 2009; Pulvermüller and Fadiga, 2010; Schwartz et al., 2010; Hickok, 2014). Currently, proposals suggesting that articulatory input has an important role in auditory-only speech processing (Yuen et al., 2010; Möttönen et al., 2013, 2014) have been viewed by some as highly controversial (Lotto et al., 2009; McGettigan et al., 2010; Chevillet et al., 2013).
Somewhat less controversial is the discussion of speech production in the context of multisensory speech processing (Ojanen et al., 2005; Skipper et al., 2007a; Okada and Hickok, 2009; Treille et al., 2014). Just as visual influences on auditory speech processing have long been reported (e.g., Sumby and Pollack, 1954; see Navarra et al., 2012 for review), recent reports have also shown similar effects from articulatory information. For example, subjects’ own silent articulations (Sams et al., 2005; Sato et al., 2013; Scott et al., 2013) influence auditory perception in similar ways as seeing visual speech (although see Mochida et al., 2013). Moreover, receiving haptic or tactile input related to another person’s articulatory movements can also influence auditory speech processing (Fowler and Dekle, 1991; Gick et al., 2008; Gick and Derrick, 2009; Ito et al., 2009; Treille et al., 2014). Neuroimaging studies converge with these behavioral findings: For example, when visual-only or audiovisual speech are presented to subjects, activation is seen in primary auditory areas of the brain, such as the superior temporal sulcus (STS), and in areas traditionally associated with speech production, such as Broca’s area (Calvert et al., 1997; Calvert and Campbell, 2003; Ojanen et al., 2005; Pekkola et al., 2005). TMS studies have now shown that the perception of visual and audiovisual speech is linked to primary motor cortex (Sundara et al., 2001; Sato et al., 2010), and from this accumulated evidence, there is emerging consensus that visual speech processing is closely linked to internal models of the vocal tract (Santi et al., 2003; van Wassenhove et al., 2005; Skipper et al., 2007a, b; Okada and Hickok, 2009; Dick et al., 2010; Swaminathan et al., 2013).
Here, we present a discussion of how developmental work may contribute to this broader literature. Infancy and childhood are particularly interesting because speech perception versus speech production capabilities are largely asymmetric during this period (see for reviews Oller, 1980; Stark, 1980; Werker and Yeung, 2005; Gervain and Mehler, 2010; Stoel-Gammon, 2011; Werker et al., 2012). Nevertheless, infants sometimes show neurophysiological activation that belies their apparent deficits in production. For example, areas corresponding to Broca’s area are activated in response to auditory speech even in 6 month olds (Imada et al., 2006), and while this area is also activated in a variety of adult tasks (including ones not strictly about production, see Friederici, 2012), these infant data could potentially be interpreted as reflecting rudimentary perception-production loops.
In light of infants’ limitations in the speech production domain, we use sensorimotor as a general term that broadly encompasses motor and proprioceptive information related to both speech-like and non-speech orofacial gestures. We focus on three issues that we see as being particularly pressing for future research: (i) the relation between audiovisual speech perception and sensorimotor processes at birth, (ii) the pathways through which sensorimotor processes interact with audiovisual speech processing in infancy, and (iii) developmental change in sensorimotor pathways as speech production emerges in infancy.
The Relation Between Audiovisual Speech Perception and Sensorimotor Processes at Birth
Infants receive filtered auditory input in the womb but necessarily do not experience audiovisual speech until birth. However, as soon as it can be measured, at least some basic aspects of audiovisual perception are already present. For example, newborns map abstract sensory and magnitude information across vision and audition (Meltzoff and Borton, 1979; Streri, 1993; de Hevia et al., 2014), and it also appears that newborns are particularly sensitive to audiovisual temporal synchrony (Slater et al., 1999). The precise origin of these interactions between vision and audition remain under debate (e.g., Bahrick et al., 2004; Maurer and Mondloch, 2004; Streri, 2012), but it is clear that intersensory correspondences are powerful in that they can influence attention and learning, as shown in classic studies with precocial birds (e.g., Lickliter et al., 2002). In human newborns, temporal synchrony between audition and vision plays an important role in matching monkey faces and voices (Lewkowicz et al., 2010), and newborns’ can also match human faces and voices under some circumstances (Aldridge et al., 1999), but further research showing the mechanisms driving this matching is needed. Here we define some critical issues with regard to the role of sensorimotor processes in audiovisual processing of speech- and speech-like stimuli at birth.
It is well established that newborns imitate faces at birth, suggesting early integration of vision and proprioception (e.g., Meltzoff and Moore, 1977, 1989), although it is important to note that this has been questioned on both empirical (Anisfeld, 1996) and interpretational grounds (Jones, 2007). Still, studies using speech stimuli converge with these results. For example, newborns produce more mouth openings when listening to /a/ versus /m/ sounds, and they produce more mouth closing when listening to /m/ versus /a/ sounds (Chen et al., 2004). However, future work will need to move beyond simple correspondences between sight, sound, and movement, and ask instead how such information interacts. For example, facial imitation at birth is more robust in the presence of congruent audiovisual speech: Infants produce more mouth-opening when presented with a face saying /a/, than with the face alone, or that face dubbed with an /i/ audio track (Coulon et al., 2013). A speculative interpretation is that congruent audiovisual speech constitutes more robust input to an internal model of the vocal tract, increasing the production of relevant mouth shapes.
Another question concerns specificity: can imitation also be elicited from auditory or visual models that are not identifiably human, and if so, what constraints on this system exist? For example, previous work has suggested preferential processing of speech stimuli over white noise (Colombo and Bundy, 1981) and synthetic analogs of speech (Vouloumanos and Werker, 2004, 2007). However, in a striking set of studies, a preference for human over monkey vocalizations was not found at birth, but was found at 3 months of age (Shultz and Vouloumanos, 2010; Vouloumanos et al., 2010). Together, these data suggest evolutionary constraints on auditory preferences, and in turn, raise questions about the imitation studies above. Will infants produce more facial gestures in response to human versus non-human (or non-mammalian) auditory, visual, and audiovisual models? What attentional and/or evolutionary factors might drive such effects?
A final future research question must also examine the functioning of sensorimotor and perceptual systems in a more precise manner, and in more naturalistic situations. For example, recent work suggests that newborns are highly sensitive to both rigid (i.e., whole-head) and non-rigid movements (i.e., facial features) of a talking face (Guellaï et al., 2011). Do newborns privilege one type of feature over the other when imitating (see also Meltzoff and Moore, 1989)? Previous work has also shown that newborns are also more sensitive to talking faces with direct versus averted gaze (Guellaï and Streri, 2011), suggesting that foundational aspects of social communication may exist at birth. However, it remains unclear how facial imitation may change with social gaze.
Pathways Through Which Sensorimotor Influences Interact with Audiovisual Speech Processing in Infancy
After the neonatal period, older infants continue to perceive audiovisual speech robustly. This has commonly been shown using a cross-modal matching procedure, where 2–4 month-olds are presented with side-by-side faces articulating the two visual vowels ([i] and [a]), accompanied by a single speech sound (either /i/ or /a/) in synchrony with both faces. Infants look longer at the matching face, showing an ability to associate vowels with the corresponding articulation (Kuhl and Meltzoff, 1982, 1984; MacKain et al., 1983; Patterson and Werker, 1999, 2002, 2003; Yeung and Werker, 2013). The effects of congruent versus incongruent audiovisual speech are also evident in a variety of other behavioral paradigms (Rosenblum et al., 1997; Burnham and Dodd, 2004; Desjardins and Werker, 2004; Pons et al., 2009; Tomalski et al., 2012; Kubicek et al., 2014; Pons and Lewkowicz, 2014), as well as in electrophysiological recordings (Kushnerenko et al., 2008; Bristow et al., 2009). A few recent papers have also begun to test audiovisual matching with fluent streams of speech (instead of just vowels or consonants; Lewkowicz and Pons, 2013; Kubicek et al., 2014), suggesting that audiovisual matching abilities in infancy can be very broad.
What about the mechanisms driving audiovisual speech perception? As mentioned above, infants at birth detect subtle differences in temporal synchrony between auditory and visual channels (Lewkowicz et al., 2010), and this is true of older infants as well (Lewkowicz, 1996, 2010). It could be that intersensory redundancy facilitates the detection of amodal properties related to vowel identity. Indeed, previous research has already shown that intersensory redundancy can enhance the detection of other kinds of amodal properties from faces (e.g., emotional affect; Flom and Bahrick, 2007), but at the cost of processing unimodal features (e.g., face identity; Bahrick et al., 2013). Together, this work suggests that synchrony detection may enhance amodal aspects of audiovisual speech (e.g., Bahrick et al., 2004).
An alternative proposal suggests that audiovisual speech information is mapped using sensorimotor information, perhaps via an internal model of the vocal tract (Kuhl and Meltzoff, 1984, 1988; Kent and Vorperian, 2007; Yeung and Werker, 2013). Several lines of evidence are suggestive of this sensorimotor mechanism: first, audiovisual matching with non-speech stimuli is often less robust than with speech (Kuhl and Meltzoff, 1984; Kuhl et al., 1991), particularly at later points in development (Lewkowicz and Ghazanfar, 2006), which suggests that audiovisual perception becomes more speech specific with age. Second, just as in newborns (Coulon et al., 2013), older infants also produce more congruent mouth shapes when hearing audiovisually congruent vowels compared to incongruent vowels (Legerstee, 1990; Kuhl and Meltzoff, 1996; Patterson and Werker, 1999). A recent report further shows that infants making /i/-like lip movements while chewing on a teething ring, or /u/-like lip movements while sucking on a pacifier, could no longer achieve match audiovisual speech matching if the heard vowel was similar the achieved lip shape (Yeung and Werker, 2013). This suggests that direct activation of the motor system can indeed affect audiovisual speech perception, and it is strongly suggestive of sensorimotor influences.
Together, this work raises two critical areas of future research. First, these dueling approaches must be reconciled: Are auditory and visual speech are bound together by temporal synchrony cues, or is there some internal model of the vocal tract that accomplishes this mapping? A third alternative is that two separate modes of audiovisual processing will be identified. For example, recent work has suggested that synchrony detection in 5 month-old infants uses a fast and automatic pathway which could be similar to the kind of adult audiovisual pathways that activate the STS and its associated networks (Hyde et al., 2011). More work is needed to see whether a slower, higher level pathway can also be distinguished, and if this pathway also taps sensorimotor information.
A second question concerns the definition of orofacial movements in infancy. Some work suggests that early vocalizations can already be considered speech-like: Cooing and babbling are influenced by the phonological properties of the native language (De Boysson-Bardies et al., 1989; Ruzza et al., 2006; Whalen et al., 2007), and are argued to be continuous with the first productions of words (de Boysson-Bardies and Vihman, 1991; Vihman, 1991; McCune and Vihman, 2001). Infant vocalizations also change in response to socially contingent responses from mothers, whether manipulated in the laboratory (Goldstein and Schwade, 2008), or measured during free play (Gros-Louis et al., 2014). Other work has even suggested that babbling capacities act as an attentional filter on auditory speech perception, modulating preferences to listen to words that either share or do not share commonalities between what is produced in babbling and in one’s early words (DePaolis et al., 2011, 2013; Majorano et al., 2014). At the same time, other research argues instead that universal constraints on the motor system (not specific to speech) play an equally important role in structuring how babbling is produced (MacNeilage and Davis, 1993; Lee et al., 2010). Moreover, coordinative movements differ when infants speak, babble, suck, or chew, suggesting that the physical mechanisms underlying babbling are not continuous with later speech motor control (Steeve, 2010).
In conjunction with the results from Yeung and Werker (2013), which demonstrate an effect of non-speech movements, the above debate shows how difficult it is to define what counts as an articulatory (i.e., speech-like) gesture, which in turn makes it hard to speculate about how an internal model of the vocal tract might be structured in early development (although see Ménard et al., 2007; Howard and Messum, 2011). Future research postulating a sensorimotor pathway in infancy will need to bear this literature in mind. One intriguing possibility is that distinctions between “speech-like” or “non-speech-like” may not be important at all (at least in early development): For example, infants have more difficulties matching auditory whistles to visual faces that are whistling than auditory trills to visual faces that are trilling. One speculative reason for this asymmetry could be that infants produce bilabial trills, but do not yet produce whistles (Mugitani et al., 2008).
Developmental Change as Speech Production Becomes More Varied and Sophisticated
Of course, the development of perceptual and motor systems continues well beyond infancy. For example, previous reports show that children (up to the age of 10) increasingly weight visual speech information more heavily in cases of sensory conflict or ambiguity (McGurk and MacDonald, 1976; Massaro, 1984; Massaro et al., 1986; Wightman et al., 2006; van Linden and Vroomen, 2008; Barutchu et al., 2010; Ross et al., 2011). It seems likely that multiple mechanisms drive this developmental change: For example, Sekiyama and Burnham (2008) find cross-cultural differences, which are likely unrelated to differences in motor ability. Nevertheless, there is also some correlational evidence supporting a sensorimotor pathway: children who have greater trouble articulating consonants show less sensitivity to visual speech information (Desjardins et al., 1997), as is also the case for children with broader language deficits (Bergeson et al., 2005; Dodd et al., 2008).
Other studies provide further evidence for multiple pathways emerging in childhood that are reminiscent of adult models (e.g., Skipper et al., 2007b; Okada and Hickok, 2009; Hickok et al., 2011). For example, while visual speech is more heavily weighted throughout childhood, non-speech audiovisual processing is relatively stable (Tremblay et al., 2007; although see Hillock et al., 2011). Such divergent trajectories suggest that two kinds of audiovisual binding mechanisms may be dissociated. Neurophysiological evidence for that dissociation comes from a study revealing two separable electrophysiological measures: amplitude versus latency of the commonly evoked N1/P2 complex to audiovisual speech (Knowland et al., 2014). Critically, only amplitude changes in development, while latency remains stable. Additional evidence comes from functional imaging studies, which suggests two networks related to audiovisual binding of speech stimuli: One network is centered around primary auditory areas, while a second network involves inferior frontal areas (Dick et al., 2010; Nath et al., 2011). Developmental change in audiovisual speech processing correlates with changes in connectivity between these networks (Dick et al., 2010).
Together these findings are highly suggestive of at least two distinct pathways in the brain that support audiovisual speech processing. A preliminary conjecture is that multiple pathways might be distinguished based on their developmental characteristics (stable, or increasing), their selectivity (to speech, or to may kinds of signals), and their mechanisms (depending on intersensory redundancy, or depending on an internal articulatory model). Critical lines of future research will need to investigate these hypotheses.
Speech perception is one of the most deeply explored aspects of language development. However, as this review highlights, several aspects of this phenomenon remain mysterious: in particular, the relation between speech perception and production. Here, we examine possible sensorimotor influences in multisensory speech processing, highlighting three areas for future research that will bridge between debates in the adult literature and emerging work in development.
First, we suggest that future research must examine the link between imitation and audiovisual speech perception at birth, and explore interactions among vision, audition, and the motor system. Second, we highlight two potential pathways involved in audiovisual speech perception in older infants, one of which may depend on sensorimotor information. Third, we illustrate the need to elucidate the behavioral and at the neural characteristics of these pathways in children, as speech production becomes more sophisticated.
Conflict of Interest Statement
The Guest Associate Editor Maya Gratier declares that, despite being affiliated to the same institution as author Bahia Guellaï, the review process was handled objectively and no conflict of interest exists. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors thank Rana Esseily and Maya Gratier for inviting them in this special issue.
Bahrick, L. E., Lickliter, R., and Castellanos, I. (2013). The development of face perception in infancy: intersensory interference and unimodal visual facilitation. Dev. Psychol. 49, 1919–1930. doi: 10.1037/a0031238
Bahrick, L. E., Lickliter, R., and Flom, R. (2004). Intersensory redundancy guides the development of selective attention, perception, and cognition in infancy. Curr. Dir. Psychol. Sci. 13, 99–102. doi: 10.1111/j.0963-7214.2004.00283.x
Barutchu, A., Danaher, J., Crewther, S. G., Innes-Brown, H., Shivdasani, M. N., and Paolini, A. G. (2010). Audiovisual integration in noise by children and adults. J. Exp. Child Psychol. 105, 38–50. doi: 10.1016/j.jecp.2009.08.005
Bergeson, T. R., Pisoni, D. B., and Davis, R. A. O. (2005). Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants. Ear. Hear. 26, 149–164. doi: 10.1097/00003446-200504000-00004
Best, C. T. (1995). “A direct realist view of cross-language speech perception,” in Speech Perception and Linguistic Experience: Issues in Cross-language Speech Research, ed. W. Strange (Timonium, MD: York Press), 171–204.
Bristow, D., Dehaene-Lambertz, G., Mattout, J., Soares, C., Gliga, T., Baillet, S.,et al. (2009). Hearing faces: how the infant brain matches the face it sees with the speech it hears. J. Cogn. Neurosci. 21, 905–921. doi: 10.1162/jocn.2009.21076
Burnham, D., and Dodd, B. (2004). Auditory-visual speech integration by prelinguistic infants: perception of an emergent consonant in the McGurk effect. Dev. Psychobiol. 45, 204–220. doi: 10.1002/dev.20032
Chevillet, M. A., Jiang, X., Rauschecker, J. P., and Riesenhuber, M. (2013). Automatic phoneme category selectivity in the dorsal auditory stream. J. Neurosci. 33, 5208–5215. doi: 10.1523/JNEUROSCI.1870-12.2013
de Hevia, M. D., Izard, V., Coubart, A., Spelke, E. S., and Streri, A. (2014). Representations of space, time, and number in neonates. Proc. Natl. Acad. Sci. U.S.A. 111, 4809–4813. doi: 10.1073/pnas.1323628111
DePaolis, R. A., Vihman, M. M., and Keren-Portnoy, T. (2011). Do production patterns influence the processing of speech in prelinguistic infants? Infant Behav. Dev. 34, 590–601. doi: 10.1016/j.infbeh.2011.06.005
Desjardins, R. N., Rogers, J., and Werker, J. F. (1997). An exploration of why preschoolers perform differently than do adults in audiovisual speech perception tasks. J. Exp. Child Psychol. 66, 85–110. doi: 10.1006/jecp.1997.2379
Dodd, B., McIntosh, B., Erdener, D., and Burnham, D. (2008). Perception of the auditory-visual illusion in speech perception by children with phonological disorders. Clin. Linguist. Phon. 22, 69–82. doi: 10.1080/02699200701660100
Flom, R., and Bahrick, L. E. (2007). The development of infant discrimination of affect in multimodal and unimodal stimulation: the role of intersensory redundancy. Dev. Psychol. 43, 238–252. doi: 10.1037/0012-16184.108.40.206
Fowler, C. A., and Dekle, D. J. (1991). Listening with eye and hand: cross-modal contributions to speech perception. J. Exp. Psychol. Hum. Percept. Perform. 17, 816–823. doi: 10.1037/0096-15220.127.116.116
Gick, B., Jóhannsdóttir, K. M., Gibraiel, D., and Mühlbauer, J. (2008). Tactile enhancement of auditory and visual speech perception in untrained perceivers. J. Acoust. Soc. Am. 123, EL72–EL76. doi: 10.1121/1.2884349
Hillock, A. R., Powers, M. R., and Wallace, M. T. (2011). Binding of sights and sounds: age-related changes in multisensory temporal processing. Neuropsychologia 49, 461–467. doi: 10.1016/j.neuropsychologia.2010.11.041
Imada, T., Zhang, Y., Cheour, M., Taulu, S., Ahonen, A., and Kuhl, P. K. (2006). Infant speech perception activates Broca’s area: a developmental magnetoencephalography study. Neuroreport 17, 957–962. doi: 10.1097/01.wnr.0000223387.51704.89
Kent, R. D., and Vorperian, H. K. (2007). “In the mouths of babes: anatomic, motor, and sensory foundations of speech development in children,” in Language Disorders from a Developmental Perspective, ed. R. Paul (Mahwah, NJ: Lawrence Erlbaum), 55–81.
Knowland, V. C. P., Mercure, E., Karmiloff-Smith, A., Dick, F., and Thomas, M. S. C. (2014). Audio-visual speech perception: a developmental ERP investigation. Dev. Sci. 17, 110–124. doi: 10.1111/desc.12098
Kubicek, C., Hillairet de Boisferon, A., Dupierrix, E., Pascalis, O., Loevenbruck, H., Gervain, J.,et al. (2014). Cross-modal matching of audio-visual german and French fluent speech in infancy. PLoS ONE 9:e89275. doi: 10.1371/journal.pone.0089275
Kuhl, P. K., and Meltzoff, A. N. (1988). “Speech as an intermodal object of perception,” in Perceptual Development in Infancy. The Minnesota Symposia on Child Csychology, Vol. 20, ed. A. Yonas. (Hilldale, NJ: Lawrence Erlbaum),235–266.
Kuhl, P. K., Williams, K. A., and Meltzoff, A. N. (1991). Cross-modal speech perception in adults and infants using nonspeech auditory stimuli. J. Exp. Psychol. Hum. Percept. Perform. 17, 829–840. doi: 10.1037/0096-1518.104.22.1689
Kushnerenko, E., Teinonen, T., Volein, A., and Csibra, G. (2008). Electrophysiological evidence of illusory audiovisual speech percept in human infants. Proc. Natl. Acad. Sci. U.S.A. 105, 11442–11445. doi: 10.1073/pnas.0804275105
Lee, S., Davis, B., and MacNeilage, P. (2010). Universal production patterns and ambient language influences in babbling: a cross-linguistic study of Korean- and English-learning infants. J. Child Lang. 37, 26. doi: 10.1017/S0305000909009532
Lickliter, R., Bahrick, L. E., and Honeycutt, H. (2002). Intersensory redundancy facilitates prenatal perceptual learning in bobwhite quail (Colinus virginianus) embryos. Dev. Psychol. 38, 15–23. doi: 10.1037//0012-1622.214.171.124
MacNeilage, P. F., and Davis, B. L. (1993). “Motor explanations of babbling and early speech patterns,” in Developmental Neurocognition: Speech and Face Processing in the First Year of Life, Vol. 69, eds B. de Boysson-Bardies, S. de Schonen, P. W. Jusczyk, P. F. MacNeilage, and J. Morton (Norwell, MA: Kluwer),341–352.
Majorano, M., Vihman, M. M., and DePaolis, R. A. (2014). The relationship between infants’ production experience and their processing of speech. Lang. Learn. Dev. 10, 179–204. doi: 10.1080/15475441.2013.829740
Massaro, D. W., Thompson, L. A., Barron, B., and Laren, E. (1986). Developmental changes in visual and auditory contributions to speech perception. J. Exp. Child Psychol. 41, 93–113. doi: 10.1016/0022-0965(86)90053-6
Maurer, D., and Mondloch, C. J. (2004). “Neonatal synesthesia: a reevaluation,” in Synesthesia: Perspectives from Cognitive Neuroscience, eds L. C. Robertson and N. Sagiv (Oxford: Oxford University Press), 193–213.
McGettigan, C., Agnew, Z. K., and Scott, S. K. (2010). Are articulatory commands automatically and involuntarily activated during speech perception? Proc. Natl. Acad. Sci. U.S.A. 107:E42. doi: 10.1073/pnas.1000186107
Meltzoff, A. N., and Moore, M. K. (1989). Imitation in newborn infants: exploring the range of gestures imitated and the underlying mechanisms. Dev. Psychol. 25, 954–962. doi: 10.1037/0012-16126.96.36.1994
Ménard, L., Schwartz, J.-L., Boë, L.-J., and Aubin, J. (2007). Articulatory–acoustic relationships during vocal tract growth for French vowels: analysis of real data and simulations with an articulatory model. J. Phon. 35, 1–19. doi: 10.1016/j.wocn.2006.01.003
Mochida, T., Kimura, T., Hiroya, S., Kitagawa, N., Gomi, H., and Kondo, T. (2013). Speech misperception: speaking and seeing interfere differently with hearing. PLoS ONE 8:e68619. doi: 10.1371/journal.pone.0068619
Nath, A. R., Fava, E. E., and Beauchamp, M. S. (2011). Neural correlates of interindividual differences in children’s audiovisual speech perception. J. Neurosci. 31, 13963–13971. doi: 10.1523/JNEUROSCI.2605-11.2011
Navarra, J., Yeung, H. H., Werker, J. F., and Soto-Faraco, S. (2012). “Multisensory interactions in speech perception,” in The New Handbook of Multisensory Processing, ed. B. E. Stein (Cambridge, MA: MIT Press),435–452.
Ojanen, V., Möttönen, R., Pekkola, J., Jääskeläinen, I. P., Joensuu, R., Autti, T.,et al. (2005). Processing of audiovisual speech in Broca’s area. Neuroimage, 25, 333–338. doi: 10.1016/j.neuroimage.2004.12.001
Okada, K., and Hickok, G. (2009). Two cortical mechanisms support the integration of visual and auditory speech: a hypothesis and preliminary data. Neurosci. Lett. 452, 219–223. doi: 10.1016/j.neulet.2009.01.060
Oller, D. K. (1980). “The emergence of the sounds of speech in infancy,” in Child Phonology, Vol. 1, Production, eds G. H. Yeni-Komishan, J. F. Kavanagh, and C. A. Ferguson (New York: Academic Press), 92–112.
Pekkola, J., Ojanen, V., Autti, T., Jääskeläinen, I. P., Möttönen, R., Tarkiainen, A.,et al. (2005). Primary auditory cortex activation by visual speech: an fMRI study at 3 T. Neuroreport 16, 125. doi: 10.1097/00001756-200502080-00010
Pons, F., Lewkowicz, D. J., Soto-Faraco, S., and Sebastián-Gallés, N. (2009). Narrowing of intersensory speech perception in infancy. Proc. Natl. Acad. Sci. U.S.A. 106, 10598–10602. doi: 10.1073/pnas.0904134106
Ross, L. A., Molholm, S., Blanco, D., Gomez-Ramirez, M., Saint-Amour, D., and Foxe, J. J. (2011). The development of multisensory speech perception continues into the late childhood years. Eur. J. Neurosci. 33, 2329–2337. doi: 10.1111/j.1460-9568.2011.07685.x
Santi, A., Servos, P., Vatikiotis-Bateson, E., Kuratate, T., and Munhall, K. (2003). Perceiving biological motion: dissociating visible speech from walking. J. Cogn. Neurosci. 15, 800–809. doi: 10.1162/089892903322370726
Sato, M., Buccino, G., Gentilucci, M., and Cattaneo, L. (2010). On the tip of the tongue: modulation of the primary motor cortex during audiovisual speech perception. Speech Commun. 52, 533–541. doi: 10.1016/j.specom.2009.12.004
Sato, M., Troille, E., Ménard, L., Cathiard, M.-A., and Gracco, V. (2013). Silent articulation modulates auditory and audiovisual speech perception. Exp. Brain Res. 227, 275–288. doi: 10.1007/s00221-013-3510-8
Schwartz, J.-L., Basirat, A., Ménard, L., and Sato, M. (2010). The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception. J. Neurolinguistics 25, 336–354. doi: 10.1016/j.jneuroling.2009.12.004
Scott, S. K., McGettigan, C., and Eisner, F. (2009). A little more conversation, a little less action – candidate roles for the motor cortex in speech perception. Nat. Rev. Neurosci. 10, 295–302. doi: 10.1038/nrn2603
Skipper, J. I., Goldin-Meadow, S., Nusbaum, H. C., and Small, S. L. (2007a). Speech-associated gestures, Broca’s area, and the human mirror system. Brain Lang. 101, 260–277. doi: 10.1016/j.bandl.2007.02.008
Skipper, J. I., van Wassenhove, V., Nusbaum, H. C., and Small, S. L. (2007b). Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception. Cereb. Cortex 17, 2387–2399. doi: 10.1093/cercor/bhl147
Slater, A., Brown, E., Hayes, R., and Quinn, P. C. (1999). Intermodal perception at birth: intersensory redundancy guides newborn infants’ learning of arbitrary auditory – visual pairings. Dev. Sci. 2, 333. doi: 10.1111/1467-7687.00079
Stark, R. E. (1980). “Stages of speech development in the first year of life,” in Child Phonology, Vol. 1, Production, eds G. H. Yeni-Komishan, J. F. Kavanagh, and C. A. Ferguson (New York: Academic Press), 73–92.
Streri, A. (2012). “Crossmodal interactions in the human newborn: news answers to Molyneux’s question,” in Multisensory Development, eds A. J. Bremner, D. J. Lewkowicz, and C. Spence (Oxford: Oxford University Press), 88–112.
Sundara, M., Kumar Namasivayam, A., and Chen, R. (2001). Observation–execution matching system for speech: a magnetic stimulation study. Neuroreport 12, 1341–1344. doi: 10.1097/00001756-200105250-00010
Swaminathan, S., MacSweeney, M., Boyles, R., Waters, D., Watkins, K. E., and Möttönen, R. (2013). Motor excitability during visual perception of known and unknown spoken languages. Brain Lang. 126, 1–7. doi: 10.1016/j.bandl.2013.03.002
Tomalski, P., Ribeiro, H., Ballieux, H., Axelsson, E. L., Murphy, E., Moore, D. G.,et al. (2012). Exploring early developmental changes in face scanning patterns during the perception of audiovisual mismatch of speech cues. Eur. J. Dev. Psychol. 1, 1–14. doi: 10.1080/17405629.2012.728076
Treille, A., Cordeboeuf, C., Vilain, C., and Sato, M. (2014). Haptic and visual information speed up the neural processing of auditory speech in live dyadic interactions. Neuropsychologia 57, 71–77. doi: 10.1016/j.neuropsychologia.2014.02.004
Tremblay, C., Champoux, F., Voss, P., Bacon, B. A., Lepore, F., and Théoret, H. (2007). Speech and non-speech audio-visual illusions: a developmental study. PLoS ONE 2:e742. doi: 10.1371/journal.pone.0000742
Vihman, M. M. (1991). “Ontogeny of phonetic gestures: speech production,” in Modularity and the Motor Theory of Speech Perception, eds I. G. Mattingly and M. Studdert-Kennedy (Hillsdale, NJ: Lawrence Erlbaum), 69–84.
Keywords: speech perception, speech production, sensorimotor systems, infants, children
Citation: Guellaï B, Streri A and Yeung HH (2014) The development of sensorimotor influences in the audiovisual speech domain: some critical questions. Front. Psychol. 5:812. doi: 10.3389/fpsyg.2014.00812
Received: 27 April 2014; Accepted: 09 July 2014;
Published online: 06 August 2014.
Edited by:Maya Gratier, Université Paris Ouest Nanterre La Défense, France
Copyright © 2014 Guellaï, Streri and Yeung. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Bahia Guellaï, Laboratoire Ethologie, Cognition, Développement, Université Paris Ouest Nanterre La Défense, 200, Avenue de la République, 92000 Nanterre, France e-mail: firstname.lastname@example.org; H. Henny Yeung, Laboratoire Psychologie de la Perception, 45 rue des Saints-Pères, 75006 Paris, France e-mail: email@example.com