Abstract
Speech researchers have long been interested in how auditory and visual speech signals are integrated, and the recent work has revived interest in the role of speech production with respect to this process. Here, we discuss these issues from a developmental perspective. Because speech perception abilities typically outstrip speech production abilities in infancy and childhood, it is unclear how speech-like movements could influence audiovisual speech perception in development. While work on this question is still in its preliminary stages, there is nevertheless increasing evidence that sensorimotor processes (defined here as any motor or proprioceptive process related to orofacial movements) affect developmental audiovisual speech processing. We suggest three areas on which to focus in future research: (i) the relation between audiovisual speech perception and sensorimotor processes at birth, (ii) the pathways through which sensorimotor processes interact with audiovisual speech processing in infancy, and (iii) developmental change in sensorimotor pathways as speech production emerges in childhood.
INTRODUCTION
A unique property of speech—compared to other auditory signals—is that it is multisensory. Speech involves not only auditory, but also visual, motor, as well as proprioceptive information, since we produce speech by moving our articulators (i.e., the jaw, tongue, lips, etc.). Accordingly, many speech researchers postulated that articulatory gestures, rather than acoustic cues, were the primary objects of speech perception (Liberman et al., 1967; Liberman and Mattingly, 1985; Fowler, 1986, 1996; Best, 1995; Galantucci et al., 2006), and in recent years, vigorous debates about these ideas have continued (Scott et al., 2009; Pulvermüller and Fadiga, 2010; Schwartz et al., 2010; Hickok, 2014). Currently, proposals suggesting that articulatory input has an important role in auditory-only speech processing (Yuen et al., 2010; Möttönen et al., 2013, 2014) have been viewed by some as highly controversial (Lotto et al., 2009; McGettigan et al., 2010; Chevillet et al., 2013).
Somewhat less controversial is the discussion of speech production in the context of multisensory speech processing (Ojanen et al., 2005; Skipper et al., 2007a; Okada and Hickok, 2009; Treille et al., 2014). Just as visual influences on auditory speech processing have long been reported (e.g., Sumby and Pollack, 1954; see Navarra et al., 2012 for review), recent reports have also shown similar effects from articulatory information. For example, subjects’ own silent articulations (Sams et al., 2005; Sato et al., 2013; Scott et al., 2013) influence auditory perception in similar ways as seeing visual speech (although see Mochida et al., 2013). Moreover, receiving haptic or tactile input related to another person’s articulatory movements can also influence auditory speech processing (Fowler and Dekle, 1991; Gick et al., 2008; Gick and Derrick, 2009; Ito et al., 2009; Treille et al., 2014). Neuroimaging studies converge with these behavioral findings: For example, when visual-only or audiovisual speech are presented to subjects, activation is seen in primary auditory areas of the brain, such as the superior temporal sulcus (STS), and in areas traditionally associated with speech production, such as Broca’s area (Calvert et al., 1997; Calvert and Campbell, 2003; Ojanen et al., 2005; Pekkola et al., 2005). TMS studies have now shown that the perception of visual and audiovisual speech is linked to primary motor cortex (Sundara et al., 2001; Sato et al., 2010), and from this accumulated evidence, there is emerging consensus that visual speech processing is closely linked to internal models of the vocal tract (Santi et al., 2003; van Wassenhove et al., 2005; Skipper et al., 2007a, b; Okada and Hickok, 2009; Dick et al., 2010; Swaminathan et al., 2013).
Here, we present a discussion of how developmental work may contribute to this broader literature. Infancy and childhood are particularly interesting because speech perception versus speech production capabilities are largely asymmetric during this period (see for reviews Oller, 1980; Stark, 1980; Werker and Yeung, 2005; Gervain and Mehler, 2010; Stoel-Gammon, 2011; Werker et al., 2012). Nevertheless, infants sometimes show neurophysiological activation that belies their apparent deficits in production. For example, areas corresponding to Broca’s area are activated in response to auditory speech even in 6 month olds (Imada et al., 2006), and while this area is also activated in a variety of adult tasks (including ones not strictly about production, see Friederici, 2012), these infant data could potentially be interpreted as reflecting rudimentary perception-production loops.
In light of infants’ limitations in the speech production domain, we use sensorimotor as a general term that broadly encompasses motor and proprioceptive information related to both speech-like and non-speech orofacial gestures. We focus on three issues that we see as being particularly pressing for future research: (i) the relation between audiovisual speech perception and sensorimotor processes at birth, (ii) the pathways through which sensorimotor processes interact with audiovisual speech processing in infancy, and (iii) developmental change in sensorimotor pathways as speech production emerges in infancy.
THE RELATION BETWEEN AUDIOVISUAL SPEECH PERCEPTION AND SENSORIMOTOR PROCESSES AT BIRTH
Infants receive filtered auditory input in the womb but necessarily do not experience audiovisual speech until birth. However, as soon as it can be measured, at least some basic aspects of audiovisual perception are already present. For example, newborns map abstract sensory and magnitude information across vision and audition (Meltzoff and Borton, 1979; Streri, 1993; de Hevia et al., 2014), and it also appears that newborns are particularly sensitive to audiovisual temporal synchrony (Slater et al., 1999). The precise origin of these interactions between vision and audition remain under debate (e.g., Bahrick et al., 2004; Maurer and Mondloch, 2004; Streri, 2012), but it is clear that intersensory correspondences are powerful in that they can influence attention and learning, as shown in classic studies with precocial birds (e.g., Lickliter et al., 2002). In human newborns, temporal synchrony between audition and vision plays an important role in matching monkey faces and voices (Lewkowicz et al., 2010), and newborns’ can also match human faces and voices under some circumstances (Aldridge et al., 1999), but further research showing the mechanisms driving this matching is needed. Here we define some critical issues with regard to the role of sensorimotor processes in audiovisual processing of speech- and speech-like stimuli at birth.
It is well established that newborns imitate faces at birth, suggesting early integration of vision and proprioception (e.g., Meltzoff and Moore, 1977, 1989), although it is important to note that this has been questioned on both empirical (Anisfeld, 1996) and interpretational grounds (Jones, 2007). Still, studies using speech stimuli converge with these results. For example, newborns produce more mouth openings when listening to /a/ versus /m/ sounds, and they produce more mouth closing when listening to /m/ versus /a/ sounds (Chen et al., 2004). However, future work will need to move beyond simple correspondences between sight, sound, and movement, and ask instead how such information interacts. For example, facial imitation at birth is more robust in the presence of congruent audiovisual speech: Infants produce more mouth-opening when presented with a face saying /a/, than with the face alone, or that face dubbed with an /i/ audio track (Coulon et al., 2013). A speculative interpretation is that congruent audiovisual speech constitutes more robust input to an internal model of the vocal tract, increasing the production of relevant mouth shapes.
Another question concerns specificity: can imitation also be elicited from auditory or visual models that are not identifiably human, and if so, what constraints on this system exist? For example, previous work has suggested preferential processing of speech stimuli over white noise (Colombo and Bundy, 1981) and synthetic analogs of speech (Vouloumanos and Werker, 2004, 2007). However, in a striking set of studies, a preference for human over monkey vocalizations was not found at birth, but was found at 3 months of age (Shultz and Vouloumanos, 2010; Vouloumanos et al., 2010). Together, these data suggest evolutionary constraints on auditory preferences, and in turn, raise questions about the imitation studies above. Will infants produce more facial gestures in response to human versus non-human (or non-mammalian) auditory, visual, and audiovisual models? What attentional and/or evolutionary factors might drive such effects?
A final future research question must also examine the functioning of sensorimotor and perceptual systems in a more precise manner, and in more naturalistic situations. For example, recent work suggests that newborns are highly sensitive to both rigid (i.e., whole-head) and non-rigid movements (i.e., facial features) of a talking face (Guellaï et al., 2011). Do newborns privilege one type of feature over the other when imitating (see also Meltzoff and Moore, 1989)? Previous work has also shown that newborns are also more sensitive to talking faces with direct versus averted gaze (Guellaï and Streri, 2011), suggesting that foundational aspects of social communication may exist at birth. However, it remains unclear how facial imitation may change with social gaze.
PATHWAYS THROUGH WHICH SENSORIMOTOR INFLUENCES INTERACT WITH AUDIOVISUAL SPEECH PROCESSING IN INFANCY
After the neonatal period, older infants continue to perceive audiovisual speech robustly. This has commonly been shown using a cross-modal matching procedure, where 2–4 month-olds are presented with side-by-side faces articulating the two visual vowels ([i] and [a]), accompanied by a single speech sound (either /i/ or /a/) in synchrony with both faces. Infants look longer at the matching face, showing an ability to associate vowels with the corresponding articulation (Kuhl and Meltzoff, 1982, 1984; MacKain et al., 1983; Patterson and Werker, 1999, 2002, 2003; Yeung and Werker, 2013). The effects of congruent versus incongruent audiovisual speech are also evident in a variety of other behavioral paradigms (Rosenblum et al., 1997; Burnham and Dodd, 2004; Desjardins and Werker, 2004; Pons et al., 2009; Tomalski et al., 2012; Kubicek et al., 2014; Pons and Lewkowicz, 2014), as well as in electrophysiological recordings (Kushnerenko et al., 2008; Bristow et al., 2009). A few recent papers have also begun to test audiovisual matching with fluent streams of speech (instead of just vowels or consonants; Lewkowicz and Pons, 2013; Kubicek et al., 2014), suggesting that audiovisual matching abilities in infancy can be very broad.
What about the mechanisms driving audiovisual speech perception? As mentioned above, infants at birth detect subtle differences in temporal synchrony between auditory and visual channels (Lewkowicz et al., 2010), and this is true of older infants as well (Lewkowicz, 1996, 2010). It could be that intersensory redundancy facilitates the detection of amodal properties related to vowel identity. Indeed, previous research has already shown that intersensory redundancy can enhance the detection of other kinds of amodal properties from faces (e.g., emotional affect; Flom and Bahrick, 2007), but at the cost of processing unimodal features (e.g., face identity; Bahrick et al., 2013). Together, this work suggests that synchrony detection may enhance amodal aspects of audiovisual speech (e.g., Bahrick et al., 2004).
An alternative proposal suggests that audiovisual speech information is mapped using sensorimotor information, perhaps via an internal model of the vocal tract (Kuhl and Meltzoff, 1984, 1988; Kent and Vorperian, 2007; Yeung and Werker, 2013). Several lines of evidence are suggestive of this sensorimotor mechanism: first, audiovisual matching with non-speech stimuli is often less robust than with speech (Kuhl and Meltzoff, 1984; Kuhl et al., 1991), particularly at later points in development (Lewkowicz and Ghazanfar, 2006), which suggests that audiovisual perception becomes more speech specific with age. Second, just as in newborns (Coulon et al., 2013), older infants also produce more congruent mouth shapes when hearing audiovisually congruent vowels compared to incongruent vowels (Legerstee, 1990; Kuhl and Meltzoff, 1996; Patterson and Werker, 1999). A recent report further shows that infants making /i/-like lip movements while chewing on a teething ring, or /u/-like lip movements while sucking on a pacifier, could no longer achieve match audiovisual speech matching if the heard vowel was similar the achieved lip shape (Yeung and Werker, 2013). This suggests that direct activation of the motor system can indeed affect audiovisual speech perception, and it is strongly suggestive of sensorimotor influences.
Together, this work raises two critical areas of future research. First, these dueling approaches must be reconciled: Are auditory and visual speech are bound together by temporal synchrony cues, or is there some internal model of the vocal tract that accomplishes this mapping? A third alternative is that two separate modes of audiovisual processing will be identified. For example, recent work has suggested that synchrony detection in 5 month-old infants uses a fast and automatic pathway which could be similar to the kind of adult audiovisual pathways that activate the STS and its associated networks (Hyde et al., 2011). More work is needed to see whether a slower, higher level pathway can also be distinguished, and if this pathway also taps sensorimotor information.
A second question concerns the definition of orofacial movements in infancy. Some work suggests that early vocalizations can already be considered speech-like: Cooing and babbling are influenced by the phonological properties of the native language (De Boysson-Bardies et al., 1989; Ruzza et al., 2006; Whalen et al., 2007), and are argued to be continuous with the first productions of words (de Boysson-Bardies and Vihman, 1991; Vihman, 1991; McCune and Vihman, 2001). Infant vocalizations also change in response to socially contingent responses from mothers, whether manipulated in the laboratory (Goldstein and Schwade, 2008), or measured during free play (Gros-Louis et al., 2014). Other work has even suggested that babbling capacities act as an attentional filter on auditory speech perception, modulating preferences to listen to words that either share or do not share commonalities between what is produced in babbling and in one’s early words (DePaolis et al., 2011, 2013; Majorano et al., 2014). At the same time, other research argues instead that universal constraints on the motor system (not specific to speech) play an equally important role in structuring how babbling is produced (MacNeilage and Davis, 1993; Lee et al., 2010). Moreover, coordinative movements differ when infants speak, babble, suck, or chew, suggesting that the physical mechanisms underlying babbling are not continuous with later speech motor control (Steeve, 2010).
In conjunction with the results from Yeung and Werker (2013), which demonstrate an effect of non-speech movements, the above debate shows how difficult it is to define what counts as an articulatory (i.e., speech-like) gesture, which in turn makes it hard to speculate about how an internal model of the vocal tract might be structured in early development (although see Ménard et al., 2007; Howard and Messum, 2011). Future research postulating a sensorimotor pathway in infancy will need to bear this literature in mind. One intriguing possibility is that distinctions between “speech-like” or “non-speech-like” may not be important at all (at least in early development): For example, infants have more difficulties matching auditory whistles to visual faces that are whistling than auditory trills to visual faces that are trilling. One speculative reason for this asymmetry could be that infants produce bilabial trills, but do not yet produce whistles (Mugitani et al., 2008).
DEVELOPMENTAL CHANGE AS SPEECH PRODUCTION BECOMES MORE VARIED AND SOPHISTICATED
Of course, the development of perceptual and motor systems continues well beyond infancy. For example, previous reports show that children (up to the age of 10) increasingly weight visual speech information more heavily in cases of sensory conflict or ambiguity (McGurk and MacDonald, 1976; Massaro, 1984; Massaro et al., 1986; Wightman et al., 2006; van Linden and Vroomen, 2008; Barutchu et al., 2010; Ross et al., 2011). It seems likely that multiple mechanisms drive this developmental change: For example, Sekiyama and Burnham (2008) find cross-cultural differences, which are likely unrelated to differences in motor ability. Nevertheless, there is also some correlational evidence supporting a sensorimotor pathway: children who have greater trouble articulating consonants show less sensitivity to visual speech information (Desjardins et al., 1997), as is also the case for children with broader language deficits (Bergeson et al., 2005; Dodd et al., 2008).
Other studies provide further evidence for multiple pathways emerging in childhood that are reminiscent of adult models (e.g., Skipper et al., 2007b; Okada and Hickok, 2009; Hickok et al., 2011). For example, while visual speech is more heavily weighted throughout childhood, non-speech audiovisual processing is relatively stable (Tremblay et al., 2007; although see Hillock et al., 2011). Such divergent trajectories suggest that two kinds of audiovisual binding mechanisms may be dissociated. Neurophysiological evidence for that dissociation comes from a study revealing two separable electrophysiological measures: amplitude versus latency of the commonly evoked N1/P2 complex to audiovisual speech (Knowland et al., 2014). Critically, only amplitude changes in development, while latency remains stable. Additional evidence comes from functional imaging studies, which suggests two networks related to audiovisual binding of speech stimuli: One network is centered around primary auditory areas, while a second network involves inferior frontal areas (Dick et al., 2010; Nath et al., 2011). Developmental change in audiovisual speech processing correlates with changes in connectivity between these networks (Dick et al., 2010).
Together these findings are highly suggestive of at least two distinct pathways in the brain that support audiovisual speech processing. A preliminary conjecture is that multiple pathways might be distinguished based on their developmental characteristics (stable, or increasing), their selectivity (to speech, or to may kinds of signals), and their mechanisms (depending on intersensory redundancy, or depending on an internal articulatory model). Critical lines of future research will need to investigate these hypotheses.
CONCLUSION
Speech perception is one of the most deeply explored aspects of language development. However, as this review highlights, several aspects of this phenomenon remain mysterious: in particular, the relation between speech perception and production. Here, we examine possible sensorimotor influences in multisensory speech processing, highlighting three areas for future research that will bridge between debates in the adult literature and emerging work in development.
First, we suggest that future research must examine the link between imitation and audiovisual speech perception at birth, and explore interactions among vision, audition, and the motor system. Second, we highlight two potential pathways involved in audiovisual speech perception in older infants, one of which may depend on sensorimotor information. Third, we illustrate the need to elucidate the behavioral and at the neural characteristics of these pathways in children, as speech production becomes more sophisticated.
Statements
Acknowledgments
The authors thank Rana Esseily and Maya Gratier for inviting them in this special issue.
Conflict of interest
The Guest Associate Editor Maya Gratier declares that, despite being affiliated to the same institution as author Bahia Guellaï, the review process was handled objectively and no conflict of interest exists. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
1
AldridgeM. A.BragaE. S.WaltonG. E.BowerT. G. R. (1999). The intermodal representation of speech in newborns.Dev. Sci.242–46. 10.1111/1467-7687.00052
2
Anisfeld. (1996). Only tongue protrusion modeling is matched by neonates.Dev. Rev.16149–161. 10.1006/drev.1996.0006
3
BahrickL. E.LickliterR.CastellanosI. (2013). The development of face perception in infancy: intersensory interference and unimodal visual facilitation.Dev. Psychol.491919–1930. 10.1037/a0031238
4
BahrickL. E.LickliterR.FlomR. (2004). Intersensory redundancy guides the development of selective attention, perception, and cognition in infancy.Curr. Dir. Psychol. Sci.1399–102. 10.1111/j.0963-7214.2004.00283.x
5
BarutchuA.DanaherJ.CrewtherS. G.Innes-BrownH.ShivdasaniM. N.PaoliniA. G. (2010). Audiovisual integration in noise by children and adults.J. Exp. Child Psychol.10538–50. 10.1016/j.jecp.2009.08.005
6
BergesonT. R.PisoniD. B.DavisR. A. O. (2005). Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants.Ear. Hear.26149–164. 10.1097/00003446-200504000-00004
7
BestC. T. (1995). “A direct realist view of cross-language speech perception,” inSpeech Perception and Linguistic Experience: Issues in Cross-language Speech Researched.StrangeW. (Timonium, MD:York Press) 171–204.
8
BristowD.Dehaene-LambertzG.MattoutJ.SoaresC.GligaT.BailletS.et al (2009). Hearing faces: how the infant brain matches the face it sees with the speech it hears.J. Cogn. Neurosci.21905–921. 10.1162/jocn.2009.21076
9
BurnhamD.DoddB. (2004). Auditory-visual speech integration by prelinguistic infants: perception of an emergent consonant in the McGurk effect.Dev. Psychobiol.45204–220. 10.1002/dev.20032
10
CalvertG. A.BullmoreE. T.BrammerM. J.CampbellR. (1997). Activation of auditory cortex during silent lipreading.Science276593–596. 10.1126/science.276.5312.593
11
CalvertG. A.CampbellR. (2003). Reading speech from still and moving faces: the neural substrates of visible speech.J. Cogn. Neurosci.1557–70. 10.1162/089892903321107828
12
ChenX.StrianoT.RakoczyH. (2004). Auditory-oral matching behavior in newborns.Dev. Sci.742–47. 10.1111/j.1467-7687.2004.00321.x
13
ChevilletM. A.JiangX.RauscheckerJ. P.RiesenhuberM. (2013). Automatic phoneme category selectivity in the dorsal auditory stream.J. Neurosci.335208–5215. 10.1523/JNEUROSCI.1870-12.2013
14
ColomboJ.BundyR. S. (1981). A method for the measurement of infant auditory selectivity.Infant Behav. Dev.4219–223. 10.1016/S0163-6383(81)80025-2
15
CoulonM.HemimouC.StreriA. (2013). Effects of seeing and hearing vowels on neonatal facial imitation.Infancy18782–796. 10.1111/infa.12001
16
De Boysson-BardiesB.HalléP. A.SagartL.DurandC. (1989). A crosslinguistic investigation of vowel formants in babbling.J. Child Lang.161–17. 10.1017/S0305000900013404
17
de Boysson-BardiesB.VihmanM. M. (1991). Adaptation to language: evidence from babbling and first words in four languages.Language67297–319. 10.1353/lan.1991.0045
18
de HeviaM. D.IzardV.CoubartA.SpelkeE. S.StreriA. (2014). Representations of space, time, and number in neonates.Proc. Natl. Acad. Sci. U.S.A.1114809–4813. 10.1073/pnas.1323628111
19
DePaolisR. A.VihmanM. M.Keren-PortnoyT. (2011). Do production patterns influence the processing of speech in prelinguistic infants?Infant Behav. Dev.34590–601. 10.1016/j.infbeh.2011.06.005
20
DePaolisR. A.VihmanM. M.NakaiS. (2013). The influence of babbling patterns on the processing of speech.Infant Behav. Dev.36642–649. 10.1016/j.infbeh.2013.06.007
21
DesjardinsR. N.RogersJ.WerkerJ. F. (1997). An exploration of why preschoolers perform differently than do adults in audiovisual speech perception tasks.J. Exp. Child Psychol.6685–110. 10.1006/jecp.1997.2379
22
DesjardinsR. N.WerkerJ. F. (2004). Is the integration of heard and seen speech mandatory for infants?Dev. Psychobiol.45187–203. 10.1002/dev.20033
23
DickA. S.SolodkinA.SmallS. L. (2010). Neural development of networks for audiovisual speech comprehension.Brain Lang.114101–14. 10.1016/j.bandl.2009.08.005
24
DoddB.McIntoshB.ErdenerD.BurnhamD. (2008). Perception of the auditory-visual illusion in speech perception by children with phonological disorders.Clin. Linguist. Phon.2269–82. 10.1080/02699200701660100
25
FlomR.BahrickL. E. (2007). The development of infant discrimination of affect in multimodal and unimodal stimulation: the role of intersensory redundancy.Dev. Psychol.43238–252. 10.1037/0012-1649.43.1.238
26
FowlerC. A. (1986). An event approach to the study of speech perception from a direct-realist perspective.J. Phon.143–28.
27
FowlerC. A. (1996). Listeners do hear sounds, not tongues.J. Acoust. Soc. Am.991730–1741. 10.1121/1.415237
28
FowlerC. A.DekleD. J. (1991). Listening with eye and hand: cross-modal contributions to speech perception.J. Exp. Psychol. Hum. Percept. Perform.17816–823. 10.1037/0096-1523.17.3.816
29
FriedericiA. D. (2012). The cortical language circuit: from auditory perception to sentence comprehension.Trends Cogn. Sci.16262–268. 10.1016/j.tics.2012.04.001
30
GalantucciB.FowlerC. A.TurveyM. T. (2006). The motor theory of speech perception reviewed.Psychon. Bull. Rev.13361–377. 10.3758/BF03193857
31
GervainJ.MehlerJ. (2010). Speech perception and language acquisition in the first year of life.Annu. Rev. Psychol.61191–218. 10.1146/annurev.psych.093008.100408
32
GickB.DerrickD. (2009). Aero-tactile integration in speech perception.Nature462502–504. 10.1038/nature08572
33
GickB.JóhannsdóttirK. M.GibraielD.MühlbauerJ. (2008). Tactile enhancement of auditory and visual speech perception in untrained perceivers.J. Acoust. Soc. Am.123EL72–EL76. 10.1121/1.2884349
34
GoldsteinM. H.SchwadeJ. A. (2008). Social feedback to infants’ babbling facilitates rapid phonological learning.Psychol. Sci.19515–523. 10.1111/j.1467-9280.2008.02117.x
35
Gros-LouisJ.WestM. J.KingA. P. (2014). Maternal responsiveness and the development of directed vocalizing in social interactions.Infancy19385–408. 10.1111/infa.12054
36
GuellaïB.CoulonM.StreriA. (2011). The role of motion and speech in face recognition at birth.Vis. Cogn.191212–1233. 10.1080/13506285.2011.620578
37
GuellaïB.StreriA. (2011). Cues for early social skills: direct gaze modulates newborns’ recognition of talking faces.PLoS ONE6:e18610. 10.1371/journal.pone.0018610
38
HickokG. (2014). The architecture of speech production and the role of the phoneme in speech processing.Lang. Cogn. Neurosci.292–20. 10.1080/01690965.2013.834370
39
HickokG.HoudeJ.RongF. (2011). Sensorimotor integration in speech processing: computational basis and neural organization.Neuron69407–422. 10.1016/j.neuron.2011.01.019
40
HillockA. R.PowersM. R.WallaceM. T. (2011). Binding of sights and sounds: age-related changes in multisensory temporal processing.Neuropsychologia49461–467. 10.1016/j.neuropsychologia.2010.11.041
41
HowardI. S.MessumP. (2011). Modeling the development of pronunciation in infant speech acquisition.Motor Control1585–117.
42
HydeD. C.JonesB. L.FlomR.PorterC. L. (2011). Neural signatures of face-voice synchrony in 5-month-old human infants.Dev. Psychobiol.53359–370. 10.1002/dev.20525
43
ImadaT.ZhangY.CheourM.TauluS.AhonenA.KuhlP. K. (2006). Infant speech perception activates Broca’s area: a developmental magnetoencephalography study.Neuroreport17957–962. 10.1097/01.wnr.0000223387.51704.89
44
ItoT.TiedeM.OstryD. J. (2009). Somatosensory function in speech perception.Proc. Natl. Acad. Sci. U.S.A.1061245–1248. 10.1073/pnas.0810063106
45
JonesS. S. (2007). Imitation in infancy: the development of mimicry.Psychol. Sci.18593–599. 10.1111/j.1467-9280.2007.01945.x
46
KentR. D.VorperianH. K. (2007). “In the mouths of babes: anatomic, motor, and sensory foundations of speech development in children,” inLanguage Disorders from a Developmental Perspective,ed.PaulR. (Mahwah, NJ:Lawrence Erlbaum) 55–81.
47
KnowlandV. C. P.MercureE.Karmiloff-SmithA.DickF.ThomasM. S. C. (2014). Audio-visual speech perception: a developmental ERP investigation.Dev. Sci.17110–124. 10.1111/desc.12098
48
KubicekC.Hillairet de BoisferonA.DupierrixE.PascalisO.LoevenbruckH.GervainJ.et al (2014). Cross-modal matching of audio-visual german and French fluent speech in infancy.PLoS ONE9:e89275. 10.1371/journal.pone.0089275
49
KuhlP. K.MeltzoffA. N. (1982). The bimodal perception of speech in infancy.Science2181138–1141. 10.1126/science.7146899
50
KuhlP. K.MeltzoffA. N. (1984). The intermodal representation of speech in infants.Infant Behav. Dev.7361–381. 10.1016/S0163-6383(84)80050-8
51
KuhlP. K.MeltzoffA. N. (1988). “Speech as an intermodal object of perception,” inPerceptual Development in Infancy. The Minnesota Symposia on Child CsychologyVol. 20ed.YonasA. (Hilldale, NJ: Lawrence Erlbaum)235–266.
52
KuhlP. K.MeltzoffA. N. (1996). Infant vocalizations in response to speech: vocal imitation and developmental change.J. Acoust. Soc. Am.1002425–2438. 10.1121/1.417951
53
KuhlP. K.WilliamsK. A.MeltzoffA. N. (1991). Cross-modal speech perception in adults and infants using nonspeech auditory stimuli.J. Exp. Psychol. Hum. Percept. Perform.17829–840. 10.1037/0096-1523.17.3.829
54
KushnerenkoE.TeinonenT.VoleinA.CsibraG. (2008). Electrophysiological evidence of illusory audiovisual speech percept in human infants.Proc. Natl. Acad. Sci. U.S.A.10511442–11445. 10.1073/pnas.0804275105
55
LeeS.DavisB.MacNeilageP. (2010). Universal production patterns and ambient language influences in babbling: a cross-linguistic study of Korean- and English-learning infants.J. Child Lang.3726. 10.1017/S0305000909009532
56
LegersteeM. (1990). Infant use of multimodal information to imitate speech sounds.Infant Behav. Dev.13343–354. 10.1016/0163-6383(90)90039-B
57
LewkowiczD. J. (1996). Perception of auditory–visual temporal synchrony in human infants.J. Exp. Psychol. Hum. Percept. Perform.221094–1106. 10.1037/0096-1523.22.5.1094
58
LewkowiczD. J. (2010). Infant perception of audio-visual speech synchrony.Dev. Psychol.4666–77. 10.1037/a0015579
59
LewkowiczD. J.GhazanfarA. A. (2006). The decline of cross-species intersensory perception in human infants.Proc. Natl. Acad. Sci. U.S.A.1036771–6774. 10.1073/pnas.0602027103
60
LewkowiczD. J.LeoI.SimionF. (2010). Intersensory perception at birth: newborns match nonhuman primate faces and voices.Infancy1546–60. 10.1111/j.1532-7078.2009.00005.x
61
LewkowiczD. J.PonsF. (2013). Recognition of amodal language identity emerges in infancy.Int. J. Behav. Dev.3790–94. 10.1177/0165025412467582
62
LibermanA. M.CooperF. S.ShankweilerD. P.Studdert-KennedyM. (1967). Perception of the speech code.Psychol. Rev.74431–461. 10.1037/h0020279
63
LibermanA. M.MattinglyI. G. (1985). The motor theory of speech perception revised.Cognition211–36. 10.1016/0010-0277(85)90021-6
64
LickliterR.BahrickL. E.HoneycuttH. (2002). Intersensory redundancy facilitates prenatal perceptual learning in bobwhite quail (Colinus virginianus) embryos.Dev. Psychol.3815–23. 10.1037//0012-1649.38.1.15
65
LottoA. J.HickokG.HoltL. L. (2009). Reflections on mirror neurons and speech perception.Trends Cogn. Sci.13110–114. 10.1016/j.tics.2008.11.008
66
MacKainK.Studdert-KennedyM.SpiekerS.SternD. (1983). Infant intermodal speech perception is a left-hemisphere function.Science2191347–1349. 10.1126/science.6828865
67
MacNeilageP. F.DavisB. L. (1993). “Motor explanations of babbling and early speech patterns,” inDevelopmental Neurocognition: Speech and Face Processing in the First Year of LifeVol. 69edsde Boysson-BardiesB.de SchonenS.JusczykP. W.MacNeilageP. F.MortonJ. (Norwell, MA:Kluwer)341–352.
68
MajoranoM.VihmanM. M.DePaolisR. A. (2014). The relationship between infants’ production experience and their processing of speech.Lang. Learn. Dev.10179–204. 10.1080/15475441.2013.829740
69
MassaroD. W. (1984). Children ’s perception of visual and auditory speech children’s perception and auditory speech.Child Dev.551777–1788. 10.2307/1129925
70
MassaroD. W.ThompsonL. A.BarronB.LarenE. (1986). Developmental changes in visual and auditory contributions to speech perception.J. Exp. Child Psychol.4193–113. 10.1016/0022-0965(86)90053-6
71
MaurerD.MondlochC. J. (2004). “Neonatal synesthesia: a reevaluation,” inSynesthesia: Perspectives from Cognitive NeuroscienceedsRobertsonL. C.SagivN. (Oxford:Oxford University Press) 193–213.
72
McCuneL.VihmanM. M. (2001). Early phonetic and lexical development: a productivity approach.J. Speech Lang. Hear Res.44670–684. 10.1044/1092-4388(2001/054)
73
McGettiganC.AgnewZ. K.ScottS. K. (2010). Are articulatory commands automatically and involuntarily activated during speech perception?Proc. Natl. Acad. Sci. U.S.A.107:E42. 10.1073/pnas.1000186107
74
McGurkH.MacDonaldJ. (1976). Hearing lips and seeing voices.Nature264746–748. 10.1038/264746a0
75
MeltzoffA. N.BortonR. W. (1979). Intermodal matching by human neonates.Nature282403–404. 10.1038/282403a0
76
MeltzoffA. N.MooreM. K. (1977). Imitation of facial and manual gestures by human neonates.Science19875–78. 10.1126/science.198.4312.75
77
MeltzoffA. N.MooreM. K. (1989). Imitation in newborn infants: exploring the range of gestures imitated and the underlying mechanisms.Dev. Psychol.25954–962. 10.1037/0012-1649.25.6.954
78
MénardL.SchwartzJ.-L.BoëL.-J.AubinJ. (2007). Articulatory–acoustic relationships during vocal tract growth for French vowels: analysis of real data and simulations with an articulatory model.J. Phon.351–19. 10.1016/j.wocn.2006.01.003
79
MochidaT.KimuraT.HiroyaS.KitagawaN.GomiH.KondoT. (2013). Speech misperception: speaking and seeing interfere differently with hearing.PLoS ONE8:e68619. 10.1371/journal.pone.0068619
80
MöttönenR.DuttonR.WatkinsK. E. (2013). Auditory-motor processing of speech sounds.Cereb. Cortex231190–1197. 10.1093/cercor/bhs110
81
MöttönenR.van de VenG. M.WatkinsK. E. (2014). Attention fine-tunes auditory-motor processing of speech sounds.J. Neurosci.344064–4069. 10.1523/JNEUROSCI.2214-13.2014
82
MugitaniR.KobayashiT.HirakiK. (2008). Audiovisual matching of lips and non-canonical sounds in 8-month-old infants.Infant Behav. Dev.31307–310. 10.1016/j.infbeh.2007.12.002
83
NathA. R.FavaE. E.BeauchampM. S. (2011). Neural correlates of interindividual differences in children’s audiovisual speech perception.J. Neurosci.3113963–13971. 10.1523/JNEUROSCI.2605-11.2011
84
NavarraJ.YeungH. H.WerkerJ. F.Soto-FaracoS. (2012). “Multisensory interactions in speech perception,” inThe New Handbook of Multisensory Processinged.SteinB. E. (Cambridge, MA:MIT Press)435–452.
85
OjanenV.MöttönenR.PekkolaJ.JääskeläinenI. P.JoensuuR.AuttiT.et al (2005). Processing of audiovisual speech in Broca’s area.Neuroimage,25333–338. 10.1016/j.neuroimage.2004.12.001
86
OkadaK.HickokG. (2009). Two cortical mechanisms support the integration of visual and auditory speech: a hypothesis and preliminary data.Neurosci. Lett.452219–223. 10.1016/j.neulet.2009.01.060
87
OllerD. K. (1980). “The emergence of the sounds of speech in infancy,” inChild Phonology, Vol. 1, ProductionedsYeni-KomishanG. H.KavanaghJ. F.FergusonC. A. (New York:Academic Press) 92–112.
88
PattersonM. L.WerkerJ. F. (1999). Matching phonetic information in lips and voice is robust in 4.5-month-old infants.Infant Behav. Dev.22237–247. 10.1016/S0163-6383(99)00003-X
89
PattersonM. L.WerkerJ. F. (2002). Infants’ ability to match dynamic phonetic and gender information in the face and voice.J. Exp. Child Psychol.8193–115. 10.1006/jecp.2001.2644
90
PattersonM. L.WerkerJ. F. (2003). Two-month-old infants match phonetic information in lips and voice.Dev. Sci.6191–196. 10.1111/1467-7687.00271
91
PekkolaJ.OjanenV.AuttiT.JääskeläinenI. P.MöttönenR.TarkiainenA.et al (2005). Primary auditory cortex activation by visual speech: an fMRI study at 3 T.Neuroreport16125. 10.1097/00001756-200502080-00010
92
PonsF.LewkowiczD. J. (2014). Infant perception of audio-visual speech synchrony in familiar and unfamiliar fluent speech.Acta Psychol.149142–147. 10.1016/j.actpsy.2013.12.013
93
PonsF.LewkowiczD. J.Soto-FaracoS.Sebastián-GallésN. (2009). Narrowing of intersensory speech perception in infancy.Proc. Natl. Acad. Sci. U.S.A.10610598–10602. 10.1073/pnas.0904134106
94
PulvermüllerF.FadigaL. (2010). Active perception: sensorimotor circuits as a cortical basis for language.Nat. Rev. Neurosci.11351–360. 10.1038/nrn2811
95
RosenblumL. D.SchmucklerM. A.JohnsonJ. A. (1997). The McGurk effect in infants.Percept. Psychophys.59347–357. 10.3758/BF03211902
96
RossL. A.MolholmS.BlancoD.Gomez-RamirezM.Saint-AmourD.FoxeJ. J. (2011). The development of multisensory speech perception continues into the late childhood years.Eur. J. Neurosci.332329–2337. 10.1111/j.1460-9568.2011.07685.x
97
RuzzaB.RoccaF.BoeroD. L.LentiC. (2006). Investigating the musical qualities of early infant sounds.Ann. N. Y. Acad. Sci.999527–529. 10.1196/annals.1284.066
98
SamsM.MöttönenR.SihvonenT. (2005). Seeing and hearing others and oneself talk.Cogn. Brain Res.23429–435. 10.1016/j.cogbrainres.2004.11.006
99
SantiA.ServosP.Vatikiotis-BatesonE.KuratateT.MunhallK. (2003). Perceiving biological motion: dissociating visible speech from walking.J. Cogn. Neurosci.15800–809. 10.1162/089892903322370726
100
SatoM.BuccinoG.GentilucciM.CattaneoL. (2010). On the tip of the tongue: modulation of the primary motor cortex during audiovisual speech perception.Speech Commun.52533–541. 10.1016/j.specom.2009.12.004
101
SatoM.TroilleE.MénardL.CathiardM.-A.GraccoV. (2013). Silent articulation modulates auditory and audiovisual speech perception.Exp. Brain Res.227275–288. 10.1007/s00221-013-3510-8
102
SchwartzJ.-L.BasiratA.MénardL.SatoM. (2010). The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception.J. Neurolinguistics25336–354. 10.1016/j.jneuroling.2009.12.004
103
ScottM.YeungH. H.GickB. W.WerkerJ. F. (2013). Inner speech captures the perception of external speech.J. Acoust. Soc. Am.133EL286–EL292. 10.1121/1.4794932
104
ScottS. K.McGettiganC.EisnerF. (2009). A little more conversation, a little less action – candidate roles for the motor cortex in speech perception.Nat. Rev. Neurosci.10295–302. 10.1038/nrn2603
105
SekiyamaK.BurnhamD. (2008). Impact of language on development of auditory-visual speech perception.Dev. Sci.11306–320. 10.1111/j.1467-7687.2008.00677.x
106
ShultzS.VouloumanosA. (2010). Three-month-olds prefer speech to other naturally occurring signals.Lang. Learn. Dev.6241–257. 10.1080/15475440903507830
107
SkipperJ. I.Goldin-MeadowS.NusbaumH. C.SmallS. L. (2007a). Speech-associated gestures, Broca’s area, and the human mirror system.Brain Lang.101260–277. 10.1016/j.bandl.2007.02.008
108
SkipperJ. I.van WassenhoveV.NusbaumH. C.SmallS. L. (2007b). Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception.Cereb. Cortex172387–2399. 10.1093/cercor/bhl147
109
SlaterA.BrownE.HayesR.QuinnP. C. (1999). Intermodal perception at birth: intersensory redundancy guides newborn infants’ learning of arbitrary auditory – visual pairings.Dev. Sci.2333. 10.1111/1467-7687.00079
110
StarkR. E. (1980). “Stages of speech development in the first year of life,” inChild Phonology, Vol. 1, ProductionedsYeni-KomishanG. H.KavanaghJ. F.FergusonC. A. (New York:Academic Press) 73–92.
111
SteeveR. W. (2010). Babbling and chewing: jaw kinematics from 8 to 22 months.J. Phon.38445–458. 10.1016/j.wocn.2010.05.001
112
Stoel-GammonC. (2011). Relationships between lexical and phonological development in young children.J. Child Lang.381–34. 10.1017/S0305000910000425
113
StreriA. (1993). Seeing, Reaching, Touching: The Relations between Vision and Touch in InfancyPownallT.KingerleeS.(trans.) (Cambridge, MA:MIT Press).
114
StreriA. (2012). “Crossmodal interactions in the human newborn: news answers to Molyneux’s question,” inMultisensory DevelopmentedsBremnerA. J.LewkowiczD. J.SpenceC. (Oxford:Oxford University Press) 88–112.
115
SumbyW. H.PollackI. (1954). Visual contribution to speech intelligibility in noise.J. Acoust. Soc. Am.26212–215. 10.1121/1.1907309
116
SundaraM.Kumar NamasivayamA.ChenR. (2001). Observation–execution matching system for speech: a magnetic stimulation study.Neuroreport121341–1344. 10.1097/00001756-200105250-00010
117
SwaminathanS.MacSweeneyM.BoylesR.WatersD.WatkinsK. E.MöttönenR. (2013). Motor excitability during visual perception of known and unknown spoken languages.Brain Lang.1261–7. 10.1016/j.bandl.2013.03.002
118
TomalskiP.RibeiroH.BallieuxH.AxelssonE. L.MurphyE.MooreD. G.et al (2012). Exploring early developmental changes in face scanning patterns during the perception of audiovisual mismatch of speech cues.Eur. J. Dev. Psychol.11–14. 10.1080/17405629.2012.728076
119
TreilleA.CordeboeufC.VilainC.SatoM. (2014). Haptic and visual information speed up the neural processing of auditory speech in live dyadic interactions.Neuropsychologia5771–77. 10.1016/j.neuropsychologia.2014.02.004
120
TremblayC.ChampouxF.VossP.BaconB. A.LeporeF.ThéoretH. (2007). Speech and non-speech audio-visual illusions: a developmental study.PLoS ONE2:e742. 10.1371/journal.pone.0000742
121
van LindenS.VroomenJ. (2008). Audiovisual speech recalibration in children.J. Child Lang.35809–822. 10.1017/S0305000908008817
122
van WassenhoveV.GrantK. W.PoeppelD. (2005). Visual speech speeds up the neural processing of auditory speech.Proc. Natl. Acad. Sci. U.S.A.1021181–1186. 10.1073/pnas.0408949102
123
VihmanM. M. (1991). “Ontogeny of phonetic gestures: speech production,” inModularity and the Motor Theory of Speech PerceptionedsMattinglyI. G.Studdert-KennedyM. (Hillsdale, NJ:Lawrence Erlbaum) 69–84.
124
VouloumanosA.HauserM. D.WerkerJ. F.MartinA. (2010). The tuning of human neonates’ preference for speech.Child Dev.81517–527. 10.1111/j.1467-8624.2009.01412.x
125
VouloumanosA.WerkerJ. F. (2004). Tuned to the signal: the privileged status of speech for young infants.Dev. Sci.7270–276. 10.1111/j.1467-7687.2004.00345.x
126
VouloumanosA.WerkerJ. F. (2007). Listening to language at birth: evidence for a bias for speech in neonates.Dev. Sci.10159–164. 10.1111/j.1467-7687.2007.00549.x
127
WerkerJ. F.YeungH. H. (2005). Infant speech perception bootstraps word learning.Trends Cogn. Sci.9519–527. 10.1016/j.tics.2005.09.003
128
WerkerJ. F.YeungH. H.YoshidaK. A. (2012). How do infants become experts at native-speech perception?Curr. Dir. Psychol. Sci.21221–226. 10.1177/0963721412449459
129
WhalenD. H.LevittA. G.GoldsteinL. M. (2007). VOT in the babbling of French- and English-learning infants.J. Phon.35341–352. 10.1016/j.wocn.2006.10.001
130
WightmanF.KistlerD.BrungartD. (2006). Informational masking of speech in children: auditory-visual integration.J. Acoust. Soc. Am.1193940–3949. 10.1121/1.2195121
131
YeungH. H.WerkerJ. F. (2013). Lip movements affect infants’ audiovisual speech perception.Psychol. Sci.24603–612. 10.1177/0956797612458802
132
YuenI.DavisM. H.BrysbaertM.RastleK. (2010). Activation of articulatory information in speech perception.Proc. Natl. Acad. Sci. U.S.A.107592–597. 10.1073/pnas.0904774107
Summary
Keywords
speech perception, speech production, sensorimotor systems, infants, children
Citation
Guellaï B, Streri A and Yeung HH (2014) The development of sensorimotor influences in the audiovisual speech domain: some critical questions. Front. Psychol. 5:812. doi: 10.3389/fpsyg.2014.00812
Received
27 May 2014
Accepted
09 July 2014
Published
06 August 2014
Volume
5 - 2014
Edited by
Maya Gratier, Université Paris Ouest Nanterre La Défense, France
Reviewed by
Caroline Floccia, University of Plymouth, UK; Robin Panneton, Virginia Tech, USA
Copyright
© 2014 Guellaï, Streri and Yeung.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Bahia Guellaï, Laboratoire Ethologie, Cognition, Développement, Université Paris Ouest Nanterre La Défense, 200, Avenue de la République, 92000 Nanterre, France e-mail: bahia.guellai@gmail.com; H. Henny Yeung, Laboratoire Psychologie de la Perception, 45 rue des Saints-Pères, 75006 Paris, France e-mail: henny.yeung@parisdescartes.fr
This article was submitted to Developmental Psychology, a section of the journal Frontiers in Psychology.
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.