Impact Factor 2.323

The 1st most cited journal in Multidisciplinary Psychology

Mini Review ARTICLE

Front. Psychol., 06 August 2014 | https://doi.org/10.3389/fpsyg.2014.00812

The development of sensorimotor influences in the audiovisual speech domain: some critical questions

  • 1Laboratoire Ethologie, Cognition, Développement, Université Paris Ouest Nanterre La Défense, Nanterre, France
  • 2CNRS, Laboratoire Psychologie de la Perception, UMR 8242, Paris, France
  • 3Université Paris Descartes, Paris Sorbonne Cité, Paris, France

Speech researchers have long been interested in how auditory and visual speech signals are integrated, and the recent work has revived interest in the role of speech production with respect to this process. Here, we discuss these issues from a developmental perspective. Because speech perception abilities typically outstrip speech production abilities in infancy and childhood, it is unclear how speech-like movements could influence audiovisual speech perception in development. While work on this question is still in its preliminary stages, there is nevertheless increasing evidence that sensorimotor processes (defined here as any motor or proprioceptive process related to orofacial movements) affect developmental audiovisual speech processing. We suggest three areas on which to focus in future research: (i) the relation between audiovisual speech perception and sensorimotor processes at birth, (ii) the pathways through which sensorimotor processes interact with audiovisual speech processing in infancy, and (iii) developmental change in sensorimotor pathways as speech production emerges in childhood.

Introduction

A unique property of speech—compared to other auditory signals—is that it is multisensory. Speech involves not only auditory, but also visual, motor, as well as proprioceptive information, since we produce speech by moving our articulators (i.e., the jaw, tongue, lips, etc.). Accordingly, many speech researchers postulated that articulatory gestures, rather than acoustic cues, were the primary objects of speech perception (Liberman et al., 1967; Liberman and Mattingly, 1985; Fowler, 1986, 1996; Best, 1995; Galantucci et al., 2006), and in recent years, vigorous debates about these ideas have continued (Scott et al., 2009; Pulvermüller and Fadiga, 2010; Schwartz et al., 2010; Hickok, 2014). Currently, proposals suggesting that articulatory input has an important role in auditory-only speech processing (Yuen et al., 2010; Möttönen et al., 2013, 2014) have been viewed by some as highly controversial (Lotto et al., 2009; McGettigan et al., 2010; Chevillet et al., 2013).

Somewhat less controversial is the discussion of speech production in the context of multisensory speech processing (Ojanen et al., 2005; Skipper et al., 2007a; Okada and Hickok, 2009; Treille et al., 2014). Just as visual influences on auditory speech processing have long been reported (e.g., Sumby and Pollack, 1954; see Navarra et al., 2012 for review), recent reports have also shown similar effects from articulatory information. For example, subjects’ own silent articulations (Sams et al., 2005; Sato et al., 2013; Scott et al., 2013) influence auditory perception in similar ways as seeing visual speech (although see Mochida et al., 2013). Moreover, receiving haptic or tactile input related to another person’s articulatory movements can also influence auditory speech processing (Fowler and Dekle, 1991; Gick et al., 2008; Gick and Derrick, 2009; Ito et al., 2009; Treille et al., 2014). Neuroimaging studies converge with these behavioral findings: For example, when visual-only or audiovisual speech are presented to subjects, activation is seen in primary auditory areas of the brain, such as the superior temporal sulcus (STS), and in areas traditionally associated with speech production, such as Broca’s area (Calvert et al., 1997; Calvert and Campbell, 2003; Ojanen et al., 2005; Pekkola et al., 2005). TMS studies have now shown that the perception of visual and audiovisual speech is linked to primary motor cortex (Sundara et al., 2001; Sato et al., 2010), and from this accumulated evidence, there is emerging consensus that visual speech processing is closely linked to internal models of the vocal tract (Santi et al., 2003; van Wassenhove et al., 2005; Skipper et al., 2007a, b; Okada and Hickok, 2009; Dick et al., 2010; Swaminathan et al., 2013).

Here, we present a discussion of how developmental work may contribute to this broader literature. Infancy and childhood are particularly interesting because speech perception versus speech production capabilities are largely asymmetric during this period (see for reviews Oller, 1980; Stark, 1980; Werker and Yeung, 2005; Gervain and Mehler, 2010; Stoel-Gammon, 2011; Werker et al., 2012). Nevertheless, infants sometimes show neurophysiological activation that belies their apparent deficits in production. For example, areas corresponding to Broca’s area are activated in response to auditory speech even in 6 month olds (Imada et al., 2006), and while this area is also activated in a variety of adult tasks (including ones not strictly about production, see Friederici, 2012), these infant data could potentially be interpreted as reflecting rudimentary perception-production loops.

In light of infants’ limitations in the speech production domain, we use sensorimotor as a general term that broadly encompasses motor and proprioceptive information related to both speech-like and non-speech orofacial gestures. We focus on three issues that we see as being particularly pressing for future research: (i) the relation between audiovisual speech perception and sensorimotor processes at birth, (ii) the pathways through which sensorimotor processes interact with audiovisual speech processing in infancy, and (iii) developmental change in sensorimotor pathways as speech production emerges in infancy.

The Relation Between Audiovisual Speech Perception and Sensorimotor Processes at Birth

Infants receive filtered auditory input in the womb but necessarily do not experience audiovisual speech until birth. However, as soon as it can be measured, at least some basic aspects of audiovisual perception are already present. For example, newborns map abstract sensory and magnitude information across vision and audition (Meltzoff and Borton, 1979; Streri, 1993; de Hevia et al., 2014), and it also appears that newborns are particularly sensitive to audiovisual temporal synchrony (Slater et al., 1999). The precise origin of these interactions between vision and audition remain under debate (e.g., Bahrick et al., 2004; Maurer and Mondloch, 2004; Streri, 2012), but it is clear that intersensory correspondences are powerful in that they can influence attention and learning, as shown in classic studies with precocial birds (e.g., Lickliter et al., 2002). In human newborns, temporal synchrony between audition and vision plays an important role in matching monkey faces and voices (Lewkowicz et al., 2010), and newborns’ can also match human faces and voices under some circumstances (Aldridge et al., 1999), but further research showing the mechanisms driving this matching is needed. Here we define some critical issues with regard to the role of sensorimotor processes in audiovisual processing of speech- and speech-like stimuli at birth.

It is well established that newborns imitate faces at birth, suggesting early integration of vision and proprioception (e.g., Meltzoff and Moore, 1977, 1989), although it is important to note that this has been questioned on both empirical (Anisfeld, 1996) and interpretational grounds (Jones, 2007). Still, studies using speech stimuli converge with these results. For example, newborns produce more mouth openings when listening to /a/ versus /m/ sounds, and they produce more mouth closing when listening to /m/ versus /a/ sounds (Chen et al., 2004). However, future work will need to move beyond simple correspondences between sight, sound, and movement, and ask instead how such information interacts. For example, facial imitation at birth is more robust in the presence of congruent audiovisual speech: Infants produce more mouth-opening when presented with a face saying /a/, than with the face alone, or that face dubbed with an /i/ audio track (Coulon et al., 2013). A speculative interpretation is that congruent audiovisual speech constitutes more robust input to an internal model of the vocal tract, increasing the production of relevant mouth shapes.

Another question concerns specificity: can imitation also be elicited from auditory or visual models that are not identifiably human, and if so, what constraints on this system exist? For example, previous work has suggested preferential processing of speech stimuli over white noise (Colombo and Bundy, 1981) and synthetic analogs of speech (Vouloumanos and Werker, 2004, 2007). However, in a striking set of studies, a preference for human over monkey vocalizations was not found at birth, but was found at 3 months of age (Shultz and Vouloumanos, 2010; Vouloumanos et al., 2010). Together, these data suggest evolutionary constraints on auditory preferences, and in turn, raise questions about the imitation studies above. Will infants produce more facial gestures in response to human versus non-human (or non-mammalian) auditory, visual, and audiovisual models? What attentional and/or evolutionary factors might drive such effects?

A final future research question must also examine the functioning of sensorimotor and perceptual systems in a more precise manner, and in more naturalistic situations. For example, recent work suggests that newborns are highly sensitive to both rigid (i.e., whole-head) and non-rigid movements (i.e., facial features) of a talking face (Guellaï et al., 2011). Do newborns privilege one type of feature over the other when imitating (see also Meltzoff and Moore, 1989)? Previous work has also shown that newborns are also more sensitive to talking faces with direct versus averted gaze (Guellaï and Streri, 2011), suggesting that foundational aspects of social communication may exist at birth. However, it remains unclear how facial imitation may change with social gaze.

Pathways Through Which Sensorimotor Influences Interact with Audiovisual Speech Processing in Infancy

After the neonatal period, older infants continue to perceive audiovisual speech robustly. This has commonly been shown using a cross-modal matching procedure, where 2–4 month-olds are presented with side-by-side faces articulating the two visual vowels ([i] and [a]), accompanied by a single speech sound (either /i/ or /a/) in synchrony with both faces. Infants look longer at the matching face, showing an ability to associate vowels with the corresponding articulation (Kuhl and Meltzoff, 1982, 1984; MacKain et al., 1983; Patterson and Werker, 1999, 2002, 2003; Yeung and Werker, 2013). The effects of congruent versus incongruent audiovisual speech are also evident in a variety of other behavioral paradigms (Rosenblum et al., 1997; Burnham and Dodd, 2004; Desjardins and Werker, 2004; Pons et al., 2009; Tomalski et al., 2012; Kubicek et al., 2014; Pons and Lewkowicz, 2014), as well as in electrophysiological recordings (Kushnerenko et al., 2008; Bristow et al., 2009). A few recent papers have also begun to test audiovisual matching with fluent streams of speech (instead of just vowels or consonants; Lewkowicz and Pons, 2013; Kubicek et al., 2014), suggesting that audiovisual matching abilities in infancy can be very broad.

What about the mechanisms driving audiovisual speech perception? As mentioned above, infants at birth detect subtle differences in temporal synchrony between auditory and visual channels (Lewkowicz et al., 2010), and this is true of older infants as well (Lewkowicz, 1996, 2010). It could be that intersensory redundancy facilitates the detection of amodal properties related to vowel identity. Indeed, previous research has already shown that intersensory redundancy can enhance the detection of other kinds of amodal properties from faces (e.g., emotional affect; Flom and Bahrick, 2007), but at the cost of processing unimodal features (e.g., face identity; Bahrick et al., 2013). Together, this work suggests that synchrony detection may enhance amodal aspects of audiovisual speech (e.g., Bahrick et al., 2004).

An alternative proposal suggests that audiovisual speech information is mapped using sensorimotor information, perhaps via an internal model of the vocal tract (Kuhl and Meltzoff, 1984, 1988; Kent and Vorperian, 2007; Yeung and Werker, 2013). Several lines of evidence are suggestive of this sensorimotor mechanism: first, audiovisual matching with non-speech stimuli is often less robust than with speech (Kuhl and Meltzoff, 1984; Kuhl et al., 1991), particularly at later points in development (Lewkowicz and Ghazanfar, 2006), which suggests that audiovisual perception becomes more speech specific with age. Second, just as in newborns (Coulon et al., 2013), older infants also produce more congruent mouth shapes when hearing audiovisually congruent vowels compared to incongruent vowels (Legerstee, 1990; Kuhl and Meltzoff, 1996; Patterson and Werker, 1999). A recent report further shows that infants making /i/-like lip movements while chewing on a teething ring, or /u/-like lip movements while sucking on a pacifier, could no longer achieve match audiovisual speech matching if the heard vowel was similar the achieved lip shape (Yeung and Werker, 2013). This suggests that direct activation of the motor system can indeed affect audiovisual speech perception, and it is strongly suggestive of sensorimotor influences.

Together, this work raises two critical areas of future research. First, these dueling approaches must be reconciled: Are auditory and visual speech are bound together by temporal synchrony cues, or is there some internal model of the vocal tract that accomplishes this mapping? A third alternative is that two separate modes of audiovisual processing will be identified. For example, recent work has suggested that synchrony detection in 5 month-old infants uses a fast and automatic pathway which could be similar to the kind of adult audiovisual pathways that activate the STS and its associated networks (Hyde et al., 2011). More work is needed to see whether a slower, higher level pathway can also be distinguished, and if this pathway also taps sensorimotor information.

A second question concerns the definition of orofacial movements in infancy. Some work suggests that early vocalizations can already be considered speech-like: Cooing and babbling are influenced by the phonological properties of the native language (De Boysson-Bardies et al., 1989; Ruzza et al., 2006; Whalen et al., 2007), and are argued to be continuous with the first productions of words (de Boysson-Bardies and Vihman, 1991; Vihman, 1991; McCune and Vihman, 2001). Infant vocalizations also change in response to socially contingent responses from mothers, whether manipulated in the laboratory (Goldstein and Schwade, 2008), or measured during free play (Gros-Louis et al., 2014). Other work has even suggested that babbling capacities act as an attentional filter on auditory speech perception, modulating preferences to listen to words that either share or do not share commonalities between what is produced in babbling and in one’s early words (DePaolis et al., 2011, 2013; Majorano et al., 2014). At the same time, other research argues instead that universal constraints on the motor system (not specific to speech) play an equally important role in structuring how babbling is produced (MacNeilage and Davis, 1993; Lee et al., 2010). Moreover, coordinative movements differ when infants speak, babble, suck, or chew, suggesting that the physical mechanisms underlying babbling are not continuous with later speech motor control (Steeve, 2010).

In conjunction with the results from Yeung and Werker (2013), which demonstrate an effect of non-speech movements, the above debate shows how difficult it is to define what counts as an articulatory (i.e., speech-like) gesture, which in turn makes it hard to speculate about how an internal model of the vocal tract might be structured in early development (although see Ménard et al., 2007; Howard and Messum, 2011). Future research postulating a sensorimotor pathway in infancy will need to bear this literature in mind. One intriguing possibility is that distinctions between “speech-like” or “non-speech-like” may not be important at all (at least in early development): For example, infants have more difficulties matching auditory whistles to visual faces that are whistling than auditory trills to visual faces that are trilling. One speculative reason for this asymmetry could be that infants produce bilabial trills, but do not yet produce whistles (Mugitani et al., 2008).

Developmental Change as Speech Production Becomes More Varied and Sophisticated

Of course, the development of perceptual and motor systems continues well beyond infancy. For example, previous reports show that children (up to the age of 10) increasingly weight visual speech information more heavily in cases of sensory conflict or ambiguity (McGurk and MacDonald, 1976; Massaro, 1984; Massaro et al., 1986; Wightman et al., 2006; van Linden and Vroomen, 2008; Barutchu et al., 2010; Ross et al., 2011). It seems likely that multiple mechanisms drive this developmental change: For example, Sekiyama and Burnham (2008) find cross-cultural differences, which are likely unrelated to differences in motor ability. Nevertheless, there is also some correlational evidence supporting a sensorimotor pathway: children who have greater trouble articulating consonants show less sensitivity to visual speech information (Desjardins et al., 1997), as is also the case for children with broader language deficits (Bergeson et al., 2005; Dodd et al., 2008).

Other studies provide further evidence for multiple pathways emerging in childhood that are reminiscent of adult models (e.g., Skipper et al., 2007b; Okada and Hickok, 2009; Hickok et al., 2011). For example, while visual speech is more heavily weighted throughout childhood, non-speech audiovisual processing is relatively stable (Tremblay et al., 2007; although see Hillock et al., 2011). Such divergent trajectories suggest that two kinds of audiovisual binding mechanisms may be dissociated. Neurophysiological evidence for that dissociation comes from a study revealing two separable electrophysiological measures: amplitude versus latency of the commonly evoked N1/P2 complex to audiovisual speech (Knowland et al., 2014). Critically, only amplitude changes in development, while latency remains stable. Additional evidence comes from functional imaging studies, which suggests two networks related to audiovisual binding of speech stimuli: One network is centered around primary auditory areas, while a second network involves inferior frontal areas (Dick et al., 2010; Nath et al., 2011). Developmental change in audiovisual speech processing correlates with changes in connectivity between these networks (Dick et al., 2010).

Together these findings are highly suggestive of at least two distinct pathways in the brain that support audiovisual speech processing. A preliminary conjecture is that multiple pathways might be distinguished based on their developmental characteristics (stable, or increasing), their selectivity (to speech, or to may kinds of signals), and their mechanisms (depending on intersensory redundancy, or depending on an internal articulatory model). Critical lines of future research will need to investigate these hypotheses.

Conclusion

Speech perception is one of the most deeply explored aspects of language development. However, as this review highlights, several aspects of this phenomenon remain mysterious: in particular, the relation between speech perception and production. Here, we examine possible sensorimotor influences in multisensory speech processing, highlighting three areas for future research that will bridge between debates in the adult literature and emerging work in development.

First, we suggest that future research must examine the link between imitation and audiovisual speech perception at birth, and explore interactions among vision, audition, and the motor system. Second, we highlight two potential pathways involved in audiovisual speech perception in older infants, one of which may depend on sensorimotor information. Third, we illustrate the need to elucidate the behavioral and at the neural characteristics of these pathways in children, as speech production becomes more sophisticated.

Conflict of Interest Statement

The Guest Associate Editor Maya Gratier declares that, despite being affiliated to the same institution as author Bahia Guellaï, the review process was handled objectively and no conflict of interest exists. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors thank Rana Esseily and Maya Gratier for inviting them in this special issue.

References

Aldridge, M. A., Braga, E. S., Walton, G. E., and Bower, T. G. R. (1999). The intermodal representation of speech in newborns. Dev. Sci. 2, 42–46. doi: 10.1111/1467-7687.00052

CrossRef Full Text

Anisfeld. (1996). Only tongue protrusion modeling is matched by neonates. Dev. Rev. 16, 149–161. doi: 10.1006/drev.1996.0006

CrossRef Full Text

Bahrick, L. E., Lickliter, R., and Castellanos, I. (2013). The development of face perception in infancy: intersensory interference and unimodal visual facilitation. Dev. Psychol. 49, 1919–1930. doi: 10.1037/a0031238

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Bahrick, L. E., Lickliter, R., and Flom, R. (2004). Intersensory redundancy guides the development of selective attention, perception, and cognition in infancy. Curr. Dir. Psychol. Sci. 13, 99–102. doi: 10.1111/j.0963-7214.2004.00283.x

CrossRef Full Text

Barutchu, A., Danaher, J., Crewther, S. G., Innes-Brown, H., Shivdasani, M. N., and Paolini, A. G. (2010). Audiovisual integration in noise by children and adults. J. Exp. Child Psychol. 105, 38–50. doi: 10.1016/j.jecp.2009.08.005

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Bergeson, T. R., Pisoni, D. B., and Davis, R. A. O. (2005). Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants. Ear. Hear. 26, 149–164. doi: 10.1097/00003446-200504000-00004

CrossRef Full Text

Best, C. T. (1995). “A direct realist view of cross-language speech perception,” in Speech Perception and Linguistic Experience: Issues in Cross-language Speech Research, ed. W. Strange (Timonium, MD: York Press), 171–204.

Bristow, D., Dehaene-Lambertz, G., Mattout, J., Soares, C., Gliga, T., Baillet, S.,et al. (2009). Hearing faces: how the infant brain matches the face it sees with the speech it hears. J. Cogn. Neurosci. 21, 905–921. doi: 10.1162/jocn.2009.21076

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Burnham, D., and Dodd, B. (2004). Auditory-visual speech integration by prelinguistic infants: perception of an emergent consonant in the McGurk effect. Dev. Psychobiol. 45, 204–220. doi: 10.1002/dev.20032

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Calvert, G. A., Bullmore, E. T., Brammer, M. J., and Campbell, R. (1997). Activation of auditory cortex during silent lipreading. Science 276, 593–596. doi: 10.1126/science.276.5312.593

CrossRef Full Text

Calvert, G. A., and Campbell, R. (2003). Reading speech from still and moving faces: the neural substrates of visible speech. J. Cogn. Neurosci. 15, 57–70. doi: 10.1162/089892903321107828

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Chen, X., Striano, T., and Rakoczy, H. (2004). Auditory-oral matching behavior in newborns. Dev. Sci. 7, 42–47. doi: 10.1111/j.1467-7687.2004.00321.x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Chevillet, M. A., Jiang, X., Rauschecker, J. P., and Riesenhuber, M. (2013). Automatic phoneme category selectivity in the dorsal auditory stream. J. Neurosci. 33, 5208–5215. doi: 10.1523/JNEUROSCI.1870-12.2013

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Colombo, J., and Bundy, R. S. (1981). A method for the measurement of infant auditory selectivity. Infant Behav. Dev. 4, 219–223. doi: 10.1016/S0163-6383(81)80025-2

CrossRef Full Text

Coulon, M., Hemimou, C., and Streri, A. (2013). Effects of seeing and hearing vowels on neonatal facial imitation. Infancy 18, 782–796. doi: 10.1111/infa.12001

CrossRef Full Text

De Boysson-Bardies, B., Hallé, P. A., Sagart, L., and Durand, C. (1989). A crosslinguistic investigation of vowel formants in babbling. J. Child Lang. 16, 1–17. doi: 10.1017/S0305000900013404

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

de Boysson-Bardies, B., and Vihman, M. M. (1991). Adaptation to language: evidence from babbling and first words in four languages. Language 67, 297–319. doi: 10.1353/lan.1991.0045

CrossRef Full Text

de Hevia, M. D., Izard, V., Coubart, A., Spelke, E. S., and Streri, A. (2014). Representations of space, time, and number in neonates. Proc. Natl. Acad. Sci. U.S.A. 111, 4809–4813. doi: 10.1073/pnas.1323628111

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

DePaolis, R. A., Vihman, M. M., and Keren-Portnoy, T. (2011). Do production patterns influence the processing of speech in prelinguistic infants? Infant Behav. Dev. 34, 590–601. doi: 10.1016/j.infbeh.2011.06.005

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

DePaolis, R. A., Vihman, M. M., and Nakai, S. (2013). The influence of babbling patterns on the processing of speech. Infant Behav. Dev. 36, 642–649. doi: 10.1016/j.infbeh.2013.06.007

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Desjardins, R. N., Rogers, J., and Werker, J. F. (1997). An exploration of why preschoolers perform differently than do adults in audiovisual speech perception tasks. J. Exp. Child Psychol. 66, 85–110. doi: 10.1006/jecp.1997.2379

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Desjardins, R. N., and Werker, J. F. (2004). Is the integration of heard and seen speech mandatory for infants? Dev. Psychobiol. 45, 187–203. doi: 10.1002/dev.20033

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Dick, A. S., Solodkin, A., and Small, S. L. (2010). Neural development of networks for audiovisual speech comprehension. Brain Lang. 114, 101–14. doi: 10.1016/j.bandl.2009.08.005

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Dodd, B., McIntosh, B., Erdener, D., and Burnham, D. (2008). Perception of the auditory-visual illusion in speech perception by children with phonological disorders. Clin. Linguist. Phon. 22, 69–82. doi: 10.1080/02699200701660100

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Flom, R., and Bahrick, L. E. (2007). The development of infant discrimination of affect in multimodal and unimodal stimulation: the role of intersensory redundancy. Dev. Psychol. 43, 238–252. doi: 10.1037/0012-1649.43.1.238

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective. J. Phon. 14, 3–28.

Fowler, C. A. (1996). Listeners do hear sounds, not tongues. J. Acoust. Soc. Am. 99, 1730–1741. doi: 10.1121/1.415237

CrossRef Full Text

Fowler, C. A., and Dekle, D. J. (1991). Listening with eye and hand: cross-modal contributions to speech perception. J. Exp. Psychol. Hum. Percept. Perform. 17, 816–823. doi: 10.1037/0096-1523.17.3.816

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Friederici, A. D. (2012). The cortical language circuit: from auditory perception to sentence comprehension. Trends Cogn. Sci. 16, 262–268. doi: 10.1016/j.tics.2012.04.001

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Galantucci, B., Fowler, C. A., and Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psychon. Bull. Rev. 13, 361–377. doi: 10.3758/BF03193857

CrossRef Full Text

Gervain, J., and Mehler, J. (2010). Speech perception and language acquisition in the first year of life. Annu. Rev. Psychol. 61, 191–218. doi: 10.1146/annurev.psych.093008.100408

CrossRef Full Text

Gick, B., and Derrick, D. (2009). Aero-tactile integration in speech perception. Nature 462, 502–504. doi: 10.1038/nature08572

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Gick, B., Jóhannsdóttir, K. M., Gibraiel, D., and Mühlbauer, J. (2008). Tactile enhancement of auditory and visual speech perception in untrained perceivers. J. Acoust. Soc. Am. 123, EL72–EL76. doi: 10.1121/1.2884349

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Goldstein, M. H., and Schwade, J. A. (2008). Social feedback to infants’ babbling facilitates rapid phonological learning. Psychol. Sci. 19, 515–523. doi: 10.1111/j.1467-9280.2008.02117.x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Gros-Louis, J., West, M. J., and King, A. P. (2014). Maternal responsiveness and the development of directed vocalizing in social interactions. Infancy 19, 385–408. doi: 10.1111/infa.12054

CrossRef Full Text

Guellaï, B., Coulon, M., and Streri, A. (2011). The role of motion and speech in face recognition at birth. Vis. Cogn. 19, 1212–1233. doi: 10.1080/13506285.2011.620578

CrossRef Full Text

Guellaï, B., and Streri, A. (2011). Cues for early social skills: direct gaze modulates newborns’ recognition of talking faces. PLoS ONE 6:e18610. doi: 10.1371/journal.pone.0018610

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hickok, G. (2014). The architecture of speech production and the role of the phoneme in speech processing. Lang. Cogn. Neurosci. 29, 2–20. doi: 10.1080/01690965.2013.834370

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hickok, G., Houde, J., and Rong, F. (2011). Sensorimotor integration in speech processing: computational basis and neural organization. Neuron 69, 407–422. doi: 10.1016/j.neuron.2011.01.019

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Hillock, A. R., Powers, M. R., and Wallace, M. T. (2011). Binding of sights and sounds: age-related changes in multisensory temporal processing. Neuropsychologia 49, 461–467. doi: 10.1016/j.neuropsychologia.2010.11.041

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Howard, I. S., and Messum, P. (2011). Modeling the development of pronunciation in infant speech acquisition. Motor Control 15, 85–117.

Pubmed Abstract | Pubmed Full Text

Hyde, D. C., Jones, B. L., Flom, R., and Porter, C. L. (2011). Neural signatures of face-voice synchrony in 5-month-old human infants. Dev. Psychobiol. 53, 359–370. doi: 10.1002/dev.20525

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Imada, T., Zhang, Y., Cheour, M., Taulu, S., Ahonen, A., and Kuhl, P. K. (2006). Infant speech perception activates Broca’s area: a developmental magnetoencephalography study. Neuroreport 17, 957–962. doi: 10.1097/01.wnr.0000223387.51704.89

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Ito, T., Tiede, M., and Ostry, D. J. (2009). Somatosensory function in speech perception. Proc. Natl. Acad. Sci. U.S.A. 106, 1245–1248. doi: 10.1073/pnas.0810063106

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Jones, S. S. (2007). Imitation in infancy: the development of mimicry. Psychol. Sci. 18, 593–599. doi: 10.1111/j.1467-9280.2007.01945.x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Kent, R. D., and Vorperian, H. K. (2007). “In the mouths of babes: anatomic, motor, and sensory foundations of speech development in children,” in Language Disorders from a Developmental Perspective, ed. R. Paul (Mahwah, NJ: Lawrence Erlbaum), 55–81.

Knowland, V. C. P., Mercure, E., Karmiloff-Smith, A., Dick, F., and Thomas, M. S. C. (2014). Audio-visual speech perception: a developmental ERP investigation. Dev. Sci. 17, 110–124. doi: 10.1111/desc.12098

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Kubicek, C., Hillairet de Boisferon, A., Dupierrix, E., Pascalis, O., Loevenbruck, H., Gervain, J.,et al. (2014). Cross-modal matching of audio-visual german and French fluent speech in infancy. PLoS ONE 9:e89275. doi: 10.1371/journal.pone.0089275

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Kuhl, P. K., and Meltzoff, A. N. (1982). The bimodal perception of speech in infancy. Science 218, 1138–1141. doi: 10.1126/science.7146899

CrossRef Full Text

Kuhl, P. K., and Meltzoff, A. N. (1984). The intermodal representation of speech in infants. Infant Behav. Dev. 7, 361–381. doi: 10.1016/S0163-6383(84)80050-8

CrossRef Full Text

Kuhl, P. K., and Meltzoff, A. N. (1988). “Speech as an intermodal object of perception,” in Perceptual Development in Infancy. The Minnesota Symposia on Child Csychology, Vol. 20, ed. A. Yonas. (Hilldale, NJ: Lawrence Erlbaum),235–266.

Kuhl, P. K., and Meltzoff, A. N. (1996). Infant vocalizations in response to speech: vocal imitation and developmental change. J. Acoust. Soc. Am. 100, 2425–2438. doi: 10.1121/1.417951

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Kuhl, P. K., Williams, K. A., and Meltzoff, A. N. (1991). Cross-modal speech perception in adults and infants using nonspeech auditory stimuli. J. Exp. Psychol. Hum. Percept. Perform. 17, 829–840. doi: 10.1037/0096-1523.17.3.829

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Kushnerenko, E., Teinonen, T., Volein, A., and Csibra, G. (2008). Electrophysiological evidence of illusory audiovisual speech percept in human infants. Proc. Natl. Acad. Sci. U.S.A. 105, 11442–11445. doi: 10.1073/pnas.0804275105

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Lee, S., Davis, B., and MacNeilage, P. (2010). Universal production patterns and ambient language influences in babbling: a cross-linguistic study of Korean- and English-learning infants. J. Child Lang. 37, 26. doi: 10.1017/S0305000909009532

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Legerstee, M. (1990). Infant use of multimodal information to imitate speech sounds. Infant Behav. Dev. 13, 343–354. doi: 10.1016/0163-6383(90)90039-B

CrossRef Full Text

Lewkowicz, D. J. (1996). Perception of auditory–visual temporal synchrony in human infants. J. Exp. Psychol. Hum. Percept. Perform. 22, 1094–1106. doi: 10.1037/0096-1523.22.5.1094

CrossRef Full Text

Lewkowicz, D. J. (2010). Infant perception of audio-visual speech synchrony. Dev. Psychol. 46, 66–77. doi: 10.1037/a0015579

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Lewkowicz, D. J., and Ghazanfar, A. A. (2006). The decline of cross-species intersensory perception in human infants. Proc. Natl. Acad. Sci. U.S.A. 103, 6771–6774. doi: 10.1073/pnas.0602027103

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Lewkowicz, D. J., Leo, I., and Simion, F. (2010). Intersensory perception at birth: newborns match nonhuman primate faces and voices. Infancy 15, 46–60. doi: 10.1111/j.1532-7078.2009.00005.x

CrossRef Full Text

Lewkowicz, D. J., and Pons, F. (2013). Recognition of amodal language identity emerges in infancy. Int. J. Behav. Dev. 37, 90–94. doi: 10.1177/0165025412467582

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., and Studdert-Kennedy, M. (1967). Perception of the speech code. Psychol. Rev. 74, 431–461. doi: 10.1037/h0020279

CrossRef Full Text

Liberman, A. M., and Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition 21, 1–36. doi: 10.1016/0010-0277(85)90021-6

CrossRef Full Text

Lickliter, R., Bahrick, L. E., and Honeycutt, H. (2002). Intersensory redundancy facilitates prenatal perceptual learning in bobwhite quail (Colinus virginianus) embryos. Dev. Psychol. 38, 15–23. doi: 10.1037//0012-1649.38.1.15

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Lotto, A. J., Hickok, G., and Holt, L. L. (2009). Reflections on mirror neurons and speech perception. Trends Cogn. Sci. 13, 110–114. doi: 10.1016/j.tics.2008.11.008

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

MacKain, K., Studdert-Kennedy, M., Spieker, S., and Stern, D. (1983). Infant intermodal speech perception is a left-hemisphere function. Science 219, 1347–1349. doi: 10.1126/science.6828865

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

MacNeilage, P. F., and Davis, B. L. (1993). “Motor explanations of babbling and early speech patterns,” in Developmental Neurocognition: Speech and Face Processing in the First Year of Life, Vol. 69, eds B. de Boysson-Bardies, S. de Schonen, P. W. Jusczyk, P. F. MacNeilage, and J. Morton (Norwell, MA: Kluwer),341–352.

Majorano, M., Vihman, M. M., and DePaolis, R. A. (2014). The relationship between infants’ production experience and their processing of speech. Lang. Learn. Dev. 10, 179–204. doi: 10.1080/15475441.2013.829740

CrossRef Full Text

Massaro, D. W. (1984). Children ’s perception of visual and auditory speech children’s perception and auditory speech. Child Dev. 55, 1777–1788. doi: 10.2307/1129925

CrossRef Full Text

Massaro, D. W., Thompson, L. A., Barron, B., and Laren, E. (1986). Developmental changes in visual and auditory contributions to speech perception. J. Exp. Child Psychol. 41, 93–113. doi: 10.1016/0022-0965(86)90053-6

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Maurer, D., and Mondloch, C. J. (2004). “Neonatal synesthesia: a reevaluation,” in Synesthesia: Perspectives from Cognitive Neuroscience, eds L. C. Robertson and N. Sagiv (Oxford: Oxford University Press), 193–213.

McCune, L., and Vihman, M. M. (2001). Early phonetic and lexical development: a productivity approach. J. Speech Lang. Hear Res. 44, 670–684. doi: 10.1044/1092-4388(2001/054)

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

McGettigan, C., Agnew, Z. K., and Scott, S. K. (2010). Are articulatory commands automatically and involuntarily activated during speech perception? Proc. Natl. Acad. Sci. U.S.A. 107:E42. doi: 10.1073/pnas.1000186107

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

McGurk, H., and MacDonald, J. (1976). Hearing lips and seeing voices. Nature 264, 746–748. doi: 10.1038/264746a0

CrossRef Full Text

Meltzoff, A. N., and Borton, R. W. (1979). Intermodal matching by human neonates. Nature 282, 403–404. doi: 10.1038/282403a0

CrossRef Full Text

Meltzoff, A. N., and Moore, M. K. (1977). Imitation of facial and manual gestures by human neonates. Science 198, 75–78. doi: 10.1126/science.198.4312.75

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Meltzoff, A. N., and Moore, M. K. (1989). Imitation in newborn infants: exploring the range of gestures imitated and the underlying mechanisms. Dev. Psychol. 25, 954–962. doi: 10.1037/0012-1649.25.6.954

CrossRef Full Text

Ménard, L., Schwartz, J.-L., Boë, L.-J., and Aubin, J. (2007). Articulatory–acoustic relationships during vocal tract growth for French vowels: analysis of real data and simulations with an articulatory model. J. Phon. 35, 1–19. doi: 10.1016/j.wocn.2006.01.003

CrossRef Full Text

Mochida, T., Kimura, T., Hiroya, S., Kitagawa, N., Gomi, H., and Kondo, T. (2013). Speech misperception: speaking and seeing interfere differently with hearing. PLoS ONE 8:e68619. doi: 10.1371/journal.pone.0068619

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Möttönen, R., Dutton, R., and Watkins, K. E. (2013). Auditory-motor processing of speech sounds. Cereb. Cortex 23, 1190–1197. doi: 10.1093/cercor/bhs110

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Möttönen, R., van de Ven, G. M., and Watkins, K. E. (2014). Attention fine-tunes auditory-motor processing of speech sounds. J. Neurosci. 34, 4064–4069. doi: 10.1523/JNEUROSCI.2214-13.2014

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Mugitani, R., Kobayashi, T., and Hiraki, K. (2008). Audiovisual matching of lips and non-canonical sounds in 8-month-old infants. Infant Behav. Dev. 31, 307–310. doi: 10.1016/j.infbeh.2007.12.002

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Nath, A. R., Fava, E. E., and Beauchamp, M. S. (2011). Neural correlates of interindividual differences in children’s audiovisual speech perception. J. Neurosci. 31, 13963–13971. doi: 10.1523/JNEUROSCI.2605-11.2011

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Navarra, J., Yeung, H. H., Werker, J. F., and Soto-Faraco, S. (2012). “Multisensory interactions in speech perception,” in The New Handbook of Multisensory Processing, ed. B. E. Stein (Cambridge, MA: MIT Press),435–452.

Ojanen, V., Möttönen, R., Pekkola, J., Jääskeläinen, I. P., Joensuu, R., Autti, T.,et al. (2005). Processing of audiovisual speech in Broca’s area. Neuroimage, 25, 333–338. doi: 10.1016/j.neuroimage.2004.12.001

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Okada, K., and Hickok, G. (2009). Two cortical mechanisms support the integration of visual and auditory speech: a hypothesis and preliminary data. Neurosci. Lett. 452, 219–223. doi: 10.1016/j.neulet.2009.01.060

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Oller, D. K. (1980). “The emergence of the sounds of speech in infancy,” in Child Phonology, Vol. 1, Production, eds G. H. Yeni-Komishan, J. F. Kavanagh, and C. A. Ferguson (New York: Academic Press), 92–112.

Patterson, M. L., and Werker, J. F. (1999). Matching phonetic information in lips and voice is robust in 4.5-month-old infants. Infant Behav. Dev. 22, 237–247. doi: 10.1016/S0163-6383(99)00003-X

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Patterson, M. L., and Werker, J. F. (2002). Infants’ ability to match dynamic phonetic and gender information in the face and voice. J. Exp. Child Psychol. 81, 93–115. doi: 10.1006/jecp.2001.2644

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Patterson, M. L., and Werker, J. F. (2003). Two-month-old infants match phonetic information in lips and voice. Dev. Sci. 6, 191–196. doi: 10.1111/1467-7687.00271

CrossRef Full Text

Pekkola, J., Ojanen, V., Autti, T., Jääskeläinen, I. P., Möttönen, R., Tarkiainen, A.,et al. (2005). Primary auditory cortex activation by visual speech: an fMRI study at 3 T. Neuroreport 16, 125. doi: 10.1097/00001756-200502080-00010

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pons, F., and Lewkowicz, D. J. (2014). Infant perception of audio-visual speech synchrony in familiar and unfamiliar fluent speech. Acta Psychol. 149, 142–147. doi: 10.1016/j.actpsy.2013.12.013

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pons, F., Lewkowicz, D. J., Soto-Faraco, S., and Sebastián-Gallés, N. (2009). Narrowing of intersensory speech perception in infancy. Proc. Natl. Acad. Sci. U.S.A. 106, 10598–10602. doi: 10.1073/pnas.0904134106

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Pulvermüller, F., and Fadiga, L. (2010). Active perception: sensorimotor circuits as a cortical basis for language. Nat. Rev. Neurosci. 11, 351–360. doi: 10.1038/nrn2811

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Rosenblum, L. D., Schmuckler, M. A., and Johnson, J. A. (1997). The McGurk effect in infants. Percept. Psychophys. 59, 347–357. doi: 10.3758/BF03211902

CrossRef Full Text

Ross, L. A., Molholm, S., Blanco, D., Gomez-Ramirez, M., Saint-Amour, D., and Foxe, J. J. (2011). The development of multisensory speech perception continues into the late childhood years. Eur. J. Neurosci. 33, 2329–2337. doi: 10.1111/j.1460-9568.2011.07685.x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Ruzza, B., Rocca, F., Boero, D. L., and Lenti, C. (2006). Investigating the musical qualities of early infant sounds. Ann. N. Y. Acad. Sci. 999, 527–529. doi: 10.1196/annals.1284.066

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Sams, M., Möttönen, R., and Sihvonen, T. (2005). Seeing and hearing others and oneself talk. Cogn. Brain Res. 23, 429–435. doi: 10.1016/j.cogbrainres.2004.11.006

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Santi, A., Servos, P., Vatikiotis-Bateson, E., Kuratate, T., and Munhall, K. (2003). Perceiving biological motion: dissociating visible speech from walking. J. Cogn. Neurosci. 15, 800–809. doi: 10.1162/089892903322370726

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Sato, M., Buccino, G., Gentilucci, M., and Cattaneo, L. (2010). On the tip of the tongue: modulation of the primary motor cortex during audiovisual speech perception. Speech Commun. 52, 533–541. doi: 10.1016/j.specom.2009.12.004

CrossRef Full Text

Sato, M., Troille, E., Ménard, L., Cathiard, M.-A., and Gracco, V. (2013). Silent articulation modulates auditory and audiovisual speech perception. Exp. Brain Res. 227, 275–288. doi: 10.1007/s00221-013-3510-8

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Schwartz, J.-L., Basirat, A., Ménard, L., and Sato, M. (2010). The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception. J. Neurolinguistics 25, 336–354. doi: 10.1016/j.jneuroling.2009.12.004

CrossRef Full Text

Scott, M., Yeung, H. H., Gick, B. W., and Werker, J. F. (2013). Inner speech captures the perception of external speech. J. Acoust. Soc. Am. 133, EL286–EL292. doi: 10.1121/1.4794932

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Scott, S. K., McGettigan, C., and Eisner, F. (2009). A little more conversation, a little less action – candidate roles for the motor cortex in speech perception. Nat. Rev. Neurosci. 10, 295–302. doi: 10.1038/nrn2603

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Sekiyama, K., and Burnham, D. (2008). Impact of language on development of auditory-visual speech perception. Dev. Sci. 11, 306–320. doi: 10.1111/j.1467-7687.2008.00677.x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Shultz, S., and Vouloumanos, A. (2010). Three-month-olds prefer speech to other naturally occurring signals. Lang. Learn. Dev. 6, 241–257. doi: 10.1080/15475440903507830

CrossRef Full Text

Skipper, J. I., Goldin-Meadow, S., Nusbaum, H. C., and Small, S. L. (2007a). Speech-associated gestures, Broca’s area, and the human mirror system. Brain Lang. 101, 260–277. doi: 10.1016/j.bandl.2007.02.008

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Skipper, J. I., van Wassenhove, V., Nusbaum, H. C., and Small, S. L. (2007b). Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception. Cereb. Cortex 17, 2387–2399. doi: 10.1093/cercor/bhl147

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Slater, A., Brown, E., Hayes, R., and Quinn, P. C. (1999). Intermodal perception at birth: intersensory redundancy guides newborn infants’ learning of arbitrary auditory – visual pairings. Dev. Sci. 2, 333. doi: 10.1111/1467-7687.00079

CrossRef Full Text

Stark, R. E. (1980). “Stages of speech development in the first year of life,” in Child Phonology, Vol. 1, Production, eds G. H. Yeni-Komishan, J. F. Kavanagh, and C. A. Ferguson (New York: Academic Press), 73–92.

Steeve, R. W. (2010). Babbling and chewing: jaw kinematics from 8 to 22 months. J. Phon. 38, 445–458. doi: 10.1016/j.wocn.2010.05.001

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Stoel-Gammon, C. (2011). Relationships between lexical and phonological development in young children. J. Child Lang. 38, 1–34. doi: 10.1017/S0305000910000425

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Streri, A. (1993). Seeing, Reaching, Touching: The Relations between Vision and Touch in Infancy, T. Pownall and S. Kingerlee (trans.) (Cambridge, MA: MIT Press).

Streri, A. (2012). “Crossmodal interactions in the human newborn: news answers to Molyneux’s question,” in Multisensory Development, eds A. J. Bremner, D. J. Lewkowicz, and C. Spence (Oxford: Oxford University Press), 88–112.

Sumby, W. H., and Pollack, I. (1954). Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. 26, 212–215. doi: 10.1121/1.1907309

CrossRef Full Text

Sundara, M., Kumar Namasivayam, A., and Chen, R. (2001). Observation–execution matching system for speech: a magnetic stimulation study. Neuroreport 12, 1341–1344. doi: 10.1097/00001756-200105250-00010

CrossRef Full Text

Swaminathan, S., MacSweeney, M., Boyles, R., Waters, D., Watkins, K. E., and Möttönen, R. (2013). Motor excitability during visual perception of known and unknown spoken languages. Brain Lang. 126, 1–7. doi: 10.1016/j.bandl.2013.03.002

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Tomalski, P., Ribeiro, H., Ballieux, H., Axelsson, E. L., Murphy, E., Moore, D. G.,et al. (2012). Exploring early developmental changes in face scanning patterns during the perception of audiovisual mismatch of speech cues. Eur. J. Dev. Psychol. 1, 1–14. doi: 10.1080/17405629.2012.728076

CrossRef Full Text

Treille, A., Cordeboeuf, C., Vilain, C., and Sato, M. (2014). Haptic and visual information speed up the neural processing of auditory speech in live dyadic interactions. Neuropsychologia 57, 71–77. doi: 10.1016/j.neuropsychologia.2014.02.004

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Tremblay, C., Champoux, F., Voss, P., Bacon, B. A., Lepore, F., and Théoret, H. (2007). Speech and non-speech audio-visual illusions: a developmental study. PLoS ONE 2:e742. doi: 10.1371/journal.pone.0000742

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

van Linden, S., and Vroomen, J. (2008). Audiovisual speech recalibration in children. J. Child Lang. 35, 809–822. doi: 10.1017/S0305000908008817

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

van Wassenhove, V., Grant, K. W., and Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proc. Natl. Acad. Sci. U.S.A. 102, 1181–1186. doi: 10.1073/pnas.0408949102

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Vihman, M. M. (1991). “Ontogeny of phonetic gestures: speech production,” in Modularity and the Motor Theory of Speech Perception, eds I. G. Mattingly and M. Studdert-Kennedy (Hillsdale, NJ: Lawrence Erlbaum), 69–84.

Vouloumanos, A., Hauser, M. D., Werker, J. F., and Martin, A. (2010). The tuning of human neonates’ preference for speech. Child Dev. 81, 517–527. doi: 10.1111/j.1467-8624.2009.01412.x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Vouloumanos, A., and Werker, J. F. (2004). Tuned to the signal: the privileged status of speech for young infants. Dev. Sci. 7, 270–276. doi: 10.1111/j.1467-7687.2004.00345.x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Vouloumanos, A., and Werker, J. F. (2007). Listening to language at birth: evidence for a bias for speech in neonates. Dev. Sci. 10, 159–164. doi: 10.1111/j.1467-7687.2007.00549.x

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Werker, J. F., and Yeung, H. H. (2005). Infant speech perception bootstraps word learning. Trends Cogn. Sci. 9, 519–527. doi: 10.1016/j.tics.2005.09.003

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Werker, J. F., Yeung, H. H., and Yoshida, K. A. (2012). How do infants become experts at native-speech perception? Curr. Dir. Psychol. Sci. 21, 221–226. doi: 10.1177/0963721412449459

CrossRef Full Text

Whalen, D. H., Levitt, A. G., and Goldstein, L. M. (2007). VOT in the babbling of French- and English-learning infants. J. Phon. 35, 341–352. doi: 10.1016/j.wocn.2006.10.001

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Wightman, F., Kistler, D., and Brungart, D. (2006). Informational masking of speech in children: auditory-visual integration. J. Acoust. Soc. Am. 119, 3940–3949. doi: 10.1121/1.2195121

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Yeung, H. H., and Werker, J. F. (2013). Lip movements affect infants’ audiovisual speech perception. Psychol. Sci. 24, 603–612. doi: 10.1177/0956797612458802

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Yuen, I., Davis, M. H., Brysbaert, M., and Rastle, K. (2010). Activation of articulatory information in speech perception. Proc. Natl. Acad. Sci. U.S.A. 107, 592–597. doi: 10.1073/pnas.0904774107

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Keywords: speech perception, speech production, sensorimotor systems, infants, children

Citation: Guellaï B, Streri A and Yeung HH (2014) The development of sensorimotor influences in the audiovisual speech domain: some critical questions. Front. Psychol. 5:812. doi: 10.3389/fpsyg.2014.00812

Received: 27 April 2014; Accepted: 09 July 2014;
Published online: 06 August 2014.

Edited by:

Maya Gratier, Université Paris Ouest Nanterre La Défense, France

Reviewed by:

Caroline Floccia, University of Plymouth, UK
Robin Panneton, Virginia Tech, USA

Copyright © 2014 Guellaï, Streri and Yeung. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bahia Guellaï, Laboratoire Ethologie, Cognition, Développement, Université Paris Ouest Nanterre La Défense, 200, Avenue de la République, 92000 Nanterre, France e-mail: bahia.guellai@gmail.com; H. Henny Yeung, Laboratoire Psychologie de la Perception, 45 rue des Saints-Pères, 75006 Paris, France e-mail: henny.yeung@parisdescartes.fr