Original Research ARTICLE
Functional hemispheric specialization in processing phonemic and prosodic auditory changes in neonates
- 1 Department of Pediatrics, School of Medicine, Keio University, Tokyo, Japan
- 2 Global COE program, Centre for Advanced Research on Logic and Sensibility, Keio University, Tokyo, Japan
- 3 Keio Advanced Research Center, Keio University, Tokyo, Japan
- 4 Academic Frontiers Project, Meijo University, Nagoya, Japan
- 5 Graduate School of Human Relations, Keio University, Tokyo, Japan
This study focuses on the early cerebral base of speech perception by examining functional lateralization in neonates for processing segmental and suprasegmental features of speech. For this purpose, auditory evoked responses of full-term neonates to phonemic and prosodic contrasts were measured in their temporal area and part of the frontal and parietal areas using near-infrared spectroscopy (NIRS). Stimuli used here were phonemic contrast /itta/ and /itte/ and prosodic contrast of declarative and interrogative forms /itta/ and /itta?/. The results showed clear hemodynamic responses to both phonemic and prosodic changes in the temporal areas and part of the parietal and frontal regions. In particular, significantly higher hemoglobin (Hb) changes were observed for the prosodic change in the right temporal area than for that in the left one, whereas Hb responses to the vowel change were similarly elicited in bilateral temporal areas. However, Hb responses to the vowel contrast were asymmetrical in the parietal area (around supra marginal gyrus), with stronger activation in the left. These results suggest a specialized function of the right hemisphere in prosody processing, which is already present in neonates. The parietal activities during phonemic processing were discussed in relation to verbal-auditory short-term memory. On the basis of this study and previous studies on older infants, the developmental process of functional lateralization from birth to 2 years of age for vowel and prosody was summarized.
Speech consists of two dominant components, i.e., segments and suprasegments, which correspond respectively to phonemic and prosodic levels of structure. Although language comprehension involves various processes, the perceptual analysis of segmental and suprasegmental information constitutes a crucial first step in the overall process of successful encoding of lexical, syntactic and pragmatic levels. Indeed, it is well known that learning specific features that are associated respectively with these two components is an important initial step for language acquisition in the first year of life. Functional cerebral lateralization in processing these two kinds of information has been demonstrated in neuroimaging literature on adult speech perception: human adults tend to show a left hemispheric dominance for processing phonemes and a right hemispheric dominance for processing prosodic information (e.g., Zatorre et al., 1992; Furuya and Mori, 2003). However, brain development of this specialized system in infants remains poorly understood, in spite of the fact that developmental studies offer the potential for uncovering critical clues to understanding the cerebral basis of linguistic skill acquisition. Accordingly, the present study is designed to investigate brain lateralization in neonates with the aim of determining the degree of hemispheric specialization of segments and suprasegments in early infancy.
Phonetic cues, which are characterized by formant patterns, determine the phonological status of various linguistic segments (e.g., vowel or consonant type). In contrast, prosodic cues, which are realized by pitch contour, intensity, and duration, determine suprasegmental linguistic information. Prosodic cues contribute to accentuation and intonation and also convey para/non-linguistic information, such as emotional state and talker identity. They can affect single segments as well as whole syllables/sentences. Furthermore, many phonological theories hypothesize that segments and prosodies are separately represented in different levels (e.g., Goldsmith, 1990). In general, a differential pattern of acquisition processes is observed for segments and suprasegments, or phoneme and prosody. With respect to segments, infants are born with the ability to discriminate among a wide range of phonological features of segments. They become sensitive to native phonetic (formant) patterns only after 6 months of age (Kuhl, 2004). In other words, the perceptual ability to differentiate segments is universal initially, but with maturation and exposure to the maternal language, perceptual sensitivity narrows gradually to exhibit language specificity, which appears at around 6 months of age for vowels and 12 months of age for consonants (Kuhl, 2004). In many cases, such language-specific learning starts earlier for the sentence/phrase-level prosody than for segments (Mehler et al., 1988; Nazzi et al., 2000). Furthermore, the prosodic organization of speech facilitates language acquisition in infants, and the acoustic saliency of prosody, even at the syllable-level (e.g., stressed syllable), may help draw the attention of infants to speech and its structures (Jusczyk et al., 1999). Among such suprasegmentals, acquisition of lexical tones in Mandarin, which is a syllable-level prosody, develops in a manner relatively similar way as that of segmental categories (Mattock and Burnham, 2006; Mattock et al., 2008), whereas another syllable-level prosody, that is, Japanese pitch accent, shows quite different developmental patterns (Mugitani, 2009). The present study attempts to compare the differences in brain responses to segments and suprasegmentals, which have differential acoustic and linguistic natures, as reviewed here. Either stimuli used in the study are well controlled such that each segmental and suprasegmental difference is realized within a final syllable with the least acoustical manipulation. Consequently, we use intonation contour, which is syllable-level prosody, as suprasegmental stimuli.
In adults, speech-processing involves functional hemispheric specialization. Specifically, hemispheric specialization with adult speech is influenced by at least two factors: acoustic and linguistic sound properties. The evidence for the importance of acoustic properties of speech signal derives from a growing body of research indicating that different auditory features activate the two hemispheres. In particular, when acoustic information (e.g., spectral frequency changes) is modulated over time, rapid modulations appear to predominantly activate the left hemisphere, whereas slow and/or spectral modulations show cortical activity lateralized to the right hemisphere (Zatorre and Belin, 2001; Poeppel, 2003; Poeppel et al., 2008; note that the revised model of Poeppel hypothesizes bilateral engagement for fast stimuli). Here, we refer to this notion as signal-driven hemispheric activation or signal-driven hypothesis. Various types of acoustic definitions and classifications can explain such signal-driven asymmetry (Minagawa-Kawai et al., 2011a for a review). However, among them, two dominant trends seem to be a dichotomy of temporal versus spectral changes (Zatorre and Belin, 2001; Schönwiesner et al., 2005; Jamison et al., 2006) and stimuli of short versus long time scale (high versus low frequencies), which are processed by a different size of analysis window (Poeppel, 2003; Poeppel et al., 2008). The difference between the two is that the former emphasizes spectral richness to evoke rightward dominance, whereas the latter focuses on time scale of sounds. There are other variations, and because of such different definitions of critical acoustic properties, the definition of the dichotomy has also varied, such that what is deemed “fast/temporal” or “slow/spectral” often depends upon the experimental variables within a given study. Dichotic listening studies in adults have also confirmed hemispheric asymmetry and the dependence of this asymmetry upon acoustic properties of the speech signal (see Shtyrov et al., 2000 for a review). Although these studies lack a common set of definitions for relevant acoustic properties, they nonetheless, illustrate that signal-driven laterality is a crucial issue in investigating the cerebral basis of speech.
In light of the signal-driven hypothesis, hemispheric specialization derives from the acoustic features of segments as well as suprasegments. In phonemes, these features involve the richness of temporal variations, whereas in prosody, they are associated with frequency modulations such as F0 or richness of spectral features. More precisely, phonemes may be further divided according to their physical properties, such as consonants having rapid and dynamic spectral features and vowels having rather steady-state spectral features. Hence, in the realm of segments, it has been claimed that consonants, which have more rapidly changing acoustic energies than vowels, tend to show leftward dominance in contrast to vowels, with less rapidly changing acoustic energies likely to exhibit bilateral cortical engagements (Shankweiler and Studdert-Kennedy, 1967; Weiss and House, 1973). In contrast to these segmental properties, prosodic changes can be described as slowly changing stimuli or spectrally rich stimuli as in tonal changes. These tend to be localized in the right hemisphere. Our segmental versus suprasegmental stimuli can be generally interpreted in the context of the signal-driven hypothesis. According to Poeppel (2003) and Poeppel et al. (2008), variations of formant transitions are preferentially processed on the left side, and pitch contours requiring higher spectral resolution are on the right side. However, the phonetic stimuli used here involve a vowel contrast that does not exhibit prominent rapid acoustic changes (i.e., relative to changes present in consonants). Consequently, these stimuli are characterized by steady-state formant frequencies. This acoustic property should induce bilateral activity in the temporal areas according to the signal-driven hypothesis. Furthermore, even though our prosodic stimulus is not long unlike general sentential prosody, it has richer spectral features than that of the phonemic contrast, which tends to induce the right dominance (Zatorre and Belin, 2001; Schönwiesner et al., 2005). More specifically, in contrast to the phonemic change with only F1 and F2 differences, the prosodic contrast involves more complex spectral changes as a result of manipulation of fundamental frequency affecting all the harmonic structures. This complex spectral change is likely to induce the rightward dominance.
Although it is generally agreed that in adults these acoustic-physical factors drive laterality, probably at the lower processing level, i.e., perceptual level, higher level factors (i.e., cognitive) related to linguistic knowledge also play a crucial role in explaining cerebral specialization. For instance, in adults leftward lateralization depends upon whether a particular stimulus is perceived as a linguistic element (Dehaene-Lambertz et al., 2005; Mottonen et al., 2006); similarly, if a vowel contrast is phonemically distinctive in a listeners native language, this also enhances chances that it will be lateralized leftward (Näätänen et al., 1997; Dehaene-Lambertz and Gliga, 2004). Even at the phonetic level, native phonological contrasts associated with vowels, consonants, phonotactics, and accents (but not non-native ones) are generally processed in left temporal regions (Jacquemot et al., 2003; Sato et al., 2007). Even though pitch accents or lexical tones have a slowly changing signal, they are processed predominantly on the left hemisphere by native-listeners (Gandour et al., 2000). These findings suggest that language learning is also a critical consideration for the left-dominant brain network. In short, evidence collected from adult listeners is best explained by a combination of acoustic features, linguistic, and learning factors (Minagawa-Kawai et al., 2011a). Consequently, several speech-processing models hypothesize that, to a large extent, lateralization of sounds depends upon the level of processing involved (Poeppel, 2003; Friederici and Alter, 2004; Zatorre and Gandour, 2008).
What exactly is the developmental process that leads to the functional hemispheric specialization in speech? In recent years multi-channel near-infrared spectroscopy (NIRS) has enabled examination of this issue because this methodology allows reliable localization of the focus of neural activity. In fact, recent NIRS studies have provided evidence regarding the cerebral response of infants to phonological contrasts. Minagawa-Kawai et al. (2007) compared the neural sensitivities of different age groups (five groups from 3- to 28-month-olds) to changes in phonemic category of long and short vowels and found that Japanese infants show a left-dominant temporal response to an across-category phonemic change only after 13 months of age. Similarly, NIRS analyses show that 10-month-old infants exhibit a left-lateralized cerebral response to a difference in lexical pitch accents (Sato et al., 2010). Because younger age groups in these studies did not show a left-dominant response, Sato et al. (2010) hypothesized that exposure of infants to first language (L1) modified the cortex of older infants through the construction of an L1-specific brain network that is located predominantly on the left side. In addition, electroencephalography (EEG) studies have shown emergence of a language-specific brain response after L1 exposure (Cheour et al., 1998), and recent NIRS studies revealed for the first time a developmental change in cerebral lateralization by showing the specific brain regions involved (Minagawa-Kawai et al., 2007; Sato et al., 2010).
Of special relevance to the present study is the research of Sato et al. (2003). These researchers assessed cerebral lateralization for both prosodic and phonemic contrasts using different age groups, ranging in age from 7 months to 5-years. Infants older than 11–12 months showed a significant lateralization that resembled that of adults in that the phonemic changes evoked a left-dominant response whereas prosodic contrasts evoked a right-dominant response. By contrast, for younger children (7–8; 9–10 months), hemispheric laterality indices for phonemic and prosodic conditions did not differ significantly (Sato et al., 2003). Although these results appear to indicate that brain regions required for decoding phonemic and prosodic information become more specific with maturation, detailed inspection of the laterality index (LI) in this study revealed tendencies in younger age groups toward right-dominance lateralization for the prosodic condition and a bilateral response for the phonemic condition. Figure 1 shows these data. Note that the LI for younger age groups in the prosodic condition trends downward, below zero, indicating right hemispheric dominance. Sato et al. (2003) statistically concentrated upon the overall LI difference between the two stimulus conditions. However, on inspection we found that for the youngest group the laterally index in the prosodic condition was significantly below zero (i.e., zero indicating null hemispheric bias). This result suggests that the prosodic sensitivity of infants is already functionally specialized hemispherically by the age of 7- to 8-months-old. Furthermore, recent evidence based upon neonates’ responses to presentations of frequency modulated non-speech sequences demonstrated a rightward dominance with spectral patterns having relatively slow (prosodic-like) modulations (Telkemeyer et al., 2009). These results suggest predominant right-hemisphere engagement in processing prosody from the beginning of life. To date, however, no study has investigated the inborn cerebral basis for processing prosody in real speech stimuli.
Figure 1. Box-whisker plot of laterality index (LI) for phonemic and prosodic conditions in different age groups. A LI relies upon the formula (L − R)/(L + R), where L and R are the maximal total Hb changes in the left and right auditory channels, respectively. LI is above zero for left dominance and below zero for right dominance. * = p < 0.05. Boxes, The quartiles; bars in the box, the medians; hinges, the ranges. This is adapted from Sato et al. (2003) with permission.
The present study is designed to examine this issue by contrasting two distinctive linguistic features (i.e., phonemic and prosodic contrasts) using real speech materials. To this end, this research employs speech materials used in previous studies (Furuya and Mori, 2003; Sato et al., 2003) in which different age groups including infants, children, and adults were examined. This paradigm enables an assessment of laterality for segments and suprasegments in newborn infants who have not been significantly exposed to language. Furthermore, comparisons of data from this study with that of Sato et al. (2003) will provide a broader perspective on developmental changes in the functional laterality in human infants as a function of age. This study also allows an indirect examination of the neonates’ cortical basis for processing auditory stimuli containing fast and slow/spectrally rich acoustic changes similar to those which activate adults brain asymmetrically. However, as stated before the phonemic stimuli used here is a vowel contrast characterized by steady-state formant frequencies, which is expected to induce bilateral activity in the temporal areas according to the signal-driven hypothesis.
Materials and Methods
Twenty Japanese neonates were tested with NIRS; four infants did not complete the protocol due to fussiness and excess movement; their data were excluded from further analyses. The final data set included data from 17 infants (average 4.8 days-old, range 3–8 days; 10 females). Among them, three infants failed to complete the phonemic condition and other two infants failed the prosodic condition, therefore the data set for each condition has different sets of participants (N = 14 for the phonemic condition and N = 15 for the prosodic condition). All neonates were full-term infants (averaged gestation: 271 days) with average birth weight of 2754 g (range: 1928–3298 g) and with no history of medical problems. All were from monolingual Japanese families. Consent forms were obtained from parents before the infants’ participation. This study was approved by both of the ethic committees of Faculty of letters, Keio University (No. 09049), and Keio University hospital (No. 2009-189).
Stimuli and Conditions
Stimuli consisted of speech contexts, supplied by real words, which exhibited phonemic and prosodic differences. Three different stimulus patterns reflected respectively different forms of the Japanese verb /iku/(go); these were: An affirmative form /itta/(* has/have gone, can be any subject), an imperative form /itte/(go away), and a interrogative form /itta?/(has/have gone? Imaizumi et al., 1998). All stimuli were synthesized using ASL (Kay Elemetrics Corp., USA), an analysis-by-synthesis system based upon a speech signal produced by a male adult. Spectrograms of the stimuli are shown in Figure 2. Infant-directed speech was not used in the recording. The three stimuli have identical first syllables, and differ only in their final syllables. The duration of the first syllable /i/ is 80 ms followed by 200 ms of silent interval for geminate consonant /tt/and the final vowel with the length of 80 ms. The phonemic contrast, consisting of pair members /iita/ versus /itte/, is based upon differences in the final vowel due to manipulation of formants 1 and 2 and their transitions; however both syllables have identical fundamental frequencies. Members of the prosodic contrasting pair /iita/ versus /itta?/ differ in pitch contours due to the manipulation of the fundamental frequency (F0); specifically the interrogative form has a rising pitch on the final syllable, whereas the affirmative form has a slightly falling pitch on the last syllable (Figure 2).
Figure 2. Sound spectrograms for the three test words; a statement (S) /itta/, a question (Q) /itta?/ and a demand (D) /itte/ used in baseline and experimental conditions. All words were synthesized by changing the vocal pitch contour (F0) and the formant frequencies (F1 and F2). These words consisted of a common initial /i/ vowel with a length of 80 ms, followed by a silent interval for /t/, and a final syllable. This is adapted from Imaizumi et al. (1998) with permission.
Two main experimental conditions were: phonemic contrast and prosodic contrast. These were administered to respectively different groups of participants. Participants in both conditions received an identical baseline block of trials. In the phonemic condition, the stimulus /itta/ was repeated at 1-s intervals (trials) for a total of 15 s in the baseline block without any temporal variations; this block of trials was followed by another 15 s of presentations (trials) in the target block. In phonemic target block /itte/ and /itta/ were presented in a pseudo-random order at 1-s. intervals. In the prosodic condition, the same baseline condition was initially presented but it was followed a target block comprising a series presentations of /itta/ and /itta?/ randomized as in phonemic condition. The two blocks (baseline and target blocks) in each condition were alternated at least seven times for each condition. Presentation order of the two conditions was counterbalanced. Thus, as indicated above, the baseline for evaluating responses in experimental conditions was not silence but repetitions of the /itta/ stimulus that last 15 s. This use of non-silent baseline stimuli allowed us to extract those brain response components specific to differences in /a/ versus /e/ or to different pitch contours in each condition.
Near-infrared spectroscopy experiments were performed in a testing room at Keio University hospital. Evoked auditory responses in bilateral temporal area as well as a part of frontal and parietal regions were recorded using NIRS (ETG 4000, Hitachi Medical Corporation, Tokyo, Japan). This device emits 695 and 850 nm near-infrared lasers modulated at different frequencies and detects them with lock-in amplifiers to measure changes in the concentration and oxygenation of hemoglobin (Hb; Yamashita et al., 1996). The recording channels resided in the optical path of the brain between the nearest pairs of incident and detection probes which were separated by 2 cm on the scalp surface. A silicon pad with five incident and four detection probes, arranged in 3 × 3 square lattice, was placed laterally on each side of the head. The total number of recording channels on each side was 12. The pad was attached to the head such that the center detector probe in the bottom of horizontal probe-line corresponded to the T3 or T5 position in the international 10/20 system. The bottom horizontal line of the probes was roughly aligned with the T3–Fp1–Fp2–T5 line. Stimuli were presented to neonates with amplitudes of approximately 67 dB via two speakers positioned 20–25 cm above from the infants’ head. To prevent NIRS artifacts due to systemic vascular changes such as heart rate change and/or background sound changes, the stimulus sound levels were set relatively low. During the stimulation, the newborns were sleeping.
Our analysis method consisted of two parts which involved, respectively, multiple channel analyses and analysis of cortical region of interest (ROI). Because previous NIRS studies on phoneme perception focused only upon the temporal area, this investigation used NIRS to widen the focus to include other brain regions which might be involved in early phonetic processing. First, we analyzed each channel separately to gage localized activation levels. Channels showing strong activations were then compared with contra-lateral channel counterparts to assess laterality. Next, the ROI of the temporal region, determined according to the previous NIRS studies, was tested to assess the laterality effect.
Concentrations of oxygenated and deoxygenated Hb were calculated from the absorption of 695 and 830 nm laser beams sampled at 10 Hz, and smoothed with a 5-s moving average. Blocks of trials affected by movement artifacts were automatically removed after detecting rapid changes in oxy-Hb value, which had signal variations more than 0.7 mmol mm between successive samples (Rejection rate = 34.6%). The time-continuous data of Hb-signals for each channel were separated into analysis blocks, which consisted of 5 s baseline period followed by 15 s of the target block and 10 s of the baseline block. To eliminate long-term signal trends due to systemic vascular factors, a first-degree baseline fit was estimated for each channel using the first 4 and last 4 s of analysis block. The time course of Hb concentration changes of the analysis blocks were averaged more than five times for each of the stimulus conditions. To objectively set the time window for the analysis, we first calculated peak latency for all the sound conditions by averaging the Hb time course for all channels and participants. From the onset this latency was 11.1 s. Based on this value, a 5-s time window centered about the 11.1-s point, was determined for the target block (Watanabe et al., 2010; Minagawa-Kawai et al., 2011c). Five seconds prior to stimulus onset was used as a time window for the baseline block. The average concentration of oxy- and deoxy-Hb in each time window was calculated for all channels and for each subject. The significance of differences between Hb changes within the baseline and those within target blocks was determined using a t-test for each channel under two experimental conditions. Error rates were adjusted to accommodate multiple comparisons using a false discovery rate (FDR) for determination of statistical significance. Instead of the conventional family wise error correction procedure, a method of correction for multiple comparisons that has been shown suitable for NIRS studies (Benjamini and Hochberg, 1995; Singh and Dan, 2006) was applied to control for Type I and II errors. We set the value of q specifying the maximum FDR to 0.05, so that there were no more than 5% false positives on average in the number of significant channels.
Next, to assess laterality effects, we followed the same criteria as in previous studies. This entailed first defining a ROI of a vicinity of auditory area as CH6, 8, 9, and 11 on the left and CH19, 21, 22, and 24 on the right hemisphere. The averaged oxy-Hb values were calculated for each condition and hemisphere and then compared between hemispheres. Finally, we examined the laterality effect by employing an analysis procedure similar to that used in previous NIRS studies (Furuya and Mori, 2003; Sato et al., 2003, 2007; Minagawa-Kawai et al., 2005, 2007, 2009). This allows a direct comparison of results across different studies. For each participant, we selected one channel that showed the maximum oxy-Hb responses within a vicinity of auditory areas. This method has effectively revealed functional laterality of auditory processing between left-handers and right-handers (Furuya and Mori, 2003). The LI was calculated using the formula (L − R)/(L + R), where L and R are peak values on left and right sides respectively.
For spatial estimation of channel location in the brain, we employed the virtual registration method (Tsuzuki et al., 2007) to map NIRS data onto the MNI standard brain space. Although this method is basically applicable to adult brains, we adapted it for evaluation of infants’ brain activity by adjusting for differences in head size and the emitter–detector separation length (inter-probe separation) between adults and neonates. First, we calculated the average head size of neonates including the circumference (average, 33.8 cm; SD, 0.73), nasion–inion length (average, 20.8 cm; SD, 1.49), and length of preauricular points (average, 22.2 cm; SD, 1.22). The head size ratio of the adult to neonate was revealed to be similar to that of 30 mm inter-probe separation to 20 mm used for infants, with an error range of 2–3 mm. Because the error range of adults virtual registration for the same channel placement with an inter-probe separation of 30 mm was 4–8 mm, this registration can be applied to our participants. Considering the differences of detailed anatomy in infants and adults such as relative brain position in terms of 10–20 system, we did not use the detailed anatomical labeling obtained from virtual registration. Instead, we used the approximate anatomical labeling.
Both phonemic and prosodic contrasts activated the neonates’ brain in substantially broad areas involving superior temporal gyrus, inferior frontal gyrus, and inferior parietal regions. However, the two experimental conditions elicited respectively different time courses of Hb changes as well as revealing different activation foci. This is shown in Figures 3, 4 and Table 1. Figure 3 shows that Hb changes in the phonemic condition had 10.2 s of peak latency with an initial dip, whereas changes in the prosodic condition showed a peak of 12.1 s without an initial dip. There was no statistically significant difference between these peak times (t = 0.69, p = 0.24). Phonemic changes activated the inferior frontal, inferior parietal, and temporal areas with less parietal or superior part of activities on the right. In contrast, the prosodic changes evoked responses chiefly around temporal areas. Among these areas, activation foci whose p-value is below 0.01 (corrected) are CH6, CH22 (vicinity of auditory areas on the left and right) and CH5 (inferior parietal area) for the phonemic condition and CH24 [vicinity of auditory areas on the right, superior temporal sulcus (STS)/mid temporal] for the prosodic condition.
Figure 3. Hemodynamic responses to phonemic and prosodic sound changes. Grand averaged time course of Hb collapsing across all the channels each for the phonemic and prosodic condition. Dashed line indicates the onset of target block and vertical line is 5 s of analysis window.
Figure 4. Activation amplitude indicated by p-values for phonemic (A) and prosodic (B) conditions. Channel location of 12 channels for each hemisphere was estimated based on the virtual spatial registration (Tsuzuki et al., 2007). p-Values were corrected for multiple comparisons. Channels which did not reach 0.05 with a correction were indicated with a gray circle with white rays. Area shown with dashed line indicates vicinity of auditory area defined as ROI in this study. Channel numbers are indicated for both hemisphere (C).
To examine laterality differences, averaged oxy-Hb values in the ROI of the auditory area as well as the non-auditory channel (CH5) registering strong activity (p < 0.01, corrected) were compared with counterpart regions, i.e., contra-lateral ROI and channel. As CH6, 22, and 24 were included in ROI, we did not test them individually. Results of a paired t-test showed a significantly strong activation in left-CH5 (t = 2.29, p < 0.05, corrected) for the phonemic condition. Although the ROI activations in the phonemic condition showed slightly rightward dominance, they did not have any significant hemispheric difference probably due to larger variance than that of the prosodic condition (t = 0.84, p > 0.05). In the prosodic condition, significantly stronger activation was found in the right ROI than in the left one (t = 1.88, p < 0.05, corrected; Figure 5). To compare the neonates’ results with those from previous studies using similar methods, we applied the same analytic techniques used in those studies to assess the laterality of auditory areas. Laterality indices (calculated for each participant) are plotted in Figure 6 for each of the two experimental conditions. Consistent with the results obtained by the ROI analysis, only the prosodic condition showed a significant asymmetry effect. The LI scores for the prosodic condition were significantly lower than zero (t = 3.07, p < 0.01), indicating rightward dominance. t-Test also showed a significant difference between LI scores for phonemic and prosodic conditions (t = 2.24, p = 0.016).
Figure 5. Averaged oxy-Hb changes in ROI (A,B) and inferior parietal (C) channel for different condition. Error bars indicate SE. * = p < 0.05 (corrected).
Figure 6. Laterality indices for phonemic and prosodic conditions for each participant. Laterality index is above zero for left dominance and below zero for right dominance. Boxes, The quartiles; bars in the box, the medians; hinges, the ranges. * = p < 0.05 (zero-test).
To explore the early neural bases underlying segmental and suprasegmental processing, the present study measured hemodynamic responses to phonemic and prosodic contrasts in neonates. Results showed a large and significant activation in response to the prosodic change that was located in right temporal region. This suggests a functional specialization for suprasegmental properties in neonates. By contrast, the phonemic (vowel) contrast showed symmetrical Hb changes in auditory areas; however, it is noteworthy that this contrast also elicited a strong leftward response in the inferior parietal region. Here we discuss these results in light of developmental hemispheric specialization of the temporal area for phonemic and prosodic processing by comparing the results from the previous infant studies.
As indicated in the introduction, previous NIRS studies that have used identical stimulus contrasts reported finding an absence of functional specialization of the auditory area for two different phonetic contrasts in 7- to 8- and 9- to 10-month-olds (e.g., Sato et al., 2003). But the latter research also presented evidence of a tendency for right hemispheric dominance with prosodic contrasts. The present study used neonates as participants and it produced a clearer outcome. Neonates’ NIRS responses revealed significant right-dominance around auditory area in response to the prosodic change, suggesting that a specialized function of the right hemisphere for prosody processing is present at birth in human infants. The focus of this activation ranged over four channels in the right auditory region and appeared to involve the STS and mid temporal gyrus.
What kind of cognitive function is reflected in the brain activities in this area of the right hemisphere? This will depend upon a listeners age. It is difficult to associate activation in neonate response to a prosodic manipulation if this processing is interpreted to mean a high level of acquired language skills (e.g., distinguishing implied affirmations versus interrogation). Clearly, newborns will lack such skills. Rather, it is more reasonable to assume that this activity reflects a lower cognitive processing, one that involves differentiation of acoustic contours of those spectral components that change with a prosodic manipulation. There is further evidence to support this interpretation. For instance, neuroimaging data of adults showed a cerebral laterality that reflected differential responding to both fast versus slow band-noise stimuli (Boemio et al., 2005) and to temporal versus spectral modulated stimuli (Zatorre and Belin, 2001). Spectrally rich stimuli elicit activations in the anterior superior temporal gyrus as well as the right STS, and these activations increase with the richness of spectral variations (Zatorre and Belin, 2001). Although our prosodic stimulus is not long, it has pitch modulations with richer spectral changes than the other contrast of /itta/ and /itte/ which only has two spectrum differences. In the present study, it is assumed that contrasts between stimuli ending in a rising contour versus those with an unchanging pitch contour are chiefly processed around the right STS in neonates.
Other evidence speaks more directly to developmental issues. Telkemeyer et al. (2009) presented neonates with a subset of stimuli from Boemio et al. (2005) and showed a significant response near the right temporo-parietal area to “slow” stimuli, although its effect is not so powerful. Homae et al. (2006) presented 3-month-old infants with sentential speech prosody and reported dominant activations of the right temporo-parietal region. Although these activated regions are not in brain areas identical to those active regions found in our investigation, there is nonetheless, a rightward superiority in processing prosodic information in neonates or young infants that is in agreement with our findings. Furthermore, these data together with those gathered in the present study suggest the operation of a neuronal network involving the temporo-parietal region and STS/MTG which is partially active from birth.
Other evidence appears to conflict with these findings. Minagawa-Kawai et al. (2011b) presented stimuli used by Zatorre and Belin (2001) to neonates with aim of examining signal-driven mechanisms in early infancy. In this study, they used the contrast temporal versus spectral variations where speed of tone alternation or spectral richness was manipulated. They did not find clear hemispheric specialization associated with signal properties (temporal versus spectral), although intensity of signal change (relative entropy) did correlated with the amplitude of Hb changes. These conflicting results on the lateralization in young infants’ brain may be associated with the difference between speech and non-speech stimuli, because it seems that speech elicited clearer lateralization (Peña et al., 2003; Gervain et al., 2008) than non-speech did (Minagawa-Kawai et al., 2011b; Telkemeyer et al., 2009) in neonates suggesting a specific role of human vocalization. Further research is required to explain these discrepancies. This discrepancy underscores the need for greater attention to clarifying the acoustic definition of signal-driven system with regard to critical details of acoustic features that may be determining these conflicting outcomes. Thus the influence of the signal-driven system on the cerebral responses during processing speech is still tentative conclusion.
With respect to the phonemic vowel contrast of /itta/ versus /itte/, a cross-sectional study (Sato et al., 2003) showed that their youngest groups (7–8, 9–10 months of age) evoked activations equally in bilateral temporal areas. It was only when infants approached 11 months of age that they showed a lateralization of the vowel difference in the form of leftward dominance. Our results provide additional evidence that the auditory region functions as an innate starting for the development of auditory processing. Taken together, both sets of findings permit the inference that bilateral engagement for processing vowel contrast continues from birth to ages of 7 months. Although we lack data for 2- to 6-month-olds, previous results for different vowel types showing a bilateral temporal response in 3- to 4- and 6- to 7-month-olds (Minagawa-Kawai et al., 2007) supports the idea of a continuous developmental trajectory through these intervening age levels.
Several interpretations may account for emergence of symmetrical auditory responses to the vowel changes. First, as in similar the prosodic condition, a signal-driven mechanism can explain these bilateral activations. As indicated earlier, a hypothetical signal-driven mechanism may determine bilateral responses in the temporal cortex rather than a leftward one in reaction to vowel changes. In general, vowels have been reported to be less lateralized than consonants (e.g., Haggard and Parkinson, 1971), and this may be attributable to the fact that vowels contain spectral components that change more slowly than consonants which exhibit quite rapid dynamic changes. Some neuroimaging studies also support this idea by showing greater involvement of the left planum temporale in processing CV than when a tone or a vowel is presented in isolation (Jancke et al., 2002). Admittedly, laterality is not entirely based upon signal factors, but at least the present study indicates their primary impact on neonates. In this sense, the present findings with both phonetic and prosodic contrasts can be explained by a signal-driven mechanism. In fact, a model incorporating this idea has been proposed by Minagawa-Kawai et al. (2011a). It describes developmental hemispheric specialization associated with language acquisition. Basically, this model assumes that lateralization for language emerges out of the interaction between pre-existing left-right biases in generic auditory processing (signal-driven mechanism), and a left-hemisphere predominance of particular learning mechanisms.
A second approach to this issue focuses upon the immaturity of the nervous system in neonates. Functional cerebral lateralization is typically assumed to reflect a mature neural network ranging over both hemispheres, but in resting states it has been shown that neonates have less connectivity across hemispheres than do 3 month-olds (Homae et al., 2010). However, this interpretation does not specifically take into account the right hemispheric dominance for prosodic contrasts. In any case, what appears clear is that bilateral activities for vowel processing eventually become functionally lateralized to the left auditory area as infants learn vowel categories of their native language. Thus, as an infants’ brain matures physiologically it does so in conjunction with a reorganization of synaptic connections.
To this point our discussion of lateralization has been confined to the vicinity of auditory areas. However, another rather unexpected finding was discovered in this study: dominant activations were observed in the left parietal region during vowel discrimination. These activations seem to be in the supra marginal gyrus (SMG) according to the probabilistic spatial estimation (Tsuzuki et al., 2007). The neuroimaging literature often refers to SMG in relation to speech perception. An MRI study of lesions in aphasic adult patients by Caplan et al. (1995) indicates that the left SMG is the principal site of phonemic processing; thus, patients with lesions in this area typically fail to discriminate and identify phonemes. Further, Zatorre et al. (1992) also showed that discrimination of consonant types in CVC syllable, activated the left SMG. It seems left SMG is also involved in tasks requiring verbal or auditory short-term memory (Paulesu et al., 1993). Although the neonates did not engage in any particular task in the present study, auditory short-term memory is a likely candidate for explaining SMG activations. That is, cognitive process of discrimination during the target block may underlie in the activities of left SMG even in sleeping neonates. Specifically, in contrast to the baseline trial block in which the infants received the same word repetitively, in phonemic target trials, infants had to discriminate between two temporally separated words (/iita/ and /itte/) that differ in vowels. It seems likely that this would place demands on short-term memory. Thus, activity observed in the Left SMG may a type of memory processes that is required for phoneme detection/discrimination but not for prosodic discrimination. If this interpretation is correct, these data provide indirect evidence that a neuronal substrate implicated in short-term memory may also be functional in newborns. But a caveat is warranted regarding whether or not the left SMG activation is language-specific/phoneme-specific. Future studies using non-speech analogs of the present stimuli should clarify this issue.
The involvement of various cerebral mechanism and their role in laterality during phonetic processing in infants has been examined by EEG studies as well as dichotic listening studies. But evidence has been limited with regard to phonetic processing in neonates. What evidence exists shows that newborns exhibit discriminative reactions to vowel differences (Cheour-Luhtanen et al., 1995; Dehaene-Lambertz and Pena, 2001) and that their auditory areas are sensitive to categorical voicing difference (Simos and Molfese, 1997). Very young infants also tend to show delayed latency of mismatch negativity to phonemic difference as compared with that of adults suggesting infants’ premature processing system (Dehaene-Lambertz and Gliga, 2004). However, laterality differences in infants, based upon early EEG studies, have provided rather diverse results showing left dominance (Dehaene-Lambertz and Baillet, 1998), right dominance (Molfese and Molfese, 1988; Novak et al., 1989), and bilateral activation (Simos and Molfese, 1997). The diversity of such outcomes is probably due to the limitation of spatial resolution of EEG. But, dichotic listening, test using the sucking procedure for infants, has also revealed a complicated picture regarding the laterality (Bertoncini et al., 1989). Furthermore, many of these EEG studies did not precisely reveal the activation focus or brain region involved. However, a few studies employed high-density ERP and/or sophisticated dipole modeling and these should provide better spatial resolution. For instance, Dehaene-Lambertz et al. (2004) tested 3-weeks-old infants with sylvian infarct on the left hemisphere and they found a discriminative response to vowel differences that implied a right-hemisphere contribution to vowel perception at this age. A recent study on 2-months-old infants detected ERP source locations on the left hemisphere for vowel processing (Bristow et al., 2009). These locations involved the inferior frontal gyrus and superior temporal gyrus and temporal sulcus. Activation in the superior gyrus is consistent with our results but not with those of other studies. We had strong activations in the parietal regions but not for the inferior frontal gyrus. Although the diversity of these findings may be due to variations in testing instruments, stimulus presentation and infants’ age, the co-registration of ERP and NIRS may further provide detailed evidence with respect to time course and brain region of phonemic processing in young infants’ brain.
The NIRS methodology offer more reliable cortical localizations than EEG techniques, but studies using the former methodology have not investigated vowel processing in neonates. Nevertheless, some of these studies address aspects of neonates’ speech perception that are relevant to the discussion of the NIRS data presented here. Neonates showed left-dominant Hb responses from the temporal area during listening to forward (normal) speech in contrast to bilateral response to backward speech (Peña et al., 2003). Such asymmetrical activations are also observed in response to repetition sequences of syllables against random controls (Gervain et al., 2008) suggesting neonates’ ability to find out a certain type of language structure. These results imply that it is not only signal properties that modulate the laterality of neonates, because acoustic features of target and control stimuli are similar in these studies. Thus, as with adults, laterality in neonates may also be driven by cognitive activity elicited by the specific type of stimuli and/or the presentation method. Finally, the present study found activation focus in the temporal area and SMG, but prefrontal measurement with another probe pad would reveal other activation focus reflecting novelty detection. Similar activity has been observed in the prefrontal region in 2–3 month-olds (Nakano et al., 2009). As discussed here, infant NIRS studies have enabled us to discuss localized brain function in relation to language development. Another possibly relevant parameter of NIRS that is implicated in this study is the latency or response shape of Hb time course. Although there was no statistically significant difference in latency between the conditions, prosodic condition with different response shape elicited rather slower Hb response than that for the phonemic condition. This could derive from difference of processing speed, and intonation contour may require higher spectral resolution.
In summary, by presenting segmental versus suprasegmental (phoneme versus prosody) contrasts to newborn infants, the present study revealed a functional lateralization to right temporal area for prosody processing and bilateral engagement of the auditory areas for vowel contrast. Overall, these results were explained by the signal properties of the acoustic stimuli which differentially activated distinct regions in the temporal cortex. This is the first evidence showing that neonates exhibit localized cerebral responses to phonemic contrasts of vowel and prosody. We further showed a left-dominant activation in neonates around inferior parietal region suggesting an early neuronal basis for auditory-verbal short-term memory. This study suggests that a brain mechanism for a certain form of signal-driven system in the speech stimulus context is present at birth and that it possibly operates in coordination with a domain driven system. This raises several important issues that merit further exploration in the development of infants’ neurocognitive system, including differential impact of speech and non-speech on the lateralization of neonates’ brain.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors thank K. Kosaki, Y. Matsuzaki, M. Miwa, E. Okishio, and all the staffs of neonatal unit of Keio University Hospital for help with the study, T. Imaizumi for kindly providing us the sound stimuli, K. Maekawa for his generous comments and S. Ishii and A. Matsuzaki for help with conducting the experiment. This work was supported by Grant-in-Aid for Scientific Research (A) (KAKENHI, project No. 21682002), Academic Frontier Project supported by Ministry of Education, Culture, Sports, Science and Technology (MEXT) and the Global COE (Center of Excellence) program Keio University.
Bristow, D., Dehaene-Lambertz, G., Mattout, J., Soares, C., Gliga, T., Baillet, S., and Mangin, J. F. (2009). Hearing faces: how the infant brain matches the face it sees with the speech it hears. J. Cogn. Neurosci. 21, 905–921.
Cheour, M., Ceponiene, R., Lehtokoski, A., Luuk, A., Allik, J., Alho, K., and Naatanen, R. (1998). Development of language-specific phoneme representations in the infant brain. Nat. Neurosci. 1, 351–353.
Cheour-Luhtanen, M., Alho, K., Kujala, T., Sainio, K., Reinikainen, K., Renlund, M., Aaltonen, O., Eerola, O., and Naatanen, R. (1995). Mismatch negativity indicates vowel discrimination in newborns. Hear. Res. 82, 53–58.
Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S., and Dupoux, E. (2003). Phonological grammar shapes the auditory cortex: a functional magnetic resonance imaging study. J. Neurosci. 23, 9541–9546.
Minagawa-Kawai, Y., Cristià, A., Vendelin, I., Cabrol, D., and Dupoux, E. (2011b). Assessing signal-driven mechanisms in neonates: brain responses to temporally and spectrally different sounds. Front. Psychology 2:135. doi: 10.3389/fpsyg.2011.00135
Minagawa-Kawai, Y., Naoi, N., Kikuchi, N., Yamamoto, J., Nakamura, K., and Kojima, S. (2009). Cerebral laterality for phonemic and prosodic cue decoding in children with autism. Neuroreport 20, 1219–1224.
Minagawa-Kawai, Y., van der Lely, H., Ramus, F., Sato, Y., Mazuka, R., and Dupoux, E. (2011c). Optical brain imaging reveals general auditory and language-specific processing in early infant development. Cereb. Cortex 21, 254–261.
Molfese, D. L., and Molfese, V. J. (1988). Right-hemisphere responses from preschool children to temporal cues to speech and nonspeech materials: electrophysiological correlates. Brain Lang. 33, 245–259.
Mottonen, R., Calvert, G. A., Jaaskelainen, I. P., Matthews, P. M., Thesen, T., Tuomainen, J., and Sams, M. (2006). Perceiving identical sounds as speech or non-speech modulates activity in the left posterior superior temporal sulcus. Neuroimage 30, 563–569.
Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., Vainio, M., Alku, P., Ilmoniemi, R. J., Luuk, A., Allik, J., Sinkkonen, J., and Alho, K. (1997). Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature 385, 432–434.
Novak, G. P., Kurtzberg, D., Kreuzer, J. A., and Vaughan, H. G. Jr. (1989). Cortical responses to speech sounds and their formants in normal infants: maturational sequence and spatiotemporal analysis. Electroencephalogr. Clin. Neurophysiol. 73, 295–305.
Peña, M., Maki, A., Kovacic, D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, F., and Mehler, J. (2003). Sounds and silence: an optical topography study of language recognition at birth. Proc. Natl. Acad. Sci. U.S.A. 100, 11702–11705.
Sato, Y., Mori, K., Furuya, I., Hayashi, R., Minagawa-Kawai, Y., and Koizumi, T. (2003). Developmental changes in cerebral lateralization during speech processing measured by near infrared spectroscopy. Jpn. J. Logoped. Phoniatr. 44, 165–171.
Schönwiesner, M., Rubsamen, R., and von Cramon, D. Y. (2005). Hemispheric asymmetry for spectral and temporal processing in the human antero-lateral auditory belt cortex. Eur. J. Neurosci. 22, 1521–1528.
Shtyrov, Y., Kujala, T., Palva, S., Ilmoniemi, R. J., and Näätänen, R. (2000). Discrimination of speech and of complex nonspeech sounds of different temporal structure in the left and right cerebral hemispheres. Neuroimage 12, 657–663.
Telkemeyer, S., Rossi, S., Koch, S. P., Nierhaus, T., Steinbrink, J., Poeppel, D., Obrig, H., and Wartenburger, I. (2009). Sensitivity of newborn auditory cortex to the temporal structure of sounds. J. Neurosci. 29, 14726–14733.
Keywords: phoneme, prosody, functional lateralization, neonates, NIRS, auditory area
Citation: Arimitsu T, Uchida-Ota M, Yagihashi T, Kojima S, Watanabe S, Hokuto I, Ikeda K, Takahashi T and Minagawa-Kawai Y (2011) Functional hemispheric specialization in processing phonemic and prosodic auditory changes in neonates. Front. Psychology 2:202. doi: 10.3389/fpsyg.2011.00202
Received: 22 December 2010; Accepted: 09 August 2011;
Published online: 15 September 2011.
Edited by:Judit Gervain, CNRS – Universite Paris Descartes, France
Reviewed by:Janet F. Werker, University of British Columbia, Canada
Ho Henny Yeung, Centre National de la Recherce Scientifique, France
Copyright: © 2011 Arimitsu, Uchida-Ota, Yagihashi, Kojima, Watanabe, Hokuto, Ikeda, Takahashi and Minagawa-Kawai. This is an open-access article subject to a non-exclusive license between the authors and Frontiers Media SA, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and other Frontiers conditions are complied with.
*Correspondence: Yasuyo Minagawa-Kawai, Graduate School of Human Relations, Keio University, 2-15-45 Mita, Minato-ku, Tokyo 108-8345, Japan. e-mail: firstname.lastname@example.org