Five-Year-olds' Acoustic Realization of Mandarin Tone Sandhi and Lexical Tones in Context Are Not Yet Fully Adult-Like

Large numbers of children around the world are learning tone languages, but few studies have examined the acoustic properties of children's early tone productions. Even more scarce are acquisition studies on tone sandhi, a tone change phenomenon which alters the surface realization of lexical tones. Two studies using perceptual coding report the emergence of lexical tone and tone sandhi at around 2 years (Li and Thompson, 1977; Hua and Dodd, 2000). However, the only acoustic study available shows that 3-year-olds are not yet adult-like in their lexical tone productions (Wong, 2012). This raises questions about when children's productions become acoustically adult-like and how their tone productions differ from those of adults. These questions were addressed in the current study which compared Mandarin-speaking pre-schoolers' (3–5-year-olds) tone productions to that of adults. A picture naming task was used with disyllabic real words familiar to pre-schoolers. Overall children produced appropriate tone contours for all tones, i.e., level for tone 1, rising for tones 2, 3 and full sandhi, falling for tone 4 and half sandhi. However, children's productions were not adult-like for tones 3, 4, and the sandhi forms, in terms of coordinating pitch range, slope and curvature, with little evidence of development across ages. These results suggest a protracted process in achieving adult-like acoustic realization of both lexical and sandhi tones.


INTRODUCTION
Despite recent interest in tone languages, little is known about the acquisition of lexical tones compared to segments (i.e., vowels and consonants). This is in spite of the pervasiveness of tone languages-it is estimated that more than half the world's languages are tonal (Yip, 2002). Especially lacking is knowledge about children's early productions of lexical tone, and if and how these differ from adult forms. This is also the case for phonological processes that involve lexical tone change (tone sandhi). For example, Mandarin has tone sandhi processes whereby the surface tone changes depending on tonal context, i.e., from tone 3 to a rising or falling tone. The acquisition of such phonological processes has not attracted much attention in the field of language acquisition. Given the large population of children learning tone languages, understanding how lexical tone and tone sandhi processes are acquired is crucial for providing a comprehensive account of language acquisition above the level of the segment. In this paper, we examine the early production of both lexical and sandhi tones in terms of their acoustic realizations to determine if pre-schoolers' productions are adultlike.
Languages that have lexical tone manipulate pitch height and pitch contours to change the meanings of words. Whereas in English rising and falling pitch contours on words are typically associated with prosodic information such as intonation and focus, in lexical tone languages these can change the meaning of the word. A well-studied lexical tone language is Mandarin, with the largest population of speakers around the world. Mandarin has a four-tone system with one level and three contour tones; tone 1 has a high level contour ("mā": mother), tone 2 a rising contour ("má": hemp), tone 3 a dipping contour ("mǎ": horse), and tone 4 a falling contour ("mà": reprimand). See Figure 1 for pitch contours across time on the four lexical tones. While all four lexical tones appear in the productions of Mandarinspeaking children by the 1-word stage of development, confusion between tones 2 and 3 (rising and dipping tones) continues into the 2/3-word stage of development, finally disappearing as longer sentences are produced (Li and Thompson, 1977). Only one study has reported on the acoustic characteristics of lexical tone produced by Mandarin-speaking 3-year-olds in North America (Wong, 2012). Using monosyllabic words, the study showed that 3-year-olds did not yet have adult-like productions in terms of pitch range and slope, especially for tone 3, indicating that young children face challenges in producing complex tonal contours. This is mirrored in perception studies with 3-year-olds showing difficulty with tone identification, especially for the tone 3having the most complex tone contour (Wong et al., 2005). Another study using perceptual coding of Mandarin-speaking FIGURE 1 | Mean f0 at 10 time points for the four lexical tones produced by 16 female native speakers of Mandarin from Beijing (shaded area is 95% confidence interval).
Taiwanese 4-and 5-year-olds' tone productions, showed that older pre-schoolers still had a substantial number of atypical productions, with no changes over development (Wong, 2013). Together these studies suggest that Mandarin-speaking preschoolers are still learning to produce tone in an adult-like manner.
These studies point to a protracted acquisition period for Mandarin lexical tones, especially when compared to other tone languages with larger tone inventories. For example, Cantonese is a language with a six-tone system, but children are reported to have acquired all tones by the age of 2 (So and Dodd, 1995). This includes tones with very similar pitch contours, e.g., three level tones (high, mid, and low) and two rising tones (high and mid). The same early acquisition of the full tone inventory is also observed in Thai, a language with 5 tones including three level tones (high, mid, and low), a rising and falling tone, where all tones were present by the 2-word stage (Tuaycharon, 1977). Thus, a larger tone inventory and similarities between tonal contours does not appear to delay tone acquisition. One obvious reason might be that the perceptual coding (rather than acoustic analysis) used in these studies overestimated children's abilities. Indeed, during early stages of language acquisition, children can produce acoustic contrasts that may not be detected by the listener (Scobbie et al., 2000). However, another possibility is the role of Mandarin tone sandhi in acquisition. Tone change processes such as tone sandhi, where children must learn to associate multiple surface forms with their underlying forms, might contribute to a protracted acquisition process. However, little is known about the acoustic realizations of children's early productions of tone sandhi, whether it is adult-like and if it is acquired with lexical tones.
There are two contexts for tone sandhi process in Mandarin. The full sandhi context occurs when two tone 3 syllables occur in succession (tones 3-3), and the first becomes a rising tone. The half sandhi context occurs when tone 3 is followed by any other tone (tones 1, 2, or 4), and is realized with a falling pitch. See Figure 2 for pitch contours plotted over time for the full and half sandhi tones. Therefore, except in the utterance final position, tone 3 is always realized as full or half sandhi in connected speech. Previous studies have reported that tone sandhi emerges by the 2/3-word stage of development (around 2 years), when children begin to combine words (Li and Thompson, 1977;Hua and Dodd, 2000). However, it is unclear how sandhi forms are acoustically realized in children's productions. To date, we know of only one study which has reported on the acoustic characteristics of tone sandhi productions by pre-schoolers (Xu Rattanasone et al., 2016). That study reported that 3-year-olds' production of Mandarin tone sandhi on known words had tonal contours that are consistent with the sandhi forms. However, no adult control group was used and so it remains unclear the extent to which 3-year-old's productions are acoustically adultlike. Given previous reports on the protracted acquisition of tone sandhi, it is unlikely that 3-year-olds' productions would be adult-like at this early age. Indeed, in Bantu languages such as Sesotho, where lexical and grammatical tone interact, tone sandhi processes begin to emerge only by 3 years or later, as children learn more about the grammar of the language (Demuth, 1993).
Currently, it is unclear why some studies have reported Mandarin tone acquisition to be a protracted process. This could be related to the presence of a tone sandhi process or difficulty in producing the adult-like forms of both lexical and sandhi tones. One possibility is that children are producing global tonal contours that are consistent with lexical and sandhi tones (level, rising and falling), but are not yet able to make finer acoustic adjustments in an adult-like manner (e.g., pitch range). Indeed, previous studies of 3-year-olds have shown that children are producing global tonal contours that are consistent with lexical and sandhi tone targets (Xu Rattanasone et al., 2016), but these are not yet adult-like in terms pitch range, slope and curvature (Wong, 2012). A recent study reporting on adult ratings of child productions showed that compared to adult productions, children's productions were rated as being less accurate, especially in complex phonetic contextsdisyllables (Wong and Strange, 2017). These complex phonetic contexts include the tone sandhi context, but it is unclear from that study whether children were producing sandhi forms. Therefore, it remains unclear when children might reach adultlike productions on acoustic characteristics of pitch range, slope and curvature for both lexical and sandhi tones in context such as in disyllables.
In this study, we addressed the question of whether preschooler's lexical and sandhi tone productions are acoustically adult-like by comparing 3-, 4-, 5-year-olds' productions to adult forms. All participants were monolinguals raised in Beijing. First, we report on lexical tone productions. Based on previous research, we expected that all children might produce global tonal contours that are consistent with the four lexical tones (level, rising, dipping and falling). However, we also predicted that children might not reach adult-like levels on acoustic measures such as pitch range, slope and curvature. We also expected that there might be a developmental effect whereby older 5-yearolds' productions would be more acoustically adult-like than the productions of younger children. Secondly, we report on tone sandhi productions from the same groups of children compared to adults. Based on one previous study (without an adult control group), we expected that children's global tonal contours for tone sandhi productions to be consistent with full and half sandhi forms (rising vs. falling). No study has yet reported on children's productions of pitch range, slope and curvature FIGURE 2 | Mean f0 at 10 time points for full-and half-sandhi produced by 16 female native speakers of Mandarin from Beijing (shaded area is 95% confidence interval).
for tone sandhi, but based on lexical tone, we predicted that children will not be adult-like on these measures. However, 5year-olds' productions might be more adult-like than younger 3-year-olds.
All children were recruited in Beijing from the preschool associated with the Beijing Language and Culture University. The study was conducted in accordance with the ethics protocol approved by Macquarie University's Human Ethics Panel. All child participants received stickers for their participation and the preschool received book donations for all children to use at the center.
A total of 16 adult female controls, with mean age of 24 years (range: 19-35 years) were recruited. All adults are native speakers of Beijing Mandarin and were undertaking graduate or postgraduate training in Sydney. Written consent was provided prior to participation in the study and they were paid $20 for their travel and time.
A within-subjects design was used. All participants were asked to name all lexical tone and tone sandhi items during testing.

Stimuli
The stimuli included a total of 28 disyllabic words familiar to pre-schoolers (Table 1). To elicit the lexical tones, 12 disyllabic words with tones 1, 2, and 4 on the first syllable and tones 1-4 on the second syllable were chosen. It was not possible to find enough familiar words for pre-schoolers all beginning with tone 1 to avoid tone co-articulation effects, therefore an equal number of words beginning with tones 2 and 4 (rising and falling contours) were also included to elicit a range of tonal contexts. It was also not possible to avoid some words ending in nasal /n/ and /η/ codas, which can have the effect of lowering the pitch of the syllable.
To elicit full sandhi, four disyllabic tone 3-3 words were chosen. For half sandhi 12 disyllabic words were chosen, with tone 3 as the first syllable and tones 1, 2, and 4 as the second syllable. This resulted in a total of 16 sandhi items. An additional two practice items in the full sandhi form (a puppy and a pony) were used at the beginning of each session to help train children to performing the task. These training items were not analyzed.
Most syllables had a CV structure, and where possible contained a stop or fricative/affricate onset to facilitate acoustic coding. However, a few contained a lateral or nasal onset, and some contained a nasal coda. Two versions of the test were created, each with a different randomization for the presentation order of words. See Appendix for Table A1 on durations of each tone by syllable.

Procedure
Testing was conducted in a quiet area in the preschool. Each child was greeted by the native Mandarin-speaking experimenter, the first author. The task was explained as a picture naming game where children named the pictures on an iPad and received stickers for playing the game. Two practice trials were given, and for children who could not provide an answer after three attempts, the experimenter provided the answer, e.g., "puppy." The child was then asked to repeat the label before moving on to the next item. The children were encouraged to provide answers independently during the practice trials.
All children could perform the elicitation task during the test trials, however, there were two items which most children could not name, i.e., the tones 3-2 word (gloss: rainbow) and tones 2-1 word (gloss: lobster). For these items, the experimenter named the items but the imitations from the children were not analyzed.
The same procedure was used for testing all children as well as the adult control.

Data Analysis
The productions were acoustically coded in Praat (Boersma and Weenink, 2012) by a trained coder who is a native speaker of Mandarin. Two additional native speakers listened to all tokens. No mis-productions on consonant or vowel segments were identified by any of the three listeners so all productions were included and contributed to the final analyses. A total of 60,090 tokens were analyzed, 17,860 from the 3-year-olds, 14,920 from the 4-year-olds, 17,260 from the 5-year-olds, and 11,050 from the adults. The tones were extracted from the vocalic portion of the target syllable (and nasal if present), this was the second syllable for lexical tone words and the first for tone sandhi words. The vocalic portion was identified from the onset to cessation of higher formants. In cases where the second syllable had a nasal onset, anti-resonance and simplification of the waveform was used to identify the onset of the second syllable. F0 points were tracked within the annotated interval, using autocorrelation algorithm in PRAAT, and these f0 points were checked and manually revised to correct for any "doubling" or "halving" errors in pitch tracking. The revised pitch track was then interpolated and smoothed with a bandwidth of 20 Hz. F0 was then extracted in 10 equal steps for each syllable. The raw f0 values were transformed into semitones, with reference frequency of 100 hz, for anlaysis.

RESULTS
To examine whether children's lexical tone and tone sandhi productions were adult-like, second order polynomial models were conducted for each tone separately (6 models in total: 4 for lexical tones and 2 for sandhi forms). Alpha was set at 0.008 after Bonferroni adjustment was made for multiple comparisons. In all models, children's productions were compared to the adult controls. The first order linear trends compared the steepness of the slopes, and the larger estimates indicated steeper slopes with larger differences between f0 onset and offset, i.e., greater pitch range. The second order quadratic trends compared the areas under the curve, with larger estimates indicating larger areas, i.e., more curvy contours.
Since children have higher pitch than adults, data for each age group was centered around the group means to ensure that only differences in f0 contour is analyzed and not the absolute f0 differences between children and adults. All analyses were conducted in R (R Core Team, 2013) using the lmerTest() function of the lme4 package with Satterthwaite adjustments to denominator degrees of freedom (Bates et al., 2015). The model included f0 measured over 10 time points (every 10%) as the dependent variable with Age group (3-, 4-, 5-year-olds, and Adults) as the fixed factor. Each speaker was entered as a random variable with random intercept estimated separately for each age group. The models for lexical tones (1-4) are reported first followed by tone sandhi (full-and half-sandhi). See Tables 2,3  for fixed effects model estimates of lexical tone and tone sandhi as well as R-codes for estimating the maximal model.

Lexical Tones
We predicted that children would produce global tonal contours consistent with level, rising, dipping and falling tones but will not be adult-like in producing pitch range, slope and curvature. We also predicted that older 5-year-olds' productions might be more adult-like than younger 3-year-olds. The results for lexical tone are shown in Table 2 and Figure 3. After Bonferroni adjustments for multiple models (6, 4 lexical tones and 2 sandhi tones), alpha was set at 0.008. For tone 1, there was a significant linear trend and its interaction with age. The significant linear trend in the absence of a significant quadratic trend suggests that tone 1 productions from children and adults had a level contour, consistent with the contour expected for tone 1. The linear interaction with 3-and 4year-olds, with significant negative estimates compared to adults, suggest that children's tone 1 productions had a flatter slope than adults.
The results for lexical tone 2 showed significant linear and quadratic trends suggesting that both children and adults produced a curved rising f0 contour. There were significant interactions with age for both the linear and quadratic trends. The positive effect on the linear term for 3-year-olds suggest that they produced f0 contours with steeper slopes than adults and therefore, a larger f0 range. The significant negative effect on the quadratic term for 5-year-olds suggest they produced a flatter f0 curve than adults. No other significant interactions were found.
The results for lexical tone 3 showed significant linear and quadratic trends suggesting that both children and adults produced a curved falling f0 contour. Since tone 3 has a negative going contour, the positive effect on the linear term for all three child ages suggest that children had a flatter f0 slope compared to adults, and produced tone 3 with a smaller f0 range. There were no significant interactions with the quadratic trend which suggests that the curviness of the f0 contours in the child and adult productions did not differ.
The results for tone 4 showed significant main effects for both linear and quadratic trends and interactions with age for all three age groups. The linear and quadratic trends suggest that both child and adults produced a curved falling f0 contours. The positive effect of all child groups on the linear and quadratic terms suggest that children produced flatter f0 curves and slope compared to adults, with reduced f0 range.
Overall, the results on lexical tones suggest that children were adult-like for producing global tonal contours consistent with a level contour for tone 1, rising for tone 2, dipping for tone 3 and falling for tone 4. They were also adult-like on f0 range, slope and curvature for tone 1, and mostly adult-like for tone 2. However, all children produced tone 3 with flatter slope and reduced f0 range compared to adults. Children's production of tone 4 differed the most from that of adults, with children producing both reduced f0 range and flatter f0 curves. The results did not show any consistent developmental changes across age, suggesting that older 5-year-olds were not more adult-like in their productions than younger 3-year-olds.

Tone Sandhi
We predicted that children might produce the correct global tonal contours that are consistent with full and half sandhi (rising and falling) but would not be adult-like in producing pitch range, slope and curvature. However, children's productions might be more adult-like for older 5-year-olds than younger 3-year-olds. The results for the sandhi forms are shown in Table 3 and Figure 4.
For full sandhi, there was a significant main effect of linear and quadratic trends and an interaction with age for the quadratic FIGURE 3 | Mean f0 at 10 time points for the four lexical tones by three child ages and for adults (shaded area is 95% confidence interval).
Frontiers in Psychology | www.frontiersin.org trend. The linear and quadratic trends suggest that both children and adults produced curved rising f0 contours. The negative effects of all child groups on the quadratic term suggest that children produced full sandhi with flatter f0 contours than adults.
The results for half sandhi showed significant main effects of linear and quadratic trends and a significant interaction with age for the linear trend. The linear and quadratic trends suggest that both children and adults produced curved falling f0 contours. The positive effects of all child groups on the linear term suggest that children produced half sandhi with flatter f0 slopes and reduced f0 range compared to adults. These results suggest that children are not yet adult-like in their tone sandhi productions for f0 range, slope and contour, even for the oldest age group.

DISCUSSION
The aim of this study was to examine the acoustic realizations of lexical and sandhi tones in the productions of pre-schoolers (3-, 4-, and 5-year-olds) to determine if and when they become adult-like. First, all global contours on the children's lexical tone productions were consistent with the productions by adults: a level contour for tone 1, a curved rising contour for tone 2, a curved downward dipping contour for tone 3, and a falling contour for tone 4.
However, in terms of pitch range, slope and curvature, the acoustic analysis of lexical tones suggest that children were not achieving adult-like productions across all tones. While child and adult productions of tones 1 and 2 were the least different, tones 3 and 4 showed much more difference between child and adult productions. For tone 1, 3-, and 4-year-olds produced pitch contours with smaller pitch range and flatter pitch contour compared to adults, but were adult-like by 5 years. Tone 2 also showed few differences between child and adult productions with 3-year-olds producing a larger pitch range and slope compared to adults, and 5-year-olds producing a contour that is less curvy compared to adults. No other group differences were found. Therefore, for tone 2, despite having a curved contour, most preschoolers produced it in an adult-like manner consistent with a rising tone.
Children's productions of tones 3 and 4 differed the most from adult productions in terms of pitch range, slope and curvature. For tone 3, children across all three age groups had a reduced pitch range and slope compared to adults. However, children did not show any challenges in producing curved contours for the complex tone 3; in fact, the degree of curvature did not differ from adult productions. For tone 4, all children's productions were reduced in pitch range and slope, as well as having a flatter contour with less curvature compared to adult productions. These results suggest that for the two lexical tones with a falling contour, tones 3 and 4, pre-schoolers are still struggling to coordinate pitch range, slope and curvature, even at the age of 5.
The second aim of this study was to examine the acoustic realizations of tone sandhi in children's productions. The analyses for both full and half sandhi suggest that children produced global tonal contours that are consistent with full and half sandhi tones (rising and falling). However, children are not yet adult-like on pitch range, slope and curvature. Compared to adults, children produced full sandhi contours with a flatter rising curve and half sandhi contours with a smaller falling pitch range and slope, again showing challenges in producing adult-like forms.
The results from both lexical tone and tone sandhi suggest that children are still fine-tuning their control and coordination of pitch range and slope with curvature, especially for tones 3, 4, and the sandhi forms. This provides support for Wong (2012Wong ( , 2013 and suggests that reaching adult-like tone realization on specific acoustic measures is a protracted process. However, our study also found that even 3-year-olds could produce the overall tonal contours consistent with level, rising, dipping and falling tones, important for maintaining tone category distinctions. This may help explain why studies using perceptual coding have reported earlier acquisition of lexical tones compared to studies using acoustic measures; the former may have captured children's ability to produce global tonal contours that are consistent with the different tone categories (Hua and Dodd, 2000), whereas the latter identified the implementation of specified acoustic measures (pitch range, slope and curvature) that are not yet adult-like (Wong, 2012). Together with our study, these results suggest the gradual acquisition of tone realization, with children producing global contours first, and later fine-tuning of pitch range, slope and curvature. Studies with older children, and on FIGURE 4 | Mean f0 at 10 time points for full-and half-sandhi by three child ages and for adults (shaded area is 95% confidence interval).
tonal coarticulation in a range of tone contexts will be needed to determine when this fine-tuning reaches adult-like acoustic values.
Similarly, for tone sandhi, children are producing rising and falling contours consistent with the two tone sandhi forms, but still fine-tuning pitch range, slope and curvature. However, our study used only real words and avoided low frequency words which pre-schoolers might not know. It is therefore possible that the sandhi forms examined here were lexicalized as tone 2 for full sandhi and a phonetic variant of tone 3 for half sandhi without children fully understanding how and where sandhi processes apply. Therefore, future studies are needed to examine children's ability to apply tone sandhi processes to novel words, examining their ability to generalize their knowledge about these phonological processes to word learning.
Our study did not find any developmental effects for either lexical tones or tone sandhi forms. Therefore, some caution must be taken when interpreting the results on differences observed across the age groups. For example, the results showed that for T2, 3-year-olds produced a more rising contour and 5-yearolds produced a less curvy contour, but there were no overall developmental effects. This must be interpreted with the general result showing that children's productions are not adult-like for any contour tones (i.e., tones 3 and 4, and full and half sandhi). The differences across age groups might therefore be part of children's general early difficulty in coordinating pitch range, slope and curvature to achieve adult-like productions, with the exception of the level T1 where children had achieved adult-like production by 5 years. However, the question of developmental changes in tone productions would be better answered in future longitudinal studies that track children as they develop mastery over tone production.
Finally, the lack of developmental changes for tone sandhi might be related to the use of known words in this study. It is possible that children might show developmental effects in their ability to apply tone sandhi processes when learning new words using novel items. Our results also raise questions about if and how non-adult-like productions may affect children's tone comprehension abilities. It is possible that children are less sensitive to changes in pitch range, slope and curvature but can track overall tonal contours. It is also possible that other acoustic cues are being favored by children, i.e., duration and turning point for the contour tones. Addressing these questions in future research will provide a comprehensive understanding of tone acquisition and the link between production and perception.

CONCLUSION
Mandarin-speaking children produced adult-like global tone contours for lexical tone and tone sandhi were consistent with the level (tone 1), rising (tone 2 and full sandhi), dipping (tone 3), and falling (tone 4 and half sandhi) tone categories, showing that 3-5-year-olds have good knowledge about lexicalized forms of lexical tone and tone sandhi. However, pre-schoolers are still fine-tuning their control over coordinating pitch range, slope and curvature, especially for contour tones 2, 3, and 4, and the sandhi forms. Achieving adult-like acoustic realizations of lexical tone and tone sandhi is a protracted process, probably fully attained after the age of 5.

AUTHOR CONTRIBUTIONS
NX project leader developed research question, designed experiments, collected data, performed data analysis and write up of drafts for this paper in collaboration with the coauthors. PT assisted in coding data, data analysis and interpretation, and contribution to drafts of the manuscripts. IY contributed to the design and implementation of the stimuli. Assisted in training coders for acoustic coding of the data. Contributed to drafts of the manuscripts. LG assisted in recruitment and data collection. Contributed to drafts of the manuscripts. KD contributed to shaping the research question, stimuli and research design, and to drafts of the manuscript.

FUNDING
Macquarie University Research and Development Grant #1547620. Australian Research Council (ARC) Laureate Fellowship grant #130100014 (Demuth). ARC Centre of Excellence for Cognition and its Disorders grant #CE110001021.