Edited by: Marcela Pena, Catholic University of Chile, Chile
Reviewed by: Arnaud Rey, Centre national de la recherche scientifique (CNRS), France; Mariapaola D'Imperio, Aix-Marseille University, France
*Correspondence: Yuna Jhang
This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Functional flexibility, as manifest in the use of any word or sentence to express different affective valences on different occasions, is required in linguistic communication and can be said to be an infrastructural property of language. Early infant vocalizations (protophones), believed to be precursors to speech, occur in the first month and are functionally different from non-speech-like signals (e.g., cries and laughs). Oller et al. (
Before the emergence of speech, infants explore their vocal apparatus communicatively in prelinguistic vocalizations. When they produce sounds, they do not target consonants, vowels, words, or phrases. Instead they begin by producing more primitive sounds, protophones, precursors to speech. In using protophones infants build infrastructural capabilities that eventually lead to the emergence of speech. One of the critical infrastructural properties, called functional flexibility, differentiates protophones from cry, laugh, and vegetative sounds. The protophones include squeals, growls, and vowel-like sounds (hereafter “vocants”). The decoupling of sound and function/affect seen in protophones is analogous to (and forms a foundation for) the symbolic relations between words and meanings or between words and illocutionary forces. This decoupling contrasts sharply with the rigid association of cry with negativity and laugh with positivity. Previous research by Oller et al. (
Any utterance in human language can be used to serve a variety of illocutionary functions such as acknowledgment, acceptance, joy, refusal, or seeking attention. For example, different illocutionary forces of an interjection, “oh” can be produced (a) in surprise when someone suddenly realizes something, (b) when someone hears something and expresses disappointment, or (c) when someone hears something wonderful. These three interjections might be accompanied by neutral, negative, and positive facial affect respectively. Functional flexibility can be measured as the degree to which a sound can be produced with differing communicative functions on different occasions of use. Cries, laughs, or hiccoughs cannot be said to be words in part because they do not have the degree of flexibility in serving illocutionary forces that words do: When someone hiccoughs during a meal, we know the hiccough is simply a bodily reaction to some digestive or respiratory condition—perhaps because the person ate too fast or drank too much or had some medical condition that caused involuntary contraction of the diaphragm. When an infant cries or laughs, we know the cry cannot be a positive signal and the laugh cannot be a negative one. The degree to which a cry or a laugh is associated with different communicative functions is very limited because in both cases the sound and the accompanying facial affect naturally constrain the perceivers' interpretation to either a positive state or negative state. In contrast, sounds like “oh” or any words in human languages have greater flexibility in their relations with communicative functions—any word can be produced in a positive, negative, or neutral state. Any language or precursor to language is required to have this property of functional flexibility, allowing speakers of language to be able to say any word with a variety of communicative functions, independent of circumstances.
The idea of communicative functions as we use it here comes from Austin's (
In the present study we treat infant affect as the primary determiner of communicative function because in the neonatal period caregiver interpretations of affect play a major role in the perlocutionary effect of infant communications. Affect constrains the range of illocutionary forces that can be attributed to infant vocal communications to certain valence classes (positive, neutral, or negative) and similarly constrain the likely perlocutionary responses—positive affect may yield encouragement or praise, while negative affect may result in attempts to change the situation for the baby or may also result in scolding (Toda and Fogel,
Infants' ability to produce vocal types where there is no necessary coupling between form (e.g., vocal types) and function (i.e., illocutionary forces), as is manifested by spontaneous production of protophones with differing affect types, is foundational to the emergence and development of the speech capacity, and is required for word learning in later life. The association between a word, a phrase, or a sentence and its communicative function is always flexible, as exemplified earlier.
Protophones, infant vocalizations that are neither vegetative, fixed signals, nor effortful grunts, are deemed precursors to speech for at least three reasons: (1) they can be produced spontaneously and endogenously (without any external stimuli), (2) they bear the property of functional flexibility, just as language does (for more information on protophones and their infrastructural properties, see Oller,
Protophones occur very often in the first months. Nathani et al. (
A key question is why protophones occur so frequently from early in life, given that they seem so distant from speech. It has been speculated that these sounds provide a platform for development of speech and serve the immediate function of indicating state and well-being to parents (Locke,
If infants were very limited in their ability to produce protophones spontaneously and endogenously at birth, we presume they would also be unable to show functional flexibility in vocalization. Even newborns, within the first week of life, produce protophones spontaneously, according to longitudinal observations (Stark et al.,
By 3 months, the protophones differentiate into at least three types (Buder et al.,
It is important to emphasize that while acoustic differentiations of some sort must underlie the auditory identification of protophones and other infant sounds, our analyses of functional flexibility rely on auditory judgments of vocal type rather than acoustic ones. The reason is fundamental: The development of the human infant's vocal communication must be guided by caregiver responses and elicitations. These parental actions are dependent upon auditory judgments of protophone types (which parents often imitate or elicit) as well as judgments of cry and laughter, and on visual judgments of infant facial affect. To the extent that parental reactions to these sounds and expressions of affect drive the development of the functionally flexible infant communication system, it is precisely the caregiver perceptual judgments (simulated by laboratory listeners) that are the natural target of our coding. Acoustic analysis thus plays no role in the categorization by the coders in the present study. The role of acoustic analysis is only in helping to elucidate possible bases for the intuitive judgments about infant vocal types made by human listeners. Buder et al. (
Seeking roots of language in prelinguistic communication, scientists have sought to identify affective or communicative content in the production of infant vocalizations (Oller,
Quantifying functional flexibility of infant protophones, Oller et al. (
Oller et al. (
The present study aims to extend Oller et al. (
Second, we examined whether functional flexibility can be further demonstrated in other protophone types (i.e., raspberries, other consonants alone, and ingressive sounds) in addition to the three primary ones that are defined by phonatory properties (squeals, growls, and vocants). The prior paper did not include analysis of the “other protophones.” However, given the common occurrence of other protophone types in some infants, we considered it important to include them in this and future analyses.
Here are the present hypotheses:
Functional flexibility will be discernible for protophones in the first 3 months: We predict significant odds ratios, showing protophones to be accompanied by less negativity than cry and greater neutrality than cry. Functional flexibility will be discernible in “other protophones,” just as in the phonatory protophone types (squeals, vocants, and growls): We predict functional flexibility in the other protophones will also be indicated by significant odds ratio differences.
A written consent form and a simple questionnaire were completed by the infants' parents before any recordings for the longitudinal research project on infant vocal development from which the recordings for the present study were drawn. Inclusion criteria required subjects to have no language, hearing, or developmental disorders. All procedures were approved by The University of Memphis Institutional Review Board for the Protection of Human Subjects.
From this longitudinal research, we selected all six American mother-infant dyads who had at least 1 recording day (approximately 1 h of data) for 0 and 1 months and at least 2 recording days at 2 months as indicated in Table
Infant1 | 2 | 1 | 2 |
Infant2 | 1 | 1 | 2 |
Infant3 | 1 | 1 | 2 |
Infant4 | 1 | 1 | 2 |
Infant5 | 1 | 1 | 2 |
Infant6 | 1 | 1 | 2 |
Total | 7 | 6 | 12 |
Infant 1 | 0 | 21 | 96 |
1 | 20 | 97 | |
2 | 20 | 108 | |
Infant 2 | 0 | 16 | 143 |
1 | 21 | 103 | |
2 | 26 | 128 | |
Infant 3 | 0 | 20 | 117 |
1 | 20 | 145 | |
2 | 20 | 138 | |
Infant 4 | 0 | 20 | 157 |
1 | 20 | 101 | |
2 | 20 | 136 | |
Infant 5 | 0 | 20 | 148 |
1 | 20 | 114 | |
2 | 20 | 156 | |
Infant 6 | 0 | 21 | 111 |
1 | 20 | 107 | |
2 | 26 | 163 | |
Mean duration and # of utterances per recording segment | 20.61 | 131.56 | |
2.23 | 38.72 |
The recordings had been made for the longitudinal study in a laboratory designed to resemble a child's playroom, with eight cameras positioned in the corners of the room, one high on the corner, and one low, in each case. From an adjacent control room an experimenter chose for recording two of the eight possible video channels and switched as needed to obtain a view of the infant's face along with another view of the interaction between the infant and the parent and/or experimenters throughout the recording. Both infants and parents wore wireless microphones with signals digitized at 44 kHz.
The infant, of course, was always present in the hour-long recordings, which at different points during the hour included parent-infant interaction, an interview of the parent with the experimenter, and periods of silence from the adults, allowing the infant to vocalize or bid for interaction in any other way. The segments selected for the present work were always from either the interaction or interview circumstances, during which infants were often very vocally active.
The present research was intended to examine functional flexibility of infant vocalization in the first year. Consequently coding for the primary data collection was conducted in a way similar to that of Oller et al. (
To locate 20-min periods from the recording days during which there was considerable vocal activity to code, we began by having one group of coders work in real-time to locate periods of high volubility (number of infant vocalizations). All the recording material was thus coded in real-time by this first group of coders in order to enable the 20-min segments to be selected efficiently. After selection of the eighteen 20-min segments, a separate group of observers coded in repeat observation to provide the primary data for the study.
All the coding was conducted in the same software environment used in Oller et al. (
As training for real-time coding, four graduate students in Communication Sciences and Disorders at the University of Memphis were presented with a lecture by the second author on vocal type coding for the above listed categories. During the lecture we presented real examples of previously coded infant utterances, all of which either met a consensus standard for one of the vocal categories or illustrated ambiguities of possible coding judgments based on prior listening experience in the laboratory. The infant vocal types in question are graded rather than discrete, and consequently even though there are many utterances that pertain to a single category unambiguously, there are other utterances that are judged to possess features of more than one of the categories. The coders were trained to focus on the most perceptually salient features of the utterances and to base their judgments on those most salient characteristics. Examples of the three primary protophone types produced by a 0-month-old infant from the present study are provided along with acoustic displays in the Supplementary Material.
The coders also passed the training modules of our on-line infant vocalizations training system (IVICT, Infant Vocalization Interactive Coding Training, at babyvoc.org) that has been developed to facilitate both laboratory training and training of parents in categorizing infant vocalizations with a common terminology (cry, squeal, raspberry…). The training on infant vocal types can be fairly brief because the categories correspond to naturally recognizable types, commonly reported by parents of infants in the first year of life when presented with an open-ended question such as “what kinds of sounds does your infant produce?” Thus, the primary point of training is simply to ensure that all the coders use the same terms to refer to the categories and that they make their judgments intuitively.
In a final stage, coder agreement was assessed based on coding of recordings drawn from each of the infants included in the present study, with examples drawn from all the vocal types to be coded. Background on the coding scheme, along with extensive details regarding training requirements, acoustic characteristics of the primary protophone types, reliability of acoustic identification for those protophone types, the tendency of infants to produce the individual protophone types repetitively, and coder agreement on the protophone types are provided in the Supporting Information Appendix to Oller et al. (
The real-time task was to code infant vocal types as either vocant, squeal, growl, other, cry, or laugh in real time for all the segments of the recordings of six infants at three ages (Table
The real-time coding was conducted independently by each of the first group of four coders, with both video and audio playing during every coding session. The first author collated the results and located the first 20-min segment at each age that met the requirement of at least 96 infant utterances according to the real-time coders. Table
Before repeat-observation coding could begin, the first author located infant utterances within each recording segment that had been selected based on volubility from real-time coding (Table
A second group of four graduate students in Communication Sciences and Disorders at the University of Memphis received similar training as the four real-time coders for all the same vocal types and then were assigned as repeat-observation coders. Their task included vocal type coding, conducted with audio only (video was closed), and facial affect coding, conducted with video only (audio was muted). For facial affect, the observers were instructed to code each utterance (actually just the period of time during which the utterance occurred, since audio was muted) as positive (smiling), negative (frowning or grimacing), neutral (neither smiling nor frowning), or “can't see” in cases where the infant's face was not visible in either of the two camera views. Eight percent of the utterances were dropped from the final analysis due to a report of “can't see” by at least one coder.
Each one of the repeat-observation coders received a different counter-balanced order of all 18 sessions and coded them independently, once for facial affect (video only) and again separately for vocal type (in audio only). The primary author also coded in repeat observation, so there were 5 coders altogether, each coding 18 sessions twice (once for vocal type and once for facial affect), for a total of 180 coding sessions.
We used kappa statistics (Cohen,
The mean kappa agreement between the coders and the first author was 0.49 for all the vocal types coded (i.e., moderate agreement for vocant, squeal, growl, “other protophones,” and cry). However, the primary focus of the functional flexibility study is the binary contrast of protophones vs. cry. The mean kappa agreement for protophones vs. cry was 0.68 (substantial agreement). Kappa agreement was 0.65 for facial affect (substantial agreement for positive, neutral, negative). Mean intercoder correlation assessed at the session level was 0.93 for vocal type coding (numbers of utterances judged to be protophones for all five coders) and 0.90 for facial affect (numbers of utterances judged to be neutral for all five coders) (
In order to ensure unimpaired hearing and seeing of the stimuli during repeat-observation coding, the utterances were played in such a way that the boundaries were “stretched” to include the 50 ms before and the 50 ms after each utterance. This precaution eliminated rise-time anomalies for utterances and ensured that the visible periods would include all video frames pertaining to the utterances. In both facial affect and vocal type coding the observers were allowed three listening or viewing opportunities for each utterance.
The 20-min segments contained a mixture of two circumstances: parent-infant vocal interaction (mother talking with baby) and interview (mother talking with experimenter). Because functional flexibility of infant vocalization has been shown to occur in similar degrees for both these circumstances (see Oller et al.,
As in Oller et al. (
The total number of utterances produced by the six infants in the 20-min segments was 772 for 0 months, 667 for 1 months, and 829 for 2 months (including both protophones and cries). The 0-months data composed around 34% of the dataset, the 1-months 29%, and the 2-months 37%. Individual infants contributed 13–18% the utterances to the final dataset (
Similar to Oller et al. (
Neutral affect during vocalization may be thought of as an indicator of voluntary control, and indeed a great deal of mature speech is produced with neutral facial affect. But again the 0–2 month olds of the present study showed less neutrality of protophones than the older infants of Oller et al. (
We examined the first hypothesis by considering the three predictions of Oller et al. (
(A) Prot>cry in neutrality | 24.6 |
14.8 | 23.4 |
9.4 | 22.6 |
8.8 | 79.6 |
21.5 |
276.6 | 71.3 | 71.3 | 372.9 | |||||
(B) Prot < cry in negativity | 27.5 |
7.8 | 25.5 |
10.2 | 24.7 |
9.5 | 127.9 |
29.1 |
108.1 | 77.4 | 77 | 562.4 | |||||
(C) Prot>cry in positivity | 18.6 | 0.73 | 5.6 | 0.32 | 6 | 0.31 | 36.1 |
1.7 |
472.2 | 96.9 | 118.2 | 781.2 | |||||
(A) Prot>cry in neutrality | 15.5 |
4.6 | 25.7 |
9.7 | 28.9 |
10.1 | 25.4 |
6.8 |
54.2 | 75.3 | 91.5 | 102.8 | |||||
(B) Prot < cry in negativity | 21.6 |
7.6 | 29 |
12.5 | 35 |
13.5 | 26.4 |
7.8 |
76.6 | 79.8 | 105.1 | 102.5 | |||||
(C) Prot>cry in positivity | 15.4 |
1.7 | 7.8 |
1.1 | 8.9 |
1.1 | 4.4 | 0.27 |
137.8 | 57.9 | 72.8 | 74.5 | |||||
(A) Proto>cry in neutrality | 25 |
6.3 | 23.5 |
8.7 | 40 |
13.8 | 40.8 |
11.0 |
80.6 | 53.4 | 96.8 | 121.5 | |||||
(B) Prot < cry in negativity | 38.8 |
13.9 | 38 |
20.9 | 61.2 |
20.9 | 60.2 |
16.5 |
104.1 | 178.8 | 178.8 | 220.9 | |||||
(C) Prot>cry in positivity | 45.9 |
2.12 | 30.1 |
1.8 | 26.0 |
1.5 | 21.4 | 0.85 |
994 | 491.7 | 461.0 | 541.5 |
The data in Table
In the left hand columns of Table
For data on the third prediction for hypothesis one (protophones > cry in positivity), the results were less consistent, showing statistical support for the prediction at 1 and 2 months, but not at the youngest age, 0 months. However, it should be noted that assessment of the positivity prediction was hampered by very small sample size, owing to the fact that infants as young as these scarcely ever smile. In the data at 0 months, there were only eight protophones altogether coded as having positive facial affect, and consequently it may not be sensible to evaluate the positivity prediction at this age.
To assess the second hypothesis, we used the same odds ratio approach with reference to the right hand column of Table
The patterns of ORs in Table
The purpose of the present study was to examine the emergence of functional flexibility in infant protophones across the first 3 months. We found that starting in the first month, all the protophone types demonstrated strong functional flexibility by showing significantly more neutral facial affect than cry and significantly less negative facial affect. The odds ratios used to illustrate the points showed highly significant conformity to predictions A (protophones > neutral in facial affect than cries) and B (protophones < negative in facial affect than cries). We anticipated functional flexibility to be discernible in the first 3 months in squeals, growls, and vocants as well as in the “other protophones,” and this finding supports both hypotheses that we evaluated. We also found that infant protophones were functionally flexible across all 3 months, being differentiated from cry at all the ages.
To interpret the results, we might consider the idea of protophones as a platform for development of speech, also serving to help indicate infants' affective/emotional state and condition of well-being (Oller and Griebel,
Although the 0–2 month old infants of the present study showed higher rates of negativity and lower rates of neutrality than the 3–11 month old infants of Oller et al. (
The results on positive facial affect were mixed and can be viewed as somewhat predictable based on facts about infant development. We anticipated finding protophones to be more affectively positive than cries across age, but the odds ratio analyses showed that only the “other protophones” conformed to the prediction of positivity significantly at 0 months, while the three phonatory protophones conformed only at 1 and 2 months (Table
The present study documents the emergence of functional flexibility in the first year and quantifies it in such a way that is comparable with results on older infants in Oller et al. (
The results do not necessarily suggest that infants
We reason that the similarities found in protophones and adult speech are not a coincidence. It is important to recognize that infant protophones reveal many properties of speech in a simpler form. We used affect to determine communicative functions expressed by infants. Adults can also express communicative functions primarily through affect although there are many more options available to adults. For example, adults can simply state their communicative intentions directly: “Here is my prediction about publishability of your article.” This is as a statement that marks a possible communicative intent (i.e., illocutionary force) of prediction. Similarly, “I hereby criticize your choice of wording.” This is a sentence that could be used to express a criticism of our paper. Neither of these sentences could be produced by an infant, and neither of the communicative intents (prediction or criticism) could be specified by a young infant's actions. There are many illocutionary forces available to any mature speaker of a language that cannot be produced by an infant: stipulation, denial, explanation, reiteration, and so on. Any adult can use any word or sentence to express different illocutionary forces on different occasions, and this represents the pinnacle of functional flexibility. Language makes such complexities possible. The finding that even in the first month of life, human infants showed functional flexibility in the production of protophones suggests that this foundation for speech runs very deep in human nature. The fact that we observed protophones accompanying different facial affect types appears thus to indicate that functional flexibility of protophones is an important milestone on the path to the speech capacity.
YJ coordinated the coders, designed the study, analyzed the data, and wrote the paper. DKO designed the study, analyzed the data, and wrote the paper.
This work was primarily supported by two grants from the National Institutes of Health R01 DC011027 and R01 DC006099.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Special thanks to Anne Warlaumont for developing IVICT, Rick Dale for developing IDIVA, and to IVOC research staff, Edina Bene, Neeraja Rangisetty, and Klaudia Kroboth, for the data collection.
The Supplementary Material for this article can be found online at:
1“Other protophones” are distinguished from the three phonatory protophone types (i.e. squeals, growls, and vocants) because their sources of acoustic energy (Fant,