Mini Review ARTICLE
The Role of Saliency in Learning First Words
- Faculty of Arts and Humanities, Paderborn University, Paderborn, Germany
In word learning, one key accomplishment is the reference, that is, the linking of a word to its referent. According to classical theories, the term reference captures a mental event: A person uses a word to mentally recall a concept of an entity (an object or event) in order to bring it into the mental focus of an interaction. The developmental literature proposes different approaches regarding how children accomplish this link. Although researchers agree that multiple processes (within and across phonological, lexical, and semantic areas) are responsible for word learning, recent research has highlighted the role of saliency and perception as crucial factors in the early phases of word learning. Generally speaking, whereas some approaches to solving the reference problem attribute a greater role to the referent’s properties being salient, others emphasize the social context that is needed to select the appropriate referent. In this review, we aim to systematize terminology and propose that the reason why assessments of the impact of saliency on word learning are controversial is that definitions of the term saliency reveal different weightings of the importance that either perceptual or social stimuli have for the learning process. We propose that defining early word learning in terms of paying attention to salient stimuli is too narrow. Instead, we emphasize that a new link between a word and its referent will succeed if a stimulus is relevant for the child.
Studies vary in their suggestions regarding how links emerge between words and referents: Some explanations build on research in the field of the psychology of perception (see Table 1) confirming that saliency (or salience) can capture attention quite effectively. Bottom-up saliency (Latin salire: to jump) is by definition a property in objects that makes them stand (or jump) out of the surrounding context. For instance, a red jacket stands out in the context of a crowd of black jackets (Itti and Koch, 2001: 194). Besides visual features such as color, luminance, orientation, motion, or size, auditory properties such as loudness, pitch, or spectral shape can attract attention in a mainly bottom-up way (e.g., Treue, 2003). Whereas approaches taking advantage of saliency propose that during word learning, salient properties capture and sustain infants’ attention for an object during the time in which the referent is being labeled, other explanations emphasize social interaction and its goals and how reference is established for this among partners. Our aim is to systematize the use of terms and sensitize the reader to the difference between saliency and relevance. We argue that relevance is achieved by embedding the child’s perspective into a social environment—that is, a history of joint actions. In the following, we first review studies that focus on the role of saliency for word learning.
Mechanistic View on Word Learning
The mechanistic view (e.g., Smith, 1995; Smith et al., 1996; Samuelson and Smith, 1998; Spencer et al., 2011; see also Plunkett, 1997, for a review) proposes that general mechanisms such as memory and attention drive the word learning process. The child’s task is to link the heard phonological form with an entity (an object or event) that he or she is attending to visually. Supporting studies have boiled word learning down to essentially two problems: First, the learning environment is ambiguous [a problem already identified by Quine (1960)], because it offers many potential referents and distractors (Trueswell et al., 2016). Second, even if infants identify the referent correctly, they have difficulties in sustaining their visual focus on the referent because of their still maturing attentional skills (Yu and Smith, 2016). Thus, the mechanistic view centers on the question how infants solve the ambiguity problem and sustain their attention on a referent in order to successfully establish a new word–object link.
One approach considers saliency to be a precondition for recruiting infant’s attention for an object, which in turn, if temporally synchronized with labeling, will establish a link between a word and its referent in the infant’s memory (Gogate and Bahrick, 1998). We shall collectively name these studies associationist (Hollich et al., 2000: 12), although they operationalize saliency differently (see below). A related position assigns a greater role to child’s growing experience by postulating constraints and principles of learning and is presented in more detail below.
Saliency—A Property in Objects
The associationist account postulates that infants solve the ambiguity problem because of their preference for salient objects or because they assume that adults will label that object which is the most interesting from the infant’s point of view. This account implies that for early word learning, infants rely more on perceptual saliency than on social stimuli (Moore et al., 1999; Hollich et al., 2000). This was demonstrated by Pruden et al. (2006) who operationalized social stimuli as the gaze of the experimenter and compared its effect with that of perceptual saliency: In a “coincidental condition” (p. 269), the experimenter gazed at and simultaneously named a salient object; in a “conflict condition,” the experimenter gazed at and labeled a “boring” object while a salient object was present. Infants in the coincidental condition spent more time looking toward the salient object, indicating that they mapped the new word onto the intended referent. The conflict group also looked longer at the salient distractor, indicating that the label was mismatched with the salient object. The authors viewed their results as evidence that young infants weight object saliency higher than social cues in their word learning process. Only in the course of development, do infants “move from learning words associatively to learning words based on the social cues a speaker emits” (p. 278). However, Golinkoff and Hirsh-Pasek (2006) noted that it is still not known how child development reveals a qualitative change toward weighting social cues more than perceptual cues.
Crucially, even though Pruden et al. (2006) did not define the term saliency explicitly in their study, they did vary it methodologically in terms of attention-grabbing properties—that is, in terms of objects that recruit and hold infants’ visual attention: The salient stimuli were brightly colored and could either make a noise or move, and they were paired with boring objects (dull color, neither motion nor noise). These properties are consistent with the psychological view on bottom-up saliency.1 Other studies extended this notion: For example, an object’s saliency increases if it is larger (Smith et al., 1996; Pereira et al., 2014) or more centered in the infant’s visual field in comparison to other toys, rotates on a turntable (Moore et al., 1999), moves (Werker et al., 1998; Houston-Price et al., 2005, 2006), or is illuminated (Axelsson et al., 2012). It should be noted that even though infants attend to a moving object during training, they exclude it as referent at test if its movement is not consistent (Houston-Price et al., 2006). In this way, saliency is regarded as a “bottom-up sensory input that is clean” (Yu and Smith, 2012: 258), meaning that only one object dominates the visual view. In natural environments, parents can facilitate their infants’ word learning if they establish such visually optimal moments by bringing the target object more to the fore. Furthermore, infants often create situations on their own in which referential ambiguity is low (e.g., during toy play by exploring one object at a time). Object naming in these moments is associated positively with word learning (Yu and Smith, 2012; Pereira et al., 2014).
With regard to the mechanistic view, results indicate that word learning is driven by general processes of attention and memory that are recruited via attention-grabbing features in objects. This saliency effect can be explained by cognitive learning mechanisms being facilitated by the diminishment of competitors and the unambiguous determination of the referent (Axelsson et al., 2012).
Saliency—A Property Generated by the Perceiver
Without opposing the role of perception, another perspective highlights infants’ experience with word learning episodes. Being exposed to referents and their labels, the cognitive demand, namely to map the label onto some features of the referent, gives rise to necessary cognitive operations. Accordingly, children make use of constraints and principles (Markman, 1994) that narrow down referent selection. From this position, the salient property in objects derives from the knowledge (and experience with the labeling task) of the perceiver. Such constraints and principles as the whole-object assumption (Markman and Wachtel, 1988; Woodward, 1992) or mutual exclusivity (Markman and Hutchinson, 1984; Clark, 1987; Markman, 1989; Merriman et al., 1989; Waxman and Kosowski, 1990; Golinkoff et al., 1992, 1994) are of different natures. While the whole-object assumption guides infants to the object’s features that they need to select for labeling, mutual exclusivity describes a bias in infants that prevents them from linking new words with already named objects because of the underlying assumption that objects can bear only one label (Markman, 1989, 1990). Indications of mutual exclusivity have been observed in infants as young as 10 months (Mather and Plunkett, 2010), but the cognitive basis of such a bias remains disputed: Is mutual exclusivity based on the knowledge of familiar versus novel labels for objects (e.g., Mervis and Bertrand, 1994) or rather the knowledge of an object’s novelty (e.g., Merriman et al., 1995; Mather and Plunkett, 2010; Horst et al., 2011)? At this point, we would like to note that we limit our considerations here to studies investigating the novelty bias.
To determine this bias, Mather and Plunkett (2012) presented 22-month-old infants with two stimuli: Both were name-unknown, but only one of them was truly novel. The children had become familiar with the other object through pre-exposure to it. Results showed that after having heard a new word, infants’ attention, in the form of looking time, increased more toward the novel object compared to the familiar one. The authors concluded that mutual exclusivity is a novelty-based mechanism, because it seems to be cognitively easier for infants to search for a perceptually novel object in the environment rather than retrieving all familiar object names. Hence, when presented with a novel word, the most novel object will appear to be the most salient one to an infant, thereby facilitating the mapping process. The novelty bias is a good example for a learning constraint, although we do not suggest that all constraints and principles are attributable to saliency in the same way as novelty.
One aspect is crucial to this position: Rather than being a salient object emerging through bottom-up attention, saliency is generated by the perceiver through top-down processes (Connor et al., 2004). When facing the ambiguity problem, infants rely on their prior knowledge to identify whether not only words but also objects are actually novel. The ability to hear a new label and to map it onto a novel, name-unknown object, characterizes infants as active learners in their environment. This perspective holds that word learning is a cognitive process that also affords a top-down mechanism (including past memory) in the perceiver.
Social-Pragmatic and Interactionist View on Word Learning
Whereas in the mechanistic view, word learning is dependent on perceptual and attentional constraints, social-pragmatic theories (e.g., Bruner, 1983; Tomasello, 2000; Csibra and Gergely, 2006) claim that from early on, infants are sensitive to social cues (see Table 2).
Even though both social-pragmatic and interactionist perspectives agree that word learning is an inherently social process, they differ in how referents become salient during social interaction. We shall elaborate on this difference in the following.
Saliency—A Property Emerging From Social Perception
One line of social-pragmatic studies (Tomasello and Akhtar, 1995; Baldwin et al., 1996; Carpenter et al., 1998) proposes that because infants are especially sensitive to objects that adults single out via ostensive means (eye gaze, gestures, or emotions), these objects become salient and can then be linked with new words (Akhtar et al., 1996; Csibra, 2010; Axelsson et al., 2012). In a study with 24-month-olds, Horst and Samuelson (2008) used ostensive naming (i.e., addressing the child directly, holding up the target object, and pointing at it) as a form of specific social behavior that singled out the target object and reduced competition from distractors. Compared to a condition in which a non-ostensive naming was provided, retention of new words and thus long-term learning was observed only in the ostensive naming condition. It can therefore be concluded that the use of social cues not only facilitates infants’ encoding but, more crucially, induces their long-term processing.
Recent research on infants’ reactions to attention-directing social cues supports the assumption that caregivers’ actions do not just facilitate shared attention, but might also aid learning. For example, Deák et al. (2018) found that five properties are salient from a perceiver’s perspective: gaze shift, pointing gestures, speech, object sounds, and object manipulation. They argued, for example, that pointing gestures are salient due to the sweeping motion of the arm and hand (see also Rohlfing et al., 2012). Their results indicated that when used within a dyadic toy play interaction, 3- to 11-month-olds were sensitive to all five different caregiver social cues, with object manipulation being the most effective (see also Yu and Smith, 2012).
Effects of saliency as a property of social perception can be explained by natural pedagogy (Csibra and Gergely, 2009). This stipulates that from early on (and possibly even from birth, Csibra, 2010), infants are sensitive to ostensive signals. Because they understand the referential nature of communicative signals (Gliga and Csibra, 2009), they follow them to indicate the word reference (Baldwin, 1993; Baldwin et al., 1996).
In sum, this line of social-pragmatic research reveals that social cues evoke a high level of attention in the perceivers and guide them toward objects relevant for the interaction. Furthermore, research on the attention of adults makes a similar distinction between top-down attention, which is guided by goals, knowledge, or expectations; and bottom-up attention, driven by stimulus contrast or salience (e.g., Folk et al., 1992). This might point to a more general mechanism that drives word learning as well as visual attention.
Away From Saliency—Relevance Emerging in Interactions
In comparison to the line of research mentioned above, interactionist theory turns away from the view that infants build references by observing salient entities. Instead, a successful object–referent association is attributed to infants’ engagement in an interaction toward a joint goal. Proponents of this approach emphasize the importance of a pragmatic frame that is established by repetitive participation in an ongoing social event leading to interactional experience and particular communicative acts (in the form of joint attention) that serve word learning (Bruner, 1983; Rohlfing et al., 2016).
Along these lines, Wildt and Rohlfing (2018) investigated word learning in 10-month-olds by directly comparing the associationist, the social-pragmatic, and the interactionist approaches. A stimulus set consisting of one perceptually highly salient and one less salient object was presented on a screen within an intermodal preferential looking paradigm (IPLP). Infants’ visual attention was estimated by an eye tracker. Infants’ interest for the stimulus set was assessed as the baseline. This confirmed that all infants preferred the salient stimulus. After that, all infants participated in an interaction: While both stimuli lay side by side on a table, the experimenter gazed at the “boring” stimulus, repeatedly demonstrated its function, and labeled it ostensively. After ostensive naming, infants were divided into two groups and encouraged to play with the less salient object. One group explored the object’s function on their own without any further input from the experimenter; the other group explored it in an interaction. The specific manipulation in the second group was for the experimenter to provide the infants with support in manipulating the boring object to achieve a joint goal. Here, we were contrasting the impact of joint attention with the impact of joint action on word learning. Our prediction was that infants would match a new word to the boring object because it was demonstrated as relevant for the interaction. Even though the salient object was always in the infants’ field of view, subsequent visual attention provided evidence that infants’ engagement was crucial to establish the reference. These results contradict the associationist approach and demonstrate that young infants’ early attention is not driven merely by perceptually salient objects. In addition, the results on the contrast between the joint action and the joint attention condition reveal that establishing joint attention is not sufficient to develop a word–referent link for a boring object, because infants in this condition still preferred the salient object as at baseline. In contrast, an interaction in which infants participated had a stronger effect on infants’ perception, because they no longer showed a preference for the salient object. In this vein, while the contribution of pointing behavior to later language development is recognized, recent studies reveal that this relationship might rely on pointing to interaction-relevant entities (Białek et al., 2018). Again, we want to point out a possible parallel to theories of attention in adults that are as yet unexplored: Selection-for-action approaches assume that attentional selection mainly serves action control (e.g., Neumann, 1987). Even though there is no particular reference to interactions in these approaches, interactions and actions may share the crucial property that action capacities are limited. Further research needs to clarify whether saliency emerging from interactions accords with theoretical approaches on selection for action in adults.
Taken together, the interactionist approach differs in its perspective on word learning by claiming that infants need not only perceptual or social saliency but also an active experience in an interaction such as achieving a relevant function or a joint goal. Hence, a referent becomes relevant by “charging” it with a rich meaning from actions that have taken place within a social interaction (Nomikou et al., 2016) and that draw on what the child knows (Bloom et al., 1993) rather than merely being salient in a joint attention or visual attention scenario.
We surveyed how the term saliency is used in three theoretical approaches to the study of word learning. Systematizing its uses, we registered that they differ in terms of the prioritization of the non-social versus the social information that is recruited as a cue in the process of word learning (see Figure 1).
Figure 1. State of research on saliency (emerging from perceptual biases, social cues, or interactions) in the context of early word learning.
The difference in mechanistic approaches lies in the internalization of saliency: Whereas from the associationist view, bottom-up mechanisms drive infants’ attention toward attention-grabbing properties, the approach using constraints and principles suggests that a perceiver’s top-down driven attention is based on past knowledge. The recruitment of the experience in perceiving an object is what makes it salient; or to put it in more appropriate terms: relevant.
This review addressed two other approaches that consider saliency to be socially driven. This means that infants are sensitive to social cues from early on, and that word learning is inherently social. Whereas in one line of social-pragmatic studies, saliency is attributed to infants’ social perception and their responsivity to cues such as eye gaze and pointing or ostensive labeling, a second interactionist line of studies claims that the pragmatic frame of a joint action is needed for the infant to recognize the relevance of an object for the joint goal of this interaction. In this latter view, a word–object link becomes charged with interwoven words and actions. Therefore, infants must be additionally embedded in joint actions and achieve a joint goal to see the purpose of a word and to involve their memory processes.
Taken together, it becomes clear that a unified theory is lacking. As a first step, we propose that future research should focus more on the term relevance rather than saliency, because it better encompasses the (social) context in which infants attend to objects.
EW and KR developed the framework and wrote the mini-review. This paper benefits from the expertise of IS in visual saliency for information processing.
This work was made possible within Beethoven project EASE funded by the NCN-DFG collaboration (RO 2443/5-1).
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- ^Note that the notion of pure bottom-up capture that depends only on stimulus features and not on current goals is a contested issue in experimental psychology; the conditions under which irrelevant salient stimuli capture attention are still not fully understood.
Baldwin, D. A., Markman, E. M., Bill, B., Desjardins, R. N., Irwin, J. M., and Tidball, G. (1996). Infants’ reliance on a social criterion for establishing word-object relations. Child Dev. 67, 3135–3153. doi: 10.1111/j.1467-8624.1996.tb01906.x
Białek, A., Białecka-Pikul, M., Filip, A., and Broda, M. (2018). Relevance matters: eighteen month-olds’ use of relevant informative pointing as a predictor of two-year-olds’ language abilities/La relevancia es importante. El uso de gestos deícticos relevantes e informativos por parte de niños de año y medio como factor predictor de las capacidades lingüísticas a los dos años. Infanc. Aprendizaje 41, 1–28.
Butterworth, G., and Jarrett, N. (1991). What minds have in common is space: spatial mechanisms serving joint visual attention in infancy. Br. J. Dev. Psychol. 9, 55–72. doi: 10.1111/j.2044-835x.1991.tb00862.x
Carpenter, M., Akhtar, N., and Tomasello, M. (1998). Fourteen- through 18-month-old infants differentially imitate intentional and accidental actions. Infant Behav. Dev. 21, 315–330. doi: 10.1016/s0163-6383(98)90009-1
Deák, G. O., Flom, R. A., and Pick, A. D. (2000). Effects of gesture and target on 12- and 18-month-olds’ joint visual attention to objects in front of or behind them. Dev. Psychol. 36, 511–523. doi: 10.1037/0012-16126.96.36.1991
Flom, R., Deák, G. O., Phill, C. G., and Pick, A. D. (2004). Nine-month-olds’ shared visual attention as a function of gesture and object location. Infant Behav. Dev. 27, 181–194. doi: 10.1016/s0163-6383(04)00017-7
Folk, C. L., Remington, R. W., and Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. J. Exp. Psychol. Hum. Percept. Perform. 18, 1030–1044. doi: 10.1037//0096-15188.8.131.520
Gogate, L. J., and Bahrick, L. E. (1998). Intersensory redundancy facilitates learning of arbitrary relations between vowel sounds and objects in seven-month-old infants. J. Exp. Child Psychol. 69, 133–149. doi: 10.1006/jecp.1998.2438
Golinkoff, R. M., Hirsh-Pasek, K., Bailey, L. M., and Wenger, N. R. (1992). Young children and adults use lexical principles to learn new nouns. Dev. Psychol. 28, 99–108. doi: 10.1037//0012-16184.108.40.206
Gratier, M., Devouche, E., Guellai, B., Infanti, R., Yilmaz, E., and Parlato-Oliveira, E. (2015). Early development of turn-taking in vocal interaction between mothers and infants. Front. Psychol. 6:1167. doi: 10.3389/fpsyg.2015.01167
Jaffe, J., Beebe, B., Feldstein, S., Crown, C. L., Jasnow, M. D., Rochat, P., et al. (2001). Rhythms of dialogue in infancy: coordinated timing in development. Monogr. Soc. Res. Child Dev. 66, i–viii, 1–149.
Johnson, M. H., Dziurawiec, S., Ellis, H., and Morton, J. (1991). Newborns’ preferential tracking of face-like stimuli and its subsequent decline. Cognition 40, 1–19. doi: 10.1016/0010-0277(91)90045-6
Kaldy, Z., and Blaser, E. (2009). How to compare apples and oranges: infants’ object identification tested with equally salient shape, luminance, and color changes. Infancy 14, 222–243. doi: 10.1080/15250000802707088
Merriman, W. E., Marazita, J., and Jarvis, L. (1995). “Children’s disposition to map new words onto new referents,” in Beyond Names for Things: Young Children’s Acquisition of Verbs, eds M. Tomasello and W. E. Merriman (Hillsdale, NJ: Erlbaum), 147–183.
Pascalis, O., de Schonen, S., Morton, J., Deruelle, C., and Fabre-Grenet, M. (1995). Mother’s face recognition by neonates: a replication and an extension. Infant Behav. Dev. 18, 79–85. doi: 10.1016/0163-6383(95)90009-8
Pegg, J. E., Werker, J. F., and McLeod, P. J. (1992). Preference for infant-directed over adult-directed speech: evidence from 7-week-old infants. Infant Behav. Dev. 15, 325–345. doi: 10.1016/0163-6383(92)80003-d
Pruden, S. M., Hirsh-Pasek, K., Golinkoff, R. M., and Hennon, E. A. (2006). The birth of words: ten-month-olds learn words through perceptual salience. Child Dev. 77, 266–280. doi: 10.1111/j.1467-8624.2006.00869.x
Rohlfing, K. J., Wrede, B., Vollmer, A. L., and Oudeyer, P. Y. (2016). An alternative to mapping a word onto a concept in language acquisition: pragmatic frames. Front. Psychol. 7:470. doi: 10.3389/fpsyg.2016.00470
Samuelson, L. K., and Smith, L. B. (1998). Memory and attention make smart word learning: an alternative account of Akhtar, Carpenter, and Tomasello. Child Dev. 69, 94–104. doi: 10.1111/j.1467-8624.1998.tb06136.x
Smith, L. B. (1995). “Self-organizing processes in learning to learn words: development is not induction,” in Basic and Applied Perspectives on Learning, Cognition, and Development. The Minnesota Symposia on Child Psychology, Vol. 28, ed. C. A. Nelson (Mahwah, NJ: Lawrence Erlbaum Associates), 1–32.
Spencer, J. P., Perone, S., Smith, L. B., and Samuelson, L. K. (2011). Learning words in space and time: probing the mechanisms behind the suspicious-coincidence effect. Psychol. Sci. 22, 1049–1057. doi: 10.1177/0956797611413934
Trueswell, J. C., Lin, Y., Armstrong, B. III, Cartmill, E. A., Goldin-Meadow, S., and Gleitman, L. R. (2016). Perceiving referential intent: dynamics of reference in natural parent–child interactions. Cognition 148, 117–135. doi: 10.1016/j.cognition.2015.11.002
Werker, J. F., Cohen, L. B., Lloyd, V. L., Casasola, M., and Stager, C. L. (1998). Acquisition of word–object associations by 14-month-old infants. Dev. Psychol. 34, 1289–1309. doi: 10.1037//0012-16220.127.116.119
Wildt, E., and Rohlfing, K. J. (2018). “What type of interactional presentation helps 10-month-olds to overcome the saliency-effect during referent selection?” in Poster Session Presented at the International Congress of Infant Studies (ICIS), Philadelphia, PA.
Keywords: word learning, saliency, relevance, joint attention, joint action
Citation: Wildt E, Rohlfing KJ and Scharlau I (2019) The Role of Saliency in Learning First Words. Front. Psychol. 10:1150. doi: 10.3389/fpsyg.2019.01150
Received: 28 November 2018; Accepted: 01 May 2019;
Published: 15 May 2019.
Edited by:Emily Mather, University of Hull, United Kingdom
Reviewed by:Carmel Houston-Price, University of Reading, United Kingdom
Yang Zhang, University of Minnesota Twin Cities, United States
Copyright © 2019 Wildt, Rohlfing and Scharlau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Eugenia Wildt, firstname.lastname@example.org