The Role of Words in Cognitive Tasks: What, When, and How?

Robinson, Christoper  W; Best, Catherine  A; Deng, Wei (Sophia); Sloutsky, Vladimir

doi:10.3389/fpsyg.2012.00095

HYPOTHESIS AND THEORY article

Front. Psychol., 11 April 2012

Sec. Human Developmental Psychology

Volume 3 - 2012 | https://doi.org/10.3389/fpsyg.2012.00095

This article is part of the Research TopicGrounding Word Learning in Attention, Perception, and MemoryView all 12 articles

The role of words in cognitive tasks: what, when, and how?

Christopher W. Robinson

Catherine A. Best

Wei (Sophia) Deng

Vladimir M. Sloutsky*

Department of Psychology and Center for Cognitive Science, The Ohio State University, Columbus, OH, USA

The current review focuses on how exposure to linguistic input, and count nouns in particular, affect performance on various cognitive tasks, including individuation, categorization and category learning, and inductive inference. We review two theoretical accounts of effects of words. Proponents of one account argue that words have top-down effects on cognitive tasks, and, as such, function as supervisory signals. Proponents of the other account suggest that early in development, words, just like any other perceptual feature, are first and foremost part of the stimulus input and influence cognitive tasks in a bottom-up, non-supervisory fashion. We then review evidence supporting each account. We conclude that, although much research is needed, there is a large body of evidence indicating that words start out like other perceptual features and become supervisory signals in the course of development.

Word learning is a critically important task in early development and is a necessary step in language acquisition. Furthermore, it is often argued that language affects cognition. The reported effects range from imposing category boundaries on sensory continua to affecting the range of concepts that people acquire (Whorf, 1956; Gentner and Goldin-Meadow, 2003; Gleitman and Papafragou, 2005). For example, exposure to words may affect the way people track individual objects (Xu, 2002), learn and use categories (Yamauchi and Markman, 1998, 2000; Sloutsky and Fisher, 2004; Fulkerson and Waxman, 2007; Lupyan et al., 2007; Plunkett et al., 2008; Ferry et al., 2010), and make inductive inferences (Gelman and Markman, 1986; Graham et al., 2004; Sloutsky and Fisher, 2004; Keates and Graham, 2008).

The current review begins by focusing on several theoretical considerations of how word learning, and exposure to linguistic input more specifically, may affect performance on various cognitive tasks. While many of the reviewed findings may generalize to different word classes (e.g., verbs, adjectives, etc.), the current paper primarily focuses on how count nouns and non-linguistic sounds affect performance on a variety of tasks. We then discuss empirical findings examining the necessary components underlying word learning (i.e., processing of arbitrary auditory–visual pairings) and explain how these low-level cross-modal interactions can account for some of the effects of words on a variety of cognitive tasks including categorization, individuation, and induction.

Theoretical Developments: The Role of Words in Early Cognition

Although there is little disagreement as to whether words affect cognition, the nature of these effects, their developmental time course, and underlying mechanisms remain unknown. Some researchers believe that from early in development, words are names of objects and categories, and words act as invitations to form categories (Balaban and Waxman, 1997; Xu, 2002; Waxman and Booth, 2003). At the computational level (Marr, 1982), this account assumes that words function as supervisory signals that direct and guide learning. Thus, if two discriminable items share the same count noun (e.g., both are called “a dax”), the word serves as a top-down signal to the infant or child that these items are equivalent in some way (cf. Gliga et al., 2010).

A second possibility is that early in development, words, just like any other perceptual feature, are first and foremost part of the stimulus input, and they influence cognition in a bottom-up, non-supervisory fashion (Sloutsky and Lo, 1999; Sloutsky and Fisher, 2004; Colunga and Smith, 2005; Plunkett et al., 2008). The account that words are perceptual features affecting processing of visual input has yielded research findings where under some conditions, linguistic input facilitates learning (Samuelson and Smith, 1999; Colunga and Smith, 2005; Plunkett et al., 2008); whereas, under other conditions it hinders learning (Robinson and Sloutsky, 2007a; Best et al., 2011a,b). When reviewing the bottom-up account, we primarily focus on situations when auditory input hinders learning. As we explain elsewhere (Sloutsky and Fisher, 2005), this “auditory dominance” is a consequence of words being features.

A third account posits that words begin as part of the stimulus input, but eventually become supervisory signals (Casasola and Bhagwat, 2007; Casasola, 2008; Sloutsky, 2010; Plunkett, 2011). As children develop they may learn that count nouns have high predictive power in determining a category, and as a result, words may become supervisory signals. While there is little disagreement among theorists that words eventually become top-down supervisory signals (cf. Yamauchi and Markman, 1998; Casasola and Bhagwat, 2007; Lupyan et al., 2007; Casasola, 2008; Sloutsky, 2010; Plunkett, 2011), the precise developmental time course remains unknown.

Distinguishing among these notions of how words might influence cognition is of critical importance for understanding cognitive development. If early in development count nouns function as supervisory signals, then top-down effects may play a significant role in early cognitive development. Perhaps the most important implication is that at both the cognitive and the neural levels, lower-level processes (such as discrimination and generalization) may be subject to top-down control. Also, given that supervision (i.e., guided learning or feedback) results in the ability to learn substantially more complex categories than unsupervised learning (Rumelhart, 1989), if words are supervisory signals for infants and young children, our construal of what infants can and cannot learn early in development will be subject to substantial revision. Alternatively, if words become supervisory signals during the course of development, then top-down control need not exhibit an early onset and such control could be itself the product of development. Understanding how the role of words changes in the course of development has been a focus of research in our lab.

Empirical Support for Words as Features

Preliminary evidence suggesting that effects of words may stem from low-level effects, as opposed to supervisory signals, comes from studies examining children’s similarity judgments and inductive generalizations (Sloutsky and Lo, 1999; Sloutsky et al., 2001; Sloutsky and Fisher, 2004). In these tasks, participants were presented with a target item and two test items. In the similarity judgment tasks, participants selected the test item they deemed more perceptually similar to the target item. In the induction tasks, participants were told an unobservable property about the target, and they had to choose which test item also shared the same unobservable property. Furthermore, in no label conditions, items were labeled with generic phrases (e.g., “Look at this one. This one has yellow blood inside of its body.”), and in label conditions, items were labeled with count nouns (e.g., “This is a Guga. This Guga has yellow blood inside of its body.”).

Several findings from Sloutsky and colleagues’ similarity judgment tasks are relevant for exploring whether words are supervisory signals or features. First, Sloutsky and Lo (1999) demonstrated that words influenced the perceived similarity in 5- to 7-year-olds and in 7- to 9-year-olds but not in 9- to 11-year-olds (i.e., young children reported that two items looked more similar to each other if they shared a common label). These results suggest that some of the effects of words on higher-order tasks may stem from words increasing the perceptual similarity of compared items, as opposed to words being top-down supervisory signals. Second, consistent with this claim, words had comparable effects on similarity judgment and induction tasks at 4–5 years of age; whereas, words affected only inductive generalization (but not similarity judgment) in 11- to 12-year-olds (Sloutsky et al., 2001). This finding suggests that words were playing a different role for younger and older children. Finally, while words affected induction in a qualitative manner in older participants (i.e., older children relied almost exclusively on the word when making inductive inferences), young children took into account both words and appearance information, with words having greater attentional weight than the visual information. These results present evidence that the role of words changes with development, but they raise a number of important questions. Why do words often have greater attentional weights than visual information, and why do words and other types of auditory input have different effects on cognitive tasks early in development?

Mechanism Underlying Attention to Words: Auditory Dominance

When words (or other auditory stimuli) accompany visual stimuli, one has to process information presented cross-modally. In some situations, cross-modal presentation of information can facilitate processing. For example, when the same information can be expressed in multiple sensory modalities (e.g., rhythm, tempo), 5-month-old infants are more likely to learn this information when it is presented cross-modally than when the same information is presented unimodally (Bahrick and Lickliter, 2000; see Bahrick et al., 2004 for a review). At the same time, words and their corresponding referents are arbitrarily paired together within the world, and there are many situations when cross-modal presentation hinders processing of arbitrary, auditory–visual pairings. For example, studies examining auditory dominance in infants and young children show that infants and children often have difficulty discriminating visual stimuli when these are paired with an auditory stimulus (Sloutsky and Napolitano, 2003; Robinson and Sloutsky, 2004, 2007b, 2010a; Sloutsky and Robinson, 2008). This finding is noteworthy because these participants ably discriminate the same visual stimuli presented in silence (Sloutsky and Napolitano, 2003; Robinson and Sloutsky, 2004, 2007b, 2010a; Sloutsky and Robinson, 2008). Furthermore, cross-modal presentation does not appear to attenuate auditory processing (Sloutsky and Napolitano, 2003; Robinson and Sloutsky, 2010a). We refer to this asymmetric cost (i.e., cross-modal presentation attenuates visual but not auditory processing) as auditory dominance. We believe that auditory dominance underlies many of the effects of words on cognitive tasks. But what is the mechanism of auditory dominance?

In an attempt to elucidate the mechanism underlying auditory dominance, we have formulated a set of theoretical considerations pertaining to the allocation of attention in the course of cross-modal processing (for a more extensive review see Robinson and Sloutsky, 2010b). The overall idea is that attentional resources are finite, which results in modalities competing for attention. When multisensory stimuli are presented simultaneously, the stimulus that is faster to engage attention wins the competition. During the later stages of processing, infants and children begin processing the details of the stimuli; however, due to the selective nature of sustained attention (see Berg and Richards, 1997; Richards, 2001, 2005, for reviews), it is likely that processing of stimuli in the “winning” modality will be enhanced whereas processing of stimuli in the “losing” modality will be attenuated. At some point in the course of processing, the winning modality will release attention, thus, allowing for more attentional resources to be deployed to the losing modality. Given these assumptions: (a) auditory dominance effects should be more pronounced early in the course of processing because the auditory modality has not had a chance to release attention (Robinson and Sloutsky, 2008), (b) auditory dominance should be more pronounced in younger populations due to slower overall processing speeds (Robinson and Sloutsky, 2004), and (c) auditory stimuli that are slow to release attention (e.g., complex or novel) should exert stronger interference than auditory stimuli that are fast to release attention (Robinson and Sloutsky, 2007b, 2010a; Sloutsky and Robinson, 2008).

Several factors may also give auditory input a “leg-up” on visual input. First, auditory stimuli are often transient; whereas, visual stimuli are often presented for longer durations. Thus, it may be adaptive to first allocate attention to stimuli that will quickly disappear. Second, almost all naturally occurring auditory stimuli are dynamic in nature as they change in pitch and amplitude across time. While some visual stimuli can also be dynamic, many visual stimuli are static for extended periods of time. Third, auditory stimuli are often processed faster than visual stimuli in adults (Green and von Gierke, 1984), and due to early maturation of the auditory system, this difference may be even more pronounced early in development.

Empirical Support for Auditory Dominance

Initial evidence for auditory dominance comes from a series of experiments examining change detection in 4-year-olds and adults (Sloutsky and Napolitano, 2003). For example, participants in some of the reported experiments were presented with an auditory–visual target item (AUD_TargetVIS_Target) followed by a test item. Participants had to respond same if the two compound stimuli had the same auditory and visual components as the target and respond different if either the auditory and/or visual component changed at test in any of three combinations (e.g., AUD_TargetVIS_New, AUD_NewVIS_Target, AUD_NewVIS_New). The auditory components consisted of unfamiliar non-linguistic sounds and the visual components consisted of unfamiliar images (e.g., landscapes). If participants encoded both auditory and visual stimuli, then they should correctly accept target items as the same, while correctly rejecting items that had either new visual or new auditory components as different. Adults were accurate across all three test trial types, suggesting they encoded both the auditory and visual components. In contrast, children failed to report a difference when only the visual input changed at test (AUD_TargetVIS_New). At the same time, children ably discriminated the visual stimuli when presented unimodally in a separate experiment; therefore, it was concluded that the auditory input overshadowed the corresponding visual input in children. This finding has been replicated using a variety of tasks, including familiarization and habitation procedures in infants (Robinson and Sloutsky, 2004, 2010a; Sloutsky and Robinson, 2008).

Differential Effects of Words and Sounds on Visual Processing

These interference effects can be mediated by the type and familiarity of the auditory stimulus (Robinson and Sloutsky, 2007b, 2010a; Sloutsky and Robinson, 2008). In fact, Napolitano and Sloutsky (2004) demonstrated that it is even possible to reverse dominance effects (i.e., achieve visual dominance) in 4-year-olds by using familiar visual stimuli and unfamiliar auditory stimuli. While such a reversal was not found in infants (Robinson and Sloutsky, 2010a), there are reasons to believe that it is possible to attenuate modality dominance effects. Recall that the underlying idea of auditory dominance is that auditory stimuli are often faster to engage attention than visual stimuli, and processing of the details of a visual stimulus does not begin until the auditory modality releases attention. Thus, auditory stimuli that are processed quickly and are fast to release attention (e.g., simple and/or familiar) should exert less cross-modal interference than auditory stimuli that are slow to release attention (e.g., complex and/or novel).

This hypothesis was tested in several experiments by (1) manipulating the type of auditory stimulus or (2) by pre-familiarizing 8- to 16-month-old infants to the auditory stimulus before pairing it with a corresponding visual stimulus (Robinson and Sloutsky, 2007b, 2010a; Sloutsky and Robinson, 2008). In the pre-familiarization conditions, infants were first exposed to the auditory stimuli (presented unimodally), then given a short break, and then they were tested in the experiment proper where we measured discrimination of the auditory and/or visual input. Discrimination of the visual stimuli in the different auditory conditions (e.g., unfamiliar sounds or pre-familiarized sounds) was compared to discrimination of the same visual stimuli in a silent baseline.

The results from these studies demonstrate that words and sounds have different effects on visual processing (Robinson and Sloutsky, 2007b; Sloutsky and Robinson, 2008). For example, using a continuous familiarization procedure (cf., Fantz, 1964; Roder et al., 2000), 14-month-old infants required less familiarization to reliably discriminate visual images when the images were paired with words (i.e., “Look at the dax”) than when the same images were paired with unfamiliar sounds (Robinson and Sloutsky, 2007b). However, comparisons to a unimodal visual baseline showed that this effect resulted from unfamiliar sounds attenuating visual processing, as opposed to words facilitating visual processing. Furthermore, pre-familiarization experiments corroborate this finding: when infants were pre-familiarized to the unfamiliar non-speech sounds prior to the experiment proper, interference effects disappeared and words and non-speech sounds had comparable effects on visual processing (Robinson and Sloutsky, 2007b). These findings are consistent with the proposed mechanism underlying auditory dominance, and they have direct implications on a variety of higher-order tasks that rely on processing of auditory and visual information.

Effects of Words on Cognitive Tasks: Individuation, Categorization, and Induction

Effects of words have been found in many cognitive tasks; however, we only focus on individuation, categorization, and induction. In what follows we consider the direction of these effects (i.e., whether the target task is facilitated or hindered by the presence of words) and their robustness (i.e., whether effects of words exceed those of non-speech sounds and those of the silent baseline).

Individuation

There have been a number of reports suggesting that infants are more likely to track individual objects across time and space when these individuals are associated with unique words. For example, in Xu (2002), 8-month-old infants were familiarized to a duck and a ball appearing and disappearing from behind an occluder. At test, the occluder dropped revealing either one object (unexpected event) or two objects (expected event). When each object was accompanied by a unique word (i.e., “a duck” and “a ball”), infants expected two objects to be behind the occluder. However, when the duck and ball were paired with two unfamiliar sounds or one word, infants did not appear to make this assumption.

To determine if hearing unique words facilitated individuation, Robinson and Sloutsky (2008) conducted two experiments that familiarized 8- and 14-month-old infants to either: (a) a duck and ball appearing and disappearing from behind an occluder or (b) two novel creatures appearing and disappearing in a basket. The visual stimuli were either paired with two unique words (e.g., “a duck” and “a ball”), two unique non-linguistic sounds, or the images were presented in silence. Across both reported experiments, hearing non-linguistic sounds attenuated learning compared to the silent condition. When infants were given ample time to process the images, words had no effect compared to the silent condition (Experiment 1); however, under shortened stimulus presentations, both words and sounds attenuated learning compared to the silent condition (Experiment 2). These findings are consistent with auditory dominance and further suggest that differential effects of words and sounds stem from unfamiliar sounds attenuating visual processing more than count nouns, as opposed to count nouns serving as a top-down supervisory signal which facilitates learning.

Categorization

Categorization is a fundamental skill for learning, so it is not unexpected that categorization abilities emerge early in infancy with 3-month-old infants forming perceptual categories (e.g., Quinn et al., 1993). Yet, as infants learn to organize their visual world, they are also learning words that map onto objects within their surroundings. Given the importance of words in everyday speech by adults to convey category meaning, it is not without reason to assume there is a relation between word learning and category learning. The nature of this relation between words and categories is not without dispute, however. According to one account, even very young infants have some understanding that words (but not other types of auditory input) denote categories, with words facilitating categorization by highlighting common features (Ferry et al., 2010). According to other accounts, words are part of the input (i.e., features), which either facilitate or interfere with learning (Samuelson and Smith, 1999; Colunga and Smith, 2005; Robinson and Sloutsky, 2007a; Plunkett et al., 2008; Best et al., 2010, 2011b).

There have been reports that words (specifically count nouns) facilitate infants’ categorization. However, similar to individuation research, most of the studies pointing to facilitative effects of words on categorization did not include a silent baseline. Instead, these studies compared infants’ and children’s performance in a word condition with that in a non-linguistic sound or no label condition. For example, to estimate effects of words on category learning, Waxman and colleagues (e.g., Balaban and Waxman, 1997; Fulkerson and Waxman, 2007; Ferry et al., 2010) compared 3-, 6-, and 9-month-old infants’ learning of a category in a word condition where the same word (e.g., “a rabbit” or “do you see the toma”) was associated with members of the to-be-learned category to infants’ learning of a category in a sound condition where the same non-linguistic sound was associated with category members (but see Best et al., 2010 and Waxman and Braun, 2005, where effects of common words were also compared to unique words). As in auditory dominance research, words and sounds often had different effects, with only infants in the word conditions learning the categories (Balaban and Waxman, 1997; Fulkerson and Waxman, 2007; Robinson and Sloutsky, 2007a).

However, using the non-linguistic sound (or unique label) condition as a control makes sense only if it is established that sounds or unique words facilitate category responding, and it is important to determine if effects of words exceed the general facilitative effects of sounds. While work by Roberts and Jacob (1991) is often cited as evidence of general auditory facilitation effects, Roberts (1995) demonstrated that sounds and labels facilitate learning only when the presentation of auditory input was contingent on infants’ looking (e.g., infants did not hear words or sounds when looking away from visual stimuli). This suggests that the contingency rather than the presence of the auditory stimulus may be driving the facilitative effect. Furthermore, auditory dominance research demonstrates that both sounds and words can interfere with visual processing (Robinson and Sloutsky, 2007b; Sloutsky and Robinson, 2008). Therefore, without a unimodal visual baseline, it is unclear whether differences between two auditory conditions (e.g., words vs. sounds), if found, stem from words facilitating categorization, from sounds interfering with categorization, or from both (see Robinson and Sloutsky, 2007a for additional discussion).

The studies that have directly compared effects of words on categorization to a silent condition have yielded mixed results (Roberts and Jacob, 1991; Roberts, 1995; Waxman and Markow, 1995; Fulkerson and Haaf, 2003; Robinson and Sloutsky, 2007a; Plunkett et al., 2008). In Fulkerson and Haaf (2003) and Waxman and Markow (1995), effects of words did not exceed the silent condition when 9-, 12-, and 15-month-olds infants were trained and tested on basic-level categories; however, effects of words did appear to facilitate categorization above the silent condition when the categories were more abstract. Both words and sounds can facilitate categorization at 15 months when the presentation of auditory input is contingent on infants’ looking; however, neither words nor sounds facilitate categorization when this contingency is broken (Roberts and Cuff, 1989; Roberts and Jacob, 1991; Roberts, 1995). While Plunkett et al. (2008) did not find facilitative effects of words per se, their study demonstrated that words can affect the structure of the learned category: when presented with the same visual stimuli, 10-month-old infants who heard one word (e.g., “Look, dax”) formed one category; whereas, infants who heard two words (“Look, dax” and “Look, rif”) formed two categories.

Research from our lab demonstrates that words either have no effect on categorization or they interfere with categorization. For example, in Robinson and Sloutsky (2007a), 8- and 12-month-olds were familiarized to different exemplars from the same category, and each member of the category was either associated with the same word (e.g., “a cat”), the same non-linguistic sound, or no auditory input was provided (i.e., a silent condition). After familiarization, infants were simultaneously presented with a novel stimulus from the familiarized category and a novel stimulus from a novel category. Categorization was inferred from increased looking to the novel category items compared to the familiarized category items. At both 8 and 12 months of age, infants were more likely to form categories in the silent condition than in the word or sound conditions. While 12-month-olds were more likely to learn categories when items were accompanied by words than non-linguistic sounds, this effect was driven by non-linguistic sounds hindering categorization more than words. These findings are consistent with previous research examining auditory dominance (Robinson and Sloutsky, 2007a; Sloutsky and Robinson, 2008).

The hindering effects of words on categorization did not completely disappear with age. We presented 4-year-old children with a category learning task where they had to learn two types of flowers, and we tested categorization at various points in the course of training (Best et al., 2011a). The experiment had a between-subjects design, with participants randomly assigned to one of two experimental conditions (i.e., word or silent). In the word condition, the two types of flowers were labeled during training [e.g., “These flowers are called zibblers (blickets)”]. At test we presented novel flowers and children had to determine if the flowers were zibblers or blickets. In the silent condition, we presented flowers in silence and children had to associate the two types of flowers with two creatures (i.e., creature 1 ate one type of flower and creature 2 ate a different type of flower). At test we presented novel flowers and children had to determine which creature ate that type of flower. The most interesting finding from this study was that effects of hearing words during training hindered category learning compared to when objects were presented in silence, with only children in the silent condition reliably categorizing the novel flowers. This study, in conjunction with Robinson and Sloutsky (2007a), casts doubt on the claim that words are supervisory signals that facilitate category learning.

However, the above mentioned studies only focused on the outcome of learning, not on the process of learning. Thus, one limitation of previous research is that the mechanisms underlying the effects of words on categorization are often inferred by examining infants’ novelty preference at test, rather than directly testing how words affect attention in the course of category learning. We have recently addressed this issue by using an eye tracker to examine infants’ fixations to category-relevant and irrelevant features in the course of learning (Best et al., 2011b). Six- to 8-month-old infants in this study were familiarized to novel images, which were either presented in silence or paired with the same word (e.g., “Look at the feps. Do you see the feps?”). At test, we simultaneously presented a novel item from the familiar category and a novel item from a new category, and categorization was assessed by increased looking to the novel category.

If words facilitate categorization by directing infant’s attention to category-relevant features then infants who hear the same word paired with different category members should show an increase in looking to category defining features (in terms of first look or overall looking times). However, infants who heard the same word paired with different exemplars during familiarization did not increase looking to category-relevant features across training, nor did they accumulate more looking to relevant features than infants in a silent condition. In fact, the pattern was in the opposite direction, with infants who heard words during training reliably looking to category-irrelevant features. Furthermore, whereas infants in the silent condition exhibited a reliable novelty preference between 0 and 2000 ms within test trials, infants’ looking at test in the word condition never differed from chance performance. These findings demonstrate that words hindered category learning at 6–8 months of age and cast doubt on the claim that facilitative effects of words stem from words directing infants’ attention to category-relevant features.

In summary, while it is well documented that words and sounds can have different effects on category learning, most of the published findings do not include a silent condition to serve as a control. Thus, it is often unclear if common words are facilitating categorization or if non-linguistic sounds, unique words, or no label phrases are disrupting categorization. We have also demonstrated that words and sounds can have different effects on individuation and category learning (Robinson and Sloutsky, 2007a, 2008); however, consistent with auditory dominance research, this effect stems from non-linguistic sounds hindering categorization more than words. Furthermore, eye tracking data provide no support for the claim that words facilitate categorization by highlighting common features (Best et al., 2011b). While there are reasons to believe that words eventually become supervisory signals that facilitate categorization, the reported studies question whether such as mechanism is at play early in development.

Induction

The studies reviewed so far have indirectly tested whether words are supervisory signals or features by focusing on facilitation and interference effects. However, there are tasks developed for addressing this issue more directly. For example, in Yamauchi and Markman’s (2000) work, adults were presented with two tasks. In the classification task, adults were presented with bugs comprised of multiple features, and they had to determine whether the bug belonged to category 1 or category 2. Thus, participants had to use the features to predict the category label. In the induction task, participants were presented with bugs and corresponding words, and they had to use the words and features to infer a missing feature. If words are simply features, then there should be no difference in performance between the two tasks because participants are making inferences based on the same number of features. However, if words are category markers (i.e., a supervisory signal), then performance in the two tasks should differ because they can rely on the category marker in the induction task but not in the classification task. They found that adults relied almost exclusively on the words in the induction task, suggesting that words are more than features for adults.

Using a similar approach, Deng and Sloutsky (2012) tested whether words are features or category markers in 4- to 5-year-olds and adults. However, in the current experiment we also pitted the words (e.g., “This is a flurp”) against a feature that was more salient than the words. This manipulation was critically important because if words are more than features, then salience of the competing feature should not matter – participants should rely on words when performing induction. However, if words are features, then participants should rely on highly salient features when they are pitted against words.

The results indicate that young children exhibited overwhelming reliance on a highly salient feature and not on the category label, whether the label was novel (Experiment 1) or familiar (Experiment 2). Thus, in contrast to adults in Yamauchi and Markman (1998), children responded similarly across both tasks, suggesting that words are features for young children. The results are more complicated in adults: some adults exhibited consistent reliance on the salient feature and some relied on the label. Taken together these results indicate that for young children (and for some adults) category labels may function as features, as little reliance on category label was observed when it was pitted against the highly salient feature. At the same time, for some adults labels may be category markers. These results cast doubt on the view that labels start out as supervisory signals, suggesting instead that early in development labels are features, but they may become supervisory signals in the course of development.

The notion that words are features also predicts that, like other perceptual features, the phonological similarity of the word should affect children’s inductions. To test this hypothesis, Sloutsky and Fisher (2012) presented 5-year-olds and adults with lexical extension and property induction tasks, and they systematically manipulated the phonological similarity of the word. In the lexical extension task, a computer presented a target object and corresponding word (e.g., “gama”) and participants had to determine which of four test items would be called a guma. Children but not adults extended the phonologically similar word to a perceptually similar object. In the induction task, participants were presented with a target object and two test objects and they had to determine which test item shared an unobservable property with the target. Consistent with previous findings, words contributed in a quantitative manner for young children. Children were more likely to rely on the word to make inductions when the target and one of the test items shared the exact same word (e.g., gama and gama) than when the target and test items were labeled with phonologically similar words (e.g., gama and guma). More importantly, children were also more likely to rely on phonologically similar (yet highly discriminable) words than on phonologically dissimilar words (e.g., satu and kipa). Thus, similar to other perceptual features, words and effects of words on induction are also influenced by the perceptual similarity of the word. While these findings are consistent with the “words as features” account, they pose a challenge for the idea that words are top-down supervisory signals that denote category membership.

Future Directions

The studies reviewed in this paper point to clear developmental differences in the role of words in a variety of cognitive tasks and in the processing of arbitrary auditory–visual pairings more generally. For example, there is a gradual decrease in relying on words in similarity judgment tasks between 4 and 12 years of age and an increase in relying on words in induction and categorization tasks (Sloutsky and Lo, 1999; Sloutsky et al., 2001). Five-year-olds and adults use words differently when making inductive generalizations and lexical extensions (Deng and Sloutsky, 2012; Sloutsky and Fisher, 2012), and there are considerable differences in 4-year-olds’ and adults’ processing of arbitrary auditory–visual pairings (Sloutsky and Napolitano, 2003; Robinson and Sloutsky, 2004). However, drawing strong conclusions about the developmental trajectory early in development is difficult. This difficulty stems primarily from using different methodologies within infant and child populations. Given the findings by Deng and Sloutsky (2012) and Sloutsky and Fisher (2012), it seems reasonable to posit that young infants are also treating words as features; however, to fully capture the developmental trajectory, future research will need to test infants and children using identical procedures.

While the current review primarily focused on research within our lab, it will be important to reconcile the current infant findings with previous research. When effects of words are assessed by comparing performance in a word condition (e.g., the same word denotes all members of the category) to non-linguistic sounds, varying labels, and no labels, it is typically found that words have a different effect than other types of input (Balaban and Waxman, 1997; Xu, 2002; Fulkerson and Waxman, 2007; Ferry et al., 2010). However, when effects of words are assessed by making comparisons to a unimodal visual baseline, the findings are mixed with some evidence suggesting that words interfere with learning (Roberts and Jacob, 1991; Roberts, 1995; Waxman and Markow, 1995; Fulkerson and Haaf, 2003; Robinson and Sloutsky, 2007a; Plunkett et al., 2008). While the former comparisons clearly demonstrate that different types of auditory input have different effects, it is difficult to determine what is driving this effect without a silent baseline (e.g., are non-linguistic sounds interfering with learning or are words facilitating learning?). If words act as top-down supervisory signals that facilitate categorization by directing attention to the category-relevant features, then this should be evident in eye tracking data with infants who hear words accumulating more looking to the relevant features and more likely to learn categories compared to infants who do not hear words. While this hypothesis requires further consideration, our preliminary eye tracking study found no support for the claim that labels facilitate categorization (Best et al., 2011b).

Finally, while additional research is needed to examine the developmental trajectory, it will be important to determine what mechanisms best account for the developmental pattern. According to Sloutsky (2010), several components may underlie children’s abilities to use labels as top-down supervisory signals. First, because many words are presented auditorily and many objects are presented visually, children need to be efficient at processing arbitrary auditory–visual pairings. Second, because many causal or central features that define a category are implicit in nature and not directly observable in the input (e.g., essences, causal relations, etc.), children have to learn how to ignore the perceptual details of a stimulus and attend to these less obvious features. It seems reasonable to posit that this ability requires top-down selective attention and the development of the prefrontal cortex (Diamond and Goldman-Rakic, 1989; Bunge and Zelazo, 2006; Davidson et al., 2006), and therefore may not be present early in development.

Conclusion

In summary, associating words with objects and more abstract categories is a necessary step in language acquisition, and it is well established that words affect performance on a variety of cognitive tasks. The research in our lab suggests that words function as features, and effects of words on cognitive tasks are initially grounded in the dynamics of cross-modal processing. This proposal suggests that words functioning as features may either hinder task performance (i.e., when the task requires processing of details of visual input, such as when items are presented sequentially) or facilitate performance (i.e., when reliance primarily on words may be sufficient for performing the task, such as in match-to-sample and other tasks where stimuli are presented simultaneously). We reviewed a substantial body of evidence, supporting this proposal, indicating that words start out as features affecting infants’ and children’s performance on cognitive tasks in a bottom-up manner, but they may become supervisory signals in the process of development. Much additional research is needed to understand why, how, and when this transformation takes place.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

Writing of this manuscript is supported by National Science Foundation Grant BCS-0720135 Institute of Education Sciences, US Department of Education Grant R305H050125, and by National Institutes of Health Grant R01HD056105 to Vladimir M. Sloutsky.

References

Bahrick, L. E., and Lickliter, R. (2000). Intersensory redundancy guides attentional selectivity and perceptual learning in infancy. Dev. Psychol. 36, 190–201.

Pubmed Abstract | Pubmed Full Text | CrossRef Full Text

Bahrick, L. E., Lickliter, R., and Flom, R. (2004). Intersensory redundancy guides the development of selective attention, perception, and cognition in infancy. Curr. Dir. Psychol. Sci. 13, 99–102.