Gestural and symbolic development among apes and humans: support for a multimodal theory of language evolution

What are the implications of similarities and differences in the gestural and symbolic development of apes and humans?This focused review uses as a starting point our recent study that provided evidence that gesture supported the symbolic development of a chimpanzee, a bonobo, and a human child reared in language-enriched environments at comparable stages of communicative development. These three species constitute a complete clade, species possessing a common immediate ancestor. Communicative behaviors observed among all species in a clade are likely to have been present in the common ancestor. Similarities in the form and function of many gestures produced by the chimpanzee, bonobo, and human child suggest that shared non-verbal skills may underlie shared symbolic capacities. Indeed, an ontogenetic sequence from gesture to symbol was present across the clade but more pronounced in child than ape. Multimodal expressions of communicative intent (e.g., vocalization plus persistence or eye-contact) were normative for the child, but less common for the apes. These findings suggest that increasing multimodal expression of communicative intent may have supported the emergence of language among the ancestors of humans. Therefore, this focused review includes new studies, since our 2013 article, that support a multimodal theory of language evolution.

A challenge facing researchers investigating the evolution of language is shifting definitions of language itself. Language is often defined circularly as whatever aspects of communication are uniquely human. While psychologists and biologists often define language as culturally specific shared systems of meaning, linguists often define language in terms of innate cognitive structures. A cross-disciplinary dialogue aimed at bridging these definitional chasms set forth two domains of language: a faculty of language in the broad sense (FLB) that consists of skills shared with other species and a faculty of language in the narrow sense (FLN) that is specific to humans (Fitch et al., 2005). They concluded that what constitutes FLN (including whether or not any aspects of language are unique to humans) and how language evolved remain open questions.

THE GESTURAL THEORY OF THE EVOLUTION OF LANGUAGE
The idea that language evolved from a primarily gestural mode of communication is centuries old (Condillac, 1746;Hewes, 1973;Corballis, 2002). Evidence for a gestural origin of language includes the relatively early emergence of bipedalism (freeing up the hands to gesture), the possibility that modern hand configurations arose much earlier than the modern vocal tract, the variability and flexibility of non-human primates' gestures relative to their vocal communication, shared neural substrates for manual action and language, and enhanced laterality of communicative gestures relative to other types of action (Greenfield, 1991(Greenfield, , 2008Lieberman, 1998;Rizzolatti and Arbib, 1998;Corballis, 2002;Hopkins et al., 2005;Molnar-Szakacs et al., 2006;Armstrong and Wilcox, 2007). Despite their flexibility, ape gestures often have specific meanings . Gestures produced by captive orangutans and captive and wild gorillas and chimpanzees are often used intentionally to serve consistent functions, as indexed by persistence on the part of the gesturer and the recipient's response (Genty et al., 2009;Cartmill and Byrne, 2010;Roberts et al., 2012a).
Although non-human primates use and respond to calls referring to specific aspects of the environment, these calls may be limited in their referential flexibility and the degree to which they are under voluntary control . Nonhuman primates may have a limited capacity for vocal imitation relative to humans, aquatic mammals and birds (Fitch, 2000). Indeed, early attempts to teach apes verbal speech revealed pronounced constraints in their capacity to say words. In the early Frontiers in Psychology www.frontiersin.org October 2014 | Volume 5 | Article 1228 | 2 1900s, Furness (1916) attempted to teach two juvenile chimpanzees and orangutans to speak. After extensive training, one of the orangutans learned to say "cup" and clung to him while she said "papa." Over the next 60 years, attempts to teach apes speech by co-rearing them with humans resulted in productive vocabularies of only a few utterances despite years of training (Kellogg and Kellogg, 1933;Hayes, 1951;Laidler, 1980). Despite the apes' great difficulty uttering words, researchers believed that the apes' comprehension of language far exceeded their ability to produce it. In 1966, the Gardner and Gardner (1969) adopted a wildcaptured female, 10 month-old chimpanzee, Washoe. They speculated that difficulties teaching apes to speak might be attributable to physiological rather than cognitive limitations. Relative to other apes, the human larynx is lower and the tongue is more flexible; this allows humans to produce a wider range of discriminable sounds than other apes (Lieberman, 1968;Fitch, 2000). Washoe's caregivers used a simple form of American Sign Language (ASL) to sign about daily activities. Like human children, Washoe learned language by engaging in shared routines with caregivers (Gardner and Gardner, 1969;Acredelo and Goodwyn, 1988). Although caregivers taught her to imitate (or copy) gestures, immediate imitation was more effective for shaping signs than for introducing novel signs. While some signs were acquired via delayed imitation, ontogenetic ritualization (or shaping of previously uncommunicative behaviors into communicative signals through repeat interactions between individuals) probably played a greater role in her symbolic development than imitation. Indeed, Washoe's early signs were often shaped from spontaneous behaviors, such as pounding on the door to request that it be opened.
As Washoe's productive vocabulary increased to approximately 150 signs by 8 years, more names, pragmatic functions and meaningful word combinations emerged. When introduced into a colony of chimpanzees who were encouraged to sign together, Washoe employed techniques that humans had used to teach signs to her-molding, modeling, and signing on his body-to teach signs to Loulis, an infant chimpanzee who she adopted (Fouts and Fouts, 1989). This contrasts with reports that wild juvenile apes learn by observing adults who do not directly teach them (Matsuzawa, 2002).
Subsequent research demonstrated that additional chimpanzees, gorillas, and orangutans could acquire substantial productive vocabularies when gestural or lexigram (arbitrary visual symbols) rather than oral communication systems were used (Hayes, 1951;Gardner and Gardner, 1969;Patterson, 1978;Savage-Rumbaugh, 1987;Patterson et al., 1988;Fouts and Fouts, 1989;Miles, 1990;Bonvillian and Patterson, 1993). These apes grouped symbols into categories when their referent was absent, combined symbols into meaningful novel statements, exhibited word order preferences, and demonstrated the range of pragmatic functions exhibited by children including using language to deceive, to indicate novelty, to communicate declaratively, and to refer to non-present entities (Greenfield and Smith, 1976;Patterson, 1978;Gardner and Gardner, 1969;Savage-Rumbaugh, 1984, 1990;Savage-Rumbaugh, 1987;Miles, 1990;Lyn, 2007;Lyn et al., 2011a,c). While it is important to note that one can group symbols into categories based on perceptual similarity without an understanding of their symbolic functions (e.g., Vygotsky, 1962), errors during vocabulary tests provided additional evidence that language-enculturated apes represent lexigrams (which typically bear no perceptual similarity to their referent) in terms of semantic (but not syntactic) categories (Gardner and Gardner, 1969;Lyn, 2007).
Thus, ape language research provided strong support for the gestural theory of language by demonstrating that apes could communicate symbolically with visual rather than auditory communication systems. However, human language may involve integration of information across different modalities (Hickok and Poeppel, 2007). Indeed, recent evidence suggests that multimodal communication, or the ability to integrate information across different modalities, may have supported the evolution of language (Taglialatela et al., 2011).

A CALL FOR A MULTIMODAL THEORY OF THE EVOLUTION OF LANGUAGE
Although evidence suggests that non-human primates use gestures more flexibly than sounds (Pollick and de Waal, 2007), this evidence may be distorted by a tendency to focus on monkeys in the wild when assessing vocalizations and on apes in captivity when assessing gestures (Slocombe et al., 2011). Recent findings suggest that chimpanzee alarm calls are intentional: they direct them more frequently to allies, persist until their allies are out of danger, and monitor responses (Schel et al., 2013). Both monkeys and apes are more likely to call around certain individuals, suggesting volitional control of vocalizations (Arbib et al., 2008). In addition, non-human primates' comprehension of vocal cues may be substantially more flexible than production . Apes exhibit referential flexibility by using and/or responding to sequences of calls as if they convey more information than the individual calls of which they are composed (Clarke et al., 2006;Clay and Zuberbühler, 2011). Although teaching apes to produce human sounds, which their vocal tracts are not designed to do, was ineffective, teaching apes to modify their own vocalizations into a set of sounds that a computer can discern as associated with different referents has not been systematically attempted but is theoretically possible.
While previous research suggested that chimpanzee gestures activate neural regions associated with language, this association was found only among the apes in the study who vocalized while gesturing (Taglialatela et al., 2011). Recent research has shown that chimpanzees in captivity may learn attention-getting calls socially and can be operantly conditioned to use calls that were not previously in their repertoire (Taglialatela et al., 2012;Russell et al., 2013). These findings have prompted calls for a multimodal theory of language evolution wherein language may have evolved from an integrated system of vocal, facial, and gestural signals (Slocombe et al., 2011).
Foundational to a multimodal theory of language acquisition is the neural integration of hands and mouth. In 1991, Greenfield proposed just such an integration. She noted that, in the first two years of human life, a common neural substrate (roughly Broca's area) underlies the organization of elements in both speech and manual action. The theory posited that an evolutionary homolog Frontiers in Psychology www.frontiersin.org October 2014 | Volume 5 | Article 1228 | 3 of the neural substrate for language production and manual action has provided a foundation for the evolution of human language. Support for this proposition comes from the discovery of a Broca's area homolog and related neural circuits in contemporary primates. More recent research has revealed an even tighter connection: semantic information conveyed by speech and gesture is processed in the same brain areas (reviewed in Levinson and Holler, 2014). Although the study that is the focus of this review was designed to evaluate the gestural theory of language evolution, our findings revealed unexpected support for the multimodal theory of language evolution. In retrospect, a multimodal theory of language evolution is more logical than a purely gestural theory because the human brain is essentially a multi-modal device that converts different modalities of input (e.g., light, sound, touch) into an interpretable framework in order to respond to it. Primates and many other animals integrate information across multiple sensory modalities (Pack and Herman, 1995;Wallace et al., 1996). Given that the input that primates respond to is multimodal, it stands to reason that they would produce multimodal output. Therefore, the capacity for multimodal communication was present in the common ancestor of apes and humans. It seems only logical that any language-based system that evolved would make use of the input and output modalities of the biological systems already in place and honed by millions of years of evolution. A key and unexpected finding of the study that is the focus of this review is that the frequency of multimodal communication may be greater in humans relative to apes when they are raised in similar rearing conditions. This suggests that multimodal communication was enhanced as humans diverged from their sibling species and that this enhancement may have supported the evolution of language.

A DEVELOPMENTAL INVESTIGATION OF THE GESTURAL THEORY OF LANGUAGE EVOLUTION
The study that is the focus of this review was designed to evaluate the gestural theory of language evolution by examining if gesture supports symbolic development across the clade. While phylogeny (evolution) does not repeat ontogeny (development), later stages of development cannot evolve without the ontogenetic foundation of earlier stages already being present; and, thus, later stages of ontogenetic development also tend to evolve later (Parker and McKinney, 1999). Thus, a longitudinal comparison of gestural and symbolic development across the clade provides essential information about the role of gesture in language evolution.

EVIDENCE THAT GESTURE SUPPORTS LANGUAGE DEVELOPMENT AND LANGUAGE EVOLUTION
Gesture is a precursor to symbolic communication for both humans and language-enculturated apes (Brakke and Savage-Rumbaugh, 1996;Iverson and Goldin-Meadow, 2005). Deictic gestures (context-dependent indication) help infants understand links between symbols and referents, allow infants to refer to objects before mastering their names, are more common than words early in development, and predict linguistic development in typical and atypical humans across cultures (Bates et al., 1975;Caselli and Volterra, 1990;Goldin-Meadow and Morford, 1994;Iverson and Goldin-Meadow, 2005;Rowe et al., 2008;Colonnesi et al., 2010;Iverson, 2010;Özçalışkan and Goldin-Meadow, 2010;Goldin-Meadow and Alibali, 2013).
Representational gestures (referring to a specific referent irrespective of context) may also predict linguistic development (Acredolo and Goodwyn, 1985). However, iconic gestures, or non-arbitrary representational gestures wherein the form or motion of an action or object is depicted, typically emerge approximately 6 months after first verbs (Özçalışkan et al., 2013).
Infants initially use both gestures and words referentially, but the use of words to represent and gestures to indicate increases with development (Capirci and Volterra, 2008). While words typically become more common than gestures within the second year of life, gestures remain important as part of two-element combinations (Greenfield and Smith, 1976;Iverson et al., 1994;Capirci and Volterra, 2008). Infants refer to objects in the gestural modality before they refer to them with speech and gesture-symbol combinations precede the development of symbol-symbol combinations in human children and language-enculturated apes (Greenfield and Smith, 1976;Iverson and Goldin-Meadow, 2005;Greenfield et al., 2008).
A primary aim of our study was to determine if this pattern of gestures preceding words was apparent across the clade. A secondary aim was to compare the gestures of humans and apes at comparable stages of development, as no previous study had used video data to compare the gestures of a bonobo, chimpanzee, and human at comparable stages of communicative development.

SIMILARITIES AND DIFFERENCES BETWEEN APE AND HUMAN GESTURES: A DEVELOPMENTAL PERSPECTIVE
Unlike the gestures of human children, the majority of ape gestures are dyadic, intended to draw another's attention to oneself, rather than triadic, intended to draw another's attention to an external entity (Pika, 2008). Also unlike most humans, ape gestures are frequently imperative (requests) and less frequently declarative (attempts to share experience with another; Lyn et al., 2011a). Nonetheless, declarative deictic gestures have been observed throughout the clade: in humans (e.g., Greenfield and Smith, 1976); chimpanzees, both language-trained (Lyn et al., 2011a) and in the wild ; and bonobos, both language trained (Lyn et al., 2011a) and in the wild (Vea and Sabater-Pi, 1998;Leavens, 2004).
As is the case for some (but not all) of the iconic gestures of human children (Acredolo and Goodwyn, 1985), iconic gestures can emerge spontaneously in bonobos and gorillas. Savage-Rumbaugh et al. (1977) pioneered in describing spontaneous iconic gestures used by bonobos. Among a group of captive bonobos, they identified seven different iconic gestures used to position two partners for copulatory bouts, such as moving one's hand and forearm across one's body to induce one's partner to turn around. Gestural combinations were also used to induce a partner to turn around, e.g., first touching a part of the partner's body, then moving one's hand and forearm across one's own body. A gesture that is both iconic and indexical, the "directed scratch gesture," has also been observed among captive bonobos (Pika and  2006), while a beckoning gesture combining iconic and indexical (or deictic) elements has been described for bonobos in a Congo sanctuary . Iconic gestures have also been reported among captive and language-enculturated gorillas (Liebal and Pika, 2005;Tanner et al., 2006). Another similarity between apes and humans is the ability chimpanzees, bonobos and 1-year-old children have to use gestures to communicate about absent and displaced objects (Liszkowski et al., 2009;Lyn et al., 2014;Roberts et al., 2014). Because of flawed methodology, displaced reference was not observed among apes in earlier research (Liszkowski et al., 2009). Thus, displaced reference, a hallmark of language, is visible across the clade in the gestural modality.
How gestures develop remains controversial. One thing that is clear is that social interaction is important for gestural development of both human infants (Acredelo and Goodwyn, 1988) and chimpanzees . While imitation, or social learning through observation, is believed to play a strong role in human gestural development (e.g., Caselli and Volterra, 1990), some human gestures may emerge independently of imitation (Acredelo and Goodwyn, 1988). Imitation may contribute far less to the gestural development of apes relative to humans, as variability in gestures is typically similar within and across groups of apes (Liebal and Call, 2012). Group-specific gestures have been observed, albeit infrequently, among captive and wild apes suggesting some gestural imitation (Pika and Liebal, 2006;Hobaiter and Byrne, 2011).
Ontogenetic ritualization, or conventionalization, may underlie much of ape gestural development (Halina et al., 2013), although this is not without controversy. This is a process of mutual anticipation in which particular social behaviors come to function as intentional communicative signals within a dyad. For example, Plooij (1979) documented how a mother's act of raising a baby chimpanzee's arm to groom him was subsequently transformed into a conventionalized gesture performed by the baby: he raised his arm to ask mother to groom him. This same type of conventionalization is described by Bruner (1975) for young children wherein a mother initiates an interactive routine and the infant signals for the recurrence of the routine by performing some portion of it. However, evidence suggests that ontogenetic ritualization contributes very little to human gestural development (Marentette and Nicoladis, 2012).
Although the specific forms they take may be culturally determined, the general form of many gestures may be inherited from the common ancestor of humans and apes. Researchers documented 66 distinct gestures over 266 days of observation of wild chimpanzees (Hobaiter and Byrne, 2011). Almost half of the gestures had been documented in other ape species. A study classifying the gestures of wild chimpanzees based on their form revealed similar gestures to those documented in other captive and wild ape populations, as well as among humans such as "arm beckoning" and "hand clapping" (Roberts et al., 2012b).
Despite the lack of prior research directly comparing the gestures of apes and humans, purported differences in the gestures of apes and humans have been used to support assertions of human uniqueness. For instance, (Seidenberg and Petitto, 1987) stated that apes do not produce iconic gestures, which may indicate that they cannot mentally represent entities, and that finger pointing requires an ability to draw attention to specific aspects of the environment that apes may lack. Similarly, Povinelli et al. (2003) asserted that apes do not really point because they lack an understanding of other minds. Tomasello (2007) also asserted that both declarative pointing and human language derive from a uniquely human capacity to understand others' minds and share experiences. He stated that apes differ from human children in that they do not share experience for its own sake, as evinced by the absence of showing gestures among apes.
However, substantial evidence contradicts the assertion that pointing is unique to humans. Index finger pointing has been observed among captive apes (Leavens and Hopkins, 1999), language-enculturated apes (Miles, 1990;Brakke and Savage-Rumbaugh, 1996;Krause and Fouts, 1997;Tanner et al., 2006) and apes in the wild (Inoue-Nakamura and Matsuzawa, 1997;Vea and Sabater-Pi, 1998). Early immersion in language-enriched environments may facilitate finger pointing among apes (Call and Tomasello, 1994). Finger pointing is infrequently observed among captive or wild apes who have not been language-enculturated, but reaching is commonly demonstrated by captive and wild apes (e.g., Leavens and Hopkins, 1999;Roberts et al., 2012b). Functional pointing without index finger extension has also been observed; chimpanzees in the wild use "directed scratches" to request grooming of specific body parts (Pika and Mitani, 2006). "Directed scratches" combine an indexical and an iconic element. Reaching and such "directed scratches" generally are used to request, whereas finger pointing often signifies indication. These findings suggest that ape gestures are more frequently imperative than the gestures of humans and that early exposure may increase declarative gesturing among apes, a point confirmed by other research (see below).
Similarly, it has been suggested that comprehension of declarative pointing is unique to humans among members of the clade consisting of bonobos, chimpanzees, and humans (Moll and Tomasello, 2007). This suggestion has been surprising, particularly given other species' (such as dogs and dolphins) ability to follow pointing gestures (e.g., Miklosi and Soproni, 2006;Pack and Herman, 2007). However, several studies have pointed to methodological differences as the largest driver of differences between apes and other animals in comprehension of declarative pointing (Mulcahy and Call, 2009;Lyn, 2010;Mulcahy and Hedge, 2012). Meta-analyses showed that when apes were tested with more distant object referents (as most other species had been) their comprehension of pointing was similar to that documented among other species (Mulcahy and Hedge, 2012). Comprehension of declarative gesturing also seems to be supported by language-enriched environments, with apes from language-enriched environments outperforming captive apes on distant and near pointing tasks .
Although infrequently observed, showing gestures have also been reported for a language-enculturated gorilla (Bonvillian and Patterson, 1999). Showing, a clearly declarative gesture, precedes pointing for humans (Bates et al., 1975), but emerged after pointing for the gorilla, whose pointing behaviors were often interpreted as requests for a caregiver to perform an action. The reduced frequency of declarative gestures produced early in Frontiers in Psychology www.frontiersin.org October 2014 | Volume 5 | Article 1228 | 5 development by the gorilla relative to human children suggests that pointing may serve a more imperative function for apes than humans. Indeed, greater relative frequency of imperative vs. declarative gestures was previously documented among the participants in the study that is the focus of this review (e.g., Lyn et al., 2011a).

CLADISTIC ANALYSIS OF GESTURES: KEY FINDINGS AND IMPLICATIONS FOR LANGUAGE EVOLUTION
We adapted a study by Iverson and Goldin-Meadow (2005) to determine whether gestures support symbolic development across the clade consisting of bonobos, chimpanzees, and humans. Approximately an hour a month of video data of a language-enculturated bonobo (Panbanisha) and chimpanzee (Panpanzee) between 12 and 26 months of age was compared to videos of a human child (GN) between 11 and 18 months of age. Thus, approximately 14 and 15 h of video footage were coded for the bonobo and chimpanzee, respectively, compared to 8 h of video footage of the child. Panbanisha and Panpanzee were raised together from soon after birth in a language-enriched environment wherein they were encouraged to communicate with lexigrams and gestures while engaging in mutually meaningful routines with caregivers (Brakke and Savage-Rumbaugh, 1996). Thus, the types of communicative input received by the apes and the human child were quite similar in that communication occurred during meaningful routines for all three species. However, input was not systematically controlled for and could have influenced our findings. Across 4 years, virtually all evidence of the apes' symbol production and comprehension was entered into a database at the end of the day. Actions preceded communicative gestures, primarily requests, for both apes. Gestures preceded communicative lexigram use for Panpanzee and co-emerged with communicative lexigram use for Panbanisha. Both apes exhibited comprehension and use of lexigrams across contexts indicating that lexigrams were not simply associative stimuli for them. Deferred imitation of prior dialogue provided a foundation for their emerging symbol combinations, as it did for two human children (Gillespie-Lynch et al., 2011). Combinations became increasingly independent of prior input among humans and apes with development.
In support of the gestural theory of language evolution, our study revealed pronounced similarities in the form and function of gestures produced by Panbanisha, Panpanzee, and the child (see Gillespie-Lynch et al., 2013). Increasing reliance on symbols relative to gestures was apparent with development, regardless of species. However, symbols became more frequent than gestures over the course of the study for the human but not the apes. Although all three species exhibited the predicted pattern of objects and events being more likely to be referred to via gesture before speech than the reverse, this pattern was only statistically significant for the human child, likely because she referred to more objects across modalities than did the apes.
However, findings also supported the multimodal theory of language evolution. Communicative intent, or evidence that a gesture was emitted or a symbol was used in order to influence another, is central to the definition of symbolic communication (e.g., Savage-Rumbaugh et al., 1986). All three species exhibited the same markers of communicative intent: eye gaze, vocalization, and persistence. Thus, these markers of communicative intent were likely present in our common ancestor. However, the child more frequently accompanied a gesture with more than one marker of communicative intent than the apes did. The species difference was particularly striking for multimodal communications consisting of gesture plus vocalization. Given that intentionality is a central aspect of symbolic communication, these differences in frequency suggest that increases in multimodal communication-particularly the combination of gesture plus vocalization-may have supported the evolution of language.
Findings also provided gestural evidence that ape communication is more instrumental than human communication. While all three species exhibited finger pointing, the human child pointed more and reached less (138 points vs. 151 reaches) than the bonobo (11 points vs. 271 reaches) and chimpanzee (17 points vs. 358 reaches). The human child was the only one to produce a number of gestures not exhibited by the other species including a declarative gesture, "show," and an iconic gesture, "open." Similarities in types of gestures and the developmental progression from gesture to symbol across the clade when all three species were raised in language-enriched environments suggest that the common ancestor of chimpanzees, bonobos, and humans had the capacity to learn to use a range of gestures, including finger-pointing, and that gestures likely supported the evolution of language. Differences in the frequency of communicative signals and gestures in humans relative to the other apes suggest that the capacity to use multimodal communication (especially gesture plus vocalization), as well as declarative and iconic gestures, was enhanced as humans diverged from their sibling species. Given that the ancestors of humans initially developed language without access to a language-enriched environment, the increasing frequency of multimodal signals and specific gestures in the human line may have supported the emergence of language.

IMPLICATIONS FOR THE EVOLUTION OF LANGUAGE: KEY ROLES FOR GESTURE AND FOR MULTIMODAL COMMUNICATION
Our findings, in conjunction with prior research, demonstrate that gesture precedes symbolic development across the clade consisting of bonobos, chimpanzees, and humans. Our findings also suggest that a combination of shared ancestry (Hobaiter and Byrne, 2011;Roberts et al., 2012b), ontogenetic ritualization (Halina et al., 2013), and imitation (Gardner and Gardner, 1969;Caselli and Volterra, 1990;Miles, 1990;Pika and Liebal, 2006;Gillespie-Lynch et al., 2011) support communicative development across the clade. Differences in the frequency with which humans and apes use multimodal signals to indicate communicative intent, point, use iconic gestures, imitate others, and communicate declaratively suggest that increasing ability to engage in multimodal communication, especially gesture plus vocalization, may have scaffolded the evolution of language (Pika and Liebal, 2006;Lyn et al., 2011a;Gillespie-Lynch et al., 2013).
The child in our study paired gestures with multimodal indices of communicative intent, such as eye gaze, vocalization, and persistence, much more frequently than the apes. A pattern Frontiers in Psychology www.frontiersin.org October 2014 | Volume 5 | Article 1228 | 6 of apes using multimodal communication rarely is evident in research with captive and language-enculturated apes (Savage-Rumbaugh, 1987;Bodamar and Gardner, 2002;Leavens et al., 2010). For example, captive apes typically move into another's visual field before gesturing rather than capturing their attention through vocalization or touch (Liebal et al., 2004). However, wild chimpanzees vocalize while gesturing (Roberts et al., 2012b). Ape vocal, gestural and facial signals have typically been studied separately with different coding systems (Slocombe et al., 2011). Our study was one of the few to integrate coding across modalities, thus being able to assess the frequency of multimodal communication (Gillespie-Lynch et al., 2013). Multimodal signals may elicit a stronger response than the sum of their parts (Slocombe et al., 2011). In infant development, abstraction is facilitated by multimodal cues, intersensory redundancy supports learning of arbitrary relations between vowels and objects, and multimodal information helps infants imitate speech, learn words (Legerstee, 1990;Gogate and Bahrick, 1998;Frank et al., 2009;Rader and Zukow-Goldring, 2010), and generate visual expectancies (Greenfield, 1972). Indeed, human language may be multimodal. McNeill (1992) asserts that speech-synchronized gestures should be considered part of speech.
Because the conclusion of our study was that language has a multimodal (particularly gestural + vocal) origin in both phylogeny and ontogeny, it is relevant to note how the visual and vocal modalities are integrated early in human language acquisition: Between one and two years of age, a common type of verbal communication is an indicative in which the child points at an entity and says its name (Greenfield and Smith, 1976). In similar fashion, bonobos in a sanctuary in the Republic of Congo coordinate sound and gesture into a unified communication. In such communications, sound and gesture complement each other to make up a complex meaning that is communicated to one or more conspecifics .
Although our findings suggest that gesture supported the evolution of language, they do not support the theory that iconic gestures supported the evolutionary transition from action through gesture to language (Tanner et al., 2006). Although traditional accounts of symbolic development suggest that iconicity supports word learning (e.g., Werner and Kaplan, 1984), iconic gestures emerge developmentally after verbs, perhaps because they require complex representational skills in order to decouple an action schema from an action goal and reinterpret it as something else (Özçalışkan et al., 2013). Acredolo and Goodwyn (1985) interpreted most of a child's early representational gestures as indexical (having shifting meanings depending on context) rather than iconic because they referred to routines wherein the gesture was learned. Many ape gestures that have been classified as iconic would be defined as indexical according to this definition. In infancy, Kanzi produced gestures that were interpreted as iconic (i.e., reach). These gestures likely indexed shared routines. Also contrary to the theory that language evolved from iconic gestural depictions of actions (e.g., Tanner et al., 2006), the early vocabularies of language-enculturated apes often include at least as many names of objects as references to actions (e.g., Brakke and Savage-Rumbaugh, 1996).
Our findings and those of others suggest that indexical gestures (such as pointing) paired with multimodal signals of communicative intent are more likely candidates for the types of gestures that may have supported the evolution of language than iconic gestures.
Increasing use of indexical gestures paired with multimodal indices of communicative intent by the ancestors of humans may have produced more frequent and more compelling opportunities for the ancestors of humans to engage in shared attention toward external entities and objects. Engagement in shared attention is associated with enhanced language development among human children (e.g., Tomasello and Farrar, 1986). By encouraging shared attention, indexical gestures may have supported interpersonal environments conducive to the emergence of more symbolic (less context bound) communication. Indeed, a system of self-reflective indexes may underly symbolic communication more generally. Terrence Deacon states that, "language is made possible by a vast network of inter-referring indices. . . [that] effectively "point" to one another (1998, p. 401)." Indexical gestures paired with multimodal cues might have elicited stronger responses than gestures paired with fewer indices of communicative intent (e.g., Slocombe et al., 2011) and been more effective in supporting the types of abstractions that are essential to developing symbolic communication (e.g., Legerstee, 1990;Gogate and Bahrick, 1998;Frank et al., 2009;Rader and Zukow-Goldring, 2010).
Over a century of ape language research has yielded strong evidence in support of the importance of gesture in language evolution and revealed that apes can learn many aspects of language previously thought to be uniquely human when reared in language-enriched environments from infancy. Indeed, it remains possible that other species possess a form of language that has not yet been discerned. Early interactive routines, and opportunities to share attention more generally, may be central to the symbolic development of apes, as they are for human children (Acredelo and Goodwyn, 1988). In contrast to the varied ways that apes immersed in language-enriched environments use symbols, less flexible use of symbols is observed when apes are trained to communicate using operant conditioning (Terrace et al., 1979;Miles, 1983;Savage-Rumbaugh, 1987;O'Sullivan and Yeager, 1989). Our recent study added to prior research by providing evidence supporting the importance of gesture in the ontogeny and phylogeny of language, while, at the same time suggesting that the gestural theory of language evolution should be expanded to include a multimodal foundation for the evolution of human language.

AUTHOR NOTE
We dedicate this manuscript to the memory of Panpanzee and Panbanisha, both of whom were taken from us too soon. We would like to thank the reviewers and editors of this paper for their careful attention to detail and constructive feedback.