# IS THE LANGUAGE FACULTY NONLINGUISTIC?

EDITED BY: Umberto Ansaldo and N. J. Enfield PUBLISHED IN: Frontiers in Psychology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-914-3 DOI 10.3389/978-2-88919-914-3

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **IS THE LANGUAGE FACULTY NONLINGUISTIC?**

Topic Editors:

**Umberto Ansaldo,** The University of Hong Kong, China **N. J. Enfield,** The University of Sydney, Australia

"Constructed and Emergent" Image by N. J. Enfield

A line of research in cognitive science over several decades has been dedicated to finding an innate, language-specific cognitive system, a faculty which allows human infants to acquire languages natively without formal instruction and within short periods of time. In recent years, this search has attracted significant controversy in cognitive science generally, and in the language sciences specifically. Some maintain that the search has had meaningful results, though there are different views as to what the findings are: ranging from the view that there is a rich and rather specific set of principles, to the idea that the contents of the language faculty are - while specifiable - in fact extremely minimal. But other researchers rigorously oppose the continuation of this search, arguing that decades of effort have turned up nothing. The fact remains that the proposal of a language-specific faculty was made for a good

reason, namely as an attempt to solve the vexing puzzle of language in our species. Much work has been developing to address this, and specifically, to look for ways to characterize the language faculty as an emergent phenomenon; i.e., not as a dedicated, language-specific system, but as the emergent outcome of a set of uniquely human but not specifically linguistic factors, in combination. A number of theoretical and empirical approaches are being developed in order to account for the great puzzles of language - language processing, language usage, language acquisition, the nature of grammar, and language change and diversification. This research topic aims at reviewing and exploring these recent developments and establishing bridges between these young frameworks, as well as with the traditions that have come before. The goal of this Research Topic is to focus on current developments in what many regard as a paradigm shift in the language sciences. In this Research Topic, we want to ask: If current explicit proposals for an innate, dedicated faculty for language are not supported by data or arguments, how can we solve the problems that UG was proposed to solve? Is it possible to solve the puzzles of language in our species with an appeal to causes that are not specifically linguistic?

**Citation:** Ansaldo, U., Enfield, N. J., eds. (2016). Is the Language Faculty Nonlinguistic? Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-914-3

# Table of Contents


Vyvyan Evans

### **4. Perspective**

*114 An Evaluation of Universal Grammar and the Phonological Mind1* Daniel L. Everett

# Editorial: Is the Language Faculty Nonlinguistic?

#### Umberto Ansaldo<sup>1</sup> \* and N. J. Enfield<sup>2</sup>

*<sup>1</sup> Department of Linguistics, University of Hong Kong, Hong Kong, China, <sup>2</sup> Department of Linguistics, The University of Sydney, Sydney, NSW, Australia*

Keywords: Universal Grammar, development, innateness, evolution, phonology, syntax, semantics

#### **Editorial on the Research Topic**

#### **Is the Language Faculty Nonlinguistic?**

A line of research in cognitive science over several decades has been dedicated to mapping a hypothetically innate, language-specific cognitive system, a faculty that allows human infants to acquire languages natively without formal instruction and within short periods of time. In recent years, this search has attracted significant controversy in cognitive science generally, and in the language sciences specifically. Some maintain that the search has had meaningful results, though there are different views as to what the findings are: ranging from the view that there is a rich and rather specific set of principles, to the idea that the contents of the language faculty are while specifiable—in fact extremely minimal. Other researchers rigorously oppose the continuation of this search, arguing that decades of effort have turned up nothing. The fact remains that the proposal of a language-specific faculty was made for a good reason, namely as an attempt to solve the vexing puzzle of language in our species. Much work has been developing to address this, and specifically, to look for ways to characterize the language faculty as an emergent phenomenon; i.e., not as a dedicated, language-specific system, but as the emergent outcome of a set of uniquely human but not specifically linguistic factors, in combination. A number of theoretical and empirical approaches are being developed in order to account for the great puzzles of language—language processing, language usage, language acquisition, the nature of grammar, and language change and diversification. The goal of this Research Topic is to ask whether a paradigm shift has indeed occurred that allows us to conceptualize language not as an innate, dedicated faculty, but as the result of general cognitive abilities adapted for linguistic use.

In the first of three review articles, D ˛abrowska reviews the fundamental arguments in support of the Universal Grammar hypothesis. The focus is on the three most powerful arguments, namely universality, convergence, and poverty of stimulus. The author maintains that all three can be proven wrong: languages have been shown to display deep differences of structure; significant variation has been documented in speakers' knowledge of grammar; and grammatical constructions have been proven to be learnable through input. The second review by Christiansen and Chater takes issue with the latest, most minimal proposal for a language faculty (LF): recursion. Through a review and discussion of genetic, non-human primate and neuro-scientific research the authors argue that an innate LF is evolutionarily unlikely. The ability to process recursive structure emerges gradually through adaptation of domain-general sequence learning abilities. The relationship between domain-specificity and linguistic adaptation is the focus of the third review, by Culbertson and Kirby. The authors propose that our linguistic knowledge is best seen as a unique interaction of domain-general capacities with language. This can be illustrated by what they see as a powerful general bias towards simplicity of representation, which manifests itself cross-linguistically through universal tendencies such as compositionality, regularity, harmony, and isomorphism.

#### Edited and reviewed by:

*Manuel Carreiras, Basque Center on Cognition, Brain and Language, Spain*

> \*Correspondence: *Umberto Ansaldo uansaldo@gmail.com*

#### Specialty section:

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

Received: *17 May 2016* Accepted: *24 May 2016* Published: *10 June 2016*

#### Citation:

*Ansaldo U and Enfield NJ (2016) Editorial: Is the Language Faculty Nonlinguistic? Front. Psychol. 7:861. doi: 10.3389/fpsyg.2016.00861*

In their more theoretical article, Mattos and Hinzen shift the focus of the debate to the acquisition of declarative gestures in pre-verbal children. Even before the onset of one-word expressions, children show the ability to link lexical concepts to gestures. This, the authors argue, can only be explained by a system that is both symbolic and referential, and must be taken as a challenge to the alleged non-linguistic roots of natural pedagogy. In the second article of more theoretical nature, Adger and Svenonius defend the view that "aspects of our best theories of syntactic phenomena are simply special cases of more general principles. But those more general principles are not established at the moment [. . . ] generative syntax provides a potential way to reach those more general principles." A methodological point made here is that in evaluating domain specificity we need to ensure that we evaluate principles of actual explanatory power. A theoretical point maintains that principles might exist that are language-specialized, i.e., linguistic versions of more general cognitive principles. The third of these more theoretically oriented contributions, by Goldberg, concerns exactly what kind of evidence should be used in support of UG. Goldberg looks at the "subtle and intricate" implicit knowledge of language that speakers seem to possess. Even these cases, the author argues, do not warrant the positing of unlearned syntactic structures, as they can be explained by the functions of the constructions involved. Crucially these are learned, conventionalized, and only require domain-general constraints on perception, attention and memory.

Two original research papers offer strong views against innateness. Archangeli and Pulleyblank present a take on phonology based on the Emergent Grammar Hypothesis. In this view humans are understood to make sense of linguistic data primarily through three non-linguistic abilities: categorial thinking, sensitivity to frequency, and symbolic generalization. In three case studies ranging from English to Bantu and Esimbi, the authors show how diverse language data can be explained by such operational abilities. They propose an emergent basis for not only phonology but possibly morphological structures too. In a second original research paper, Evans approaches human language as a communicative system that must have two fundamental design features: a conceptual and a linguistic system, each of which contributes to meaning construction. Evans argues that both systems operate in a symbiotic relation and are semantic in nature, but the former is evolutionarily older and is the one to which the latter is adapted.

Finally, in a perspective piece, Everett takes issue in particular with the notion of a "phonological mind," or phonological nativism. The proposal, according to the author, suffers from at least two shortcomings. A theoretical problem is that properties invoked in phonological nativism are not successfully explained in evolutionary terms. A methodological problem confuses design features of any given system with innate, rather than acquired, constraints.

We are pleased to present a set of articles that approach our research question—Is the language faculty nonlinguistic?—from a range of angles, and with consideration of multiple stances on the question. Perhaps most importantly, this applies to the very idea of what a language faculty is. The concept can be understood in two distinct ways:


We might call (1) an axiom. Nobody hypothesizes that humans have a capacity for language. Rather, that capacity is the thing to be explained and understood. By contrast, (2) is a hypothesis, i.e., that the relevant mechanisms are not general but are specifically dedicated to language. These two concepts of a language faculty must not be confused. Progress with this central problem in the psychology of language will not only require a constructive approach to dialogue between those of differing views, it will require conceptual clarity at every step.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct, and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Ansaldo and Enfield. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **What exactly is Universal Grammar, and has anyone seen it?**

#### *Ewa Da*˛*browska\**

*Department of Humanities, Northumbria University, Newcastle upon Tyne, UK*

Universal Grammar (UG) is a suspect concept. There is little agreement on what exactly is in it; and the empirical evidence for it is very weak. This paper critically examines a variety of arguments that have been put forward as evidence for UG, focussing on the three most powerful ones: universality (all human languages share a number of properties), convergence (all language learners converge on the same grammar in spite of the fact that they are exposed to different input), and poverty of the stimulus (children know things about language which they could not have learned from the input available to them). I argue that these arguments are based on premises which are either false or unsubstantiated. Languages differ from each other in profound ways, and there are very few true universals, so the fundamental crosslinguistic fact that needs explaining is diversity, not universality. A number of recent studies have demonstrated the existence of considerable differences in adult native speakers' knowledge of the grammar of their language, including aspects of inflectional morphology, passives, quantifiers, and a variety of more complex constructions, so learners do not in fact converge on the same grammar. Finally, the poverty of the stimulus argument presupposes that children acquire linguistic representations of the kind postulated by generative grammarians; constructionist grammars such as those proposed by Tomasello, Goldberg and others can be learned from the input. We are the only species that has language, so there must be something unique about humans that makes language learning possible. The extent of crosslinguistic diversity and the considerable individual differences in the rate, style and outcome of acquisition suggest that it is more promising to think in terms of a languagemaking capacity, i.e., a set of domain-general abilities, rather than an innate body of knowledge about the structural properties of the target system.

**Keywords: Universal Grammar, language universals, poverty of the stimulus, convergence, individual differences, language acquisition, construction grammar, linguistic nativism**

# **Introduction**

The Universal Grammar (UG) hypothesis—the idea that human languages, as superficially diverse as they are, share some fundamental similarities, and that these are attributable to innate principles unique to language: that deep down, there is only one human language (Chomsky, 2000a, p. 7)—has generated an enormous amount of interest in linguistics, psychology, philosophy, and other social and cognitive sciences. The predominant approach in linguistics for almost 50 years (Smith, 1999, p. 105: described it as "unassailable"), it is now coming under increasing criticism from a variety of sources. In this paper, I provide a critical assessment of the UG approach. I argue that there is little agreement on what UG actually is; that the arguments for its existence are either irrelevant, circular,

#### *Edited by:*

*Umberto Ansaldo, University of Hong Kong, China*

#### *Reviewed by:*

*Randi Martin, Rice University, USA Nicholas D. Evans, Australian National University, Australia*

*\*Correspondence:*

*Ewa Da*˛*browska, Department of Humanities, Faculty of Arts, Design and Social Sciences, Northumbria University, Newcastle upon Tyne NE1 8ST, UK ewa.dabrowska@northumbria.ac.uk*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 19 February 2015 Accepted: 08 June 2015 Published: 23 June 2015*

#### *Citation:*

*Da*˛*browska E (2015) What exactly is Universal Grammar, and has anyone seen it? Front. Psychol. 6:852. doi: 10.3389/fpsyg.2015.00852* or based on false premises; and that there are fundamental problems with the way its proponents address the key questions of linguistic theory.

# **What Exactly is UG?**

Universal Grammar is usually defined as the "system of categories, mechanisms and constraints shared by all human languages and considered to be innate" (O'Grady et al., 1996, p. 734; cf. also Chomsky, 1986, p. 3, 2007, p. 1; Pesetsky, 1999, p. 476). These are generally thought to include formal universals (e.g., principles, i.e., general statements which specify the constraints on the grammars of human languages, and parameters, which specify the options for grammatical variation between languages) as well as substantive universals (e.g., lexical categories and features). There is very little agreement, however, on what these actually are.

Chomsky (1986) sees UG as "an intricate and highly constrained structure" (p. 148) consisting of "various subsystems of principles" (p. 146). These include "X-bar theory, binding theory, Case theory, theta theory, bounding theory *. . .* and so forth – each containing certain principles with a limited degree of parametric variation. In addition there are certain overriding principles such as the projection principle, FI (full interpretation), and the principles of licensing*. . .* [UG also contains] certain concepts, such as the concept of domain *. . .* and the related notions of c-command and government" (p. 102). However, every major development in the theory since then was accompanied by very substantial revisions to the list of proposed universals. Thus the list of UG principles is quite different when we move to the Barriers period, and radically different in Minimalism (see below).

With respect to parameters, very few scholars have even attempted to give a reasonably comprehensive inventory of what these are. Two rare exceptions are Baker (2001), who discusses 10 parameters, and Fodor and Sakas (2004), who list 13. In both cases, the authors stress that the list is far from complete; but it is interesting to note that only three parameters occur on both lists (Tomasello, 2005; see also Haspelmath, 2007). There is no agreement even on approximately how many parameters there are: thus Pinker (1994, p. 112) claims that there are "only a few"; Fodor (2003, p. 734) suggests that there are "perhaps 20"; according to Roberts and Holmberg (2005, p. 541), the correct figure is probably "in the region of 50–100." However, if, following Kayne (2005), we assume that there is a parameter associated with every functional element, the number of parameters must be considerably larger than this. Cinque and Rizzi (2008), citing Heine and Kuteva's (2002) work on grammaticalization targets, estimate that there are about 400 functional categories. According to Shlonsky (2010, p. 424), even this may be a low estimate. Shlonsky (2010) also suggests that "[e]very feature is endowed with its own switchboard, consisting of half a dozen or so binary options" (p. 425), which implies that there are thousands of parameters.

Things are no better when we consider substantive universals. While most generative linguists agree that the inventory of lexical categories includes N, V, and A, there is little agreement on what the functional categories are (see Newmeyer, 2008; Corbett, 2010; Pullum and Tiede, 2010; Boeckx, 2011). Newmeyer (2008)surveys some of the relevant literature and concludes:

"There is no way to answer this question that would satisfy more than a small number of generativists. It seems fair to say that categories are proposed for a particular language when they appear to be needed for that language, with little thought as to their applicability to the grammar of other languages. My guess is that well over two hundred have been put forward in current work in the principles-andparameters tradition." (p. 51)

The situation, Newmeyer (2008) observes, is even less clear when it comes to features:

"Even more than for categories, features tend to be proposed ad hoc in the analysis of a particular language when some formal device is needed to distinguish one structure (or operation on a particular structure) from another. As a result, supplying even a provisional list of what the set of universal distinctive syntactic features might be seems quite hopeless." (p. 53)

Thus, some linguists see UG as a very elaborate structure, consisting of a large number of principles, parameters, and categories. At the other extreme, we have the strong minimalist thesis, according to which UG may comprise just the structurebuilding operation Merge (cf. Chomsky, 2004, 2012; Berwick et al., 2011). It seems that the only point of agreement amongst proponents of UG is that it exists; they do not agree on what it actually contains. What evidence, then, is there for the existence of specifically linguistic innate knowledge? I turn to this question in the next section.

# **Arguments for UG**

Over the years, a number of arguments have been put forward in support of the UG hypothesis. These include the following:


Arguments 1–4 are generally regarded as the most powerful ones; 5–10 are subsidiary in the sense they only provide support for the idea of innateness of language general, rather than the innateness of a specific aspect of linguistic organization, and they are also open to other interpretations. I begin by evaluating the subsidiary arguments, and then move on to the more powerful ones.

## **Species Specificity**

"To say that language is not innate is to say that there is no difference between my granddaughter, a rock and a rabbit. In other words, if you take a rock, a rabbit and my granddaughter and put them in a community where people are talking English, they'll all learn English. If people believe that, then they believe that language is not innate. If they believe that there is a difference between my granddaughter, a rabbit, and a rock, then they believe that language is innate." (Chomsky, 2000b, p. 50)

Clearly, there is something unique about human biological make-up that makes it possible for humans, and only humans, to acquire language. However, *nobody* disputes this, so in the passage quoted above Chomsky is fighting a straw man. The crucial question is whether the relevant knowledge or abilities are language-specific or whether they can be attributed to more general cognitive processes—and this is far from clear.

There are a number of other characteristics which appear to be specific to our species. These include collaboration, cultural learning, the use of complex tools, and—surprisingly—the use of pointing and others means of drawing attention to particular features of the immediate environment, such as holding objects up for others to see.<sup>1</sup> This suggests there may be a more fundamental difference between humans and the rest of the animal kingdom. As Tomasello et al. (2005) put it, "saying that only humans have language is like saying that only humans build skyscrapers, when the fact is that only humans (among primates) build freestanding shelters at all" (p. 690). Tomasello et al. (2005) argue that language is a consequence of the basic human ability to recognize others' communicative intentions and to engage in joint attention, which also underlies other cultural achievements.

The ability to read and share intentions, including communicative intentions—i.e., theory of mind in the broad sense—is important for language for two reasons. First, it enables the language learner to understand what language is *for*: an animal that did not understand that other individuals have beliefs and intentions different from its own would have little use for language. Secondly, it provides the learner with a vital tool for learning language. In order to learn a language, one must acquire a set of form-meaning conventions; and to acquire these, learners must be able to guess at least some of the meanings conveyed by the utterances they hear.

The human ability to read and share intentions may not explain subjacency effects—the existence of other differences between humans and other species does not entail lack of UG, just as species specificity does not entail its existence. The point is that arguments for the innateness of language in a general sense (what Scholz and Pullum, 2002 call "general nativism") do not constitute arguments for the innateness of UG ("linguistic nativism") if UG is taken to be a specific body of linguistic knowledge. In other words, the fact that we are the only species that has language does not entail that we have innate knowledge of subjacency.

### **Ease and Speed of Child Language Acquisition**

It has been often suggested that children acquire grammatical systems of enormous complexity rapidly and effortlessly on the basis of very little evidence, and by "mere exposure," that is to say, without explicit teaching (see, for example, Chomsky, 1962, p. 529, 1976, p. 286, 1999; Guasti, 2002, p. 3). In fact, they get vast amounts of language experience. If we assume that language acquisition begins at age 1 and ends at age 5 and that children are exposed to language for 8 h a day, they get 11680 h of exposure (4 *×* 365 *×* 8 = 11680). At 3600 input words per hour (the average number of words heard by the children in the Manchester corpus),<sup>2</sup> this amounts to over 42 million words over 4 years.

Note that this is a rather conservative estimate: we know that language development begins before age 1 (Jusczyk, 1997; Karmiloff and Karmiloff-Smith, 2001) and continues throughout childhood and adolescence (Nippold, 1998; Berman, 2004, 2007; Nippold et al., 2005; Kaplan and Berman, 2015); moreover, children are exposed to language—through utterances directed to them, utterances directed to other people present, radio and television, and later school, reading and the internet almost every waking hour of their lives.

Furthermore, we know that "mere exposure" is not enough—as demonstrated by studies of hearing children of deaf parents (Todd and Aitchison, 1980; Sachs et al., 1981; see also Da˛browska, 2012, for some observations on the effects of the quality of the input). Consider, for example, Jim—one of children studied by Sachs et al. (1981). In early childhood, Jim had very little contact with hearing adults but watched television quite frequently, and occasionally played with hearing children. His parents used sign language when addressing each other, but did not sign to the children. At age 3;9 (3 years and 9 months)—the beginning of the study—Jim had very poor comprehension of spoken language, and severe articulation problems. His utterances were very short, with an MLU (mean length of utterance) of 2.9—typical for a child aged about 2;9.

<sup>1</sup>Our nearest relatives, the great apes, do not point and do not understand pointing gestures (Tomasello, 1999; Tomasello et al., 2005). Dogs do understand human pointing, which is believed to be a consequence of domestication (Hare et al., 2002); they do not, however, use pointing gestures themselves. "Pointing" dogs do not intentionally point things out to others: they merely look at the game, enabling the human hunter to follow their line of sight.

<sup>2</sup>The Manchester corpus is described in Theakston et al. (2001) and is available from CHILDES (MacWhinney, 1995).

He had low use of grammatical morphemes, producing them in only 37% of obligatory contexts, while MLU-matched controls supplied them 64–81% of the time; and many of his utterances had clearly deviant syntax (*My mommy my house e play ball; House e chimney my house e my chimney*). And, interestingly, although he was exposed to ASL at home, he did not sign. Jim's spoken language improved rapidly once he began interacting with adults on a one-on-one basis, and by age 6;11, he performed above age level on most measures—showing that he was not language impaired. Thus, although he was exposed to both spoken English (through television and occasional interaction with other children) and to ASL (though observing his parents), Jim did not acquire either language until he was given an opportunity to interact with competent users.

#### **Uniformity**

Some researchers (e.g., Stromswold, 2000; Guasti, 2002) have suggested that children acquire language in a very similar manner, going through the same stages at approximately the same ages, in spite of the fact that they are exposed to different input. Stromswold (2000), for instance, observes that

"Within a given language, the course of language acquisition is remarkably uniform*. . .*. Most children say their first referential words at 9 to 15 months*. . .* and for the next 6-8 months, children typically acquire single words fairly slowly until they have acquired approximately 50 words*. . .*. Once children have acquired 50 words, their vocabularies often increase rapidly*. . .*. At around 18 to 24 months, children learning morphologically impoverished languages such as English begin combining words to form two-word utterances*. . .*. Children acquiring such morphologically impoverished languages gradually begin to use sentences longer than two words; but for several months their speech often lacks phonetically unstressed functional category morphemes such as determiners, auxiliary verbs, and verbal and nominal inflectional endings *. . .*. Gradually, omissions become rarer until children are between three and four years old, at which point the vast majority of Englishspeaking children's utterances are completely grammatical." (p. 910)

This uniformity, Stromswold argues, indicates that the course of language acquisition is strongly predetermined by an innate program.

There are several points to be made in connection with this argument. First, many of the similarities that Stromswold mentions are not very remarkable: we do not need UG to explain why children typically (though by no means always) produce single word utterances before they produce word combinations, or why frequent content words are acquired earlier than function words. Secondly, the age ranges she gives (e.g., 9–15 months for first referential words) are quite wide: 6 months is a very long time for an infant. Thirdly, the passage describes *typical* development, as evidenced by qualifiers like "most children," "typically," "often"—so the observations are not true of all children. Finally, by using qualifiers like "within a given language" and limiting her observations to "children acquiring morphologically impoverished languages" Stromswold implicitly concedes the existence of crosslinguistic differences. These are quite substantial: children acquiring different languages have to rely on different cues, and this results in different courses of development (Bavin, 1995; Jusczyk, 1997; Lieven, 1997); and they often acquire "the same" constructions at very different ages. For example, the passive is acquired quite late by English speaking children—typically (though by no means always—see below) by age 4 or 5, and even later—by about 8—by Hebrew-speaking children (Berman, 1985). However, children learning languages in which the passive is more frequent and/or simpler master this construction much earlier—by about 2;8 in Sesotho (Demuth, 1989) and as early as 2;0 in Inuktitut (Allen and Crago, 1996).

Even within the same language, contrary to Stromswold's claims, there are vast individual differences both in the rate and course of language development (Bates et al., 1988; Richards, 1990; Shore, 1995; Goldfield and Snow, 1997; Peters, 1997; Huttenlocher, 1998). Such differences are most obvious, and easiest to quantify, in lexical development. The comprehension vocabularies of normally developing children of the same age can differ tenfold or more (Benedict, 1979; Goldfield and Reznick, 1990; Bates et al., 1995). There are also very large differences in the relationship between a child's expressive and receptive vocabulary early in development: some children are able to understand over 200 words before they start producing words themselves, while others are able to produce almost all the words they know (Bates et al., 1995). Children also differ with regard to the kinds of words they learn in the initial stages of lexical development. "Referential" children initially focus primarily on object labels (i.e., concrete nouns), while "expressive" children have more varied vocabularies with more adjectives and verbs and some formulaic phrases such as *thank you*, *not now*, *you're kidding*, *don't know* (Nelson, 1973, 1981). Last but not least, there are differences in the pattern of growth. Many children do go through the "vocabulary spurt" that Stromswold alludes to some time between 14 and 22 months, but about a quarter do not, showing a more gradual growth pattern with no spurt (Goldfield and Reznick, 1990).

Grammatical development is also far from uniform. While some children begin to combine words as early as 14 months, others do not do so until after their second birthday (Bates et al., 1995), with correspondingly large differences in MLU later in development—from 1.2 to 5.0 at 30 months (Wells, 1985). Some children learn to inflect words before they combine them into larger structures, while others begin to combine words before they are able to use morphological rules productively (Smoczyńska, 1985, p. 618; Thal et al., 1996). Some children are very cautious learners who avoid producing forms they are not sure about, while others are happy to generalize on the basis of very little evidence. This results in large differences in error rates (Maratsos, 2000). Considerable individual differences have also been found in almost every area of grammatical development where researchers have looked for them, including word order (Clark, 1985), case marking (Da˛browska and Szczerbiński, 2006), the order of emergence of grammatical morphemes (Brown, 1973), auxiliary verbs (Wells, 1979; Richards, 1990; Jones, 1996), questions (Gullo, 1981; Kuczaj and Maratsos, 1983; de Villiers and de Villiers, 1985), passives (Horgan, 1978; Fox and Grodzinsky, 1998), and multiclause sentences (Huttenlocher et al., 2002).

Children also differ in their learning "styles" (Peters, 1977; Nelson, 1981; Peters and Menn, 1993). "Analytic" (or "referential") children begin with single words, which they articulate reasonably clearly and consistently. "Holistic" (or "expressive") children, on the other hand, begin with larger units which have characteristic stress and intonation patterns, but which are often pronounced indistinctly, and sometimes consist partly or even entirely of filler syllables such as [dadada]. Peters (1977) argues that holistic children attempt to approximate the overall shape of the target utterance while analytic children concentrate on extracting and producing single words. These different starting points determine how the child "breaks into" grammar, and therefore have a substantial effect on the course of language development. Analytic children must learn how to combine words to form more complex units. They start by putting together content words, producing telegraphic utterances such as *there doggie* or *doggie eating*. Later in development they discover that different classes of content words require specific function words and inflections (nouns take determiners, verbs take auxiliaries, and tense inflections, etc.), and gradually learn to supply these. Holistic children, in contrast, must segment their rote-learned phrases and determine how each part contributes to the meaning of the whole. Unlike analytic children, they sometimes produce grammatical morphemes very early in acquisition, embedded in larger unanalysed or only partially analyzed units; or they may use filler syllables as place-holders for grammatical morphemes. As their systems develop, the fillers gradually acquire more phonetic substance and an adult-like distribution, and eventually evolve into function words of the target language (Peters and Menn, 1993; Peters, 2001). Thus, while both groups of children eventually acquire similar grammars, they get there by following different routes.<sup>3</sup>

## **Maturational Effects**

Language acquisition is sometimes claimed to be "highly sensitive to maturational factors" and "surprisingly insensitive to environmental factors" (Fodor, 1983, p. 100; see also Gleitman, 1981; Crain and Lillo-Martin, 1999; Stromswold, 2000), which, these researchers suggest, indicates that the language faculty develops, or matures, according to a biologically determined timetable.

The claim that language acquisition is insensitive to environmental factors is simply incorrect, as demonstrated by the vast amount of research showing that both the amount and quality of input have a considerable effect on acquisition—particularly for vocabulary, but also for grammar (e.g., Huttenlocher, 1998; Huttenlocher et al., 2002; Ginsborg, 2006; Hoff, 2006). There is no doubt that maturation also plays a very important role—but this could be due to the development of the cognitive prerequisites for language (Slobin, 1973, 1985; Tomasello, 2003) rather than the maturation of the language faculty. Likewise, while it is possible that critical/sensitive period effects are due to UG becoming inaccessible at some point in development, they could also arise as a result of older learners' greater reliance on declarative memory (Ullman, 2006), developmental changes in working memory capacity (Newport, 1990), or entrenchment of earlier learning (Elman et al., 1996; MacWhinney, 2008). Thus, again, the existence of maturational effects does not entail the existence of an innate UG: they are, at best, an argument for general innateness, not linguistic innateness.

### **Dissociations between Language and Cognition**

A number of researchers have pointed out that some individuals (e.g., aphasics and children with Specific Language Impairment) show severe language impairment and relatively normal cognition, while others (e.g., individuals with Williams syndrome (WS), or Christopher, the "linguistic savant" studied by Smith and Tsimpli, 1995) show the opposite pattern: impaired cognition but good language skills. The existence of such a double dissociation suggests that language is not part of "general cognition"—in other words, that it depends at least in part on a specialized linguistic "module."

The existence of double dissociations in adults is not particularly informative with regard to the innateness issue, however, since modularization can be a result of development (Paterson et al., 1999; Thomas and Karmiloff-Smith, 2002); hence, the fact that language is relatively separable in adults does not entail innate linguistic knowledge. On the other hand, the developmental double dissociation between specific language impairment (SLI) and WS, is, on the face of it, much more convincing. There are, however, several reasons to be cautious in drawing conclusions from the observed dissociations.

First, there is growing evidence suggesting that WS language is impaired, particularly early in development (Karmiloff-Smith et al., 1997; Brock, 2007; Karmiloff-Smith, 2008). Children with WS begin talking much later than typically developing children, and their language develops along a different trajectory. Adolescents and adults with WS show deficits in all areas of language: syntax (Grant et al., 2002), morphology (Thomas et al., 2001), phonology (Grant et al., 1997), lexical knowledge (Temple et al., 2002), and pragmatics (Laws and Bishop, 2004). Secondly, many, perhaps all, SLI children have various non-linguistic impairments (Leonard, 1998; Tallal, 2003; Lum et al., 2010)—making the term *Specific* Language Impairment something of a misnomer. Thus the dissociation is, at best, partial: older WS children and adolescents have relatively good language in spite of a severe cognitive deficit; SLI is a primarily linguistic impairment.

More importantly, it is debatable whether we are really dealing with a double dissociation in this case. Early reports of the double dissociation between language and cognition in Williams and SLI were based on indirect comparisons between the two populations. For instance, Pinker (1999) discusses a study conducted by Bellugi et al. (1994), which compared WS and Down's syndrome adolescents and found that the former have much better language skills, and van der Lely's work on somewhat younger children with SLI (van der Lely, 1997; van der Lely and Ullman, 2001), which found that SLI children perform less well than typically developing children. However, a study which compared the two

<sup>3</sup> It should be emphasized that these styles are idealizations. Most children use a mixture of both strategies, although many have a clear preference for one or the other.

populations directly (Stojanovik et al., 2004) suggests rather different conclusions. Stojanovik et al. (2004) gave SLI and WS children a battery of verbal and non-verbal tests. As expected, the SLI children performed much better than the WS children on all non-verbal measures. However, there were no differences between the two groups on the language tests—in fact, the SLI children performed slightly better on some measures, although the differences were not statistically significant. Clearly, one cannot argue that language is selectively impaired in SLI and intact in WS if we find that the two populations' performance on the same linguistic tests is indistinguishable.

To summarize: There is evidence of a partial dissociation in SLI children, who have normal IQ and below-normal language—and, as pointed out earlier, a variety of non-linguistic impairments which may the underlying cause of their linguistic deficit. There is, however, no evidence for a dissociation in Williams syndrome: WS children's performance on language tests is typically appropriate for their mental age, and well below their chronological age.

## **Neurological Separation**

The fact that certain parts of the brain—specifically, the perisylvian region including Broca's area, Wernicke's area and the angular gyrus—appear to be specialized for language processing has led some researchers (e.g., Pinker, 1995; Stromswold et al., 1996; Stromswold, 2000, p. 925; Musso et al., 2003) to speculate that they may constitute the neural substrate for UG. Intriguing though such proposals are, they face a number of problems. First, the language functions are not strongly localized: many other areas outside the classical "language areas" are active during language processing; and, conversely, the language areas may also be activated during non-linguistic processing (Stowe et al., 2005; Anderson, 2010; see, however, Fedorenko et al., 2011). More importantly, studies of neural development clearly show that the details of local connectivity in the language areas (as well as other areas of the brain) are not genetically specified but emerge as a result of activity and their position in the larger functional networks in the brain (Elman et al., 1996; Müller, 2009; Anderson et al., 2011; Kolb and Gibb, 2011). Because of this, human brains show a high amount of plasticity, and other areas of the brain can take over if the regions normally responsible for language are damaged. In fact, if the damage occurs before the onset of language, most children develop normal conversational skills (Bates et al., 1997; Aram, 1998; Bates, 1999; Trauner et al., 2013), although language development is often delayed (Vicari et al., 2000), and careful investigations do sometimes reveal residual deficits in more complex aspects of language use (Stiles et al., 2005; Reilly et al., 2013). Lesions sustained in middle and late childhood typically leave more lasting deficits, although these are relatively minor (van Hout, 1991; Bishop, 1993; Martins and Ferro, 1993). In adults, the prospects are less good, but even adults typically show some recovery (Holland et al., 1996), due partly to regeneration of the damaged areas and partly to shift to other areas of the brain, including the right hemisphere (Karbe et al., 1998; Anglade et al., 2014). Thus, while the neurological evidence does suggest that certain areas of the brain are particularly well-suited for language processing, there is no evidence that these regions actually contain a genetically specified preprint blueprint for grammar.

# **Language Universals**

Generative linguists have tended to downplay the differences between languages and emphasize their similarities. In Chomsky's (2000a) words,

"*. . .* in their essential properties and even down to fine detail, languages are cast to the same mold. The Martian scientist might reasonably conclude that there is a single human language, with differences only at the margins." (p. 7)

Elsewhere (Chomsky, 2004, p. 149) he describes human languages as "essentially identical." Stromswold (1999) expresses virtually the same view:

"In fact, linguists have discovered that, although some languages seem, superficially, to be radically different from other languages *. . .*, in essential ways all human languages are remarkably similar to one another." (p. 357)

This view, however, is not shared by most typologists (cf. Croft, 2001; Haspelmath, 2007; Evans and Levinson, 2009). Evans and Levinson (2009), for example, give counterexamples to virtually all proposed universals, including major lexical categories, major phrasal categories, phrase structure rules, grammaticalized means of distinguishing between subjects and objects, use of verb affixes to signal tense and aspect, auxiliaries, anaphora, and WH movement, and conclude that

"*. . .*.languages differ so fundamentally from one another at every level of description (sound, grammar, lexicon, meaning) that it is very hard to find any single structural property they share. The claims of Universal Grammar *. . .* are either empirically false, unfalsifiable or misleading in that they refer to tendencies rather than strict universals." (p. 429)

Clearly, there is a fundamental disagreement between generative linguists like Chomsky and functionalists like Evans and Levinson (2009). Thus, it is misleading to state that "linguists have discovered that *. . .* in essential ways all human languages are remarkably similar to one another"; it would have been more accurate to prefix such claims with a qualifier such as "some linguists think that*. . .*."

One reason for the disagreement is that generative and functional linguists have a very different view of language universals. For the functionalists, universals are inductive generalizations about observable features of language, discovered by studying a large number of unrelated languages—what some people call descriptive, or "surface" universals. The generativists' universals, on the other hand, are cognitive or "deep" universals, which are highly abstract and cannot be derived inductively from observation of surface features. As Smolensky and Dupoux (2009) argue in their commentary on Evans and Levinson's paper,

"Counterexamples to des-universals are not counterexamples to cog-universals *. . .* a hypothesised cog-universal can only be falsified by engaging the full apparatus of the formal theory." (p. 468)

This is all very well—but how exactly do we "engage the full apparatus of the formal theory"? The problem with deep universals is that in order to evaluate them, one has to make a number of subsidiary (and often controversial) assumptions which in turn depend on further assumptions—so the chain of reasoning is very long indeed (cf. Hulst, 2008; Newmeyer, 2008). This raises obvious problems of falsifiability. Given that most deep universals are parameterized, that they may be parameterized "invisibly," and that some languages have been argued to be exempt from some universals (cf. Newmeyer, 2008), it is not clear what would count as counterevidence for a proposed universal.

The issue is particularly problematic for substantive universals. The predominant view of substantive universals (lexical categories, features, etc.,) is that they are part of UG, but need not be used by all languages: in other words, UG makes available a list of categories, and languages "select" from this list. But as Evans and Levinson (2009) point out,

"*. . .* the claim that property X is a substantive universal cannot be falsified by finding a language without it, because the property is not required in all of them. Conversely, suppose we find a new language with property Y, hitherto unexpected: we can simply add it to the inventory of substantive universals*. . .*. without limits on the toolkit, UG is unfalsifiable." (p. 436)

Apart from issues of falsifiability, the fact that deep universals are theory internal has another consequence, nicely spelled out by Tomasello (1995):

"Many of the Generative Grammar structures that are found in English can be found in other languages—if it is generative grammarians who are doing the looking. But these structures may not be found by linguists of other theoretical persuasions because these structures are defined differently, or not recognised at all, in other linguistic theories." (p. 138)

In other words, deep universals may exist—but they cannot be treated as evidence for the theory, because they are assumed by the theory.

Returning to the more mundane, observable surface universals: although absolute universals are very hard to find, there is no question that there are some very strong universal tendencies, and these call for an explanation. Many surface universals have plausible functional explanations (Comrie, 1983; Hawkins, 2004; Haspelmath, 2008). It is also possible that they derive from a shared protolanguage or that they are in some sense "innate," i.e., that they are part of the initial state of the language faculty—although existing theories of UG do not fare very well in explaining surface universals (Newmeyer, 2008).

Generative linguists' focus on universals has shifted attention from what may be the most remarkable property of human languages—their diversity. Whatever one's beliefs about UG and the innateness hypothesis, it is undeniable that some aspects of our knowledge—the lexicon, morphological classes, various idiosyncratic constructions, i.e., what generative linguists sometimes refer to as the "periphery"—must be learned, precisely because they are idiosyncratic and specific to particular languages. These aspects of our linguistic knowledge are no less complex (in fact, in some cases considerably more complex) than the phenomena covered by "core" grammar, and mastering them requires powerful learning mechanisms. It is possible, then, that the cognitive mechanisms necessary to learn about the periphery may suffice to learn core grammar as well (Menn, 1996; Culicover, 1999; Da˛browska, 2000a).

### **Convergence**

"*. . .* it is clear that the language each person acquires is a rich complex construction hopelessly underdetermined by the fragmentary evidence available [to the learner]. Nevertheless individuals in a speech community have developed essentially the same language. This fact can be explained only on the assumption that these individuals employ highly restrictive principles that guide the construction of the grammar." (Chomsky, 1975, p. 11)

"The set of utterances to which any child acquiring a language is exposed is equally compatible with many distinct descriptions. And yet children converge to a remarkable degree on a common grammar, with agreement on indefinitely many sentences that are novel. Mainly for this reason, Chomsky proposed that the child brings prior biases to the task." (Lidz and Williams, 2009, p. 177)

"The explanation that is offered must also be responsive to other facts about the acquisition process; in particular, the fact that every child rapidly converges on a grammatical system that is equivalent to everyone else's, despite a considerable latitude in linguistic experience – indeed, without any relevant experience in some cases. Innate formal principles of language acquisition are clearly needed to explain these basic facts." (Crain et al., 2009, p. 124)

As illustrated by these passages, the (presumed) fact that language learners converge on the same grammar despite having been exposed to different input is often regarded as a powerful argument for an innate UG. It is interesting to note that all three authors quoted above simply assume that learners acquire essentially the same grammar: the convergence claim is taken as self-evident, and is not supported with any evidence. However, a number of recent studies which have investigated the question empirically found considerable individual differences in how much adult native speakers know about the grammar of their language, including inflectional morphology (Indefrey and Goebel, 1993; Da˛browska, 2008), a variety of complex syntactic structures involving subordination (Da˛browska, 1997, 2013; Chipere, 2001, 2003), and even simpler structures such as passives and quantified noun phrases (Da˛browska and Street, 2006; Street, 2010; Street and Da˛browska, 2010, 2014; for recent reviews, see Da˛browska, 2012, 2015).

For example, Street and Da˛browska (2010) tested adult native English speakers' comprehension of simple sentences with universal quantifiers such as (1–2) and unbiased passives (3); the corresponding actives (4) were a control condition.


Participants listened to each test sentence and were asked to select the matching picture from an array of two. For the quantifier sentences the pictures depicted objects and containers in partial one-to-one correspondence (e.g., three mugs, each with a toothbrush in it plus an extra toothbrush; three mugs, each with a toothbrush in it plus an extra mug). For actives and passives, the pictures depicted a transitive event (e.g., a girl hugging a boy and a boy hugging a girl).

Experiment 1 tested two groups, a high academic attainment (HAA) group, i.e., postgraduate students, and a low academic attainment (LAA) group, who worked as shelf-stackers, packers, assemblers, or clerical workers and who had no more than 11 years of formal education. The HAA participants consistently chose the target picture in all four conditions. The LAA participants were at ceiling on actives, 88% correct on passives, 78% on simple locatives with quantifiers, and 43% correct (i.e., at chance) on possessive locatives with quantifiers. The means for the LAA group mask vast differences between participants: individual scores in this group ranged from 0 to 100% for the quantifier sentences and from 33 to 100% for passives.

Street and Da˛browska argue that the experiment reveals differences in linguistic knowledge (competence), not performance, pointing out that the picture selection task has minimal cognitive demands (and can be used with children as young as 2 to test simpler structures); moreover, all participants, including the LAA group, were at ceiling on active sentences, showing that they had understood the task, were cooperative, etc. (For further discussion of this issue, see Da˛browska, 2012.)

Experiment 2 was a training study. LAA participants who had difficulty with all three of the experimental constructions (i.e., those who scored no more than 4 out of 6 correct on each construction in the pre-test) were randomly assigned to either a passive training group or a quantifier training group. The training involved an explicit explanation of the target construction followed by practice with feedback. Subsequently, participants were given a series of post-tests: immediately after training, a week later, and 12 weeks after training. The results revealed that performance improved dramatically after training, but only on the construction trained, and that the effects of training were long-lasting—that is to say, the participants performed virtually at ceiling even on the last post-test. This indicates that the participants were not language impaired, and that their poor performance on the pre-test is attributable to lack of knowledge rather than failure to understand the instructions or to cooperate with the experimenter.

The existence of individual differences in linguistic attainment is not, of course, incompatible with the existence of innate predispositions and biases. In fact, we know that differences in verbal ability are heritable (Stromswold, 2001; Misyak and Christiansen, 2011), although it is clear that environmental factors also play an important role (see Da˛browska, 2012). However, the Street and Da˛browska experiments as well as other studies mentioned earlier in this section suggest that the convergence argument is based on a false premise. Native speakers do not converge on the same grammar: there are, in fact, considerable differences in how much speakers know about some of the basic constructions of their native language.

# **Poverty of the Stimulus and Negative Evidence**

The most famous, and most powerful, argument for UG is the poverty of the stimulus argument: the claim that children have linguistic knowledge which could not have been acquired from the input which is available to them:

"*. . .*every child comes to know facts about the language for which there is no decisive evidence from the environment. In some cases, there appears to be no evidence at all." (Crain, 1991)

"People attain knowledge of the structure of their language for which no evidence is available in the data to which they are exposed as children." (Hornstein and Lightfoot, 1981, p. 9)

"Universal Grammar provides representations that support deductions about sentences that fall outside of experience*. . .*. These abstract representations drive the language learner's capacity to project beyond experience in highly specific ways." (Lidz and Gagliardi, 2015)

The textbook example of the poverty of the stimulus is the acquisition of the auxiliary placement rule in English Y/N questions (see, for example, Chomsky, 1972, 2012; Crain, 1991; Lasnik and Uriagereka, 2002; Berwick et al., 2011). On hearing pairs of sentences such as (5a) and (5b) a child could infer the following rule for deriving questions:

Hypothesis A: Move the auxiliary to the beginning of the sentence.

However, such a rule would incorrectly derive (6b), although the only grammatical counterpart of (6a) is (6c).


In order to acquire English, the child must postulate a more complex, structure dependent rule:

Hypothesis B: Move the first auxiliary after the subject to the beginning of the sentence.

Crucially, the argument goes, children never produce questions such as (6b), and they know that such sentences are ungrammatical; furthermore, it has been claimed that they know this without ever being exposed to sentences like (6c) (see, for example, Piattelli-Palmarini, 1980, p. 40, pp. 114–115; Crain, 1991).

A related issue, sometimes conflated with poverty of the stimulus, is lack of negative evidence. Language learners must generalize beyond the data that they are exposed to, but they must not generalize too much. A learner who assumed an overly general grammar would need negative evidence—evidence that some of the sentences that his or her grammar generates are ungrammatical—to bring the grammar in line with that of the speech community. Since such evidence is not generally available, learners' generalizations must be constrained by UG (Baker, 1979; Marcus, 1993).

Let us begin with the negative evidence problem. Several observations are in order. First, while parents do not reliably correct their children's errors, children do get a considerable amount of *indirect* negative evidence in the form of requests for clarification and adult reformulations of their erroneous utterances. Moreover, a number of studies have demonstrated that children understand that requests for clarification and recasts are negative evidence, and respond appropriately, and that corrective feedback results in improvement in the grammaticality of child speech (Demetras et al., 1986; Saxton et al., 1998; Saxton, 2000; Chouinard and Clark, 2003). Negative evidence can also be inferred from absence of positive evidence: a probabilistic learner can distinguish between accidental non-occurrence and a nonoccurrence that is statistically significant, and infer that the latter is ungrammatical (Robenalt and Goldberg, in press; Scholz and Pullum, 2002, 2006; Stefanowitsch, 2008).

Secondly, as Cowie (2008) points out, the acquisition of grammar is not the only area where we have to acquire knowledge about what is not permissible without the benefit of negative evidence. We face exactly the same problem in lexical learning and learning from experience generally: few people have been explicitly told that custard is not icecream, and yet somehow they manage to learn this. Related to this, children do make overgeneralization errors—including morphological overgeneralizations like *bringed* and *gooder* and overgeneralizations of various sentence level constructions (e.g., *I said her no*, *She giggled me*), and they do recover from them (cf. Bowerman, 1988). Thus, the question isn't "What sort of innate constraints must we assume to prevent children from overgeneralizing?" but rather "How do children recover from overgeneralization errors?"—and there is a considerable amount of research addressing this very issue (see, for example, Brooks and Tomasello, 1999; Brooks et al., 1999; Tomasello, 2003; Ambridge et al., 2008, 2009, 2011; Boyd and Goldberg, 2011).

Let us return to the poverty of the stimulus argument. The structure of the argument may be summarized as follows:


As with any deductive argument, the truth of the conclusion (4) depends on the validity of the argument itself and the truth of the premises. Strikingly, most expositions of the poverty of the stimulus argument in the literature do not take the trouble to establish the truth of the premises: it is simply assumed. In a wellknown critique of the POS argument, Pullum and Scholz (2002) analyze four linguistic phenomena (plurals inside compounds, anaphoric *one*, auxiliary sequences, auxiliary placement in Y/N questions) which are most often used to exemplify it, and show that the argument does not hold up: in all four cases, either the generalization that linguists assumed children acquired is incorrect or the relevant data is present in the input, or both. With respect to the auxiliary placement rule, for example, Pullum and Scholz (2002) estimate that by age 3, most children will have heard between 7500 and 22000 utterances that falsify the structure independent rule.

Lasnik and Uriagereka (2002) and others argue that Pullum and Scholz (2002) have missed the point: knowing that sentences like (6c) are grammatical does not entail that sentences like (6b) are not; and it does not tell the child how to actually form a question. They point out that "not even the fact that [6c] is grammatical proves that something with the effect of hypothesis B is correct (*and the only possibility* [my italics]), hence does not lead to adult knowledge of English" (Lasnik and Uriagereka, 2002; p. 148), and conclude that "children come equipped with a priori knowledge of language*. . .* because it is *unimaginable* [my italics] how they could otherwise acquire the complexities of adult language" (pp. 149–150).

Note that Lasnik and Uriagereka (2002) have moved beyond the original poverty of the stimulus argument. They are not arguing merely that a particular aspect of our linguistic knowledge must be innate because the relevant data is not available to learners (poverty of the stimulus); they are making a different argument, which Slobin (cited in Van Valin, 1994) refers to as the "argument from the poverty of the imagination": "I can't imagine how X could possibly be learned from the input; therefore, it must be innate." Appeals to lack of imagination are not very convincing, however. One can easily construct analogous arguments to argue for the opposite claim: "I can't imagine how X could have evolved (or how it could be encoded in the genes); therefore, it must be learned." Moreover, other researchers may be more imaginative.

# **The Construction Grammar Approach**

Lasnik and Uriagereka (2002) conclude their paper with a challenge to non-nativist researchers to develop an account of how grammar could be learned from positive evidence. The challenge has been taken up by a number of constructionist researchers (Tomasello, 2003, 2006; Da˛browska, 2004; Goldberg, 2006; for reviews, see Diessel, 2013; Matthews and Krajewski, 2015). Let us begin by examining how a constructionist might account for the acquisition of the auxiliary placement rule.

## **Case Study: The Acquisition of Y/N Questions by Naomi**

Consider the development of Y/N questions with the auxiliary *can* in one particular child, Naomi (see Da˛browska, 2000b, 2004, 2010a, also discussed data for two other children from the CHILDES database).<sup>4</sup> The first recorded questions with *can* appeared in Naomi's speech at age 1;11.9 (1 year, 11 months and 9 days) and were correctly inverted:


Seven days later there are some further examples, but this time the subject is left out, although it is clear from the context that the subject is Naomi herself:


In total, there are 56 tokens of this "permission formula" in the corpus, 25 with explicit subjects.

The early questions with *can* are extremely stereotypical: the auxiliary is always placed at the beginning of the sentence (there are no "uninverted" questions), and although the first person pronoun is often left out, the agent of the action is invariably Naomi herself. There are other interesting restrictions on her usage during this period. For example, in Y/N interrogatives with *can*, if she explicitly refers to herself, she always uses the pronoun *I* (25 tokens)—never her name. In contrast, in other questions (e.g., the formulas *What's Nomi do?*, *What's Nomi doing?*, and *Where's Nomi?*—45 tokens in total) she always refers to herself as *Nomi*. Furthermore, while she consistently inverts in first person questions with *can* and *could*, all the other Y/N questions with first person subjects are uninverted.

As the formula is analyzed, usage becomes more flexible. Two weeks after the original *can I. . .?* question, a variant appears with *could* instead of *can*:


Five weeks later, we get the first question with a subject other than *I*:

#### 2; 0.28 can you draw eyes?

The transcripts up to this point contain 39 questions with *can*, including 10 with explicit subjects.

So we see a clear progression from an invariant formula (*Can I get down?*) through increasingly abstract formulaic frames (*Can I* + ACTION? ABILITY VERB + *I* + ACTION?) to a fairly general constructional schema in which none of the slots is tied to particular lexical items (ABILITY VERB + PERSON + ACTION?).

Questions with other auxiliaries follow different developmental paths. Not surprisingly, the first interrogatives with *will* were requests (*will you ACTION?*); this was later generalized to questions about future actions, and to other agents (*will* PERSON ACTION?). The earliest interrogatives with *do* were offers of a specific object (*do you want THING?*). This was later generalized to *do you ACTION?*; but for a long time, Naomi used "*do* support" almost exclusively with second person subjects.

Thus, Naomi started with some useful formulas such as request for permission (*Can I ACTION?*), request that the addressee do something for her (*Will you ACTION?*), and offers of an object (*Do you want THING?*). These were gradually integrated into a network of increasingly general constructional schemas. The process is depicted schematically in **Figure 1**. The left hand side of the figure shows the starting point of development: formulaic phrases. The boxes in the second columns represent low-level schemas which result from generalizations over specific formulaic phrases. The schemas contain a slot for specifying the type of activity; this must be filled by a verb phrase containing a plain verb. The schemas in the third column are even more abstract, in that they contain two slots, one for the activity and one for the agent; they can be derived by generalizing over the low-level schemas. Finally, on the far right, we have a fully abstract Y/N question schema. The left-to-right organization of the figure represents the passage of time, in the sense that concrete schemas developmentally precede more abstract ones. However, the columns are not meant to represent distinct stages, since the generalizations are local: for example, Noami acquired the *Can NP VP?* schema about 6 months before she started to produce *Will you VP?* questions. Thus, different auxiliaries followed different developmental patterns, and, crucially, there is no evidence that she derived questions from structures with declarative-like word order at any stage, as auxiliaries in declaratives were used in very different ways. It is also important to note that the later, more abstract schemas probably do not replace the early lexically specific ones: there is evidence that the two continue to exist side by side in adult speakers (Langacker, 2000; Da˛browska, 2010b).

Da˛browska and Lieven (2005), using data from eight highdensity developmental corpora, show that young children's novel questions can be explained by appealing to lexically specific units which can be derived from the child's linguistic experience. Da˛browska (2014) argues that such units can also account for the vast majority of adult utterances, at least in informal conversation.

One might object that, since the slots in the formulas can be filled by words or phrases, this approach assumes that the child knows something about constituency. This is true; note, however, that constituency is understood differently in this framework: not as a characteristic of binary branching syntactic trees with labeled nodes, but merely an understanding that some combinations of words function as a unit when they fill a particular slot in a formula. In the constructionist approach, constituency is an emergent property of grammar rather than something that is present from the start, and it is sometimes fluid and variable (cf. Langacker, 1997). Constituency in this sense—i.e., hierarchical organization—is something that is a general property of many cognitive structures and is not unique to language.

#### **Understanding Language, Warts, and All**

Languages are shot through with patterns. The patterns exist at all levels: some are very general, others quite low-level. Languages are also shot through with idiosyncrasies: constructional idioms, lexical items which do not fit easily into any grammatical

<sup>4</sup>Naomi's linguistic development was recorded by Sachs (1983). The transcripts are available from the CHILDES database (MacWhinney, 1995).

class, irregular morphology. The generative program focuses on uncovering the deepest, most fundamental generalizations, and relegates the low-level patterns and idiosyncrasies—which are regarded as less interesting—to the periphery. But lowlevel patterns are a part of language, and a satisfactory theory of language must account for them as well as more general constructions.

Construction grammar began as an attempt to account for constructional idioms such as the *X-er the Y-er* (e.g., *The more the merrier*; *The bigger they come, the harder they fall*—see Fillmore et al., 1988) and *what's X doing Y?* (e.g., *What's this fly doing in my soup?*, *What are you doing reading my diary?*—see Kay and Fillmore, 1999). Such constructional idioms have idiosyncratic properties which are not predictable from general rules or principles, but they are productive: we can create novel utterances based on the schema. As construction grammar developed, it quickly became apparent that whatever mechanisms were required to explain low-level patterns could also account for highlevel patterns as a special case: consequently, as Croft (2001) put it, "the constructional tail has come to wag the syntactic dog" (p. 17). As suggested earlier, the same is true of acquisition: the learning mechanisms that are necessary to learn relational words can also account for the acquisition of more abstract constructions.

## **Back to Poverty of the Stimulus**

It is important to note that the way the poverty-of-the-stimulus problem is posed (e.g., "how does the child know that the auxiliary inside the subject cannot be moved?") presupposes a generative account of the phenomena (i.e., interrogatives are derived from declarative-like structures by moving the auxiliary). The problem does not arise in constructionist accounts, which do not assume movement.

More generally, generativist and constructionist researchers agree about the basic thrust of the POS argument: the child cannot learn about the properties of empty categories, constraints on extraction, etc., from the input. What they disagree about is the conclusion that is to be drawn from this fact. For generative researchers, the fact that some grammatical principles or notions are unlearnable entails that they must be part of an innate UG. Constructionist researchers, on the other hand, draw a completely different conclusion: if X cannot be learned from the input, then we need a better linguistic theory—one that does not assume such an implausible construct.

Thus, one of the basic principles of the constructionist approach is that linguists should focus on developing "childfriendly" grammars (Langacker, 1987, 1991, 2008; Goldberg, 2003; Tomasello, 2003, 2006; Da˛browska, 2004) rather than postulate an innate UG. Construction grammar attempts to capture all that speakers know about their language in terms of constructions—form-meaning pairings which can be simple or complex and concrete or partially or entirely schematic (i.e., they can contain one or more "slots" which can be elaborated by more specific units, allowing for the creation of novel expressions). Most construction grammar researchers also assume that children prefer relatively concrete, lexically-specific patterns which can be easily inferred from the input; more schematic patterns emerge later in development, as a result of generalization over the concrete units acquired earlier (Johnson, 1983; Da˛browska, 2000b; Tomasello, 2003, 2006; Diessel, 2004). Crucially, the mechanisms required to learn constructional schemas are also necessary to acquire relational terms such as verbs and prepositions (Da˛browska, 2004, 2009). Since we know that children are able to learn the meanings and selectional restrictions of verbs and prepositions, it follows that they are able to learn constructional schemas as well.

# **Conclusion**

As we have seen, contemporary views on what is or is not in UG are wildly divergent. I have also argued that, although many arguments have been put forward in favor of some kind of an innate UG, there is actually very little evidence for its existence: the arguments for the innateness of specific linguistic categories or principles are either irrelevant (in that they are arguments for general innateness rather than linguistic innateness), based on false premises, or circular.

Some generative linguists respond to criticisms of this kind by claiming that UG is an *approach* to doing linguistics rather than a specific hypothesis. For example, Nevins et al. (2009) in their critique of Everett's work on Pirahã, assert that

"The term Universal Grammar (UG), in its modern usage, was introduced as a name for the collection of factors that underlie the uniquely human capacity for language—whatever they may turn out to be *. . .*. There are many different proposals about the overall nature of UG, and continuing debate about its role in the explanation of virtually every linguistic phenomenon. Consequently, there is no general universal-grammar model for which [Everett's claims] could have consequences – only a wealth of diverse hypotheses about UG and its content." (p. 357)

This view contrasts sharply with other assessments of the UG enterprise. Chomsky (2000a), for instance, claims that the Principles and Parameters framework was "highly successful" (p. 8), that it "led to an explosion of inquiry into a very broad range of typologically diverse languages, at a level of depth not previously envisioned" (Chomsky, 2004, p. 11), and that it was "the only real revolutionary departure in linguistics maybe in the last several thousand years, much more so than the original work in generative grammar" (Chomsky, 2004, p. 148). If Nevins et al. (2009) are right in their assertion that the UG literature is no more than a collection of proposals which, as a set, do not make any specific empirical predictions about languages, then such triumphalist claims are completely unjustified.

Is it a fruitful approach? (Or perhaps a better question might be: Was it a fruitful approach?) It was certainly fruitful in the sense that it generated a great deal of debate. Unfortunately, it does not seem to have got us any closer to answers to the fundamental questions that it raised. One could regard the existing disagreements about UG as a sign of health. After all, debate is the stuff of scientific inquiry: initial hypotheses are often erroneous; it is by reformulating and refining them that we gradually get closer to the truth. However, the kind of development we see in UG theory is very different from what we see in the natural sciences. In the latter, the successive theories are gradual approximations to the truth. Consider an example discussed by Asimov (1989). People once believed that the earth is flat. Then, ancient Greek astronomers established that it was spherical. In the seventeenth century, Newton argued that it was an oblate spheroid (i.e., slightly squashed at the poles). In the twentieth century, scientists discovered that it is not a perfect oblate spheroid: the equatorial bulge is slightly bigger in the southern hemisphere. Note that although the earlier theories were false, they clearly approximated the truth: the correction in going from "sphere" to "oblate spheroid," or from "oblate spheroid" to "slightly irregular oblate spheroid" is much smaller than when going from "flat" to "spherical." And while "slightly irregular oblate spheroid" may not be entirely accurate, we are extremely unlikely to discover tomorrow that the earth is conical or cube-shaped. We do not see this sort of approximation in work in the UG approach: what we see instead is wildly different ideas being constantly proposed and abandoned. After more than half a century of intensive research we are no nearer to understanding what UG is than we were when Chomsky first used the term.

This lack of progress, I suggest, is a consequence of the way that the basic questions are conceptualized in the UG approach, and the strategy that it adopts in attempting to answer them. Let us consider a recent example. Berwick et al. (2011) list four factors determining the outcome of language acquisition:


They go on to assert that the goal of linguistic theory is to explain how these factors "conspire to yield human language" (p. 1223), and that "on any view, (1) is crucial, at least in the initial mapping of external data to linguistic experience" (p. 1209).

There are three problems with this approach. First, it *assumes* that innate language-specific factors are "crucial." It may well be that this is true; however, such a statement should be the outcome of a research program, not the initial assumption.

Secondly, Berwick et al. (2011) appear to assume that the four types of factors are separate and isolable: a particular principle can be attributed to factor 1, 2, 3, or 4. The problem is that one cannot attribute specific properties of complex systems to individual factors, since they emerge from the interaction of various factors (Elman et al., 1996; Bates, 2003; MacWhinney, 2005). Asking whether a particular principle is "innate" or due to "external stimuli" is meaningless—it is both: genes and the environment interact in myriad ways at different levels (molecular, cellular, at the level of the organism, and in the external environment, both physical and social). Asking whether something is "domain general" or "domain specific" may be equally unhelpful. Presumably everybody, including the staunchest nativists, agrees that (the different components of) what we call the language faculty arose out of some non-linguistic precursors. Bates (2003) argues that language is "a new machine built out of old parts"; she also suggests that the "old parts" (memory consolidation, motor planning, attention) "have kept their day jobs" (Bates, 1999). However, it is perfectly possible that they have undergone further selection as a result of the role they play in language, so that language is now their "day job," although they continue to "moonlight" doing other jobs.

Finally, Berwick et al. (2011) like most researchers working in the UG tradition, assume that one can determine which aspects of language can be attributed to which factor by ratiocination rather than empirical enquiry: "the best overall strategy for identifying the relative contributions of (1–4) to human linguistic knowledge is to formulate POS arguments that reveal a priori assumptions that theorists can reduce to more basic linguistic principles"

## **References**


(p. 1210). This "logical" approach to language learnability is a philosophical rather than a scientific stance, somewhat reminiscent of Zeno's argument that motion could not exist. Zeno of Elea was an ancient Greek philosopher who "proved," through a series of paradoxes (Achilles and the tortoise, the dichotomy argument, the arrow in flight), that motion is an illusion. However, Zeno's paradoxes, intriguing as they are, are not a contribution to the study of physics: in fact, we would not have modern physics if we simply accepted his argument.

Virtually everyone agrees that there is something unique about humans that makes language acquisition possible. There is a growing consensus, even in the generativist camp, that the "big mean UG" of the Principles and Parameters model is not tenable: UG, if it exists, is fairly minimal,<sup>5</sup> and most of the interesting properties of human languages arise through the interaction of innate capacities and predispositions and environmental factors. This view has long been part of the constructivist outlook (Piaget, 1954; Bates and MacWhinney, 1979; Karmiloff-Smith, 1992; MacWhinney, 1999, 2005; O'Grady, 2008, 2010), and it is encouraging to see the two traditions in cognitive science are converging, to some extent at least.

The great challenge is to understand exactly how genes and environment interact during individual development, and how languages evolve and change as a result of interactions between individuals. To do this, it is crucial to examine interactions at different levels. Genes do not interact with the primary linguistic data: they build proteins which build brains which learn to "represent" language and the external environment by interacting with it via the body. It is unlikely that we will be able to tease apart the contribution of the different factors by ratiocination: the interactions are just too complex, and they often lead to unexpected results (Thelen and Smith, 1994; Elman et al., 1996; Bates, 2003; MacWhinney, 2005). We have already made some headway in this area. Further progress will require empirical research and the coordinated efforts of many disciplines, from molecular biology to psychology and linguistics.


<sup>5</sup> In fact, Roberts and Holmberg (2011) suggest that "UG does not have to be seen as either language-specific or human-specific," thus capitulating on the central claims of the UG approach. Note that this dilutes the innateness hypothesis to the point where it becomes trivial: if UG is neither language specific nor human specific, then saying that it exists amounts to saying that we are different from rocks.

*Child Language*, eds P. Fletcher and B. MacWhinney (Oxford: Blackwell), 96–151.


Jusczyk, P. (1997). *The Discovery of Spoken Language*. Cambridge, MA: MIT Press. Kaplan, D., and Berman, R. (2015). Developing linguistic flexibility across the


Smolensky, P., and Dupoux, E. (2009). Universals in cognitive theories of language. *Behav. Brain Sci.* 32, 468–469. doi: 10.1017/S0140525X09990586


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Da*˛*browska. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The language faculty that wasn't: a usage-based account of natural language recursion

Morten H. Christiansen1, 2, 3 \* and Nick Chater <sup>4</sup>

*<sup>1</sup> Department of Psychology, Cornell University, Ithaca, NY, USA, <sup>2</sup> Department of Language and Communication, University of Southern Denmark, Odense, Denmark, <sup>3</sup> Haskins Laboratories, New Haven, CT, USA, <sup>4</sup> Behavioural Science Group, Warwick Business School, University of Warwick, Coventry, UK*

In the generative tradition, the language faculty has been shrinking—perhaps to include only the mechanism of recursion. This paper argues that even this view of the language faculty is too expansive. We first argue that a language faculty is difficult to reconcile with evolutionary considerations. We then focus on recursion as a detailed case study, arguing that our ability to process recursive structure does not rely on recursion as a property of the grammar, but instead emerges gradually by piggybacking on domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. Finally, we re-evaluate some of the key considerations that have often been taken to require the postulation of a language faculty.

Keywords: recursion, language evolution, cultural evolution, usage-based processing, language faculty, domain-general processes, sequence learning

# Introduction

Over recent decades, the language faculty has been getting smaller. In its heyday, it was presumed to encode a detailed "universal grammar," sufficiently complex that the process of language acquisition could be thought of as analogous to processes of genetically controlled growth (e.g., of a lung, or chicken's wing) and thus that language acquisition should not properly be viewed as a matter of learning at all. Of course, the child has to home in on the language being spoken in its linguistic environment, but this was seen as a matter of setting a finite set of discrete parameters to the correct values for the target language—but the putative bauplan governing all human languages was viewed as innately specified. Within the generative tradition, the advent of minimalism (Chomsky, 1995) led to a severe theoretical retrenchment. Apparently baroque innately specified complexities of language, such as those captured in the previous Principles and Parameters framework (Chomsky, 1981), were seen as emerging from more fundamental language-specific constraints. Quite what these constraints are has not been entirely clear, but an influential article (Hauser et al., 2002) raised the possibility that the language faculty, strictly defined (i.e., not emerging from generalpurpose cognitive mechanisms or constraints) might be very small indeed, comprising, perhaps, just the mechanism of recursion (see also, Chomsky, 2010). Here, we follow this line of thinking to its natural conclusion, and argue that the language faculty is, quite literally, empty: that natural

#### Edited by:

*N. J. Enfield, University of Sydney, Australia*

#### Reviewed by:

*Martin John Pickering, University of Edinburgh, UK Bill Thompson, Vrije Universiteit Brussel, Belgium*

#### \*Correspondence:

*Morten H. Christiansen, Department of Psychology, Uris Hall, Cornell University, Ithaca, NY 14853, USA christiansen@cornell.edu*

#### Specialty section:

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

Received: *05 May 2015* Accepted: *27 July 2015* Published: *27 August 2015*

#### Citation:

*Christiansen MH and Chater N (2015) The language faculty that wasn't: a usage-based account of natural language recursion. Front. Psychol. 6:1182. doi: 10.3389/fpsyg.2015.01182* language emerges from general cognitive constraints, and that there is no innately specified special-purpose cognitive machinery devoted to language (though there may have been some adaptations for speech; e.g., Lieberman, 1984).

The structure of this paper is as follows. In The Evolutionary Implausibility of an Innate Language Faculty, we question whether an innate linguistic endowment could have arisen through biological evolution. In Sequence Learning ad the Basis for Recursive Structure, we then focus on what is, perhaps, the last bastion for defenders of the language faculty: natural language recursion. We argue that our limited ability to deal with recursive structure in natural language is an acquired skill, relying on nonlinguistic abilities for sequence learning. Finally, in Language without a Language Faculty, we use these considerations as a basis for reconsidering some influential lines of argument for an innate language faculty<sup>1</sup> .

# The Evolutionary Implausibility of an Innate Language Faculty

Advocates of a rich, innate language faculty have often pointed to analogies between language and vision (e.g., Fodor, 1983; Pinker and Bloom, 1990; Pinker, 1994). Both appear to pose highly specific processing challenges, which seem distinct from those involved in more general learning, reasoning, and decision making processes. There is strong evidence that the brain has innately specified neural hardwiring for visual processing; so, perhaps we should expect similar dedicated machinery for language processing.

Yet on closer analysis, the parallel with vision seems to lead to a very different conclusion. The structure of the visual world (e.g., in terms of its natural statistics, e.g., Field, 1987; and the ecological structure generated by the physical properties of the world and the principles of optics, e.g., Gibson, 1979; Richards, 1988) has been fairly stable over the tens of millions of years over which the visual system has developed in the primate lineage. Thus, the forces of biological evolution have been able to apply a steady pressure to develop highly specialized visual processing machinery, over a very long time period. But any parallel process of adaptation to the linguistic environment would have operated on a timescale shorter by two orders of magnitude: language is typically assumed to have arisen in the last 100,000–200,000 years (e.g., Bickerton, 2003). Moreover, while the visual environment is stable, the linguistic environment is anything but stable. Indeed, during historical time, language change is consistently observed to be extremely rapid—indeed, the entire Indo-European language group may have a common root just 10,000 years ago (Gray and Atkinson, 2003).

Yet this implies that the linguistic environment is a fastchanging "moving target" for biological adaptation, in contrast to the stability of the visual environment. Can biological evolution occur under these conditions? One possibility is that there might be co-evolution between language and the geneticallyspecified language faculty (e.g., Pinker and Bloom, 1990). But computer simulations have shown that co-evolution between slowly changing "language genes" and more a rapidly change language environment does not occur. Instead, the language rapidly adapts, through cultural evolution, to the existing "pool" of language genes (Chater et al., 2009). More generally, in geneculture interactions, fast-changing culture rapidly adapts to the slower-changing genes and not vice versa (Baronchelli et al., 2013a).

It might be objected that not all aspects of the linguistic environment may be unstable—indeed, advocates of an innate language faculty frequently advocate the existence of strong regularities that they take to be universal across human languages (Chomsky, 1980; though see Evans and Levinson, 2009). Such universal features of human language would, perhaps, be stable features of the linguistic environment, and hence provide a possible basis for biological adaptation. But this proposal involves a circularity—because one of the reasons to postulate an innate language faculty is to explain putative language universals: thus, such universals cannot be assumed to pre-exist, and hence to provide a stable environment for, the evolution of the language faculty (Christiansen and Chater, 2008).

Yet perhaps a putative language faculty need not be a product of biological adaptation at all—could it perhaps have arisen through exaptation (Gould and Vrba, 1982): that is, as a sideeffect of other biological mechanisms, which have themselves adapted to entirely different functions (e.g., Gould, 1993)? That a rich innate language faculty (e.g., one embodying the complexity of a theory such as Principles and Parameters) might arise as a distinct and autonomous mechanism by, in essence, pure chance seems remote (Christiansen and Chater, 2008). Without the selective pressures driving adaptation, it is highly implausible that new and autonomous piece of cognitive machinery (which, in traditional formulations, the language faculty is typically assumed to be, e.g., Chomsky, 1980; Fodor, 1983) might arise from the chance recombination of pre-existing cognitive components (Dediu and Christiansen, in press).

These arguments do not necessarily count against a very minimal notion of the language faculty, however. As we have noted, Hauser et al. (2002) speculate that the language faculty may consist of nothing more than a mechanism for recursion. Such a simple (though potentially far-reaching) mechanism could, perhaps, have arisen as a consequence of a modest genetic mutation (Chomsky, 2010). We shall argue, though, that even this minimal conception of the contents of the language faculty is too expansive. Instead, the recursive character of aspects of natural language need not be explained by the operation of a dedicated recursive processing mechanism at all, but, rather, as emerging from domain-general sequence learning abilities.

# Sequence Learning as the Basis for Recursive Structure

Although recursion has always figured in discussions of the evolution of language (e.g., Premack, 1985; Chomsky, 1988;

<sup>1</sup>Although we do not discuss sign languages explicitly in this article, we believe that they are subject to the same arguments as we here present for spoken language. Thus, our arguments are intended to apply to language in general, independently of the modality within which it is expressed (see Christiansen and Chater, Forthcoming 2016, in press, for further discussion).

Pinker and Bloom, 1990; Corballis, 1992; Christiansen, 1994), the new millennium saw a resurgence of interest in the topic following the publication of Hauser et al. (2002), controversially claiming that recursion may be the only aspect of the language faculty unique to humans. The subsequent outpouring of writings has covered a wide range of topics, from criticisms of the Hauser et al. claim (e.g., Pinker and Jackendoff, 2005; Parker, 2006) and how to characterize recursion appropriately (e.g., Tomalin, 2011; Lobina, 2014), to its potential presence (e.g., Gentner et al., 2006) or absence in animals (e.g., Corballis, 2007), and its purported universality in human language (e.g., Everett, 2005; Evans and Levinson, 2009; Mithun, 2010) and cognition (e.g., Corballis, 2011; Vicari and Adenzato, 2014). Our focus here, however, is to advocate a usage-based perspective on the processing of recursive structure, suggesting that it relies on evolutionarily older abilities for dealing with temporally presented sequential input.

#### Recursion in Natural Language: What Needs to Be Explained?

The starting point for our approach to recursion in natural language is that what needs to be explained is the observable human ability to process recursive structure, and not recursion as a hypothesized part of some grammar formalism. In this context, it is useful to distinguish between two types of recursive structures: tail recursive structures (such as 1) and complex recursive structures (such as 2).


Both sentences in (1) and (2) express roughly the same semantic content. However, whereas the two levels of tail recursive structure in (1) do not cause much difficulty for comprehension, the comparable sentence in (2) with two centerembeddings cannot be readily understood. Indeed, there is a substantial literature showing that English doubly centerembedded sentences (such as 2) are read with the same intonation as a list of random words (Miller, 1962), cannot easily be memorized (Miller and Isard, 1964; Foss and Cairns, 1970), are difficult to paraphrase (Hakes and Foss, 1970; Larkin and Burns, 1977) and comprehend (Wang, 1970; Hamilton and Deese, 1971; Blaubergs and Braine, 1974; Hakes et al., 1976), and are judged to be ungrammatical (Marks, 1968). Even when facilitating the processing of center-embeddings by adding semantic biases or providing training, only little improvement is seen in performance (Stolz, 1967; Powell and Peters, 1973; Blaubergs and Braine, 1974). Importantly, the limitations on processing center-embeddings are not confined to English. Similar patterns have been found in a variety of languages, ranging from French (Peterfalvi and Locatelli, 1971), German (Bach et al., 1986), and Spanish (Hoover, 1992) to Hebrew (Schlesinger, 1975), Japanese (Uehara and Bradley, 1996), and Korean (Hagstrom and Rhee, 1997). Indeed, corpus analyses of Danish, English, Finnish, French, German, Latin, and Swedish (Karlsson, 2007) indicate that doubly center-embedded sentences are almost entirely absent from spoken language.

By making complex recursion a built-in property of grammar, the proponents of such linguistic representations are faced with a fundamental problem: the grammars generate sentences that can never be understood and that would never be produced. The standard solution is to propose a distinction between an infinite linguistic competence and a limited observable psycholinguistic performance (e.g., Chomsky, 1965). The latter is limited by memory limitations, attention span, lack of concentration, and other processing constraints, whereas the former is construed to be essentially infinite in virtue of the recursive nature of grammar. There are a number of methodological and theoretical issues with the competence/performance distinction (e.g., Reich, 1969; Pylyshyn, 1973; Christiansen, 1992; Petersson, 2005; see also Christiansen and Chater, Forthcoming 2016). Here, however, we focus on a substantial challenge to the standard solution, deriving from the considerable variation across languages and individuals in the use of recursive structures differences that cannot readily be ascribed to performance factors.

In a recent review of the pervasive differences that can be observed throughout all levels of linguistic representations across the world's current 6–8000 languages, Evans and Levinson (2009) observe that recursion is not a feature of every language. Using examples from Central Alaskan Yup'ik Eskimo, Khalkha Mongolian, and Mohawk, Mithun (2010) further notes that recursive structures are far from uniform across languages, nor are they static within individual languages. Hawkins (1994) observed substantial offline differences in perceived processing difficulty of the same type of recursive constructions across English, German, Japanese, and Persian. Moreover, a selfpaced reading study involving center-embedded sentences found differential processing difficulties in Spanish and English (even when morphological cues were removed in Spanish; Hoover, 1992). We see these cross-linguistic patterns as suggesting that recursive constructions form part of a linguistic system: the processing difficulty associated with specific recursive constructions (and whether they are present at all) will be determined by the overall distributional structure of the language (including pragmatic and semantic considerations).

Considerable variations in recursive abilities have also been observed developmentally. Dickinson (1987) showed that recursive language production abilities emerge gradually, in a piecemeal fashion. On the comprehension side, training improves comprehension of singly embedded relative clause constructions both in 3–4-year old children (Roth, 1984) and adults (Wells et al., 2009), independent of other cognitive factors. Level of education further correlates with the ability to comprehend complex recursive sentences (D ˛abrowska, 1997). More generally, these developmental differences are likely to reflect individual variations in experience with language (see Christiansen and Chater, Forthcoming 2016), differences that may further be amplified by variations in the structural and distributional characteristics of the language being spoken.

Together, these individual, developmental and cross-linguistic differences in dealing with recursive linguistic structure cannot easily be explained in terms of a fundamental recursive competence, constrained by fixed biological constraints on performance. That is, the variation in recursive abilities across individuals, development, and languages are hard to explain in terms of performance factors, such as language-independent constraints on memory, processing or attention, imposing limitations on an otherwise infinite recursive grammar. Invoking such limitations would require different biological constraints on working memory, processing, or attention for speakers of different languages, which seems highly unlikely. To resolve these issues, we need to separate claims about recursive mechanisms from claims about recursive structure: the ability to deal with a limited amount of recursive structure in language does not necessitate the postulation of recursive mechanisms to process them. Thus, instead of treating recursion as an a priori property of the language faculty, we need to provide a mechanistic account able to accommodate the actual degree of recursive structure found across both natural languages and natural language users: no more and no less.

We favor an account of the processing of recursive structure that builds on construction grammar and usagebased approaches to language. The essential idea is that the ability to process recursive structure does not depend on a built-in property of a competence grammar but, rather, is an acquired skill, learned through experience with specific instances of recursive constructions and limited generalizations over these (Christiansen and MacDonald, 2009). Performance limitations emerge naturally through interactions between linguistic experience and cognitive constraints on learning and processing, ensuring that recursive abilities degrade in line with human performance across languages and individuals. We show how our usage-based account of recursion can accommodate human data on the most complex recursive structures that have been found in naturally occurring language: center-embeddings and cross-dependencies. Moreover, we suggest that the human ability to process recursive structures may have evolved on top of our broader abilities for complex sequence learning. Hence, we argue that language processing, implemented by domaingeneral mechanisms—not recursive grammars—is what endows language with its hallmark productivity, allowing it to "... make infinite employment of finite means," as the celebrated German linguist, Wilhelm von Humboldt (1836/1999: p. 91), noted more than a century and a half ago.

### Comparative, Genetic, and Neural Connections between Sequence Learning and Language

Language processing involves extracting regularities from highly complex sequentially organized input, suggesting a connection between general sequence learning (e.g., planning, motor control, etc., Lashley, 1951) and language: both involve the extraction and further processing of discrete elements occurring in temporal sequences (see also e.g., Greenfield, 1991; Conway and Christiansen, 2001; Bybee, 2002; de Vries et al., 2011, for similar perspectives). Indeed, there is comparative, genetic, and neural evidence suggesting that humans may have evolved specific abilities for dealing with complex sequences. Experiments with non-human primates have shown that they can learn both fixed sequences, akin to a phone number (e.g., Heimbauer et al., 2012), and probabilistic sequences, similar to "statistical learning" in human studies (e.g., Heimbauer et al., 2010, under review; Wilson et al., 2013). However, regarding complex recursive non-linguistic sequences, non-human primates appear to have significant limitations relative to human children (e.g., in recursively sequencing actions to nest cups within one another; Greenfield et al., 1972; Johnson-Pynn et al., 1999). Although more carefully controlled comparisons between the sequence learning abilities of human and non-primates are needed (see Conway and Christiansen, 2001, for a review), the currently available data suggest that humans may have evolved a superior ability to deal with sequences involving complex recursive structures.

The current knowledge regarding the FOXP2 gene is consistent with the suggestion of a human adaptation for sequence learning (for a review, see Fisher and Scharff, 2009). FOXP2 is highly conserved across species but two amino acid changes have occurred after the split between humans and chimps, and these became fixed in the human population about 200,000 years ago (Enard et al., 2002). In humans, mutations to FOXP2 result in severe speech and orofacial motor impairments (Lai et al., 2001; MacDermot et al., 2005). Studies of FOXP2 expression in mice and imaging studies of an extended family pedigree with FOXP2 mutations have provided evidence that this gene is important to neural development and function, including of the cortico-striatal system (Lai et al., 2003). When a humanized version of Foxp2 was inserted into mice, it was found to specifically affect cortico-basal ganglia circuits (including the striatum), increasing dendrite length and synaptic plasticity (Reimers-Kipping et al., 2011). Indeed, synaptic plasticity in these circuits appears to be key to learning action sequences (Jin and Costa, 2010); and, importantly, the cortico-basal ganglia system has been shown to be important for sequence (and other types of procedural) learning (Packard and Knowlton, 2002). Crucially, preliminary findings from a mother and daughter pair with a translocation involving FOXP2 indicate that they have problems with both language and sequence learning (Tomblin et al., 2004). Finally, we note that sequencing deficits also appear to be associated with specific language impairment (SLI) more generally (e.g., Tomblin et al., 2007; Lum et al., 2012; Hsu et al., 2014; see Lum et al., 2014, for a review).

Hence, both comparative and genetic evidence suggests that humans have evolved complex sequence learning abilities, which, in turn, appear to have been pressed into service to support the emergence of our linguistic skills. This evolutionary scenario would predict that language and sequence learning should have considerable overlap in terms of their neural bases. This prediction is substantiated by a growing bulk of research in the cognitive neurosciences, highlighting the close relationship between sequence learning and language (see Ullman, 2004; Conway and Pisoni, 2008, for reviews). For example, violations of learned sequences elicit the same characteristic event-related potential (ERP) brainwave response as ungrammatical sentences, and with the same topographical scalp distribution (Christiansen et al., 2012). Similar ERP results have been observed for musical sequences (Patel et al., 1998). Additional evidence for a common domain-general neural substrate for sequence learning and language comes from functional imaging (fMRI) studies showing that sequence violations activate Broca's area (Lieberman et al., 2004; Petersson et al., 2004, 2012; Forkstam et al., 2006), a region in the left inferior frontal gyrus forming a key part of the cortico-basal ganglia network involved in language. Results from a magnetoencephalography (MEG) experiment further suggest that Broca's area plays a crucial role in the processing of musical sequences (Maess et al., 2001).

If language is subserved by the same neural mechanisms as used for sequence processing, then we would expect a breakdown of syntactic processing to be associated with impaired sequencing abilities. Christiansen et al. (2010b) tested this prediction in a population of agrammatic aphasics, who have severe problems with natural language syntax in both comprehension and production due to lesions involving Broca's area (e.g., Goodglass and Kaplan, 1983; Goodglass, 1993—see Novick et al., 2005; Martin, 2006, for reviews). They confirmed that agrammatism was associated with a deficit in sequence learning in the absence of other cognitive impairments. Similar impairments to the processing of musical sequences by the same population were observed in a study by Patel et al. (2008). Moreover, success in sequence learning is predicted by white matter density in Broca's area, as revealed by diffusion tensor magnetic resonance imaging (Flöel et al., 2009). Importantly, applying transcranial direct current stimulation (de Vries et al., 2010) or repetitive transcranial magnetic stimulation (Uddén et al., 2008) to Broca's area during sequence learning or testing improves performance. Together, these cognitive neuroscience studies point to considerable overlap in the neural mechanisms involved in language and sequence learning<sup>2</sup> , as predicted by our evolutionary account (see also Wilkins and Wakefield, 1995; Christiansen et al., 2002; Hoen et al., 2003; Ullman, 2004; Conway and Pisoni, 2008, for similar perspectives).

#### Cultural Evolution of Recursive Structures Based on Sequence Learning

Comparative and genetic evidence is consistent with the hypothesis that humans have evolved more complex sequence learning mechanisms, whose neural substrates subsequently were recruited for language. But how might recursive structure recruit such complex sequence learning abilities? Reali and Christiansen (2009) explored this question using simple recurrent networks (SRNs; Elman, 1990). The SRN is a type of connectionist model that implements a domain-general learner with sensitivity to complex sequential structure in the input. This model is trained to predict the next element in a sequence and learns in a self-supervised manner to correct any violations of its own expectations regarding what should come next. The SRN model has been successfully applied to the modeling of both sequence learning (e.g., Servan-Schreiber et al., 1991; Botvinick and Plaut, 2004) and language processing (e.g., Elman, 1993), including multiple-cue integration in speech segmentation (Christiansen et al., 1998) and syntax acquisition (Christiansen et al., 2010a). To model the difference in sequence learning skills between humans and non-human primates, Reali and Christiansen first "evolved" a group of networks to improve their performance on a sequencelearning task in which they had to predict the next digit in a five-digit sequence generated by randomizing the order of the digits, 1–5 (based on a human task developed by Lee, 1997). At each generation, the best performing network was selected, and its initial weights (prior to any training)—i.e., their "genome" was slightly altered to produce a new generation of networks. After 500 generations of this simulated "biological" evolution, the resulting networks performed significantly better than the first generation SRNs.

Reali and Christiansen (2009) then introduced language into the simulations. Each miniature language was generated by a context-free grammar derived from the grammar skeleton in **Table 1**. This grammar skeleton incorporated substantial flexibility in word order insofar as the material on the right-hand side of each rule could be ordered as it is (right-branching), in the reverse order (left-branching), or have a flexible order (i.e., the constituent order is as is half of time, and the reverse the other half of the time). Using this grammar skeleton, it is possible to instantiate 3<sup>6</sup> (= 729) distinct grammars, with differing degrees of consistency in the ordering of sentence constituents. Reali and Christiansen implemented both biological and cultural evolution in their simulations: As with the evolution of better sequence learners, the initial weights of the network that best acquired a language in a given generation were slightly altered to produce the next generation of language learners—with the additional constraint that performance on the sequence learning task had to be maintained at the level reached at the end of the first part of the simulation (to capture the fact that humans are still superior sequence learners today). Cultural evolution of language was simulated by having the networks learn several different languages at each generation and then selecting the best learnt language as the basis for the next generation. The best learnt language was then varied slightly by changing the directions of a rule to produce a set of related "offspring" languages for each generation.

Although the simulations started with language being completely flexible, and thus without any reliable word order constraints, after <100 generations of cultural evolution, the resulting language had adopted consistent word order constraints in all but one of the six rules. When comparing the networks from the first generation at which language was introduced

*S, sentence; NP, noun phrase; VP, verb phrase; PP, adpositional phrase; PossP, possessive phrase; N, noun; V, verb; adp, adposition; poss, possessive marker. Curly brackets indicate that the order of constituents can be as is, the reverse, or either way with equal probability (i.e., flexible word order). Parentheses indicate an optional constituent.*

<sup>2</sup> Some studies purportedly indicate that the mechanisms involved in syntactic language processing are not the same as those involved in most sequence learning tasks (e.g., Penã et al., 2002; Musso et al., 2003; Friederici et al., 2006). However, the methods and arguments used in these studies have subsequently been challenged (de Vries et al., 2008; Marcus et al., 2003, and Onnis et al., 2005, respectively), thereby undermining their negative conclusions. Overall, the preponderance of the evidence suggests that sequence-learning tasks tap into the mechanisms involved in language acquisition and processing (see Petersson et al., 2012, for discussion).

and the final generation, Reali and Christiansen (2009) found no difference in linguistic performance. In contrast, when comparing network performance on the initial (all-flexible) language vs. the final language, a very large difference in learnability was observed. Together, these two analyses suggest that it was the cultural evolution of language, rather than biological evolution of better learners, that allowed language to become more easily learned and more structurally consistent across these simulations. More generally, the simulation results provide an existence proof that recursive structure can emerge in natural language by way of cultural evolution in the absence of language-specific constraints.

#### Sequence Learning and Recursive Consistency

An important remaining question is whether human learners are sensitive to the kind of sequence learning constraints revealed by Reali and Christiansen's (2009) simulated process of cultural evolution. A key result of these simulations was that the sequence learning constraints embedded in the SRNs tend to favor what we will refer to as recursive consistency (Christiansen and Devlin, 1997). Consider rewrite rules (2) and (3) from **Table 1**:

NP → {N (PP)} PP → {adp NP}

Together, these two skeleton rules form a recursive rule set because each calls the other. Ignoring the flexible version of these two rules, we get the four possible recursive rule sets shown in **Table 2**. Using these rules sets we can generate the complex noun phrases seen in (3)–(6):


The first two rules sets from **Table 2** generate recursively consistent structures that are either right-branching (as in 3) or left-branching (as in 4). The prepositions and postpositions, respectively, are always in close proximity to their noun complements, making it easier for a sequence learner to discover their relationship. In contrast, the final two rule sets generate recursively inconsistent structures, involving center-embeddings: all nouns are either stacked up before all the postpositions (5) or after all the prepositions (6). In both cases, the learner has to work out that from and cities together form a prepositional phrase, despite being separated from each other by another prepositional phrase involving with and smog. This process is further complicated by an increase in memory load caused by the intervening prepositional phrase. From a sequence learning perspective, it should therefore be easier to acquire the recursively consistent structure found in (3) and (4) compared with the recursively inconsistent structure in (5) and (6). Indeed, all the simulation runs in Reali and Christiansen (2009) resulted in languages in which both recursive rule sets were consistent.

Christiansen and Devlin (1997) had previously shown that SRNs perform better on recursively consistent structure (such as those in 3 and 4). However, if human language has adapted by way of cultural evolution to avoid recursive inconsistencies (such as 5 and 6), then we should expect people to be better at learning recursively consistent artificial languages than recursively inconsistent ones. Reeder (2004), following initial work by Christiansen (2000), tested this prediction by exposing participants to one of two artificial languages, generated by the artificial grammars shown in **Table 3**. Notice that the consistent grammar instantiates a left-branching grammar from the grammar skeleton used by Reali and Christiansen (2009), involving two recursively consistent rule sets (rules 2–3 and 5– 6). The inconsistent grammar differs only in the direction of two rules (3 and 5), which are right-branching, whereas the other three rules are left-branching. The languages were instantiated using 10 spoken non-words to generate the sentences to which the participants were exposed. Participants in the two language conditions would see sequences of the exact same lexical items, only differing in their order of occurrence as dictated by the respective grammar (e.g., consistent: jux vot hep vot meep nib vs. inconsistent: jux meep hep vot vot nib). After training, the participants were presented with a new set of sequences, one by one, for which they were asked to judge whether or not these new items were generated by the same rules as the ones they saw previously. Half of the new items incorporated subtle violations of the sequence ordering (e.g., grammatical: cav hep vot lum meep nib vs. ungrammatical: cav hep vot rud meep nib, where rud is ungrammatical in this position).

The results of this artificial language learning experiment showed that the consistent language was learned significantly better (61.0% correct classification) than the inconsistent one (52.7%). It is important to note that because the consistent grammar was left-branching (and thus more like languages such as Japanese and Hindi), knowledge of English cannot explain the results. Indeed, if anything, the two right-branching rules in the inconsistent grammar bring that language closer to English<sup>3</sup> . To further demonstrate that the preferences for consistently recursive sequences is a domain-general bias, Reeder (2004)

#### TABLE 2 | Recursive rule sets.


*NP, noun phrase; PP, adpositional phrase; prep, preposition; post, postposition; N, noun. Parentheses indicate an optional constituent.*

<sup>3</sup>We further note that the SRN simulations by Christiansen and Devlin (1997) showed a similar pattern, suggesting that a general linguistic capacity is not required to explain these results. Rather, the results would appear to arise from the distributional patterns inherent to the two different artificial grammars.

TABLE 3 | The grammars used Christiansen (2000) and Reeder (2004).


*S, sentence; NP, noun phrase; VP, verb phrase; PP, adpositional phrase; PossP, possessive phrase; N, noun; V, verb; post, postposition; prep, preposition; poss, possessive marker. Parentheses indicate an optional constituent.*

conducted a second experiment, in which the sequences were instantiated using black abstract shapes that cannot easily be verbalized. The results of the second study closely replicated those of the first, suggesting that there may be general sequence learning biases that favor recursively consistent structures, as predicted by Reali and Christiansen's (2009) evolutionary simulations.

The question remains, though, whether such sequence learning biases can drive cultural evolution of language in humans. That is, can sequence-learning constraints promote the emergence of language-like structure when amplified by processes of cultural evolution? To answer this question, Cornish et al. (under review) conducted an iterated sequence learning experiment, modeled on previous human iterated learning studies involving miniature language input (Kirby et al., 2008). Participants were asked to participate in a memory experiment, in which they were presented with 15 consonant strings. Each string was presented briefly on a computer screen after which the participants typed it in. After multiple repetitions of the 15 strings, the participants were asked to recall all of them. They were requested to continue recalling items until they had provided 15 unique strings. The recalled 15 strings were then recoded in terms of their specific letters to avoid trivial biases such as the location of letters on the computer keyboard and the presence of potential acronyms (e.g., X might be replaced throughout by T, T by M, etc.). The resulting set of 15 strings (which kept the same underlying structure as before recoding) was then provided as training strings for the next participant. A total of 10 participants were run within each "evolutionary" chain.

The initial set of strings used for the first participant in each chain was created so as to have minimal distributional structure (all consonant pairs, or bigrams, had a frequency of 1 or 2). Because recalling 15 arbitrary strings is close to impossible given normal memory constraints, it was expected that many of the recalled items would be strongly affected by sequence learning biases. The results showed that as these sequence biases became amplified across generations of learners, the sequences gained more and more distributional structure (as measured by the relative frequency of repeated two- and threeletter units). Importantly, the emerging system of sequences became more learnable. Initially, participants could only recall about 4 of the 15 strings correctly but by the final generation this had doubled, allowing participants to recall more than half the strings. Importantly, this increase in learnability did not evolve at the cost of string length: there was no decrease across generations. Instead, the sequences became easy to learn and recall because they formed a system, allowing subsequences to be reused productively. Using network analyses (see Baronchelli et al., 2013b, for a review), Cornish et al. demonstrated that the way in which this productivity was implemented strongly mirrored that observed for child-directed speech.

The results from Cornish et al. (under review) suggest that sequence learning constraints, as those explored in the simulations by Reali and Christiansen (2009) and demonstrated by Reeder (2004), can give rise to language-like distributional regularities that facilitate learning. This supports our hypothesis that sequential learning constraints, amplified by cultural transmission, could have shaped language into what we see today, including its limited use of embedded recursive structure. Next, we shall extend this approach to show how the same sequence learning constraints that we hypothesized to have shaped important aspects of the cultural evolution of recursive structures also can help explain specific patterns in the processing of complex recursive constructions.

## A Usage-based Account of Complex Recursive Structure

So far, we have discussed converging evidence supporting the theory that language in important ways relies on evolutionarily prior neural mechanisms for sequence learning. But can a domain-general sequence learning device capture the ability of humans to process the kind of complex recursive structures that has been argued to require powerful grammar formalisms (e.g., Chomsky, 1956; Shieber, 1985; Stabler, 2009; Jäger and Rogers, 2012)? From our usage-based perspective, the answer does not necessarily require the postulation of recursive mechanisms as long as the proposed mechanisms can deal with the level of complex recursive structure that humans can actually process. In other words, what needs to be accounted for is the empirical evidence regarding human processing of complex recursive structures, and not theoretical presuppositions about recursion as a stipulated property of our language system.

Christiansen and MacDonald (2009) conducted a set of computational simulations to determine whether a sequencelearning device such as the SRN would be able to capture human processing performance on complex recursive structures. Building on prior work by Christiansen and Chater (1999), they focused on the processing of sentences with centerembedded and cross-dependency structures. These two types of recursive constructions produce multiple overlapping nonadjacent dependencies, as illustrated in **Figure 1**, resulting in rapidly increasing processing difficulty as the number of embeddings grows. We have already discussed earlier how performance on center-embedded constructions breaks down at two levels of embedding (e.g., Wang, 1970; Hamilton and Deese, 1971; Blaubergs and Braine, 1974; Hakes et al., 1976). The processing of cross-dependencies, which exist in Swiss-German and Dutch, has received less attention, but the available data also point to a decline in performance with increased levels




*S, sentence; NP, noun phrase; PP, prepositional phrase; PossP, possessive phrase; rel, relative clauses (subscripts, sub and obj, indicate subject/object relative clause); VP, verb phrase; N, noun; V, verb; prep, preposition; poss, possessive marker. For brevity, NP rules have been compressed into a single rule, using "|" to indicate exclusive options. The subscripts i, t, o, and c denote intransitive, transitive, optionally transitive, and clausal verbs, respectively. Subscript numbers indicate noun-verb dependency relations. Parentheses indicate an optional constituent.*

of embedding (Bach et al., 1986; Dickey and Vonk, 1997). Christiansen and MacDonald trained networks on sentences derived from one of the two grammars shown in **Table 4**. Both grammars contained a common set of recursive structures: right-branching recursive structure in the form of prepositional modifications of noun phrases, noun phrase conjunctions, subject relative clauses, and sentential complements; leftbranching recursive structure in the form of prenominal possessives. The grammars furthermore had three additional verb argument structures (transitive, optionally transitive, and intransitive) and incorporated agreement between subject nouns and verbs. As illustrated by **Table 4**, the only difference between the two grammars was in the type of complex recursive structure they contained: center-embedding vs. cross-dependency.

The grammars could generate a variety of sentences, with varying degree of syntactic complexity, from simple transitive sentences (such as 7) to more complex sentences involving different kinds of recursive structure (such as 8 and 9).


The generation of sentences was further restricted by probabilistic constraints on the complexity and depth of recursion. Following training on either grammar, the networks performed well on a variety of recursive sentence structures, demonstrating that the SRNs were able to acquire complex grammatical regularities (see also Christiansen, 1994) 4 . The

<sup>4</sup>All simulations were replicated multiple times (including with variations in network architecture and corpus composition), yielding qualitatively similar results.

networks acquired sophisticated abilities for generalizing across constituents in line with usage-based approaches to constituent structure (e.g., Beckner and Bybee, 2009; see also Christiansen and Chater, 1994). Differences between networks were observed, though, on their processing of the complex recursive structure permitted by the two grammars.

To model human data on the processing of center-embedding and cross-dependency structures, Christiansen and MacDonald (2009) relied on a study conducted by Bach et al. (1986) in which sentences with two center-embeddings in German were found to be significantly harder to process than comparable sentences with two cross-dependencies in Dutch. Bach et al. asked native Dutch speakers to rate the comprehensibility of Dutch sentences involving varying depths of recursive structure in the form of cross-dependency constructions and corresponding right-branching paraphrase sentences with similar meaning. Native speakers of German were tested using similar materials in German, where center-embedded constructions replaced the cross-dependency constructions. To remove potential effects of processing difficulty due to length, the ratings from the right-branching paraphrase sentences were subtracted from the complex recursive sentences. **Figure 2** shows the results of the Bach et al. study on the left-hand side.

SRN performance was scored in terms of Grammatical Prediction Error (GPE; Christiansen and Chater, 1999), which measures the network's ability to make grammatically correct predictions for each upcoming word in a sentence, given prior context. The right-hand side of **Figure 2** shows the mean sentence GPE scores, averaged across 10 novel sentences. Both humans and SRNs show similar qualitative patterns of processing difficulty (see also Christiansen and Chater, 1999). At a single level of embedding, there is no difference in processing difficulty. However, at two levels of embedding, cross-dependency structures (in Dutch) are processed more easily than comparable center-embedded structures (in German).

#### Bounded Recursive Structure

Christiansen and MacDonald (2009) demonstrated that a sequence learner such as the SRN is able to mirror the differential human performance on center-embedded and cross-dependency recursive structures. Notably, the networks were able to capture human performance without the complex external memory devices (such as a stack of stacks; Joshi, 1990) or external memory constraints (Gibson, 1998) required by previous accounts. The SRNs ability to mimic human performance likely derives from a combination of intrinsic architectural constraints (Christiansen and Chater, 1999) and the distributional properties of the input to which it has been exposed (MacDonald and Christiansen, 2002; see also Christiansen and Chater, Forthcoming 2016). Christiansen and Chater (1999) analyzed the hidden unit representations of the SRN—its internal state—before and after training on recursive constructions and found that these networks have an architectural bias toward local dependencies, corresponding to those found in right-branching recursion. To process multiple instances of such recursive constructions, however, the SRN needs exposure to the relevant types of recursive structures. This exposure is particularly important when the network has to process center-embedded constructions because the network must overcome its architectural bias toward local dependencies. Thus, recursion is not a built-in property of the SRN; instead, the networks develop their human-like abilities for processing recursive constructions through repeated exposure to the relevant structures in the input.

As noted earlier, this usage-based approach to recursion differs from many previous processing accounts, in which unbounded recursion is implemented as part of the representation of linguistic knowledge (typically in the form of a rule-based grammar). Of course, this means that systems of the latter kind can process complex recursive constructions, such as center-embeddings, beyond human capabilities. Since Miller and Chomsky (1963), the solution to this mismatch has been to impose extrinsic memory limitations exclusively aimed at capturing human performance limitations on doubly

center-embedded constructions (e.g., Kimball, 1973; Marcus, 1980; Church, 1982; Just and Carpenter, 1992; Stabler, 1994; Gibson and Thomas, 1996; Gibson, 1998; see Lewis et al., 2006, for a review).

To further investigate the nature of the SRN's intrinsic constraints on the processing of multiple center-embedded constructions, Christiansen and MacDonald (2009) explored a previous result from Christiansen and Chater (1999) showing that SRNs found ungrammatical versions of doubly centerembedded sentences with a missing verb more acceptable than their grammatical counterparts<sup>5</sup> (for similar SRN results, see Engelmann and Vasishth, 2009). A previous offline rating study by Gibson and Thomas (1999) found that when the middle verb phrase (was cleaning every week) was removed from (10), the resulting ungrammatical sentence in (11) was rated no worse than the grammatical version in (10).


However, when Christiansen and MacDonald tested the SRN on similar doubly center-embedded constructions, they obtained predictions for (11) to be rated better than (10). To test these predictions, they elicited on-line human ratings for the stimuli from the Gibson and Thomas study using a variation of the "stop making sense" sentence-judgment paradigm (Boland et al., 1990, 1995; Boland, 1997). Participants read a sentence, word-byword, while at each step they decided whether the sentence was grammatical or not. Following the presentation of each sentence, participants rated it on a 7-point scale according to how good it seemed to them as a grammatical sentence of English (with 1 indicating that the sentence was "perfectly good English" and 7 indicating that it was "really bad English"). As predicted by the SRN, participants rated ungrammatical sentences such as (11) as better than their grammatical counterpart exemplified in (10).

The original stimuli from the Gibson and Thomas (1999) study had certain shortcomings that could have affected the outcome of the online rating experiment. Firstly, there were substantial length differences between the ungrammatical and grammatical versions of a given sentence. Secondly, the sentences incorporated semantic biases making it easier to line up a subject noun with its respective verb (e.g., apartment–decorated, service– sent over in 10). To control for these potential confounds, Christiansen and MacDonald (2009) replicated the experiment using semantically-neutral stimuli controlled for length (adapted from Stolz, 1967), as illustrated by (12) and (13).


The second online rating experiment yielded the same results as the first, thus replicating the "missing verb" effect. These results have subsequently been confirmed by online ratings in French (Gimenes et al., 2009) and a combination of self-paced reading and eye-tracking experiments in English (Vasishth et al., 2010). However, evidence from German (Vasishth et al., 2010) and Dutch (Frank et al., in press) indicates that speakers of these languages do not show the missing verb effect but instead find the grammatical versions easier to process. Because verbfinal constructions are common in German and Dutch, requiring the listener to track dependency relations over a relatively long distance, substantial prior experience with these constructions likely has resulted in language-specific processing improvements (see also Engelmann and Vasishth, 2009; Frank et al., in press, for similar perspectives). Nonetheless, in some cases the missing verb effect may appear even in German, under conditions of high processing load (Trotzke et al., 2013). Together, the results from the SRN simulations and human experimentation support our hypothesis that the processing of center-embedded structures are best explained from a usage-based perspective that emphasizes processing experience with the specific statistical properties of individual languages. Importantly, as we shall see next, such linguistic experience interacts with sequence learning constraints.

#### Sequence Learning Limitations Mirror Constraints on Complex Recursive Structure

Previous studies have suggested that the processing of singly embedded relative clauses are determined by linguistic experience, mediated by sequence learning skills (e.g., Wells et al., 2009; Misyak et al., 2010; see Christiansen and Chater, Forthcoming 2016, for discussion). Can our limited ability to process multiple complex recursive embeddings similarly be shown to reflect constraints on sequence learning? The embedding of multiple complex recursive structures—whether in the form of center-embeddings or cross-dependencies results in several pairs of overlapping non-adjacent dependencies (as illustrated by **Figure 1**). Importantly, the SRN simulation results reported above suggest that a sequence learner might also be able to deal with the increased difficulty associated with multiple, overlapping non-adjacent dependencies.

Dealing appropriately with multiple non-adjacent dependencies may be one of the key defining characteristics of human language. Indeed, when a group of generativists and cognitive linguists recently met to determine what is special about human language (Tallerman et al., 2009), one of the few things they could agree about was that long-distance dependencies constitute one of the hallmarks of human language, and not recursion (contra Hauser et al., 2002). de Vries et al. (2012) used a variation of the AGL-SRT task (Misyak et al., 2010) to determine whether the limitations on processing of multiple non-adjacent dependencies might depend on general constraints on human sequence learning, instead of being unique to language. This task incorporates the structured, probabilistic input of artificial grammar learning (AGL; e.g., Reber, 1967) within a modified two-choice serial reaction-time (SRT; Nissen and Bullemer, 1987) layout. In the de Vries et al.

<sup>5</sup> Importantly, Christiansen and Chater (1999) demonstrated that this prediction is primarily due to intrinsic architectural limitations on the processing on doubly center-embedded material rather than insufficient experience with these constructions. Moreover, they further showed that the intrinsic constraints on center-embedding are independent of the size of the hidden unit layer.

study, participants used the computer mouse to select one of two written words (a target and a foil) presented on the screen as quickly as possible, given auditory input. Stimuli consisted of sequences with two or three non-adjacent dependencies, ordered either using center-embeddings or cross-dependencies. The dependencies were instantiated using a set of dependency pairs that were matched for vowel sounds: ba-la, yo-no, mi-di, and wu-tu. Examples of each of the four types of stimuli are presented in (14–17), where the subscript numbering indicates dependency relationships.


Thus, (14) and (16) implement center-embedded recursive structure and (15) and (17) involve cross-dependencies. Participants would only be exposed to one of the four types of stimuli. To determine the potential effect of linguistic experience on the processing of complex recursive sequence structure, study participants were either native speakers of German (which has center-embedding but not cross-dependencies) or Dutch (which has cross-dependencies). Participants were only exposed to one kind of stimulus, e.g., doubly center-embedded sequences as in (16) in a fully crossed design (length × embedding × native language).

de Vries et al. (2012) first evaluated learning by administering a block of ungrammatical sequences in which the learned dependencies were violated. As expected, the ungrammatical block produced a similar pattern of response slow-down for both for both center-embedded and cross-dependency items involving two non-adjacent dependencies (similar to what Bach et al., 1986, Bach et al., found in the natural language case). However, an analog of the missing verb effect was observed for the centerembedded sequences with three non-adjacencies but not for the comparable cross-dependency items. Indeed, an incorrect middle element in the center-embedded sequences (e.g., where tu is replaced by la in 16) did not elicit any slow-down at all, indicating that participants were not sensitive to violations at this position.

Sequence learning was further assessed using a prediction task at the end of the experiment (after a recovery block of grammatical sequences). In this task, participants would hear a beep replacing one of the elements in the second half of the sequence and were asked to simply click on the written word that they thought had been replaced. Participants exposed to the sequences incorporating two dependencies, performed reasonably well on this task, with no difference between centerembedded and cross-dependency stimuli. However, as for the response times, a missing verb effect was observed for the center-embedded sequences with three non-adjacencies. When the middle dependent element was replaced by a beep in center-embedded sequences (e.g., ba<sup>1</sup> wu<sup>2</sup> yo<sup>3</sup> no<sup>3</sup> <beep> la1), participants were more likely to click on the foil (e.g., la) than the target (tu). This was not observed for the corresponding cross-dependency stimuli, once more mirroring the Bach et al. (1986) psycholinguistic results that multiple cross-dependencies are easier to process than multiple center-embeddings.

Contrary to psycholinguistic studies of German (Vasishth et al., 2010) and Dutch (Frank et al., in press), de Vries et al. (2012) found an analog of the missing verb effect in speakers of both languages. Because the sequence-learning task involved non-sense syllables, rather than real words, it may not have tapped into the statistical regularities that play a key role in reallife language processing<sup>6</sup> . Instead, the results reveal fundamental limitations on the learning and processing of complex recursively structured sequences. However, these limitations may be mitigated to some degree, given sufficient exposure to the "right" patterns of linguistic structure—including statistical regularities involving morphological and semantic cues—and thus lessening sequence processing constraints that would otherwise result in the missing verb effect for doubly centerembedded constructions. Whereas the statistics of German and Dutch appear to support such amelioration of language processing, the statistical make-up of linguistic patterning in English and French apparently does not. This is consistent with the findings of Frank et al. (in press), demonstrating that native Dutch and German speakers show a missing verb effect when processing English (as a second language), even though they do not show this effect in their native language (except under extreme processing load, Trotzke et al., 2013). Together, this pattern of results suggests that the constraints on human processing of multiple long-distance dependencies in recursive constructions stem from limitations on sequence learning interacting with linguistic experience.

#### Summary

In this extended case study, we argued that our ability to process of recursive structure does not rely on recursion as a property of the grammar, but instead emerges gradually by piggybacking on top of domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. We have shown how this perspective can account for the degree to which humans are able to process complex recursive structure in the form of center-embeddings and cross-dependencies. Processing limitations on recursive structure derive from constraints on sequence learning, modulated by our individual native language experience.

We have taken the first steps toward an evolutionarilyinformed usage-based account of recursion, where our recursive

<sup>6</sup>de Vries et al. (2012) did observe a nontrivial effect of language exposure: German speakers were faster at responding to center-embedded sequences with two non-adjacencies than to the corresponding cross-dependency stimuli. No such difference was found for the Germans learning the sequences with three nonadjacent dependencies, nor did the Dutch participants show any responsetime differences across any of the sequence types. Given that center-embedded constructions with two dependencies are much more frequent than with three dependencies (see Karlsson, 2007, for a review), this pattern of differences may reflect the German participants' prior linguistic experience with center-embedded, verb-final constructions.

abilities are acquired piecemeal, construction by construction, in line with developmental evidence. This perspective highlights the key role of language experience in explaining cross-linguistic similarities and dissimilarities in the ability to process different types of recursive structure. And although, we have focused on the important role of sequence learning in explaining the limitations of human recursive abilities, we want to stress that language processing, of course, includes other domain-general factors. Whereas distributional information clearly provides important input to language acquisition and processing, it is not sufficient, but must be complemented by numerous other sources of information, from phonological and prosodic cues to semantic and discourse information (e.g., Christiansen and Chater, 2008, Forthcoming 2016). Thus, our account is far from complete but it does offer the promise of a usage-based perspective of recursion based on evolutionary considerations.

# Language without a Language Faculty

In this paper, we have argued that there are theoretical reasons to suppose that special-purpose biological machinery for language can be ruled out on evolutionary grounds. A possible countermove adopted by the minimalist approach to language is to suggest that the faculty of language is very minimal and only consists of recursion (e.g., Hauser et al., 2002; Chomsky, 2010). However, we have shown that capturing human performance on recursive constructions does not require an innate mechanism for recursion. Instead, we have suggested that the variation in processing of recursive structures as can be observed across individuals, development and languages is best explained by domain-general abilities for sequence learning and processing interacting with linguistic experience. But, if this is right, it becomes crucial to provide explanations for the puzzling aspects of language that were previously used to support the case for a rich innate language faculty: (1) the poverty of the stimulus, (2) the eccentricity of language, (3) language universals, (4) the source of linguistic regularities, and (5) the uniqueness of human language. In the remainder of the paper, we therefore address each of these five challenges, in turn, suggesting how they may be accounted for without recourse to anything more than domain-general constraints.

### The Poverty of the Stimulus and the Possibility of Language Acquisition

One traditional motivation for postulating an innate language faculty is the assertion that there is insufficient information in the child's linguistic environment for reliable language acquisition to be possible (Chomsky, 1980). If the language faculty has been pared back to consist only of a putative mechanism for recursion, then this motivation no longer applies—the complex patterns in language which have been thought to pose challenges of learnability concern highly specific properties of language (e.g., concerning binding constraints), which are not resolved merely by supplying the learner with a mechanism for recursion.

But recent work provides a positive account of how the child can acquire language, in the absence of an innate language faculty, whether minimal or not. One line of research has shown, using computational results from language corpora and mathematical analysis, that learning methods are much more powerful than had previously been assumed (e.g., Manning and Schütze, 1999; Klein and Manning, 2004; Chater and Vitányi, 2007; Hsu et al., 2011, 2013; Chater et al., 2015). But more importantly, viewing language as a culturally evolving system, shaped by the selectional pressures from language learners, explains why language and languages learners fit together so closely. In short, the remarkable phenomenon of language acquisition from a noisy and partial linguistic input arises from a close fit between the structure of language and the structure of the language learner. However, the origin of this fit is not that the learner has somehow acquired a special-purpose language faculty embodying universal properties of human languages, but, instead, because language has been subject to powerful pressures of cultural evolution to match, as well as possible, the learning and processing mechanism of its speakers (e.g., as suggested by Reali and Christiansen's, 2009, simulations). In short, the brain is not shaped for language; language is shaped by the brain (Christiansen and Chater, 2008).

Language acquisition can overcome the challenges of the poverty of the stimulus without recourse to an innate language faculty, in light both of new results on learnability, and the insight that language has been shaped through processes of cultural evolution to be as learnable as possible.

## The Eccentricity of Language

Fodor (1983) argue that the generalizations found in language are so different from those evident in other cognitive domains, that they can only be subserved by highly specialized cognitive mechanisms. But the cultural evolutionary perspective that we have outlined here suggests, instead, that the generalizations observed in language are not so eccentric after all: they arise, instead, from a wide variety of cognitive, cultural, and communicative constraints (e.g., as exemplified by our extended case study of recursion). The interplay of these constraints, and the contingencies of many thousands of years of cultural evolution, is likely to have resulted in the apparently baffling complexity of natural languages.

### Universal Properties of Language

Another popular motivation for proposing an innate language faculty is to explain putatively universal properties across all human languages. Such universals can be explained as consequences of the innate language faculty—and variation between languages has often been viewed as relatively superficial, and perhaps as being determined by the flipping of a rather small number of discrete "switches," which differentiate English, Hopi and Japanese (e.g., Lightfoot, 1991; Baker, 2001; Yang, 2002).

By contrast, we see "universals" as products of the interaction between constraints deriving from the way our thought processes work, from perceptuo-motor factors, from cognitive limitations on learning and processing, and from pragmatic sources. This view implies that most universals are unlikely to be found across all languages; rather, "universals" are more akin to statistical trends tied to patterns of language use. Consequently, specific universals fall on a continuum, ranging from being attested to only in some languages to being found across most languages. An example of the former is the class of implicational universals, such as that verb-final languages tend to have postpositions (Dryer, 1992), whereas the presence of nouns and verbs (minimally as typological prototypes; Croft, 2001) in most, though perhaps not all (Evans and Levinson, 2009), languages is an example of the latter.

Individual languages, on our account, are seen as evolving under the pressures from multiple constraints deriving from the brain, as well as cultural-historical factors (including language contact and sociolinguistic influences), resulting over time in the breathtaking linguistic diversity that characterize the about 6– 8000 currently existing languages (see also Dediu et al., 2013). Languages variously employ tones, clicks, or manual signs to signal differences in meaning; some languages appear to lack the noun-verb distinction (e.g., Straits Salish), whereas others have a proliferation of fine-grained syntactic categories (e.g., Tzeltal); and some languages do without morphology (e.g., Mandarin), while others pack a whole sentence into a single word (e.g., Cayuga). Cross-linguistically recurring patterns do emerge due to similarity in constraints and culture/history, but such patterns should be expected to be probabilistic tendencies, not the rigid properties of a universal grammar (Christiansen and Chater, 2008). From this perspective it seems unlikely that the world's languages will fit within a single parameterized framework (e.g., Baker, 2001), and more likely that languages will provide a diverse, and somewhat unruly, set of solutions to a hugely complex problem of multiple constraint satisfaction, as appears consistent with research on language typology (Comrie, 1989; Evans and Levinson, 2009; Evans, 2013). Thus, we construe recurring patterns of language along the lines of Wittgenstein's (1953) notion of "family resemblance": although there may be similarities between pairs of individual languages, there is no single set of features common to all.

#### Where do Linguistic Regularities Come From?

Even if the traditional conception of language universals is too strict, the challenge remains: in the absence of a language faculty, how can we explain why language is orderly at all? How is it that the processing of myriads of different constructions have not created a chaotic mass of conflicting conventions, but a highly, if partially, structured system linking form and meaning?

The spontaneous creation of tracks in a forest provides an interesting analogy (Christiansen and Chater, in press). Each time an animal navigates through the forest, it is concerned only with reaching its immediate destination as easily as possible. But the cumulative effect of such navigating episodes, in breaking down vegetation and gradually creating a network of paths, is by no means chaotic. Indeed, over time, we may expect the pattern of tracks to become increasingly ordered: kinks will be become straightened; paths between ecological salient locations (e.g., sources of food, shelter or water) will become more strongly established; and so on. We might similarly suspect that language will become increasingly ordered over long periods of cultural evolution.

We should anticipate that such order should emerge because the cognitive system does not merely learn lists of lexical items and constructions by rote; it generalizes from past cases to new cases. To the extent that the language is a disordered morass of competing and inconsistent regularities, it will be difficult to process and difficult to learn. Thus, the cultural evolution of language, both within individuals and across generations of learners, will impose a strong selection pressure on individual lexical items and constructions to align with each other. Just as stable and orderly forest tracks emerge from the initially arbitrary wanderings of the forest fauna, so an orderly language may emerge from what may, perhaps, have been the rather limited, arbitrary and inconsistent communicative system of early "proto-language." In particular, for example, the need to convey an unlimited number of messages will lead to a drive to recombine linguistic elements is systematic ways, yielding increasingly "compositional" semantics, in which the meaning of a message is associated with the meaning of its parts, and the way in which they are composed together (e.g., Kirby, 1999, 2000).

#### Uniquely Human?

There appears to be a qualitative difference between communicative systems employed by non-human animals, and human natural language: one possible explanation is that humans, alone, possess an innate faculty for language. But human "exceptionalism" is evident in many domains, not just in language; and, we suggest, there is good reason to suppose that what makes humans special concerns aspect of our cognitive and social behavior, which evolved prior to the emergence of language, but made possible the collective construction of natural languages through long processes of cultural evolution.

A wide range of possible cognitive precursors for language have been proposed. For example, human sequence processing abilities for complex patterns, described above, appear significantly to outstrip processing abilities of non-human animals (e.g., Conway and Christiansen, 2001). Human articulatory machinery may be better suited to spoken language than that of other apes (e.g., Lieberman, 1968). And the human abilities to understand the minds of others (e.g., Call and Tomasello, 2008) and to share attention (e.g., Knoblich et al., 2011) and to engage in joint actions (e.g., Bratman, 2014), may all be important precursors for language.

Note, though, that from the present perspective, language is continuous with other aspects of culture—and almost all aspects of human culture, from music and art to religious ritual and belief, moral norms, ideologies, financial institutions, organizations, and political structures are uniquely human. It seems likely that such complex cultural forms arise through long periods of cultural innovation and diffusion, and that the nature of such propagation depends will depend on a multitude of historical, sociological, and, most likely, a host of cognitive factors (e.g., Tomasello, 2009; Richerson and Christiansen, 2013). Moreover, we should expect that different aspects of cultural evolution, including the evolution of language, will be highly interdependent. In the light of these considerations, once the presupposition that language is sui generis and rooted in a genetically-specified language faculty is abandoned, there seems little reason to suppose that there will be a clear-cut answer

concerning the key cognitive precursors for human language, any more than we should expect to be able to enumerate the precursors of cookery, dancing, or agriculture.

### Language as Culture, Not Biology

Prior to the seismic upheavals created by the inception of generative grammar, language was generally viewed as a paradigmatic, and indeed especially central, element of human culture. But the meta-theory of the generative approach was taken to suggest a very different viewpoint: that language is primarily a biological, rather than a cultural, phenomenon: the knowledge of the language was seen not as embedded in a culture of speakers and hearers, but primarily in a genetically-specified language faculty.

We suggest that, in light of the lack of a plausible evolutionary origin for the language faculty, and a re-evaluation of the

### References


evidence for even the most minimal element of such a faculty, the mechanism of recursion, it is time to return to viewing language as a cultural, and not a biological, phenomenon. Nonetheless, we stress that, like other aspects of culture, language will have been shaped by human processing and learning biases. Thus, understanding the structure, acquisition, processing, and cultural evolution of natural language requires unpicking how language has been shaped by the biological and cognitive properties of the human brain.

#### Acknowledgments

This work was partially supported by BSF grant number 2011107 awarded to MC (and Inbal Arnon) and ERC grant 295917- RATIONALITY, the ESRC Network for Integrated Behavioural Science, the Leverhulme Trust, Research Councils UK Grant EP/K039830/1 to NC.


of Language: Social Function and the Origins of Linguistic Form, ed C. Knight (Cambridge: Cambridge University Press), 303–323.


monkeys. J. Neurosci. 33, 18825–18835. doi: 10.1523/JNEUROSCI.2414- 13.2013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Christiansen and Chater. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Simplicity and Specificity in Language: Domain-General Biases Have Domain-Specific Effects

#### Jennifer Culbertson\* and Simon Kirby

Language Evolution and Computation Research Unit, Linguistics and English Language, University of Edinburgh, Edinburgh, UK

The extent to which the linguistic system—its architecture, the representations it operates on, the constraints it is subject to—is specific to language has broad implications for cognitive science and its relation to evolutionary biology. Importantly, a given property of the linguistic system can be "specific" to the domain of language in several ways. For example, if the property evolved by natural selection under the pressure of the linguistic function it serves then the property is domain-specific in the sense that its design is tailored for language. Equally though, if that property evolved to serve a different function or if that property is domain-general, it may nevertheless interact with the linguistic system in a way that is unique. This gives a second sense in which a property can be thought of as specific to language. An evolutionary approach to the language faculty might at first blush appear to favor domain-specificity in the first sense, with individual properties of the language faculty being specifically linguistic adaptations. However, we argue that interactions between learning, culture, and biological evolution mean any domain-specific adaptations that evolve will take the form of weak biases rather than hard constraints. Turning to the latter sense of domain-specificity, we highlight a very general bias, simplicity, which operates widely in cognition and yet interacts with linguistic representations in domain-specific ways.

#### Edited by:

N. J. Enfield, University of Sydney, Australia

#### Reviewed by:

Carla Hudson Kam, University of British Columbia, Canada Maryia Fedzechkina, University of Pennsylvania, USA

#### \*Correspondence: Jennifer Culbertson jennifer.culbertson@ed.ac.uk

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 31 August 2015 Accepted: 07 December 2015 Published: 12 January 2016

#### Citation:

Culbertson J and Kirby S (2016) Simplicity and Specificity in Language: Domain-General Biases Have Domain-Specific Effects. Front. Psychol. 6:1964. doi: 10.3389/fpsyg.2015.01964 Keywords: language evolution, domain-specificity, simplicity, typological universals, compositionality, word order, regularization

# INTRODUCTION

One of the fundamental issues in cognitive science is the extent to which specifically linguistic mechanisms and representations underpin our knowledge of language and the way it is learned. This is in part because this issue has deep implications for the underlying uniqueness of a system we typically consider exclusive to humans. It has also been highly divisive in the sense that researchers from distinct traditions often have polar starting assumptions as to the likelihood of domainspecific properties of the language system. Here we will suggest that there are in fact (at least) two ways in which a given feature of the linguistic system may be considered to have domain-specific properties:

(1) If that feature evolved by natural selection under the pressure of the linguistic function it serves.

(2) If that feature is domain-general but interacts with the linguistic system and its representations in a way that is unique.

These two types of domain-specificity are quite different in terms of their implications for the evolution of language, and below we will discuss a set of results from computational models suggesting that domain-specificity of the first kind is unlikely to take the form of hard constraints on the linguistic system. Rather, if such constraints exist, they are likely to be weak biases, amplified through cultural evolution. This has important implications for linguistic theory, since, as we discuss below, many mainstream frameworks explicitly argue for hard domain-specific constraints and reject the notion of weak bias. The second type of domain-specificity, on the other hand, is likely to be widespread, and highlights the importance of collaborative efforts between experts in linguistic theory—who study the architecture and representations of language—and experts studying cognition across domains and species.

# DOMAIN-SPECIFICITY AND EVOLUTION

In this section, we focus on the first sense of domain-specificity set out above, which interprets the issue in functional terms. This is perhaps the most obvious sense in which a particular aspect of the cognitive system might be specific to language, and it is the one which places a heavier burden on biological evolution. Importantly, it is the ultimate rather than proximate function that is relevant here; knowing that some feature of the cognitive system is used in processing or acquiring language is not, in and of itself, an argument for domain-specificity. We can no more argue that such a feature is language specific because it is active in language processing than we can argue for an aspect of cognition being chess-specific simply because it is active in the brain of a chess player. Rather, we need to consider the ultimate function of the cognitive architecture in question by looking to its evolutionary history. An aspect of our cognitive architecture is specific to language if it arose as an adaptive response to the problem of learning or using language<sup>1</sup> .

This argument places evolution right at the core of the question of the existence of language-specific features of our cognitive architecture. While some cross-species comparative data exist to help us trace the functional sources of various cognitive capacities (see Fitch, 2010 for review), these data are limited by the degree to which the relevant aspects of language are autapomorphies (completely novel traits that are not found in any other species). Recent research has turned to computational modeling to provide a more direct testing ground for specific hypotheses about how the capacities involved in language may have evolved. In particular, a number of papers have looked at whether domain-specific hard constraints on language can evolve from a prior stage where biases were less strong or not present at all (e.g., Kirby and Hurford, 1997; Briscoe, 2000; Smith and Kirby, 2008; Chater et al., 2009; Thompson, 2015). This is important, since many linguistic theories conceive of the language capacity as including a set of constraints of this kind: for example, Biberauer et al. (2014), working in the Minimalist framework (Chomsky, 1993), argue for a constraint which places a hard (inviolable) restriction on the distribution of the feature triggering movement (they call it the "Final-Over-Final" constraint, in a nod to the structural description of word orders the constraint rules out). Similarly, in Optimality Theory (Prince and Smolensky, 1993/2004), although a particular constraint may be violated in a given language, the standard mechanism for explaining typological data is to restrict the set of constraints. For example Culbertson et al. (2013) describe an OT grammar for word order in the noun phrase which completely rules out particular patterns by using a limited set of so-called alignment constraints (see also Steddy and Samek-Lodovici, 2011).

To investigate how hard domain-specific constraints of this type might evolve, Chater et al. (2009) describe a simulation of a population of language-learning agents. The genes of these agents specify whether learning of different aspects of language is tightly constrained or highly flexible. Agents in the simulation that successfully communicate are more likely to pass on their genes to future generations. The question that Chater et al. (2009) ask is whether genes encoding constraints evolve in populations which start out highly flexible under the selection pressure for communication. If they do, then this would support a language faculty in which language acquisition is constrained by domain-specific principles. This process, whereby traits that were previously acquired through experience become nativised, is known as the Baldwin Effect (Baldwin, 1896; Maynard Smith, 1986; Hinton and Nowlan, 1987), and a number of authors have suggested it played a role in the evolution of the language faculty (Kirby and Hurford, 1997; Jackendoff, 2002; Turkel, 2002). However, Chater et al. (2009) argue that the fact that languages change over time makes the situation of language evolution quite different from that of other learned traits. In their simulations, if the rate of language change is high enough, it is impossible for genetic evolution to keep up–language presents a moving target, and domain-specific constraints cannot evolve.

Chater et al.'s (2009) model is a critique of a particular view of the language faculty in which hard innate constraints are placed on the form languages can take. Because of this they do not model a scenario in which the strength of bias is allowed to evolve freely (although they do show that their model gives similar results whether genes encode hard constraints, or very strong biases). However, there is growing support for a more nuanced view of language acquisition in which learners have biases that come in a range of strengths (e.g., Morgan et al., 1989; Wilson, 2006; Hudson Kam and Newport, 2009; Smith and Wonnacott, 2010; Culbertson and Smolensky, 2012; Culbertson et al., 2013; Chater et al., 2015). If the genes underpinning the language faculty were able to specify everything from a very weak bias all the way to a hard constraint, then perhaps this would allow evolution to take a gradual path from an unbiased learner to a strongly-constraining, domain-specific language faculty. To find out if this is the case, we need a model that shows how bias strength affects the nature of the languages that emerge in a population.

The iterated learning model (Kirby et al., 2007) starts from the observation that the way languages evolve culturally is

<sup>1</sup>Note this is true even if we then happen to use this aspect of our cognitive system for other, additional purposes. The fact that we use our language faculty for solving crosswords does not constitute an argument against domain-specificity of that faculty.

driven by the way in which languages are learned<sup>2</sup> . This model of cultural evolution suggests that the languages spoken by a population will not necessarily directly reflect the learning biases of that population (**Figure 1**). In particular, in many cases, cultural evolution will tend to amplify weak learning biases. This has important implications for how constraints on the language faculty actually come to be reflected in properties of language. For example, the observation that some property of language is universally, or near universally, present in language is not sufficient for us to infer that there is a corresponding strong constraint in our language faculty. Indeed, if Kirby et al. (2007) are correct, then the strength of any constraint in the language faculty may be unrelated to the strength of reflection of that constraint cross-linguistically. Weak learning biases may be sufficient to give rise to exceptionless, or near exceptionless, universals.

Smith and Kirby (2008) examine the implications of iterated learning for the biological evolution of the language faculty. Their simulation explicitly models three processes involved in the origins of linguistic structure: individual learning of languages from data; cultural evolution of languages in a population through iterated learning; and biological evolution of learning biases themselves. They show that neither hard constraints nor strong biases emerge from the evolutionary process even when agents are being selected for their ability to communicate using a shared language. This is a consequence of the amplifying effect of cultural evolution; the fitness of an organism is not derived directly from that organism's genes, but rather from the organism's phenotype. In the case of language evolution this is the actual language an individual has learned. If weak learning biases are amplified by cultural evolution, then the difference between a weak bias and a hard constraint is neutralized: both can lead to strong effects on the distribution of languages. What this means is that iterated learning effectively masks the genes underpinning the language faculty from the view of natural selection. They are free to drift; strongly-constraining domain-specific constraints on language learning are likely to be lost due to mutation, or not arise in the first place (see also, Thompson, 2015 for a detailed analysis of the evolutionary dynamics in this case).

Taken together these modeling results show that domainspecific hard constraints on language learning are unlikely to evolve, because languages change too fast (Chater et al., 2009) and because cultural evolution amplifies the effect of weak biases (Kirby et al., 2007). However, the results of this latter model suggest a further conclusion: weak biases for language learning are more evolvable by virtue of cultural evolution's amplifying effect. Any tiny change from neutrality in learning can lead to big changes in the language that the population uses. Just as culture masks the strength of bias from the view of natural selection, it unmasks non-neutrality. We argue that linguists should not shy away from formulating domain-specific aspects of the language faculty in terms of weak, defeasible biases. This is the type of language faculty that is most likely to evolve.

Although we propose that strong domain-specific biases on language should be avoided on evolutionary grounds, this does not mean that strong domain-general biases are impossible. These may be the result of very general architectural or computational considerations that govern the way cognition operates, for example (falling under the third of Chomsky's, 2005 three factors in language design). Equally, the way we learn

#### FIGURE 1 | The link between genes and the universal properties of language is mediated by development and cultural transmission. The extent to which these two processes have non-trivial dynamics is an important consideration when proposing evolutionary accounts of language. Fitness does not depend directly on the genes underpinning the language faculty, but rather the linguistic phenotype (i.e., languages). This opens up the possibility for development and cultural transmission to shield genetic variation from the view of natural selection (Figure adapted from Kirby et al., 2007). © 2007 by The National Academy of Sciences of the USA.

<sup>2</sup>Our emphasis in this article will be on learning, but there are other mechanisms that operate at the individual level but whose effect is felt at the population level. For example, the way in which hearers process input, and the way in which speakers produce output is likely to have a significant impact. See Kirby (1999) for an extended treatment of precisely how processing and learning interact with cultural transmission to give rise to language universals, and Futrell et al. (2015), Fedzechkina et al. (2012), and Jaeger and Tily (2011) for recent accounts of specific links between processing and language structure. However, the debate about domain generality/specificity plays out differently for processing than for learning, and as such will not be the focus of this review. In particular, here we discuss simplicity as a highly general learning bias that unifies a range of different domains both within and beyond language, and it is not clear that an equivalent notion of simplicity exists for processing.

language might be shaped by relatively strong domain-general biases that arise as a result of evolution for something other than language, for which the amplifying effect of culture does not apply. Biases such as these may nevertheless interact with language and linguistic representations in domain-specific ways. In the next section we will examine a learning bias that is arguably the most domain-general of all—simplicity—and show how its application in a range of different aspects of language leads to domain-specific outcomes.

#### SIMPLICITY

Simplicity has been proposed as a unifying principle of cognitive science (Chater and Vitányi, 2003). The tradition of arguing for a general simplicity bias has a long history in the context of scientific reasoning dating back to William of Occam in the 14th century who stated that we should prefer the simplest explanation for some phenomenon all other things being equal. In other words, when choosing among hypotheses that explain data equally well, the simpler one should be chosen.

This principle can be extended straightforwardly from scientific reasoning to cognitive systems. When faced with an induction problem we must have some way of dealing with the fact that there are many candidate hypotheses that are consistent with the observed data (typically an infinite number). So, for example, in a function learning task how do we interpolate from seen to unseen points when there are an infinite number of possible functions that could relate the two (**Figure 2**). Or, to give a more trivial example, why is it that we assume that the sun will continue to rise every day when there are an infinite range of hypotheses available to us which predict it won't.

Here again the simplicity bias provides an answer by giving us a way to distinguish between otherwise equally explanatory hypotheses. While a full treatment of why simplicity rather than some other bias is the correct way to solve this problem is beyond the scope of this article (accessible introductions are given in Mitchell, 1997; Chater et al., 2015), we can give an intuitive

FIGURE 2 | There are an infinite set of possible functions interpolating from seen points to unseen points in these graphs. Our intuition is that the linear function on the left represents a more reasonable hypothesis than the one on the right, despite the fact that both fit the data perfectly well. In other words, we have prior expectations about what functions are more likely than others. In this case, the prior includes a preference for linearity (cf. Kalish et al., 2007).

flavor in terms of Bayesian inference. According to Bayes rule, induction involves combining the probability distribution over hypotheses defined by the data with a prior probability distribution over these hypotheses. More formally, the best hypothesis, h, for some data, D, will maximize P(D|h)P(h).

$$h\_{\text{best}} = \arg\max\_{h \in H} P(h|D) = \arg\max\_{h \in H} P(D|h)P(h)$$

What can this tell us about simplicity? We can express this equivalently by taking logs of these probabilities. The best hypothesis is the one that minimizes the sum of negative log probabilities of the data given that hypothesis, −log2P(D|h), and the prior probability of the hypothesis itself, −log2P(h).

$$\begin{aligned} h\_{\text{best}} &= \operatorname\*{argmin}\_{h \in H} - \log\_2 P(h|D) \\ &= \operatorname\*{argmin}\_{h \in H} - \log\_2 P(D|h) - \log\_2 P(h) \end{aligned}$$

Information theory (Shannon and Weaver, 1949) tells us that this last quantity, −log2P(h), is the description length of h in bits (assuming an optimal encoding scheme for our space of hypotheses). So, all other things being equal, learners will choose hypotheses that can be described more concisely—hypotheses that are simpler.

Importantly, an information theoretic view of the equation above also suggests learners will prefer representations that provide (to a greater or lesser extent) some compression of the data they have seen. What does this mean for the nature of language? It suggests that languages will be more prevalent to the extent that they are compressible. In general, a language will be compressible if there are patterns within the set of sentences of that language that can be captured by a grammatical description. More precisely, a compressible set of sentences is one whose minimum description length is short. The description length is simply the sum of the length of the grammar (−log2P(h) in the equation above) and the length of the data when described using that grammar (given by the −log2P(D|h) term).

This argument has allowed us to relate our intuitive understanding of simplicity—as a reasonable heuristic in choosing between explanations—to a rational model of statistical inference in a relatively straightforward way. Of course, there are a lot of practical questions that this leaves unanswered. How, for example, can we tell in a given domain what counts as a simpler hypothesis? Unfortunately, there is no computable general measure of complexity (Li and Vitányi, 1997), nevertheless we propose that notions of relative simplicity should guide our search for domain-general biases underpinning phenomena of interest in language.

So, we argue that—whatever other biases learners have when they face some learning problem—they are also likely to be applying an overarching simplicity bias (Chomsky, 1957; Clark, 2001; Brighton, 2002; Kemp and Regier, 2012; Chater et al., 2015).

It is important to note that when we talk about simplicity in the context of language, it is in terms of the overall compressibility of that language, e.g., how much redundancy and systematicity does it exhibit that can be captured simply in a grammatical description, and how much irreducible unpredictability remains in the data. We might also be interested in ways in which languages differ in the length of their utterances, but this is a largely orthogonal issue. Indeed, it is possible for a language with shorter strings to have a longer grammar consider cases of irregular morphology in which regularization might simplify a paradigm at the cost of removal of short irregulars.

The generality of the bias for simplicity suggests there will be many linguistic phenomena affected by it. Below, we discuss cases which have been documented both in linguistic typological and experimental studies, with an emphasis on morphology and syntax (for discussion of experimental findings related to phonological simplicity, see Moreton and Pater, 2012a,b). We will begin with a basic design feature of language—compositionality—that can be characterized by the interaction of simplicity with a competing pressures for expressivity. We then move on to three additional examples of increasingly narrow phenomena: regularization of unconditioned variation, consistent head ordering or word order harmony, and isomorphic mapping from semantic structure to linear order. Each example illustrates a slightly different way in which this domain-general bias interacts with features that are particular to the linguistic domain.

## Compositionality

For our first example we will consider a basic property of language, often called a "design feature" (Hockett, 1960): the compositional nature of the mapping between meanings and forms. Language is arguably unique among naturally occurring communication systems in consisting of utterances whose meaning is a function of the meaning of its sub-parts and the way they are put together. For example, the meaning of the word "stars" is derived from the meaning of the root star combined with the meaning of the plural morpheme -s. Similarly, the meaning of a larger unit like "visible stars" is a function of the meanings of the individual parts of the phrase. Switching the order to "stars visible" changes the meaning of the unit in a predictable way<sup>3</sup> .

This ubiquitous feature of language makes it arguably unique among naturally occurring communication systems, the vast majority—perhaps all—of which are holistic rather than compositional (Smith and Kirby, 2012). The striking divergence from holism that we see in language (above the level of the word) is therefore of great interest to those studying the evolution of language. The fact that human communication is also highly unusual in consisting of learned rather than innate mappings between meanings and signals suggests that relating the origins of compositionality to learning biases is a good place to start in the search for an explanation.

A language that maps meanings onto signals randomly (see **Figure 3A**) will be less compressible—and hence, less simple in our terms—than one which maps them onto signals in a predictable way (see **Figure 3B**). Where both signals and meanings have internal, recombinable structure, then this predictability will be realized as compositional mappings. To see why this is, consider representing language as a transducer relating meanings and signals. The transducer in **Figure 4A** gives the most concise representation of an example holistic language, whereas the transducer in **Figure 4B** gives the most concise representation of an equivalent compositional language in which subparts of the signals map onto subparts of the meanings. What should be immediately apparent is that compositional languages are more compressible.

Brighton (2002) uses this contrast to model the cultural evolution of compositionality in an iterated learning framework (Kirby et al., 2007). Individual agents in their simulation learn transducers to map between a structured set of meanings and signals made up of sequences of elements. Crucially, the learners have a prior bias in favor of simpler transducers. In fact, the prior probability of a particular transducer is inversely related to its coding length in bits in precisely the way outlined in our discussion of simplicity above. Each agent learns their language by observing meaning-signal pairs produced by the previous agent in the simulation, and then goes on to produce meaning-signal pairs for transmission to the next generation. As the language in these simulations is repeatedly learned and reproduced, the bias of the agents in favor of simplicity shapes the evolutionary dynamic. Despite the fact that these models involve no biological evolution, the grammars adapt gradually

<sup>3</sup> In this case, placing the adjective after the noun leads to the interpretation "the stars visible (tonight)." This is a systematic rule of English: post-nominal attributive adjectives are stage-level predicates, denoting temporary properties (Cinque, 1993).



over cultural generations from ones that are random and holistic to ones that are compositional<sup>4</sup> .

This result makes intuitive sense if you think about the process of transmission from the point of view of the emerging rules and regularities in the mapping between meanings and signals. A highly specific feature of the evolving language (e.g., a particular idiosyncratic label for a single meaning, like went as the past tense of GO) will be harder to learn than a generalization over a large number of meanings (e.g., a morpheme, like –ed, that shows up in the signals associated with a wide range of meanings). Particularly if learners only see a subset of all possible meanings, this inevitably leads to a preferential transmission of broader and broader generalizations that apply across large parts of the language. Hurford (2000) puts it pithily, stating "social transmission favors linguistic generalization."

The simplicity bias thus appears to predict one of the fundamental design features of human language. However, things are not quite so straightforward. Consider a language in which every meaning is expressed by the same signal (**Figure 3C**). This degenerate language will be even more compressible than the compositional one, suggesting that a domain-general bias for simplicity is not sufficient to explain the origins of compositional structure. Cornish (2011) argues that in fact all simulations of iterated learning purporting to demonstrate the emergence of compositionality have in some way implemented a constraint that rules out degeneracy. It is simply impossible for the learners in these simulation models to acquire a language that maps many meanings to one signal. Similarly, in the first laboratory analog of these iterated learning simulations, Kirby et al. (2008) report that degenerate languages rapidly evolve over a few generations of human learners.

Kirby et al. (2015) argue that a countervailing pressure for expressivity is required to avoid the collapse of languages in iterated learning experiments to this degenerate end point. The obvious pressure arises not from learning, but from use. If pairs of participants learn an artificial language and then go on to use it in a dyadic interaction task, then there are two pressures on the language in the experiment: a pressure to be compressible arising from participants' domain-general simplicity bias in learning, and a pressure to be expressive arising from participants' use of the language to solve a communicative task. Kirby et al. (2015) show that compositionality only arises when both of these two pressures are in play. In this case then, a domaingeneral bias is only explanatorily adequate once we take into account features of its domain of application. In other words, the case of compositionality illustrates that the simplicity bias is domain-specific in the sense that we cannot understand how it shapes language without also appealing to the special function of language as a system of communication.

#### Regularization

There is converging evidence from multiple strands of research including pidgin/creole studies, sociolinguistics, language acquisition, and computational cognitive science suggesting that language tends to minimize unpredictable or unconditioned variation. Variation can be introduced by non-native speaker errors, contact with speakers of other languages, or in the case of newly emerging languages, variation may reflect a lack of conventionalized grammar. In the latter case, there is evidence that new generations of learners regularize and conventionalize these noisy systems (e.g., Sankoff, 1979; Mühlhäusler, 1986; Meyerhoff, 2000; Senghas and Coppola, 2001). Natural language and laboratory language learning research has further shown that both children and adults learn and reproduce conditioned variation relatively well compared to unpredictable variation (e.g., Singleton and Newport, 2004; Hudson Kam and Newport, 2005, 2009; Smith et al., 2007; Smith and Wonnacott, 2010; Culbertson et al., 2012). For example, Singleton and Newport (2004) report the case of a child acquiring American Sign Language (ASL) from late-learner parents. While the parents' realization of several grammatical features of ASL was variable, the child did not reproduce this variation. Rather, he regularized his parents' variable productions, resulting in a much more consistent system (though in some aspects it differed from ASL). Following up on this finding using an experimental paradigm, Hudson Kam and Newport (2009) report that, when trained on a grammar with unpredictable use of determiners, child learners (and to a lesser extent adults) regularize those determiners, using them according to a consistent rule.

Computational modeling has formalized this in terms of learners' a priori expectations, namely that observed data come from a deterministic generative process (Reali and Griffiths, 2009; Culbertson and Smolensky, 2012; Culbertson et al., 2013).

<sup>4</sup>Brighton (2002) makes the simplicity bias of the learners in his model overt by counting the numbers of bits in the encoding of transducers that generate the data the learners see. However, this does not mean that we necessarily believe that this kind of representation of grammars is necessary for an implementational or algorithmic account of what language we are doing when they learn language. Rather, this is a computational level account in (Marr, 1982) terms. It is an empirical question whether the particular ranking of grammars in terms of simplicity that we can derive from this particular representation matches precisely the ranking that applies in the case of real language learners, but we are confident that the crucial distinction between degenerate < compositional < holistic is correct. This matches behavior of participants in the lab (Kirby et al., 2015) and broadly similar results are found in both connectionist and symbolic models of iterated learning (Kirby and Hurford, 2002; Brace et al., 2015).

This has a natural interpretation in terms of simplicity, since the description of a language that only allows one option in a particular context will be shorter than one that allows multiple variants<sup>5</sup> . More generally, as we've seen already, there's a straightforward relationship between the entropy of the distribution of variants and the coding length of that distribution. More predictable processes can be captured by shorter overall descriptions: they are compressible (Ferdinand, 2015). However, the expectation that the world will be deterministic is to some extent dependent on the domain in question. Most obviously, prior experience in a given domain can override this expectation—e.g., we expect that a coin being tossed will be fair and therefore outcomes will be random (Reali and Griffiths, 2009). In a carefully controlled study comparing learning of unpredictable variation in a linguistic vs. non-linguistic domain, Ferdinand (2015) found that regularization occurs in both domains. However, across a number of conditions manipulating system complexity, the bias is stronger for linguistic stimuli. Regularization thus illustrates a case in which the strength of a bias is domain-specific, perhaps dependent on previous experience and functional pressures relevant to that domain.

While most recent work on regularization focuses on unconditioned or random variation, there is some evidence that even conditioned variation is avoided in language. For example, English is losing its system of irregular (variable) past tense marking in favor of a single rule (add -ed) despite this variation being lexically conditioned (Hooper, 1976). Similarly, while some languages allow widespread lexically or semantically conditioned variation in adjective placement, most languages tend to order them more or less consistently before or after (Dryer, 2013). This can be related straightforwardly to simplicity; a grammar with a single (high-level) rule or constraint applying to all words of a given type is more compressible than one in which different such words must obey different rules. For example, a grammar with a single rule stating that adjectives must always precede nouns is simpler than one which has to specify that certain adjectives precede and others follow.

### Harmony

Interestingly, this reflex of simplicity applies not only to word order within a word class, but also across classes of words. Some of the best known typological universals describe correlations among words orders across different phrase types. For example, Greenberg (1963) lists a set of universals, collated from a sample of 30 languages, including the following:

Universal 2: In languages with prepositions, the genitive almost always follows the governing noun, while in languages with postpositions it almost always precedes.

Universal 18: When the descriptive adjective precedes the noun, the demonstrative and the numeral, with overwhelmingly more than chance frequency, do likewise.

These universals are part of the evidence for word order harmony—the tendency for a certain class of words to appear in a consistent position, either first or last, across different phrase types in a given language (Greenberg, 1963; Chomsky, 1981; Hawkins, 1983; Travis, 1984; Dryer, 1992; Baker, 2001; for experimental evidence see Culbertson et al., 2012; Culbertson and Newport, 2015). At its root, this is just an extension of the same very general statement of within-category order consistency. However, absent a notion of what ties certain categories of words together, the connection between harmony and simplicity remains opaque. For example, the two universals quoted above make reference to a single category—noun—and how it is ordered relative to a number of other categories. Based on syntactic class alone, simplicity predicts that nouns should be ordered consistently relative to all these other categories. This is, of course, the wrong prediction; Universal 2 actually says that the order of nouns relative to adpositions is the opposite of the order of nouns relative to genitives. While adpositions and genitives thus tend to appear on different sides of the noun, it turns out that adjectives, demonstratives, and numerals often pattern with genitives (note that English is a counterexample). These tendencies are exemplified in (3).

3) a. Preposition N {Adj, Num, Dem, Gen} b. {Adj, Num, Dem, Gen} N Postposition

To make sense of this, we need a notion that connects adpositions as they relate to nouns, with nouns as they relate to the other categories. The most popular such notion provided by linguistic theory is the head-dependent relation. In this example, the noun is a head with respect to nominal modifiers—including genitive phrases, adjectives, numerals, and demonstratives. By contrast, the noun is a dependent in an adpositional construction. When stated in this way, harmony falls out: in the world's languages, there is a tendency for heads to consistently precede or follow their dependents. The former type is often called head-initial, the latter head-final. Coming back to simplicity then, a language which has a single high-level rule stating that heads either precede or follow their dependents is simpler than one which has specific ordering rules for heads in distinct phrase types. Simplicity therefore predicts that the more specific rules a grammar has, the less likely it should be.

Importantly, a clear understanding of whether this prediction is borne out depends on the precise definition of the relevant relation between word categories. This turns out to be controversial. For example, particular theories differ in what is deemed to be a head, and whether "head" is in fact the relevant notion at all (Hawkins, 1983; Zwicky, 1985; Hudson, 1987; Dryer, 1992; Corbett et al., 1993). Dryer (1992) provides typological evidence that head order does not correlate across all phrase types. For example, he reports that the order of verb (head) and object (dependent) correlates with the order of preposition (head) and noun (dependent) within a language, but not with noun (head) and adjective (dependent) order. This is unexpected if the simplicity bias is indeed based on head-dependent order. He therefore argues that a different notion, related to the average length or complexity of particular phrase types, must be used in order to see that languages do indeed prefer higher-level

<sup>5</sup>Note that this requires taking into account the simplicity of the generating grammar and the simplicity (compressibility) of the data. A grammar which allows free variation may be simpler than a grammar which generates conditioned variation, however the random data produced by the former grammar is not compressible.

rules governing order across multiple phrase types. Regardless of whether Dryer's precise formulation is correct, what this suggests is that merely stating that simplicity is a factor in determining word order does not allow us to determine which grammars are in fact the simplest. In order to do this, we need a theory of linguistic representations which tells us which should be treated as parallel and in what contexts.

From the perspective of the learner, there is also a clear sense in which the simplicity bias as it relates to word order harmony depends on linguistic representations. Given three words, in the absence of any knowledge about the relations between and among them, there is no way simplicity can be used by a learner to make inferences about likely orderings. These representations must be present (e.g., learned) before a simplicity bias can be active. How and when they develop—i.e., when particular syntactic categories are differentiated, when abstract higher-level categories like head develop, etc.—will dictate how simplicity impacts learners' inferences.

## Isomorphic Mapping

The relation between word order and semantic interpretation in a number of domains also appears to be affected by a simplicity bias. For example, Greenberg's (1963) Universal 18 describes how nominal modifiers are ordered relative to the noun. Universal 20 builds on this, describing how those modifiers tend to be ordered relative to one another.

Universal 20 (as restated by Cinque, 2005):

In pre-nominal position the order of demonstrative, numeral, and adjective (or any subset thereof) is Dem-Num-Adj.

In post-nominal position the order is either Dem-Num-Adj or Adj-Num-Dem.

Interestingly, while both post-nominal orders are indeed possible, addition typological work since Greenberg (1963) indicates that the second order is much more common. In fact, Dem-Num-Adj-N, and N-Adj-Num-Dem are the two most common orders found in the world's languages by far. Part of this is likely due to the harmony bias described above; assuming nominal modifiers are covered by the relevant notion of dependent, these two orders are harmonic, while alternative possibilities are not (e.g., Dem-Num-N-Adj). However, harmony does not explain why N-Adj-Num-Dem would be more common than N-Dem-Num-Adj. An explanation of this difference depends on how syntax–specifically, linearization—interacts with underlying semantic structure.

Several theoretical lines of research converge on a universal semantic representation of these modifiers and their relation to the noun. On one view, this representation reflects iconicity of relations (Rijkhoff, 2004). For example, adjectives modify inherent properties of nouns, numerals count those larger units, and demonstratives connect those countable units to the surrounding discourse. This describes a nesting representation as in **Figure 5A**. Research in formal linguistics further suggests a hierarchical relation between these elements in terms of semantic combination, illustrated in **Figure 5B**. Crucially, these abstract relations are preserved in linear orders that have the adjective closest to the noun and the demonstrative most peripheral—orders that can be read directly off **Figure 5A**. Notice that N-Adj-Num-Dem is one such order, while N-Dem-Num-Adj is not (the modifiers must be swapped around to get this order). Recent laboratory studies suggest a corresponding cognitive bias, in favor of isomorphic mappings between nominal semantics and linear order (Culbertson and Adger, 2014). Typological frequency differences in this domain can be therefore be much better explained once we take into account the underlying semantic structure and an isomorphism bias.

This is not the only case of isomorphic mappings from semantics to linear order, indeed perhaps the most wellknown case is the mirror principle in the domain of verbal inflection (Baker, 1985; Bybee, 1985; Rice, 2000). Languages tend to order inflectional morphemes like tense and aspect in a way that reflects semantic composition, as shown in **Figure 6**<sup>6</sup> .

Biases in favor of isomorphism between semantics and linear order can again be reduced to a general simplicity bias. In very general terms, more transparent or predictable relations between order and meaning are simpler than ones with extra arbitrary

<sup>6</sup> Interestingly, the acquisition of semantics literature provides a related observation. Musolino et al. (2000) show that when asked to interpret ambiguous sentences with quantificational elements, children strongly prefer the interpretation that corresponds to the surface syntactic position of those elements. For example, the sentence "Every horse didn't jump over the fence," could involve every taking scope over not (meaning no horses jumped over the fence), or not scoping over every (meaning not every horse jumped over the fence). The first interpretation is isomorphic to the linear order, and this is the interpretation preferred by young children (see also Musolino and Lidz, 2003).

common orders can be read off directly.

stipulations. Brighton and Kirby (2006) show that isomorphic<sup>7</sup> mappings between signals and meanings arise naturally from iterated learning under general simplicity considerations. Put in more precise terms, to derive surface order from semantics, each branch of the hierarchical structure (or each rectangle in the nested schematic) in the figure above represents a choice point for linearization. For isomorphic orders, that is all that is required: N-Adj-Num-Dem means choosing (1) Adj after N, (2) Num after [N-Adj], and (3) Dem after [N-Adj-Num]. Similarly, a non-harmonic but isomorphic order like Dem-Num-N-Adj is (1) Adj after N, (2) Num before [N-Adj], and (3) Dem before [Num-N-Adj]. By contrast, non-isomorphic orders require additional choice points or rules. N-Dem-Num-Adj, for example, cannot be derived from the semantic hierarchy alone the simplest route is Dem-Num-Adj-N (three choice points) plus one addition rule placing N first. The isomorphism bias again illustrates that the notion of simplicity, however general, must be formulated with reference to specific hypotheses about the domain in question—here, about conceptual iconicity or formal compositional semantics.

# CONCLUSION

There is little doubt that the language faculty includes capacities and constraints that are domain-general or co-opted from other cognitive systems. Whether it also includes domainspecific features is both less clear, and more likely to split along philosophical lines; traditionally, generative linguistics has argued for a Universal Grammar containing (among other things) linguistically contentful principles that place hard constraints on what is learnable. We have suggested, based on results obtained using computational models of language evolution, that domain-specific hard constraints are much less likely to have evolved than weak biases. This is essentially because the cultural evolution of language exerts cognitionexternal pressures that mean linguistic phenotypes no longer directly reflect the underlying genotype. The strength of

any particular bias is underdetermined by the cross-linguistic distribution of language types. At the same time, these cognitionexternal pressures allow weak genetically-encoded biases to have potentially large typological effects. While this does not categorically rule out the existence of very strong (or inviolable) biases that have evolved specifically for language, it clearly suggests we should not treat them as the default hypothesis. The idea that weak biases for language-specific structures or patterns are more likely is in line with recent trends in linguistics. Researchers in phonology and syntax have begun using formal models which encode probabilistic biases in order to better capture empirical data from typology and learning (e.g., Hayes and Wilson, 2008; Pater, 2009; Culbertson et al., 2013; White, 2014).

Regardless of whether the language faculty contains domainspecific capacities, the representations which make up our linguistic knowledge, and the function of language as a system of communication means that domain-general capacities will interact with language in unique ways. This is most convincingly illustrated by looking at an uncontroversially general bias: the bias in favor of representational simplicity. The examples we have discussed here show that a simplicity bias is reflected in a range of language universals that cut across very different aspects of the linguistic system: compositionality, regularity, harmony, and isomorphism. In each case, the simplicity bias interacts with linguistic representations to give rise to domain-specific effects. In the case of compositionality, simplicity interacts with the major unique function of language as a communication system that must be expressive. It is only via the interaction of these two pressures that compositional systems will emerge. The regularization bias, which describes the established finding that language learners tend to reduce random or unconditioned variation, shows domain-specific effects in terms of its strength. Word order harmony, the tendency for languages to order heads consistently before or after dependents, depends crucially on a language- and even theory-specific notion of the relevant categories. Finally, the notion of isomorphism between semantic or conceptual structure and surface word order crucially requires an articulated hypothesis about the specific semantic relations among dependent elements.

In all these cases, distinct hypotheses about linguistic categories, their representations, and how they relate to one another will make distinct predictions about how simplicity is cashed out. This means that an understanding of language, how it is learned, and how it evolved will necessarily require input from linguists formulating theories of the architecture and representations of language. The fact the many aspects of the capacity for language also come from broader cognition means linguists in turn must take into account findings from research on other cognitive domains, and indeed on related capacities in other species.

# REFERENCES

Baldwin, J. M. (1896). A new factor in evolution (continued). Am. Nat. 30, 536–553. doi: 10.1086/276428

<sup>7</sup>These authors use the term "topographic" rather than "isomorphic" because of similarity to the neuroanatomical organizing principle of topographic maps. For our purposes the terms are interchangeable, since both give rise to the property that neighboring representations in one domain map to neighboring representations in the other.

Baker, M. (1985). The mirror principle and morphosyntactic explanation. Linguist. Inq. 16, 373–415.

Baker, M. (2001). The Atoms of Language: The Mind's Hidden Rules of Grammar. New York, NY: Basic Books.


Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht: Foris.


Chomsky, N. (1957). Syntactic Structures. New York, NY: Mouton de Gruyter.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Culbertson and Kirby. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The linguistic roots of natural pedagogy

#### *Otávio Mattos1\* and Wolfram Hinzen1,2,3*

*<sup>1</sup> Grammar and Cognition Lab, Departament de Lingüística General, Universitat de Barcelona, Barcelona, Spain, <sup>2</sup> Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain, <sup>3</sup> Department of Philosophy, University of Durham, Durham, UK*

Natural pedagogy is a human-specific capacity that allows us to acquire cultural information from communication even before the emergence of the first words, encompassing three core elements: (i) a sensitivity to ostensive signals like eye contact that indicate to infants that they are being addressed through communication, (ii) a subsequent referential expectation (satisfied by the use of declarative gestures) and (iii) a biased interpretation of ostensive-referential communication as conveying relevant information about the referent's kind (Csibra and Gergely, 2006, 2009, 2011). Remarkably, the link between natural pedagogy and another human-specific capacity, namely language, has rarely been investigated in detail. We here argue that children's production and comprehension of declarative gestures around 10 months of age are in fact expressions of an evolving faculty of language. Through both declarative gestures and ostensive signals, infants can assign the roles of third,second, and first person, building the 'deictic space' that grounds both natural pedagogy and language use. Secondly, we argue that the emergence of two kinds of linguistic structures (i.e., protodeterminer phrases and proto-sentences) in the one-word period sheds light on the different kinds of information that children can acquire or convey at different stages of development (namely, generic knowledge about kinds and knowledge about particular events/actions/state of affairs, respectively). Furthermore, the development of nominal and temporal reference in speech allows children to cognize information in terms of spatial and temporal relations. In this way, natural pedagogy transpires as an inherent aspect of our faculty of language, rather than as an independent adaptation that predates language in evolution or development (Csibra and Gergely, 2006). This hypothesis is further testable through predictions it makes on the different linguistic profiles of toddlers with developmental disorders.

Keywords: language development, natural pedagogy, pointing, child communication, learning from communication, declarative gestures, concepts, knowledge about kinds

# Introduction

In an article dedicated to explore some core similarities and differences between humans and nonhuman apes, Tomasello and Herrmann (2010) argue that our species have "more sophisticated cognitive skills for dealing with the social world in terms of intention-reading, social learning, and communication" (Tomasello and Herrmann, 2010, p. 5). The authors suggest that these skills are necessary for language but precede it in development (and presumably in evolution), as children can communicate before the emergence of speech through declarative gestures like pointing.

#### *Edited by:*

*Umberto Ansaldo, University of Hong Kong, Hong Kong*

#### *Reviewed by:*

*Yang Zhang, University of Minnesota, USA Pilar Prieto, ICREA-Universitat Pompeu Fabra, Spain*

#### *\*Correspondence:*

*Otávio Mattos, Grammar and Cognition Lab, Departament de Lingüística General, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain otaviomattos@ymail.com*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 01 March 2015 Accepted: 07 September 2015 Published: 23 September 2015*

#### *Citation:*

*Mattos O and Hinzen W (2015) The linguistic roots of natural pedagogy. Front. Psychol. 6:1424. doi: 10.3389/fpsyg.2015.01424* In this way, they are already able to manifest to adults through pointing the referents about which they intend to communicate and learn. Language would add to this scenario other "fundamentally cooperative communicative devices – known as linguistic conventions (or symbols) – whose meanings derive from a kind of cooperative agreement that we will all use them in the same way" (Tomasello and Herrmann, 2010, p. 5).

The idea of a human-specific form of communication that precedes the emergence of language can also be observed in some archeologists' interpretations of the archeological record of our hominin ancestors:

Could the (Neanderthal) knapper of Marjorie's core have learned the significance and role of, say, the distal convexity without recourse to language? (*...*) We believe that the answer is yes. *If a teacher drew a novice's attention repeatedly to the distal convexity (by pointing, for example), this would have been enough*. However, we believe that (this technology) would have been very difficult to learn without some sort of guided attention; it probably required active instruction, and *active instruction relies on joint attention and theory of mind. It does not require language*. (Wynn and Coolidge, 2010; our italics).

That said, we think that this thesis is *wrong* for reasons that we set out in this article. First, we will argue that the comprehension and production of declarative gestures by infants reflect structural aspects of human language. In particular, we suggest that declarative gestures are the first expression of determiner phrases in development, to which they are developmentally linked, corresponding to the assignment of the role of 'third person' in communicative acts. In combination with ostensive signals (like eye contact), which are used to define the initial first and second persons involved in communicative acts, declarative gestures in this way complete the 'deictic space' within which both natural pedagogy and language use naturally occur. Its foundations are centrally affected in infants with autism spectrum conditions, where not only the personal pronouns but also declarative gestures as well as determiner phrases at large can be affected (Lee et al., 1994; Modyanova, 2009; Hobson et al., 2010; Curtin and Vouloumanos, 2013; Shield and Meier, 2014; Hinzen and Schroeder, 2015).

Having linked the 'pre-linguistic communication' mediated by ostensive signals and declarative gestures to the faculty of language1 , we will reflect on the kind of knowledge that children can acquire or convey through communication in light of the linguistic structures that emerge throughout the one wordperiod. We will suggest that at the 'proto-determiner phrase stage' children can only acquire knowledge that is generalized to kinds and that the emergence of the 'proto-sentence stage' in language development allows them to cognize information in terms of temporal and spatial relations — i.e., to "reconstruct from some parts of the adult's (communication) a local, episodic content for the informative intention" (Csibra, 2010, p.157). However, children's first assertions are bound to the here-and-now of speech. Language development not only expands these spatial and temporal limits, but also improves the capacity of children to understand and produce statements with sentential arguments that are anaphorically connected to entities and/or propositions that are given in the discourse.

We will argue for a faculty of language whose core function is to perform (through the production of linguistic structures) different referential acts in the spatial, temporal, and discourse domains, grounding all human-specific forms of referential communication — including infants' use of declarative gestures. In this way, language would be inherent to human-specific aspects of communication from very early in development, instead of being a 'tool' designed by and at the disposal of human communication only at later ages. Our view contrasts with the perspective of formal linguistics, which has left the referential aspect of language largely aside during the last 50 years, confining itself to an 'internalist' inquiry as defined in Chomsky (2000). Independent linguistic evidence as synthesized in Hinzen and Sheehan (2013), however, suggests that the full spectrum of forms of reference available to humans patterns along with grammatical configurations, rather than being governed by non-linguistic factors. Reference is thus inherent to grammar.

This illustrates that we are not merely continuing the old Humboldtian debate about the relative primacy of either language or thought, by arguing in favor of a 'language-first' view. Instead we advocate that a specific capacity, namely natural pedagogy, is inherently integrated with language, making them two sides of the same human-specific coin. In this way there would be a single evolving system, and the prediction is that natural pedagogy and language will never dissociate. An obvious way to explore this hypothesis further empirically is to compare typically developing children and children with communicative disorders regarding their capacity to learn different kinds of information through communication. In such a study, we would expect that particular problems in language development (e.g., a delay in the individual onset of proto-determiner phrases and proto-sentences) would be significantly associated to an atypical development of natural pedagogy (see Language and Learning from Communication as Two Non-Dissociable Capacities).

Connecting language to natural pedagogy could also motivate a new proposal within the currently stagnant debate about the origins and evolution of our linguistic capacity (Hauser et al., 2014). In contrast with living non-human apes who basically learn traditions *emulating* older generations — i.e., trying to reproduce the end result of actions through trial and error method (Tennie et al., 2009) 2 — communication is the main source of knowledge for humans (Coady, 1973). If linguistic structures are *inherent* to human-specific forms of communication as we here defend, then in exploring these structures we could understand better the main "social-cognitive

<sup>1</sup>In this paper we will mainly focus on declarative gestures, though we recognize the central role of ostensive signals not only for natural pedagogy, but for language use and development.

<sup>2</sup>Whiten et al. (2009) criticized the idea that apes are exclusively emulators, suggesting that they are also able to *imitate* others' strategies to achieve specific results. Be it as it may, for our present discussion it is enough to say that the transmission of traditions through communication is only observed in humans (Csibra and Gergely, 2006).

skills that enable (humans) to develop, in concert with others in their cultural groups, creative ways of coping with whatever challenges may arise" and "deal with everything from the Arctic to the tropics" (Tomasello and Herrmann, 2010, p. 7). Perhaps the emergence of the so-called Mousterian stone tool technology in hominin evolution relied on this human-specific mechanism — after all, it succeeded the Acheulean technology, which is the stone tool tradition that has remained the longest in human evolution and yet "true and persistent innovation does appear to be lacking" in it (Ambrose, 2001; Nowell and White, 2010, p. 76). If we can show that the faculty of language is not simply 'a symbolic system' (an idea that perhaps is implicit in Csibra and Gergely, 2006, and in Tomasello and Herrmann, 2010) but *the* symbolic *and* referential system behind all human-specific forms of referential communication, the interpretation given by Wynn and Coolidge (2010; see above) that pointing "would have been enough" to teach apprentices how to produce the Mousterian tool in question would favor our hypothesis that at least a proto-language was in place by that point.

In summary, we will argue here for a faculty of language as a 'non-encapsulated' universal capacity that is inherent to aspects of communication and meaning that are human-specific — and we will do so by focusing on a core capacity for humans, namely natural pedagogy. In order to ground the present perspective, in the first section we will explore the connections between declarative gestures and the faculty of language in more detail, while in the second our focus will be on the relation between different linguistic structures and the kinds of knowledge that children can acquire or convey through communication. We will conclude by suggesting that human communication and specifically our species-specific capacity to acquire cultural knowledge through it — is deeply rooted in the faculty of language.

## Declarative Gestures: Language's Illegitimate Child

Csibra and Gergely (2006, 2009, 2011) state that only humans among all living species have *natural pedagogy*: i.e., the capacity to transmit cultural knowledge through communication to new generations and the capacity of new generations to learn cultural knowledge from communication. Briefly, an adult manifests his communicative intention to a child by directing an 'ostensive signal' (e.g., eye contact) to her and then the child instinctively expects to receive new information about some object in the immediate surrounding world — a piece of information that she generalizes to every object of the same kind. Evidence shows that by 4 months of age infants already react to adult ostensive signals, but only by 10 months of age do these stimuli induce them (i) to expect and follow declarative gestures like pointing or gaze-shift to identify a referent in the world and (ii) to consider the adult's attitude toward the referent an informative behavior (Csibra, 2010). In other words, at 10 months of age infants expect and come to be part of a 'deictic space' within which cultural information can be acquired by connecting the third person (established at this moment exclusively through declarative gestures), the second and the first person (established through ostensive signals).

Csibra and Gergely (2006) argue that declarative gestures are our earliest form of referent assignment not only in development, but also in evolution. These gestures and broadly speaking "the ability to teach and to learn from teaching (are) a primary, independent, and possibly phylogenetically even earlier adaptation than language" (Csibra and Gergely, 2006, p. 2). Within this view, only symbolic and iconic gestures, but not indexical gestures like pointing, would be associated to language. Our goal in this section is to challenge this statement, presenting evidence that the human use of indexical gestures and natural pedagogy reflect structural aspects of language.

The relation between declarative gestures and language development has been explored in many studies (see for example Butterworth and Morissette, 1996; Markus et al., 2000). Colonnesi et al. (2010) examined twenty-five of these studies (734 children in total), concluding that pointing is related to speech both longitudinally and concurrently: (i) longitudinally, the amount of pointing produced by infants predicts their speech production rates (see also Butterworth, 2003) and (ii) concurrently, pointing is used in integration with speech. Importantly, they found statistically significant associations between declarative pointing and language already by 10–11 months of age — when infants start to produce declarative gestures but still do not produce words — and the strongest associations between 15 and 20 months of age. These associations were found for declarative pointing (i.e., a gesture that 'declares' a referent, e.g., when a child points at a dog) but not for imperative gestures (i.e., a gesture that children use to induce others to take an object for them, using other people as tools to solve an immediate problem).

Children first start to produce co-speech gesture combinations to convey 'reinforced information' — for example, pointing at a dog and saying 'dog' — and only later in development they produce 'supplemented information' — for example, pointing at a dog and saying 'go,' a kind of combination in which each modality (speech and gesture) conveys different pieces of information (Goldin-Meadow and Butcher, 2003; Iverson and Goldin-Meadow, 2005; ÖzçalI¸skan and Goldin-Meadow, 2009; Cartmill et al., 2014). Importantly, the emergence of the latter never precedes the former in development, and each of these combinations predicts the individual onset of specific linguistic structures in speech, i.e.: the individual onset of reinforced co-speech-gestures predicts the individual onset of determiner phrases in speech, while supplemented co-speechgestures predict the individual onset of sentences in speech (Goldin-Meadow and Butcher, 2003; ÖzçalI¸skan and Goldin-Meadow, 2009; Cartmill et al., 2014). The successive emergence of 'proto-determiner phrases' and 'proto-sentences' in the one-word period moreover parallels the fact that the words that children are producing at around 14 months of age are nouns related to people (e.g., 'baby,' 'dad' etc.), objects (e.g., 'banana') and animals (e.g., 'rabbit'), and expressive utterances like 'hello,' while only at around 19 months of age do they start to produce verblike words like 'woof' and 'yes/no' answers — a developmental pattern observed in signing and speaking children alike, as well as in monolinguals and bilinguals (Nelson, 1973; Holowka et al., 2002) 3 .

When humans produce or comprehend declarative gestures they are necessarily connecting referents in the external world to concepts in their internal world. Natural pedagogy can only transmit knowledge about kinds because such a connection exists. Our claim here is that the mechanism underlying this bridge between external and internal world *is* the faculty of language, which is our symbolic and referential system *par excellence*: the development of this very faculty leads children from the use of declarative gestures — alone or combined with meaningless vocalizations or one-word utterances — to a more complex set of 'resources', by which different forms of reference (such as nominal and temporal reference) and concepts can be linked in multiple ways, giving rise to a pedagogy that conveys different kinds of information4 . This is why declarative pointing and speech are strongly related along development5 , and, as we will suggest in the remainder of this section, this is also the reason why non-human animals (chimpanzees, cats, dogs, dolphins etc.) *do not* produce or comprehend declarative gestures (pointing and gaze following) in the same way that infants at 10 months of age do.

Evidence demonstrates that chimpanzees do not comprehend pointing as a declarative gesture (Povinelli et al., 2003; Miklósi and Soproni, 2006). Povinelli et al. (1997) trained seven chimpanzees to use experimenter's pointing gestures to locate a treat hidden in one of several possible locations. After many trials, the apes responded to these gestures very accurately, so the researchers increased the distance between the correct location of the treat and the distal end of the experimenter's pointing. In this situation, the success rate of five of the seven chimpanzees decreased from 100% of correct choices to chance levels, making the researchers conclude that "apes were simply focusing on the local configuration of the experimenter's hand and the box" (Povinelli et al., 2003, p. 60). However, since two apes still performed above the chance level, the researchers conducted a new experiment with the seven chimpanzees: in one case the experimenter was closer to the incorrect location and in another case the tip of the experimenter's finger was equidistant from the two possible locations (in both cases, of course, the experimenter was pointing to the correct location). Results showed that all chimpanzees made the wrong choice in the condition where the experimenter was placed closer to the incorrect location; in the other condition, all apes performed randomly. Finally and essential to our discussion, the authors also observed that 3 yearold children were perfectly accurate from the first trial onward in the same experimental procedure6 .

The study of Povinelli et al. (1997) thus shows that after much training chimpanzees can learn that some perceptual aspects of the experimenter's physical disposition can be used as 'hints' to determine the location of the treat — strongly contrasting with infants, who spontaneously start comprehending and producing declarative gestures by 10 months of age (Butterworth, 2003; Cartmill et al., 2012). On the other hand, chimpanzees seem to perform much better in tasks involving gaze and head movement: they follow experimenter's line of sight even when it projects outside their perceptual field (an ability that emerges in children only by 18 months of age; Butterworth, 2003) and they also take into account that this line of sight can not cross opaque screens (Povinelli and Eddy, 1996). Can this be evidence that chimpanzees comprehend other's gazing at a target as a declarative gesture, just as humans do?

We believe that the answer is no, but before explaining our position we also want to consider briefly the ability of some non-primates to take into account human pointing gestures. Cats, dogs, dolphins, and seals perform the experiment described before (Povinelli et al., 1997) much better than chimpanzees and they do it at a high level from the beginning of the test, just like children (Miklósi and Soproni, 2006). Furthermore, dogs seem to improve their performance even more when the pointing gesture is preceded by eye contact (Miklósi and Soproni, 2006) — which is a strong parallel with children's sensitivity to adults' ostensive signals. All this raises the question whether both sensitivity to ostensive signals and declarative gestures, far from being specific to humans, might be something that can independently emerge in cooperative species (e.g., dolphins) and/or can be the evolutionary consequence of domestication (which would also explain that dogs realize better than wolves the

<sup>3</sup>While we strongly agree with Iverson and Goldin-Meadow (2005) that 'gesture and speech form a single integrated system,' for these authors human gesture 'paves the way' for or 'facilitates' language development. By contrast, we suggest that infants' declarative gestures are themselves the expression of emerging linguistic structures, structures that gradually become more complex throughout the development of the faculty of language. This perspective makes sense of the humanly unique features of declarative gestures such as their bipartite structure, the inherent intentionality (with a 't') and intensionality (with an 's') of the forms of reference involved (see further discussion at the end of this section), and their central role in the emergence of natural pedagogy.

<sup>4</sup>Importantly, declarative gestures not only start out as part of our referential, linguistic system, but they crucially *remain* an inherent aspect of this system once it has developed fully. In particular, this kind of gesture is a fundamental ingredient in demonstrative reference with deictic expressions such as 'this' and 'that', which are universal (Diessel, 2006). Deictic reference has long been noted to be disturbed in people on the autism spectrum (Hobson et al., 2010), a disturbance that is, as we would predict, part of larger significant anomalies in the referential use of language in this population (Modyanova, 2009). Interestingly, deictic gestures do not seem to be impaired in children with SLI (Iverson and Braddock, 2011), and therefore we would expect them to have a better control of the grammar of nominal structure compared to children on the autism spectrum — although they do show problems with it as well, such as producing significantly more substitutions of definite articles than age-matched TD children (Polite et al., 2011; Chondrogianni and Marinis, 2015).

<sup>5</sup>Our perspective in this sense is compatible with McNeill's (2014) general view that some gestures and speech comprise a single, integrated multimodal system, while there are also early gestures not related to it. The latter, according to him, are quite different from gestures that are unified with speech in what he calls a 'dual semiosis' — i.e., when "gesture and speech become co-expressive rather than supplemental" (Levy and McNeill, 2015, p. 173).

<sup>6</sup>For Povinelli et al. (2003), the reason behind children's success in this experiment is their capacity for theory of mind, something that they claim to be absent in chimpanzees. We, on the other hand, suggest that their comprehension of declarative gestures is above all related to the referential mechanism of human language. The described study cannot exclude our position, which is supported by the evidence presented throughout this section. Independently of that, much evidence, reviewed in De Villiers (2007), suggests that full and explicit theory of mind is language-dependent. In this way, even if we attribute some form of theory of mind to one or another non-linguistic species, this does not mean that the members of this species think propositionally and have a capacity for intentional reference (see Fitch, 2010: P. 187–194).

mentioned experiment) (Miklósi and Soproni, 2006; Topál et al., 2009).

The main problem for this line of thought is that these experiments do not show that the same interpretative bias lies behind the correct behavioral response of chimpanzees, dogs, and infants (Povinelli and Eddy, 1996; Topál et al., 2009). For example, babies at 6 months of age also seem to be able to follow adults' gaze (Butterworth, 2003), but they do this differently from infants at 10 months of age, in two respects: firstly, the precise identification of the target is determined by the salience of the object in the situation — a mechanism that Butterworth (2003) called 'ecological mechanism of joint visual attention.' In our view, an analogous 'ecological mechanism' can be suggested for animals like dogs: they seem to try to satisfy instructor's expectation taking to him (or finding) some salient object whose location is indicated by pointing or gazing (Topál et al., 2009). Secondly, we use declarative gestures for more than directing others' attention to salient objects, and infants by 10 months of age are aware of this: they expect to receive new information about the kind of the assigned referent.

Therefore, both chimpanzees and dogs are able to perform as well as infants in tasks involving, respectively, gaze following and pointing, and both seem to be sensitive to ostensive signals (Miklósi and Soproni, 2006), but we have seen that only in the case of the infants, ostensive signals make them expect the transmission of new information about the kind of the assigned referent. This can be explained in light of the faculty of language, which is at the same time a referential *and* a symbolic system i.e., a system that connects the external world to our internal, conceptual world. Although infants by 10 months of age still do not produce words, this system has already started to develop: they can only acquire knowledge about *kinds* because (i) they hold concepts in relation to these kinds, and (ii) they can link these concepts to assigned referents in the situational context (Hinzen and Sheehan, 2013:ch. 2; Bickerton, 2014).

The use of artificial language by apes illustrates very well the unique character of human language as a referential and conceptual system. Cartmill and Maestripieri (2012) observed that apes can use arbitrary gestural symbols that are not linked to internal states like emotions, they can map these symbols to objects of the world and they can learn these symbols from passive observation. However, the authors affirm that although apes are (i) "provided with individual units that are analogous to human words (i.e., referential, arbitrary, taught)" (Cartmill and Maestripieri, 2012, p. 19), they (ii) "do not display any aptitude in combining the units in a systematic or meaningful way." The problem here is that reference emerges in human language only *from* the structure of phrases, not from words alone (Rozendaal and Baker, 2008, 2010; Martin and Hinzen, 2014), therefore being able to "combine the units in a systematic or meaningful way" (ii) is a necessary condition for human referentiality. For example, the arguments of the sentences 'a cat meows,' 'the cat meows' or 'this cat meows' are not 'referential isolated words' but determiner phrases — i.e., they combine referential operators with nouns. In short, the word 'cat' alone is not referential at all. Furthermore, the position of the determiner phrase in the sentence structure can prevent its referentiality —e.g., in 'a thief

entered,' the determiner phrase 'a thief' picks out a referent, while in 'that guy is a thief' the same determiner phrase 'a thief' works as a predicate (picking out a property ascribed to a referent instead of a referent).

Referentiality in humans is a combinatorial phenomenon *par excellence*, therefore an inaptitude in "combining the units" suggests that apes cannot display the kind of referentiality produced by human language either. This combinatorial aspect of human referentiality explicitly guides infants' use of declarative gestures: at the beginning these are often produced with meaningless vocalizations (Cartmill et al., 2012), which gives place to one-word utterances by 12 months of age — importantly, children's initial vocabulary seems to be related to the number of different kinds of objects that they point to before the one-word period (Iverson and Goldin-Meadow, 2005), which indicates that lexical concepts are already in place at this moment, being combined with declarative gestures in children's communication. In the terms of Martin and Hinzen (2014), in a definite description like 'the dog,' the determiner 'the' is the 'edge' of the phrase and regulates its referentiality (determining definiteness in this instance), while 'dog' is the 'interior' of the phrase and determines the descriptive content involved in the act of reference. Therefore, infants' declarative gestures express the referential *edge* of the determiner phrase, while their words (pronounced or not) are related to the conceptual *interior* of this nominal structure (which is linked to their knowledge about kinds)7 . In short, while Cartmill and Maestripieri (2012) state that non-human apes can use an artificial language referentially but not combinatorially, we state that human language is referential because it is combinatorial — not combinatoriality in a generic sense (of a type, for example, that can be found in artificial languages or music as well), but related specifically to grammar, which correlates with the genesis of referentiality in language.

To stress our point, we agree with Petitto (2000, p. 383) that it remains uncontroversial that "all chimpanzees fail to master key aspects of human language structure, even when you give them a way to bypass their inability to speak — for example, by exposing them to (*...*) natural signed languages" (see also Tomasello, 2008). For her, and for us as well, this indicates that chimpanzees lack cross-modal mechanisms that ground the development of both signing and speaking of any natural language, rather than merely mechanisms for perceiving and expressing speech sounds. In our view, however, these cross-modal linguistic mechanisms do not only involve the necessary ability to "detect aspects of

<sup>7</sup>Iverson and Thelen (1999) observe that speech and gesture are strongly synchronized in adults but not in children, even when gesture and vocalizations occur together. The authors then propose, based on neurophysiological and neuropsychological evidence, to account for the timing relationship between them throughout development as follows: "During the time when infants are just beginning to acquire many new words, speech requires concentration and effort, much like the early stages of any skill learning. As infants practice their new vocal skills, thresholds for hand–mouth activity decrease, and (*...*) (when) the level of activation generated by words is well beyond that required to reach threshold, it has the effect of capturing gesture and activating it simultaneously" (p. 35). In our view, this explanation accounts well for the fact that declarative gestures and 'nonpronounced words'/meaningless vocalizations/words could still be connected to the same linguistic structure in infants' mind, even when their gestural and oral production are not strongly synchronized yet.

the patterning of language (*...*) the temporal and distributional regularities initially corresponding to the syllabic and prosodic levels of natural language organization" (Petitto, 2000, p. 397), but also the capacity to perform reference — indeed, this referential mechanism seems to play an important role in the acquisition of native phonetic structures: at 9 months of age, infants enhance the discrimination of sounds that co-occur with distinct referents (Yeung and Werker, 2009), at the same time that their ability to statistically learn phonetic categories starts to decrease (Yoshida et al., 2010).

The combinatorial nature of referentiality in humans (i.e., a referentiality grounded on linguistic structures formed by a referential edge and a semantic interior) explains a further, longnoted aspect of 'intentionality' (with a 't'), namely 'intensionality' (with an 's'), which is induced by the lexical description of the nominal phrase. By (human) intentionality (with a 't') we mean the deliberate reference to things based on internal concepts, while intensionality (with an 's') arises because, if I know a referent under one description, I may of course not know it under an indefinite number of others — in other words, descriptions applicable to the same referent could be non-equivalent in the subject's mind. Thus I may not know that a colleague, Mr. Smith, is also my wife's secret lover, or my daughter's most hated teacher. My thought or statement that Mr. Smith is an honorable gentleman is therefore inaccurately (or at least misleadingly) reported as the thought or statement that my wife's secret lover is an honorable gentleman, even if the two descriptions pick out exactly the same man. Now, it would be equally misleading for someone to say, if I *point* to what is (for me) Mr. Smith, that I pointed to my wife's lover: the description stands between the referent and the person referring, as it were, and also in pointing, reference is systematically dependent on description. If declarative gestures exhibit intensionality in this sense (and consequently intentionality, as the latter is inherent to the former), it is hard to see how they are not inherently linguistic, given the inherent difficulty of establishing intensionality for any non-linguistic animal (Davidson, 1982) 8*,*9 .

Natural pedagogy, then, could, as we have argued, be the comprehension side of a coin that has proto-determiner phrases as its production side. Through natural pedagogy, infants connect assigned referents in the external world to concepts in the internal world, promoting an 'exchange' in which their current knowledge 'explains' the stimuli and interlocutors' behavior toward the stimuli modifies infants' current knowledge. The emergence of proto-sentences in language development will be equally related to the emergence of a new pedagogy: one that is based, as we will argue in the next section, on the transmission of knowledge about *facts*.

Therefore, if we take as 'declarative' only the gestures that are used as expressions of nominal 'edges,' linking the external world to our conceptual/internal world, these gestures are not only human-specific but linguistically based. In this way we disagree with views that describe declarative gestures as merely something used to "re-direct(s) the partner's attention to some distant object or event" (Leavens, 2004, p. 395). This is a necessary but not a sufficient condition for declarative gestures in the sense that we have assumed here. 'Declarative gestures' as defined by Leavens (2004) can be comprehended by distantly related species like dogs, cats, dolphins, seals, and also chimpanzees (in this latter case only gaze and head movement), hence a necessary distinction is missed. Declarative gestures in our sense seem to have only emerged in hominin evolution, being not only related to the emergence of natural pedagogy but also to the emergence of a (proto-) language that allowed our ancestors to produce (at least) proto-determiner phrases10.

In the following section we will try to demonstrate that natural pedagogy can be better understood if we take into consideration the specific developmental stage of language that parallels its emergence. In doing so, we will be able to not only understand natural pedagogy but also the emergence of other forms of communicative learning.

#### Language and Learning from Communication as Two Non-Dissociable Capacities

In this section we will defend the hypothesis that the faculty of language and the capacity to learn from communication are intrinsically related. In order to do so, we will argue that the earliest form of communicative learning to emerge in development — natural pedagogy — can be better understood in light of the first kind of linguistic structure that infants produce — namely what we called proto-determiner phrases. On the other hand, the emergence of sentence-like structures in language development gives rise to another form of 'pedagogy':

<sup>8</sup>Throughout this paper, we assume a crucial distinction between animal abstraction and human concepts, explicated in more detail in Hinzen and Sheehan (2013, ch. 2). Animals can form abstract perceptual stimulus classes, which order their experience in adaptive ways. This is a necessary but not a sufficient condition for human concepts. Concepts are abstractions that necessarily exist as the 'interior' of linguistic structures. These linguistic structures allow us to establish connections between the external and the internal world without the necessity of a perceptual mediation. In non-human animals, their perceptual input activates and 'combines' with their abstract knowledge, but human abstractions can be associated to linguistic 'edges' instead of percepts.

<sup>9</sup>Full (explicit) theory of mind inherently involves an understanding of both intensionality and intentionality, since beliefs that we attribute to agents have both intentional contents (they are intentionally directed at objects), and these contents feature concepts that can give rise to intensionality effects (objects referred to do not have the properties that the concepts of them capture and vice versa). It is in line with the present viewpoint that there is extensive evidence for a developmental link between language, explicit theory of mind, and intensionality (Rakoczy et al., 2015), as well as language (specifically, the understanding of finite clausal complements around the fourth birthday) and explicit theory of mind (De Villiers, 2007; De Villiers and De Villiers, 2012). Further evidence for this link comes from children with autism spectrum conditions (Paynter and Peterson, 2010), and from overlaps in the neural correlates of theory of mind and the language comprehension network (Ferstl et al., 2008). Astington and Jenkins (1999) classical longitudinal study of 3-year old infants found that controlling

for earlier theory of mind, earlier language abilities predicted later theory-ofmind test performance, while the reverse, controlling for earlier language, was not the case. On the other hand, theory of mind is arguably a composite function involving a number of different cognitive abilities, including face recognition (in seeing infants), empathy, tracking intentions and goals, and other abilities besides language.

<sup>10</sup>Tomasello (2006, p. 520) suggests that "asking why only humans use language is like asking why only humans build skyscrapers (*...*) (and so) asking why apes do not have language may not be our most productive question. A much more productive question (*...*) (is) why apes do not even point". But it follows from our account that these two questions are precisely related: the answer why apes do not point may lie on the fact that they do not have a faculty of language.

one that conveys information about particular events, actions, and state of affairs. Both pedagogies presuppose a 'communicative triangulation' between the speaker (the grammatical first person), a hearer (the second), and an assigned referent (the third), but only sentential structures can produce statements about the world, statements that, by their very nature, can be true or false. Finally, we will show that language development gradually frees children's statements from their temporal, spatial, and anaphoric ties, allowing them to talk about entities that are not physically present in the situational context, events that happened or will happen in a remote past or future and entities and/or claims that were previously mentioned in a conversation.

Csibra and Gergely (2006, 2009, 2011) point out that natural pedagogy is specific to humans, not because no animal can communicate or learn, but because they are not able to learn generic knowledge *from* communication. The problem is that animal forms of communication like alarm calls (i) always convey fixed configurations of message and referent and (ii) are always restricted to the immediate situation of subjects — for example, they alert conspecifics to the presence of predators, indicating with a single signal that, say, an aerial predator is approaching (Csibra and Gergely, 2011). Natural pedagogy, however, can convey a potentially infinite set of information about the same referent, and this information is generalized to other objects of the same kind. In other words, we can point at a bird and communicate many different things about it, and the hearer will consider this information in other moments and places for the same kind of entities. This suggests that at the proto-DP stage, where sentential configurations are still missing, new information is not actually tied to time and space. As we shall see below, what changes in the proto-S stage are not the elements of abstraction (e.g., lexical concepts) — they entail, *ipso facto*, generality, and function predicatively even in the proto-DP stage —, but children's capacity to grammatically cognize temporal and spatial relations through sentences.

As noted, humans use ostensive signals (e.g., eye-contact) to demonstrate their communicative intention to an interlocutor (Csibra, 2010), and adult ostensive signals cause infants from approximately 10 months onward not only to follow their deictic gestures (like gaze-shift or pointing) but to expect novel information about the referent's kind. Furthermore, infants within ostensive communication assume that this novel information is available for everyone — reacting when subjects other than the interlocutor do not take the generic information into account (Gergely et al., 2007). In this way, infants do not relate interlocutors' positive attitude toward, say, a plate of broccoli to his or her mental state, but to the properties of broccoli as a kind (e.g., 'broccoli is good'), and consider that this property is available to other subjects as well.

Our hypothesis is that children's capacity to acquire and transmit knowledge through communication develops in connection with language. In this way, natural pedagogy is related to the emergence of proto-determiner phrases and this very fact gives us insight into why natural pedagogy transmits generic knowledge about kinds. The explanation is the following: sentence structures, but not determiner phrases, relate information to sentential arguments and to a time span — i.e., a time that can precede, contain or follow the time of utterance, as in the past-tensed statement 'the book was on the table' (Klein, 1998, 2006). Therefore, when acquiring knowledge through natural pedagogy, infants seem to take assigned referents as 'physical expressions of concepts,' in such a way that any new information about these referents automatically constitutes new information about the concepts to which these referents are associated. The needed sentential complexity to restrict a predicate to a time and context is simply not yet there.

Relating natural pedagogy to the proto-DP stage can also explain why 12 months old infants seem to point declaratively essentially to obtain generalizable information about the world and not to inform interlocutors about the situational context (Southgate et al., 2007). In our view, children can only inform others when they are able to take referents as arguments of sentential predicates — as in the case described by Lock (1997) in which a child uttered the word 'dog' and, when her mother asked 'what is the dog doing?', she said 'woof'. Before that, however, they use declarative gestures exclusively to indicate the objects of their interest, stimulating adults to convey new information about their kinds. This is indeed the only scenario that we could expect. If children at the proto-DP stage can only extract generic knowledge from communication, how could they convey non-generic information about the situational context?

For this reason, we think that we should nuance Csibra's and Gergely's (2006, p. 6) argument that natural pedagogy is connected to "the predicate-argument (knowledge-referent) structure of human communication." This is true if we consider that natural pedagogy involves the connection of properties (semantic/conceptual knowledge) to referents, but false if we imply from this that semantic content and referents are connected through *sentence-like* constructions as this kind of structure only emerges in child development by 18 months of age (i.e., approximately 8 months after the emergence of natural pedagogy) (Goldin-Meadow and Butcher, 2003; Iverson and Goldin-Meadow, 2005; ÖzçalI¸skan and Goldin-Meadow, 2009). Suggesting that natural pedagogy involves *sentential* predicate-argument structures would go against the developmental pattern of language described in the previous section and undermine a linguistic explanation for the humanspecific capacity to acquire, through communication, different kinds of information — respectively, *knowledge about kinds* and knowledge about particular events, actions and state of affairs, which we will call here simply *'knowledge about facts.'* From this perspective we hypothesize here that at the DP-stage children would be able to learn through communication that 'broccoli' (as a kind) is good but not that something specific happened to her plate of broccoli, like that it fell down. The onset of the latter capacity would predict (or would be predicted) by the onset of proto-sentence production.

We currently explore this hypothesis through a longitudinal study that aims to (i) analyze children's production of gestural and oral communication throughout the one-word period and (ii) verify children's capacity to acquire information about specific events, using a version of Ganea et al.'s (2007) experimental design with stuffed toys. In their study, infants were told that a particular stuffed toy that had been earlier named had undergone a change in state while out of view. Subsequently, the infants' capacity to identify it exclusively on the basis of its new state was verified. Although the aim of the authors was to check children's capacity to incorporate "(communicative) information into one's mental representation of the absent object," we have decided to go one step further and see if children's success in this test is significantly correlated to the individual onset of proto-sentence production. We also involve children with communicative disorders, specifically regarding their production of communicative gestures (i.e., declarative, descriptive, and symbolic gestures) and words.

An essential distinction between knowledge about kinds and knowledge about facts is that only the latter could bear truth value: it is connected to sentence structures, which is our only means to acquire and convey true/false information about the world11. This seems to be in consonance with Prasada (2000, p. 67), who says that a key aspect of knowledge about kinds is that "(it is) not rendered false by the existence of instances that lack the essential property" (e.g., the existence of a three-legged dog does not make us to abandon the idea of dogs being four legged12). In this way, the production of sentential structures by the human mind would not be necessary for the acquisition of generic knowledge about kinds through communication, although, of course, we can express generic information through them (e.g., 'dogs are four-legged').

Determiner phrases allow us to cognize object reference but not temporal reference13 — which is a fundamental component of non-generic statements (Klein, 1998, 2006; Sheehan and Hinzen, 2011; Martin and Hinzen, 2014). When adults make claims

Because it is a dog.

about particular events or situations, these are always referred to as preceding, containing or following the time of utterance (Bonomi, 1995; Klein, 1998, 2006), in such a way that the truth of these assertions are limited to their specific 'temporal frames'. For example, if I say 'Cristina was drunk,' the finite verb 'was' indicates that this claim is about a situation that *precedes* the time of utterance, therefore shifting temporal reference to the past and restricting truth to this time span. Importantly, that 'Cristina *was* drunk' is true does not indicate that 'Cristina *is* drunk' is necessarily false: 'was' does not establish when the situation ends, it only indicates *for which time span the state of affairs described by the statement is supposed to be assessed as true*14.

Someone could suggest that the so-called 'tenseless languages' challenge our hypothesis about the intrinsic connection between assertion and temporal reference in grammar. Speakers of, for example, Germanic and Romance languages use finite morphology to produce the time span of events referred to in assertions, but languages like Yucatec Maya (Bohnemeyer, 2009) and Tupí-Guarani (Tonhauser, 2011) are said to be tenseless. However, the question in these cases is how interlocutors connect statements to time spans and not whether these statements are or are not linked to them (Bohnemeyer, 2009). In this way, for our purpose it is enough to say that languages have different forms to encode the time span of assertions and that these forms emerge gradually in language development.

Another possible criticism is that linguistic resources like finite morphology and temporal adverbs do not emerge when children start to make assertions either (Blom, 2003; Dimroth et al., 2003; Jolink, 2005), and therefore their claims would not be circumscribed to any temporal frame. Evidence nevertheless shows that children's untensed claims are by default related to the time of utterance: from the proto-sentence stage to approximately 31 months of age, children seem to only make claims about events, actions, and state of affairs that happen at around the moment of their speech (Morford and Goldin-Meadow, 1997). The ability to make reference to remote events in the past or future seems to be related to the development of finiteness in language, which starts to emerge by 24 months of age and is fully mastered by 36 months of age (Blom, 2003; Dimroth et al., 2003; Jolink, 2005).

Morford and Goldin-Meadow (1997) also noted that the home-signing deaf children in their study, despite the lack of a conventional language model to learn from, first started to talk about events that happened or were about to happen at around the time of their Signing and only later did they communicate about events in a distant past or future. Therefore, although the lack of linguistic input seems to have delayed the maturation and performance of temporal reference in the homesigning deaf children of the study — they talked about both near and distant events less often, and started to do it over a year later compared to hearing children —, the development of temporal reference followed the same stages observed for hearing children. It therefore appears that temporal reference is

<sup>11</sup>In formal terms, a predicate of the form 'dog' that is part of a pointing gesture at the proto-DP stage need not automatically be interpreted non-propositionally, after a translation into a formal language. That it corresponds to a proposition would mean that the child, effectively, is expressing the proposition that the object pointed to is a dog. In this case, there are propositions the moment that there are pointing gestures. In particular, where 'dog' is a noun, (N dog), the property of being a dog obtained through abstraction would be λx. dog(x). The formula [dog(x)]*g*[*a/x*] can then be defined as true in model M, iff the individual constant *a* is a dog in M under the variable assignment *g*. A child's act of pointing can now be understood as an assignment in this sense, and the reinforced pointing gesture as conveying the proposition that the object pointed at is a dog. We don't question that such a formal translation is possible. Our empirical claim is that, at the point of the proto-DP, a full model in which propositions can be *cognitively* evaluated as true or false is not yet available. We thank Hannes Rieser for conversations on this issue.

<sup>12</sup>Prasada (2000) is not talking about statements with statistical prevalence like "all dogs are four legged" or "X% of dogs are four legged". According to the author, knowledge about kinds allows us to "explain the existence of an essential property in an exemplar by citing the kind of thing it is" (Prasada, 2000, p. 66), as in the following example cited by the author on page 67:

Why does that have four legs? (pointing at a dog)

<sup>13</sup>We are not saying here that determiner phrases cannot specify temporal information lexically, in their 'interior' (the NP-part of a complex DP), which a simple example like 'John's smile at last night's party' would be enough to falsify. We are claiming that a complex DP like this one is crucially different from a sentence like 'John smil*ed* at last night's party', which establishes temporal deixis grammatically. In the former expression, which unlike the latter cannot as such be true or false, the prepositional phrase 'at last night's party' descriptively precisifies the assigned referent. In the latter, the verbal inflection does not have any descriptive function for the referentiality of the sentential argument ('John'), but sticks a new referential 'flagpole' (a temporal one) to which the lexical concept 'smile' is attached. The result is reference to an event as opposed to an object, together with a temporal relation of this event to the time of the speech event.

<sup>14</sup>Klein (1998) illustrates this point with the sentence 'John was dead.' Unless you believe in the possibility of resurrection, John is still dead and will continue being dead. Therefore, the finite element 'was' only indicates that he supposedly died before the time of utterance, not the end of the situation.

such a fundamental milestone in the development of the faculty of language (and consequently, of human communication) that even in the absence of linguistic input, the home-signing deaf children developed their own means to talk about remote past or future events — e.g., creating novel gestures, adapting some conventional gestures from their hearing community in order to mark temporal displacement.

Apart from releasing children's statements from their 'temporal ties,' language development also frees them from their 'spatial' and 'anaphoric' constraints. Let us consider the following example: 'A racoon chased the cat.' In this sentence, the indefinite noun phrase "a racoon" introduces a new referent into the conversation — in languages like English and French, indefinite noun phrases cannot be used to refer to given referents (De Cat, 2004; Rozendaal and Baker, 2008) —, while the definite noun phrase 'the cat' either refers to a given referent in the discourse (i.e., to a cat that was previously mentioned in the conversation) or to a cat that the interlocutors mutually know from before (Rozendaal and Baker, 2008). In relation to adding new referents to a conversation, we have seen that children at the one-word period still do not use indefinite or definite noun phrases to assign referents but rather use declarative gestures, which makes these toddlers highly dependent on the situational context15. With regards to anaphoric reference to elements (entities or propositions) that were previously mentioned in a conversation, children simply seem to omit them in their utterances (as in the example mentioned before in which the child said just 'woof,' omitting the agent of the action (the dog) that was already referred to in her conversation with her mother). This represents an insuperable barrier for managing conversations with many competing given referents, as probably is the case of most adult conversations — indeed, this seems to be a problem even for children at the beginning of the multi-word period (Salazar Orvig et al., 2010).

In this way, at the beginning children's statements are completely related to the here-and-now of speech and generally restricted to few (if not a single) referent. Then, throughout language development, children gradually shed these ties. By 24 months of age they start assigning referents that are not necessarily present in the situational context through determiner phrases in speech, and by 31 months of age they start to talk about events located in a remote past or future through linguistic resources like tense morphology, temporal adverbs etc. Finally, the emergence of anaphoric resources in language allows children to grammatically articulate different given elements of a conversation in new, asserted information — as in the case of the simple sentence 'she did it' (Lambrecht, 1994) in which all constituents have an anaphoric form but the sentence itself adds a new fact for the interlocutor.

To summarize, we have argued in this section that knowledge about kinds is grounded on (proto-)DP structures, which emerges approximately 8 months before (proto-)sentences in development. Only sentence structures can bind information to a time span and to sentential arguments, and this is the reason why the knowledge conveyed through natural pedagogy is never restricted to the referent in the situational context but generalized to all other objects of the same kind. Furthermore, we also argued that the development of linguistic resources for nominal and temporal reference in speech not only frees child statements from their spatial and temporal ties, but also allows children to grammatically connect their assertions to entities and/or propositions that were previously mentioned in a conversation. All in all, therefore, language and communicative learning go hand-in-hand in a very specific sense: the kind of knowledge that humans can exchange through communication is grounded on the linguistic structures that we are able to cognize in the course of development. In our view, communicative learning is rooted in the faculty of language rather than being a different and unconnected human-specific trait. This is a parsimonious conclusion considering that, in general, evolution is a conservative process, which means that "novel applications generally arise via utilization of preexisting mechanisms" instead of "depending upon *de novo* mutation and selection" (Richman and Naftolin, 2006, p. 7).

# Conclusion

We have defended a perspective in which language and learning from communication form two non-dissociable capacities. From this perspective, natural pedagogy represents an initial challenge, since it was originally proposed as a non-linguistic (although human-specific) capacity, both in development and evolution (Csibra and Gergely, 2006). However, we have argued in Section "Declarative Gestures: Language's Illegitimate Child" that declarative gestures — fundamental for natural pedagogy as they are the first form of referent assignment that infants can understand and produce — are the Achilles heel of this hypothesis. Firstly, children's initial vocabulary seems to be linked to the number of different kinds of objects that they point to before the onset of the one-word period (Iverson and Goldin-Meadow, 2005), which indicates that lexical concepts are being combined with declarative gestures at this moment. Furthermore, although by 10 months of age infants are still unable to produce words, they have started to understand lexical concepts insofar as they acquire generic information about referents' kinds. These symbols are also behind both, the intentionality (with a 't') and intensionality (with an 's') of declarative gestures. We have seen in Section "Declarative Gestures: Language's Illegitimate Child" that, despite the fact that animals like dogs seem to be sensitive to ostensive signals and to understand the directionality of pointing, they never expect to receive new, generic information from communication (Miklósi and Soproni, 2006; Topál et al., 2009). Humans seem to comprehend declarative gestures in a way that can only be explained in light of a system that is symbolic and referential at the same time, a system that no

<sup>15</sup>There is a dispute regarding whether children can also use pointing to 'nowempty locations' to indicate an object that is no longer present (see Liszkowski et al., 2007, for a defense of this claim and Southgate et al., 2007, for a criticism of it). Here this discussion is not fundamental because in both cases pointing has a deictic function (i.e., children use it in contingence to the immediate surrounding world, even if they are trying to denote a 'now-absent object'). Be it as it may, we will adopt for explanatory reasons the claim made by Southgate et al. (2007) that children can only use pointing in reference to present or occlude objects.

other living animal has. Evidence and parsimony suggest that language is the best candidate that we can appeal to in this regard.

Moreover, combinations of declarative gestures and lexical concepts obey a developmental pattern: children start combining pointing and isolated words to 'reinforce' the identity of referents in the situational context — e.g., pointing at a dog plus the word 'dog' — and only later in development do they combine gesture and isolated words to produce 'supplementary' meaning — e.g., pointing at a dog plus the word 'go'. We've seen that the individual onset of these stages predicts, respectively, the individual onset of determiner phrases and sentences in two-word speech, the reason why we called them proto-DP and proto-S stages.

In the same way that natural pedagogy and the proto-DP stage are two sides of the same coin, the emergence of the proto-S stage in development gives rise to a pedagogy with new properties. While natural pedagogy conveys *knowledge about kinds*, the pedagogy based on sentence structures conveys *knowledge about facts*. Knowledge about kinds would be not only generic but unfalsifiable, while knowledge about facts can be non-generic and falsifiable — being bound both to sentential arguments (expressed through definite and indefinite noun phrases, bare plurals, pronouns etc.) and to verbal inflections that specify for which time span the piece of information is supposed to be assessed as true (the past, present, or future of the time of utterance). For example, from our perspective children's capacity to understand through communication that a specific stuffed toy has fallen or got wet would rely on their mental ability to build sentence structures — a prediction testable in different populations, as noted.

Furthermore, we tried to explore in more detail the proto-DP and proto-S stages that we outlined in Section "Language and Learning from Communication as Two Non-Dissociable Capacities". First, we have seen that at the proto-DP stage, infants and young children are able to introduce referents for a conversation, but they cannot talk about them. The reason for us is related to the fact that they still do not produce sentential predicate-argument structures. Second, we have argued that at the beginning of the proto-S stage, children's statements are bound to the place and moment of the conversation: they can only introduce referents through declarative gestures and their statements are never related to a remote past or future (Morford and Goldin-Meadow, 1997). The more the use of determiner phrases and finiteness in speech increases, the more communication becomes *relational* —allowing children to introduce referents that are not present in the situational context (i.e., the 'here' of the interlocutors) and to talk about distant events in the past or in the future (i.e., the 'now' of the interlocutors) (Morford and Goldin-Meadow, 1997; Rozendaal and Baker, 2008). Finally, we have also argued that language development improves children's capacity to perform anaphoric reference to different given elements — either entities or propositions (Lambrecht, 1994) — in a conversation, which allows interlocutors to grammatically articulate them to their assertions.

In short, the faculty of language is responsible for giving rise to the different kinds of information that we can transmit or acquire through communication throughout our lives. Language does so by producing structures that are formed by a semantic 'interior' and a referential 'edge'. These structures ground different forms of nominal reference, such as 'a cat,' 'the cat,' 'this cat' etc.,16 (Martin and Hinzen, 2014), as well as different forms of temporal reference, such as 'he *refused* a job'. Assertions necessarily involve both temporal and nominal reference (the latter through the sentential arguments of the assertion), and their truth value seems to emerge as a 'spandrel' from the convergence of these 'referentialities' (together with other grammatical and prosodic features that mark the assertive character of the sentence). In taking the faculty of language as a merely symbolic system (as Enfield, 2009, and Tomasello and Herrmann, 2010, do), we cannot explain the ontology of the semantics involved — and consequently not its fundamental role in communicative learning either.

It is natural that as inquiry into language proceeds, our vision of what language is (ontology) changes along with our perspective on it (theory). A conventional formal definition of 'language' and 'linguistic structure' has widely influenced the language sciences. Although methodological abstractions such as those that are involved in the formalist paradigm can be well motivated at a time, they can also cease to be useful, as Chomsky (1965) in particular stressed. We have argued here that, instead of viewing language as an 'encapsulated' capacity with primarily formal properties, the faculty of language could be inherent to aspects of thought, meaning, and communication that are human-specific. This insight can also provide a new starting point for investigating language disorders and impact on their clinical definitions, which insofar as they involve the term 'language' are necessarily theory-dependent17 .

All in all, language (as identified and described in the terms laid out in this article) could play a more essential role in cognitive development than often supposed, leading to the co-development of specific grammatical patterns and the different forms of human communication18 . The range of this perspective could potentially be further supported through cognitive studies that explore the connection between referential linguistic structures and communicative and social abilities in neurotypical and neurodiverse populations in a comparative fashion, as well as neurophysiological and neuropsychological studies that aim to verify overlaps of our language circuitry with other cognitive capacities such as natural pedagogy.

<sup>16</sup>Not forgetting, as we mentioned in the Section "Declarative Gestures: Language's Illegitimate Child", that the position occupied by the determiner phrase in the sentence structure can prevent referentiality. In this way, in the sentence 'that guy is a thief,' the determiner phrase 'a thief' works as a predicate, not picking out any referent.

<sup>17</sup>This in particular concerns aspects of language impairment in Autism Spectrum Disorders, Specific Language Impairment, and Schizophrenia, on which we have commented elsewhere (Hinzen et al., 2015; Hinzen and Rosselló, 2015; for a synthetic statement see Hinzen and Sheehan, 2013, ch. 8). In all of these cases, language deviance may be an inherent aspect of core symptoms.

<sup>18</sup>This would be in line with the 'un-Cartesian' linguistic project of Hinzen and Sheehan (2013), which, as a program of research, does not separate human-specific forms of thought, reference, and communication from the forms of grammatical complexity with which they co-occur in our species and from which it appears they cannot be separated.

# Acknowledgments

This research was enabled by the Arts and Humanities Research Council UK, grant nr. AH/L004070/1, and the Spanish ministry for Education, Culture, and Sport, grant nr. FFI2013-40526-P.

# References


Chomsky, N. (1965). *Aspects of the Theory of Syntax*. Cambridge, MA: MIT Press.


We would like to thank Professor Joana Rosselló and Kristen Schroeder for their revisions and valuable comments on our manuscript, as well as Professor Hannes Rieser for his pertinent observations on core ideas of the present discussion from the field of formal semantics.


understanding aspectuality emerge together in development. *Child Dev.* 86, 486–502. doi: 10.1111/cdev.12311


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Mattos and Hinzen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Linguistic explanation and domain specialization: a case study in bound variable anaphora

David Adger <sup>1</sup> \* and Peter Svenonius <sup>2</sup>

*<sup>1</sup> Linguistics, Queen Mary University of London, London, UK, <sup>2</sup> Center for Advanced Study in Theoretical Linguistics, Department of Language and Linguistics, University of Tromsø – The Arctic University of Norway, Tromsø, Norway*

The core question behind this Frontiers research topic is whether explaining linguistic phenomena requires appeal to properties of human cognition that are specialized to language. We argue here that investigating this issue requires taking linguistic research results seriously, and evaluating these for domain-specificity. We present a particular empirical phenomenon, bound variable interpretations of pronouns dependent on a quantifier phrase, and argue for a particular theory of this empirical domain that is couched at a level of theoretical depth which allows its principles to be evaluated for domain-specialization. We argue that the relevant principles are specialized when they apply in the domain of language, even if analogs of them are plausibly at work elsewhere in cognition or the natural world more generally. So certain principles may be specialized to language, though not, ultimately, unique to it. Such specialization is underpinned by ultimately biological factors, hence part of UG.

#### Edited by:

*Umberto Ansaldo, University of Hong Kong, Hong Kong*

#### Reviewed by:

*Alan Garnham, University of Sussex, UK Colin Phillips, University of Maryland, USA*

#### \*Correspondence:

*David Adger, Linguistics, Queen Mary University of London, Mile End Road, London E1 4NS, UK d.j.adger@qmul.ac.uk*

#### Specialty section:

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

Received: *22 May 2015* Accepted: *07 September 2015* Published: *24 September 2015*

#### Citation:

*Adger D and Svenonius P (2015) Linguistic explanation and domain specialization: a case study in bound variable anaphora. Front. Psychol. 6:1421. doi: 10.3389/fpsyg.2015.01421* Keywords: universal grammar, domain specificity, bound variable anaphora, syntax semantics interface

# 1. Introduction

A core question in the cognitive science of language is whether explaining linguistic phenomena requires appeal to properties of human cognition that are specific to the language-using capacity of human beings. A common approach is to propose that domain general principles are at play in language without showing how these principles have the empirical reach of well established generalizations known within linguistics (Bybee and McClelland, 2005; Christiansen and Chater, 2015). This is not a strategy that is likely to lead to progress. A more promising alternative is to attempt to match up known generalizations about language with proposals about domain general principles (e.g., Culicover and Jackendoff, 2012). It seems to us, however, that a reasonable way to answer the question of domain specificity, given the current state of knowledge in cognitive science, is to develop theoretical approaches to linguistic phenomena which have as much empirical reach and explanatory depth as possible, and to evaluate the posits of such theories for domain generality. That third approach is what we engage in here.

There is nothing particularly totemic in the issue, at least from the perspective of generative syntax. We should hope that aspects of our best theories of syntactic phenomena are simply special cases of more general principles. But those more general principles are not established at the moment, at least not in such a way as provide deep explanations of even rather elementary properties of human syntax. Indeed, we think that generative syntax provides a potential way to reach those more general principles, and that human language is a particularly rich domain for the development of theories of some depth that may allow us to glimpse any deeper underlying regularities. The goal of this article is, then, to present a well-developed theoretical proposal for an important linguistic phenomenon and to show how the principles that underpin the proposal reveal that abstract, high-level principles of the computational construction of pairings of sound and meaning are at play. We then evaluate whether these principles are specific to language, concluding that the principle that licenses linguistic structures is plausibly so, while the principles that regulate how structures are interpreted are at least specialized to language, though they may be not even specific to cognition.

We will make the general argument here through the phenomenon of bound variable anaphora. The argument goes as follows: (i) the phenomenon is a real phenomenon of human language in general; (ii) there is a compelling generative theory that limns its empirical contours rather exactly; (iii) there are no equally empirically wide or theoretically compelling competing accounts; (iv) some explanatory devices in the successful theory appear to be specialized for language, as far as current understanding goes (even if analogs of them may be observed elsewhere in cognition).

Often generative syntactic analyses can be impenetrable to those trained outside of the discipline, so we attempt here to drill down to the core essentials and to make these accessible, drawing out the more general theoretical implications for cognition, and examining to what extent the theoretical principles we use are specific to linguistic cognition.

# 2. Structural Constraints on Interpretation

#### 2.1. Introducing Bound Variable Interpretations

The phenomenon we will use to make the argument here is known as bound variable anaphora. Take the English sentence in (1):

(1) No woman denies that she has written a best selling novel.

What is the meaning of this sentence? There are two that are readily discernible (Evans, 1980). One is that, from a group of women, not one denied that some individual (say Julie) had written a best selling novel. This meaning is easily accessible given either a preceding discourse to provide context, or, an individual that is salient in the context where the sentence is uttered. For example:

(2) Hello everyone. This is Julie, who's recently been in the news again. Now, no woman denies that she has written a best selling series of novels featuring female protagonists, but some deny that these novels are good for equal rights.

Following Evans, we'll call this meaning, where the pronoun receives its interpretation from the context, the referential meaning.

The second meaning is simply that, if you have a group of women, and you check all of them one by one, you will not find any who deny that they themselves have written a best selling novel. This is called the bound variable meaning.

We also find this ambiguity effect with quantifier phrases containing quantifiers other than no. For example, all of the following sentences have the same ambiguity; the pronoun can have a referential or a bound variable interpretation:

	- b. Did any woman say that she had met the Shah?
	- c. Every woman persuaded her son to organize her birthday party.
	- d. Each author decided that she should be at the signing.

We find bound variable anaphora in various languages (Déchaine and Wiltschko, 2014). For example, the Algonquian language Passamaquoddy displays the same effect (Bruening, 2001):


The following examples from Scottish Gaelic also show the same effect:


We have given these non-English examples to show that this phenomenon is not simply a grammatical quirk of English or other well studied European languages. The exact empirical contours of bound variable anaphora, as outlined here and explained below, are not, however, detectable in every language. For a language to display this particular pattern, it needs to have determiner quantifiers, which not all languages possess (Bach et al., 1995). Further, it must have a determiner quantifier that is singular. English has both singular determiner quantifiers (as in "every boy") and plural ones (e.g., "all boys"). Some languages, however, lack singular determiner quantifiers. Further, the language must ideally be able to use singular pronouns with the singular quantifier to create the relevant reading. This is also not available to all languages. Indeed, in English, the plural pronoun is often used in informal discourse, especially when the gender of the quantified noun phrase is unknown or avoided: for example "Every author was able to choose their own cover." In such circumstances, the plural pronoun can be construed as referring to a group of individuals that is constructed out of all the authors, similarly to the behavior of they in following discourse in English: "Every author was grumpy. They had been locked out of the decision about their book covers" (Kamp and Reyle, 1993; Rullmann, 2003). The existence of this strategy makes discerning true bound variable readings with plural pronouns challenging. Beyond these basic requirements, languages place various other restrictions on their pronouns which mean that quite careful investigation is required to determine whether there is a bound variable construction. However, we can control for these relevant factors by cross-linguistic investigation, and when the various conditions listed are met, the phenomenon reveals itself to be very consistent.

Bound variable interpretations of pronouns, then, arise when the meaning of a singular pronoun is dependent in a particular way on the meaning of a singular quantifier phrase elsewhere in the sentence (the importance of number and person features for bound variable meanings across languages is discussed in Kratzer (2009), Adger (2011); see Harbour (2014) for a compatible theory of grammatical number). When a bound variable interpretation is available in the examples we have seen, a referential interpretation is also available, leading to the ambiguity.

Let us turn now to structural constraints on the availability of this interpretation. In certain cases, it turns out that the bound variable meaning vanishes, and only the referential reading is left. For example:

	- b. The man that every woman loved said she had met the Shah.
	- c. The man that didn't love any woman said she had met the Shah.
	- d. That every woman seemed so sad persuaded me to organize her birthday party.
	- e. Because every author hates you, she will try to kill you.

If one pauses to think about the meanings of these sentences, it turns out that they are not interpreted as involving the pronoun's meaning varying with the quantifier in the way we have just seen. Compare, for example, (9-c) with (3-a). (3-a) can be paraphrased as "Given a set of women salient in the context, for each choice of some woman you make from that set, that woman you have chosen said that she herself had met the Shah." A corresponding paraphrase for (9-c) would be "Given a salient set of women in the context, for each choice you make from that set, the man that didn't love the woman you have chosen said that that that woman had met the Shah." But that paraphrase doesn't capture the meaning of the sentence in (9-c). In fact, the sentence only has a paraphrase that goes something like "Given a salient set of women in the context, the man that didn't love any woman you may choose from that set said that that she—some other female person in the context—had met the Shah." That is, the pronoun she is not ambiguous between the two interpretations: it is only referential. This is an odd meaning out of context, but is the only meaning available.

This same effect holds for the other sentences, and countless more pairs like them. Although we have illustrated the phenomenon just by appealing to what meanings are intuitively available for sentences here, it is experimentally robust (Kush et al., 2015).

We also see bound variable readings disappear in Passamaquoddy and in Scottish Gaelic, in certain circumstances. (The <sup>∗</sup> in the examples here marks not ungrammaticality, but rather the unavailability of the bound variable reading).

(10) <sup>∗</sup> Ipocol because psi=te all=EMPH wen someone Sipayik Sipayik k-nacitaham-oq, 2-hate-INV kt-oqeci=hc 2-try=FUT nehpuh-uk kill-INV "Because everyone at Sipayik hates you, he will try to kill you."

And in Gaelic

(11) a. <sup>∗</sup>Thuirt say.PAST duine man a that bhruidhinn spoke ris to gach each caileag girl gun that robh be.PAST i she tinn sick "A man that was talking to each girl said she was sick." b. <sup>∗</sup>Air sgath 's because gun that do bhuail hit.PAST thu you gach each balach, boy ruith run.PAST e he air falbh away "Because you hit each boy, he ran away."

In examples like those in (9), (10), and (11), the quantifier precedes the pronoun just as it does in the examples in (1) and (3). However, the bound variable reading is available in (1) and (3) and is unavailable in (9), (10), and (11). So the issue is not (merely) one of precedence. Various proposals have been put forward in the generative literature as to what, exactly, is responsible for the difference. The current consensus is that there are two interrelated factors involved: semantic scope and syntactic command (Safir, 2004; Barker, 2012; Déchaine and Wiltschko, 2014).

#### 2.2. Scope

Scope is simply a name for the fact that the interpretation of certain units of language is computed as a subpart of the interpretation of larger units, a cognitive factor that plausibly exists elsewhere than in language. The larger unit is said to take wide scope over the smaller unit. Consider the following cases:

	- b. An author thought every book was good.
	- c. An author thought Julie had read every book.

In (12-a), there are two meanings. In one meaning, we interpret the phrase an author as dependent on the interpretation we provide for every book; that is, the semantic computation that builds the meaning of every book includes a meaning assigned to an author. In the other, the dependency is the other way around. We can make this intuition explicit by sketching a procedure to compute the meaning of the quantifier phrases. Let us take a simpler example first:

(13) Every book is interesting.

We can treat computing the meaning of every book as involving three separate computational procedures (Peters and Westerståhl, 2006):

	- b. Identify the property which is characterized by the "scope" (the rest of the clause)—in this case, being interesting.
	- c. Apply a quantificational operator (in this case every) to determine whether every element of the set of books is such that the property of being interesting holds of it.

Similarly, we compute the meaning of an author by taking a set of authors and checking whether a condition represented by the rest of the sentence holds of one of the elements of that set.

	- b. Identify the property which is characterized by the "scope" (the rest of the clause)—in this case, winning this week's lottery.
	- c. Apply a quantificational operator (in this case an) to determine whether at least one element of the set of authors is such that the property of winning this week's lottery holds of that element.

These trivial cases are then put together for our example (12-a). We can take either the set of books first, and then compute the condition that holds of every book as involving an author, or we can take an author first, and then see whether the condition involving every book holds of an author. This gives us two distinct meanings.

Let's take every book first:

(17) Take a set of books salient in the context. Now go through the books one by one, and for each choice you make of a book, see whether an author (from a salient set of authors) has read that book. Going through the set of books, ensure that for all of the choices of book some author has read the book chosen.

This process implies that it is possible to have a different author for each book. This is the wide scope reading for every, as the computation of an author takes place within the computation for every book. The other meaning of an author read every book works out as follows:

(18) Take a set of authors salient in the context. Now go through the authors one by one and for each choice made, go through the set of books salient in the context and see whether the author you have chosen has read every member of the set of books. Ensure that there is at least one author of whom this condition holds.

This is the narrow scope reading for every. The crucial empirical difference is that in the wide scope reading for every book, we can have a different author picked for each different book, while in the narrow scope reading, once we've picked our author, that author needs to have read every book for the interpretation to be true.

It turns out that there are structural constraints on the scope of quantifiers. Consider the sentence in (12-b): this doesn't have the wide scope reading for every. Neither does the sentence in (12-c). This is because a quantifier cannot scope outside the tensed clause it is in. This idea, that certain semantic effects are bound into local syntactic domains, is of venerable descent in linguistics, originally due to Langacker (1969). We'll call it the Command Generalization:

(19) The Command Generalization: A quantifier scopes over everything in the minimal finite clause it appears in.

## 2.3. Applying Scope to Bound Variables

The generalization that seems to be most effective in determining when a quantifier phrase can bind a pronoun is the following (this is just a descriptive generalization, not a theory, as yet):

(20) The Scope Generalization: For a quantifier to bind a pronoun it must scope over that pronoun.

For example, consider the following example:

(21) Every woman says that she has written a best selling novel.

This sentence has the following rough paraphrase: take a set of women. Now go through that set one by one, and see whether, for each choice of a woman, that woman said that she, herself, wrote a best selling novel. For the sentence to come out true, all of the choices of individuals from the set of women should work.

Now compare that to the following case:

(22) A man who every woman likes says that she has written a best selling novel.

If the quantifier phrase every woman could scope over the rest of the sentence, it should be able to bind the pronoun. But we can independently tell that every woman is restricted in its scope. If we put a quantifier phrase like an author in place of she, we get:

(23) A man who every woman likes says that an author has written a best selling novel.

We can see that every woman doesn't, descriptively, scope over an author, because the sentence doesn't have a reading where the authors potentially change for each choice made from the set of women. So the Scope Generalization correctly correlates the capacity of a quantifier to scope over the pronoun with its ability to bind the pronoun. The Command Generalization captures why the quantifier doesn't have wide scope over the pronoun in this sentence: the quantifier is "trapped" within the finite (relative) clause who every woman likes.

Together, the Scope Generalization and the Command Generalization do a good job of capturing the data we have seen. Consider again, our first example:

(24) No woman denies that she has written a best selling novel.

Here, the smallest finite clause containing the quantifier phrase no woman is the whole sentence. That sentence contains a further clause that she has written a best selling novel and that clause contains the pronoun. So no woman scopes over the pronoun she and she can therefore have a bound reading, in the way described above. For the sake of visualization, we can represent this as a tree-like structure, where the scope of a quantifier phrase is its sister in the tree:

Compare this with the corresponding example from (9), which lacks a bound variable interpretation:

(26) A man who no woman likes denies that she has written a best selling novel.

No woman is in a finite (relative) clause of its own who no woman likes. It cannot therefore take scope over the whole sentence, so the pronoun she cannot be bound. Again, we can visualize the structure in a tree-like fashion:

(27)

Here the scope of the quantifier phrase is again its sister in the tree, but the sister of no woman is just the verb likes, and so the quantifier phrase does not scope over the pronoun.

Our descriptive generalizations also capture the fact that the bound reading vanishes in examples like the following:

	- b. She didn't believe that I had been introduced to any woman.
	- c. She expected that each author's book signing would be private.

Here, the quantifier phrases are inside an embedded finite clause, and the Command Generalization stops them scoping over the whole sentence, so the pronoun cannot be bound. (28-a), for example, can't have a paraphrase where for each individual chosen from a set of women, that individual persuaded the Shah to imprison her.<sup>1</sup>

Summarizing, we have seen that the phenomenon of bound variable anaphora is a real phenomenon, appearing crosslinguistically in unrelated languages when the conditions allow it to be detected. We have also seen that its empirical distribution can be described by a number of high-level descriptive generalizations:


Returning to the core issue, these generalizations appear to involve concepts that are quite specific to language: quantifier, binding, pronoun, scope, minimal finite clause. If we accept the generalizations in this form, it would seem that we are committed to highly domain specific analyses for this phenomenon. Indeed, that conclusion was adopted by generative grammar in some form in the 1980s and is consistent with a view of the evolution of language that sees it as an accretion of small evolutionary steps (e.g., Pinker and Bloom, 1990). However, current proposals derive these generalizations from more abstract principles and it is these, we believe, that should be evaluated for domainspecificity.


The examples in (i) do not have bound variable readings. This doesn't follow from what we have said so far.

These phenomena (noted for questions by Postal, 1971, extended to quantifiers by Chomsky, 1976, and dubbed Strong and Weak Crossover, respectively) cannot be captured by the Scope and Command Generalizations alone. Various approaches have been taken to this phenomenon, the Weak Crossover case is variable across languages, and there is no clear consensus on its analysis. We will not attempt to capture this data here.

<sup>1</sup>There is one final aspect to the phenomenon of bound variable interpretations which is not captured by scope and command: sometimes a quantifier phrase can take scope over a pronoun, but it cannot bind it. If all that is required is the Scope Generalization, examples like the following should be well formed with a bound variable reading:

# 3. A Theoretical Account

Generative accounts of linguistic phenomena are couched at a level of analysis that is close to Marr's (1982) Computational Level. That is, the theory specifies a system that guarantees a particular pairing of sounds and meanings across a potentially unbounded domain. A helpful analogy would be an axiomatized theory for arithmetic, that can specify, for a potentially infinite set of pairs of integers, what the sum is. How people actually add, that is, how they use this system, is distinct from what the system is. The kinds of empirical effect described above, when structures are ambiguous or not between referential and bound variable interpretations of pronouns, is specified by the system at the computational level, rather than being a side effect of processing. How the system is put into use in parsing, production, etc., is a distinct question (Chomsky, 1967 et seq).

Within current generative grammar, one approach that has been taken to the core question of how to pair up particular linguistic forms of sentences with their meanings is the theory of Merge. Merge is a principle of structure generation that is incorporated into a theory of what legitimate syntactic structures can be. It says that a syntactic unit can be combined with another syntactic unit to make a new syntactic unit, providing unbounded resources for the use of language.

We can recursively define a syntactic unit as follows (cf. Chomsky, 1995):

	- b. If A and B are syntactic units then Merge(A, B) = {A, B} is a syntactic unit.

This theory takes us from a finite list (of word-like atomic lexical items) to an unbounded set of hierarchical structures. (31) is a theory of what the legitimate structures in human language are, presumably neurally implemented (Embick and Poeppel, 2015). But these structures cannot be used as language unless they interface with the systems of sound and meaning. The definition of syntactic unit, incorporating Merge, in (31) is not sufficient for specifying language unless we add a set of principles for mapping those objects to interpretations in terms of sound and meaning. This is a point that often goes under-appreciated in literature, following Hauser et al. (2002), about whether language just consists of recursion.<sup>2</sup>

One such mapping principle has to do with the periodicity that regulates the transfer of syntactic object to the phonological and semantic systems: the idea is that this mapping takes place at certain points in the construction of a syntactic object (again, keeping to the computational level here). We will take these points to be finite clauses; though that is a simplification (Chomsky, 2008), it is sufficient for our purposes here. This is our first interface mapping principle:

(32) Transfer: Transfer the minimal structure containing the finite complementizer to phonological and semantic computations. Once a structure has been transfered, it is no longer accessible to further syntactic computation.<sup>3</sup>

The phonological and semantic computations transduce information delivered by the structure building system into forms that can be used by mechanisms of processing, production, planning, etc.

These two very general theoretical principles, Merge and Transfer, are motivated by empirical phenomena unconnected to bound variable anaphora. Merge is motivated by the need to capture basic constituency and hierarchy effects in human language, while Transfer (of finite clauses) is motivated by the special status finite clauses have in syntactic phenomena in general: they are the locus of subject case assignment, of semantic tense specification, and of locality domains for displacement operations (Adger, 2015, for review). However, these two ideas, as we will show, take us a long way in capturing the empirical distribution of the bound variable interpretation phenomenon, which we now turn to.

We notate syntactic units as sets. When a syntactic unit is transfered, the result is notated as a set, flanked by a phonological representation above and a semantic one below.

We simplify phonological representations massively by using orthographic representations and a simple concatenation operator ⌢ to represent string order. There is far more structure in phonological representations, including information about prosody, phonological phrasing, and segmental properties, but we will ignore this here.

We simplify semantic representations by using a simplified logical representation with variables and connectives augmented by a representation for natural language quantifiers. Following much work in semantics, as well as the discussion above, we take a quantified sentence to have three semantically contentful parts: a restriction, the quantifier itself, and a scope (Barwise and Cooper, 1981). These correspond to the computational operations described above: identifying a salient set in the context, quantifying over it, and determining whether a condition holds of the members of the set picked out by the quantifier. We notate these three parts, as is standard, by writing the quantifier plus the variable it binds, a colon, then the restrictor in square brackets followed by the scope in square brackets, thus:

(33) Q x:[...x...][...x...]

This set of simplifying assumptions about the interface mappings will suffice for our purposes here.

Now consider the derivation of the sentence in (34). This derivation should be understood as a computational specification

<sup>2</sup> In their words, "We propose in this hypothesis that FLN [the faculty of language in the narrow sense] comprises only the core computational mechanisms of recursion as they appear in narrow syntax and the mappings to the interfaces" (Hauser et al., 2002, p. 1573) [emphasis ours].

<sup>3</sup>What constitutes a finite complementizer across languages will be left unspecified here. For English, the embedding finite complementizer is that, while matrix finite clauses have no pronounced finite complementizer. In the derivations below, we'll simply assume that finite matrix clauses are transfered once completed, but it should be borne in mind that technically there is a more complex syntax involved. There is also a question to be answered about relative clauses, where, as we will see the minimal structure containing the finite complementizer is transfered only once the particular requirements of that complementizer are all satisfied. This provides certain elements (in wh-questions, topicalizations and relative clauses, but not quantifiers) with a limited capacity to evade locality effects. We will abstract away from these further details here.

of a sound-meaning pairing, much as a proof in logic is a computational specification of a theorem derivable from a set of axioms. This computational specification is part of a particular linguistic action [say an utterance of (34)], but does not causally determine the action.

	- b. Merge (that, {he, danced}) = Transfer, since that is a finite complementizer

that⌢he⌢danced ←PHON {that, {he, danced}} SEM→ y danced

Here the hierarchy partly determines order and the pronoun is semantically translated as the variable y.


x=y]

As the phonological and semantic information is transfered to the relevant interfaces, information about linear order, pronunciation, and semantic interpretation accretes. Crucially, the statement that the variable x has the same value as y is added within the scope of the interpretation of the quantifier noone, just as in the informal paraphrase given in the last section. This ensures that it is interpreted as bound. Of course, we can equate x to another variable not in the scope of the quantifier, in which case we get the referential reading, thus accounting for the core ambiguity we began with. Equation of variables in itself could conceivably be a purely semantic, possibly non-linguistic process, at the heart of anaphoric dependency of all sorts, but the bound interpretation is constrained by how the building up of structures interacts with their interpretation.

Now let us look at a case where variable binding is not possible:

(36) Friends that no woman knew said that she danced.

In the following derivation, steps (a–c) build up the verb phrase said that she danced and steps (d–h) independently build up the subject Friends that no woman knew. Although (d–h) is ordered after (a–b), this is just an artifact of writing down the derivation. One can think of these as separate derivations taking place in parallel.

(37) a. Merge(she, danced) = {she, danced} b. Merge (that, {she, danced}) =

> that⌢she⌢danced ←PHON {that, {she, danced}} SEM→ y danced

Steps (a-b) build up the embedded clause that she danced, which contains the pronoun of interest.


The subject friends that no woman knew involves a further Merge operation that takes the object of the verb knew, which is the unit friends, and Merges it with the whole structure that no woman knew friends. This happens in English because of a property of relative complementizers that triggers this displacement. Languages vary in whether relative clauses involve this kind of displacement Merge, with some leaving the object in its base position (Cole, 1987).

At this point, the whole relative clause is built up. Following the Transfer principle, what is transfered is the unit containing the relative complementizer that:

that⌢no⌢woman⌢knew ←PHON {friends, {that, {{no, woman}, {knew, friends}}}} SEM → λy: No x:[x is a woman] [x knows y]

In English, as just mentioned, only the higher of the two occurrences of friends is pronounced. In other languages, the lower occurrence is pronounced. We do not know of languages where both occurrences are pronounced. This suggests another mapping principle:

**Pronounce Once**: When a single object appears at more than one position in a structure, pronounce only one instance.

This principle, together with the Transfer principle, gives us the phonological representation above.

The semantics associated with this piece of structure is the tripartite structure we are familiar with, whose domain is restricted to a set of women, and whose scope is the verb phrase of the relative clause (basically the verb knew and its object). We adopt a standard approach to relative clause semantics (Heim and Kratzer, 1998): the transfered object friends is just translated to a variable bound by the relative complementizer that, and we notate this semantics in the standard way as λy:[...y...].

i. Merge({friends, {that, {{no, woman}, {knew, friends}}}}, {said, {that, {she, danced}}})


The final chunk of the derivation combines the whole subject with its VP. The VP is built up in step (c), and the output of that is Merged with the output of step (h). Phonologically, we simply concatenate these in the order required by English. Semantically, we take the bare noun friends to be interpreted with an existential quantifier some. We identify the variable this quantifier binds with that of the relative clause, and that is the variable that is the subject of the verb phrase. The pronoun in the embedded clause is translated as a further variable.

At this point, however, it is not possible to connect x and w, since the interpretation of the quantifier phrase no woman has already been completed, and the variable x has been fully interpreted, before w is encountered. This derives the simple cases of the Scope Generalization directly from very general principles of the relationship between syntax and semantics: the pronoun cannot be interpreted as bound unless it is computed within the scope of the quantifier.

The more outré effects of the Scope Principle are also amenable to the same set of basic principles. Recall that a quantifier can scope over everything inside the finite clause it is immediately contained within. With this in mind, consider the derivation of (38):

	- b. Merge({every, author}, danced) = {{every, author}, danced}
	- c. Merge(that, {{every, author}, danced}) = {that, {{every, author}, danced}}

that⌢every⌢author⌢danced ←PHON {that, {{every, author}, danced}}


```
she⌢believed⌢that⌢every⌢author⌢danced
  ←PHON
```
{she, {believed, {that, {{every, author}, danced}}}} SEM→ y believed that Every x:[x is an author][x danced]

The variable x is fully computed with values assigned, before y is introduced. It follows that the meaning of the pronoun she cannot depend on the quantifier, so the bound variable interpretation is correctly predicted to be unavailable.

Compare this to the following case:

	- b. Merge({every, author}, publicist) = {{every, author}, publicist}
		- c. Merge(loved, her) = {loved, her}
		- d. Merge({{every, author}, publicist}, {loved, her}) = {{{every, author}, publicist}, {loved, her}}
		- e. Merge({every, author}, {{{every, author}, publicist}, {loved, her}}) =

every⌢authors⌢publicist⌢loved⌢her←PHON {{every, author}, {{{every, author}, publicist}, {loved, her}}} SEM→ Every x:[x is an author][THE y:[y is

publicist of x][y loves w and w=x]]

In step (e), the Merge operation allows the quantifier phrase every author to scope, in its finite clause, higher than the pronoun. This computational step is usually called Quantifier Raising, and is a syntactic way of marking the semantic scope of the quantifier, but in the theoretical system it is just another application of the operation Merge.

Just as we saw with the relative clause case, a single syntactic unit (in this case the quantifier phrase every author) is Merged with the larger unit that contains it, creating two occurrences of the phrase. One occurrence of this quantifier phrase is now high in the structure. This means that when its semantics is computed, it takes scope over the whole clause. The upshot of this is that the variable introduced by the pronoun is introduced at a point where the variable bound by the quantifier is still being computed. This allows them to be identified (notated here as w = x) and the bound variable reading to arise.

On the phonological side of the computation, one of the occurrences of the quantifier phrase is not transfered to the phonological component following the mapping principle Pronounce Once (just as we saw with the relative clause). For the case of quantifiers in English, it is the higher rather than the lower occurrence that is not transfered, giving us the effect that the quantifier is interpreted high in the structure, but pronounced low. No extension of the computational technology already appealed to is necessary to capture this. Which occurrence is pronounced is a point of cross-linguistic variation; for example in Hungarian the higher occurrence is pronounced (see Kiss, 1981).

We might ask whether we could follow the same kind of derivation we have just seen, and allow the quantifier to Merge higher in (38), hence generating the unattested binding possibilities. However, recall that transfer applies to finite clauses and that once a finite clause is transfered, no further computation is possible. Given this, the quantifier phrase in (38) cannot be moved to a position where it scopes over the pronoun.

The principles sketched here are sufficient to capture the phenomena we have surveyed. The effects of the Scope and Command Generalizations emerge from possible Merge operations interacting with the way that finite clauses are transfered to the phonology and the semantics. We have suceeded in making the descriptive generalizations special cases of much more general principles of structure building and how structures are mapped to the interfaces. We have not shown here how these more general principles play a role in explanations of other phenomena, as this would entail a book rather than a paper. However, these general principles of structure building and mapping to the interface are effective in deriving a slew of generalizations about the syntactic structure of human languages.

### 3.1. Further Predictions

The theoretical work we have just done, however, goes beyond our core generalizations, because bound variable interpretations interact in a complex way with other phenomena. The following cases do not follow from the generalizations directly, but they do follow from the theoretical system:

	- b. Which of his relatives forced the sybils to decree that no man was innocent?

In (42-a), the pronoun can receive a bound variable interpretation, which is not available in (42-b). Why should this be?

Consider (42-a) in more detail. It includes the phrase which of his relatives, which is interpreted as the object of the verb love in the embedded clause. This entails that it is initially Merged with love in a derivation that then later involves the Merge of no man. The phrase which of his relatives is then Merged again with the finite clause, and the remainder of that finite clause is transfered to the phonological and semantic systems, just as we saw for relative clauses above. This means that our derivation will reach a point that looks as follows (we do not show the internal structure of which of his relatives):

(43)

#### that⌢no⌢man⌢love⌢← PHON

{{[which of his relatives]}, {that, {{no, man}, { may {love, {[which of his relatives]}}}}}}

SEM → y: no x:[x is a man][x may love y: y is a relative of z and z=x]

Here the variable z is introduced for the pronoun his at a point in the computation where the phrase which of his relatives is in the scope of the quantifier phrase no man. When the finite clause that no man may love is transfered, the syntactic unit which of his relatives is in the object position, and so what is transfered to the semantic computation is a structure where the pronoun's interpretation is computed within the computation of the quantifier phrase. Because of this, we can add the condition that z = x, where x is the variable introduced by the quantifier phrase. The higher occurrence of the phrase which of his relatives then undergoes further Merge, after the introduction of the material in the higher clause, to derive the whole sentence with the bound reading.

Compare this, however, to (42-b). Here the phrase which of his relatives is the subject of the higher verb force. It is never, therefore, in the scope of the quantifier phrase no man at any point in the derivation, and there is therefore no means of allowing the pronoun his to be bound by that quantifier. The underlying system of computations that build structure and transfer it to phonological and semantic systems correctly predicts a rather sophisticated distribution of form-meaning relations, going well beyond the basic descriptive generalizations.

We have now come most of the way through the argument. We have introduced the phenomenon of bound variable readings and seen that it is present cross-linguistically; we have outlined the core aspects of the phenomenon and shown how the descriptive generalizations about the phenomenon derive from a theoretical account built on deep, abstract principles stated at a computational level of analysis that specifies the soundmeaning relationships for an unbounded set of structures. We have also shown how that system extends to the interactions between bound variable anaphora and other syntactic and semantic phenomena. Before we evaluate the domain-specificity or domain-generality of these principles, however, we should ask whether there is a compelling alternative account of this phenomenon that does not appeal to operations that build and interpret structure.

#### 3.2. A Cognitive Grammar Account

The answer to this question is that there is not. The only in depth discussion of the phenomenon that is non-generative and covers a similar range of empirical phenomena is van Hoek (1996), who provides an investigation of bound variable anaphora within the framework of Cognitive Grammar. Van Hoek argues that whether a pronoun can be bound is dependent on the salience or prominence of the quantificational antecedent. For the relevant cases, she defines salient as occupying the Figure in a Figure-Ground structure. Figure Ground relations are plausibly used across cognition (Talmy, 1975). The Figure Ground relationship is conceived of purely semantically in van Hoek's work. We give here a standard specification of how this relation is to be understood within language (Talmy, 2000, p. 312):

	- b. The Ground is a reference entity, one that has a stationary setting relative to a reference frame, with respect to which the Figure's path, site, or orientation is characterized.

No doubt the notion of Figure-Ground relation is an important semantic schema in cognition. However, contrary to van Hoek's proposal, it does not seem to be implicated in defining salience for bound variable anaphora. There are numerous cases where the subject of a sentence is the Ground, rather than the Figure but this does not impact on the distribution of bound variable anaphora.

Talmy gives examples such as the room filled with smoke, where the Figure is the smoke which moves or changes with respect to the room, which is therefore the Ground. In van Hoek's approach, we would expect the object to act as a salient antecedent for a pronoun in the subject position, but this is not what we find, using examples modeled on Talmy's pattern Ground filled with Figure:

	- b. <sup>∗</sup> Its vase filled with each blooming flower.

Here we find that a quantifier phrase which is semantically the Ground can bind a pronoun in the Figure, and conversely that a quantifier phrase that is the Figure cannot bind a Ground pronoun.

The verb contain, by definition, also has a Figure as object and Ground as subject. Again, if the Figure is always salient, van Hoek's system incorrectly predicts the wrong binding possibilities:

	- b. <sup>∗</sup> Its initial chapter contains a synopsis of each book.

Some action verbs, especially those of consumption, have been analyzed as involving a Figure object moving with respect to a Ground subject. Once again, the binding patterns we see empirically are unexpected on an approach like van Hoek's.

	- b. <sup>∗</sup>His child gobbled up each father.

In all of these cases, the Figure is the object, and hence, in van Hoek's proposals, the possible binding relations should have exactly the reverse distribution from the standard cases. One might try to rescue the system by proposing some special semantic relation to be associated with subjecthood that overrides Figure-Ground relations, but that, of course, would be circular in the absence of an independently verifiable, purely semantic specification for what a subject is. Van Hoek provides no such specification.

One might attempt to supplement van Hoek's proposal by appealing to information structure effects on salience. For example, we could ensure that the relevant set of books is preestablished in the context, and that universal quantification over this set is also pre-established, and further we can ensure that the quantifier phrase is a Figure. But still the structural facts override all of these potential cues and are determinant of what the binding possibilities are. Binding from a highly salient Ground object into a pronoun in the subject position is impossible:

(48) There are a whole lot of new books on display at the convention this year and they've all got something in common: <sup>∗</sup> Its initial chapter contains a synopsis of each book.

We do not want to deny that pragmatic principles may have an impact on the processing of bound variable anaphora as it is clear that this is a factor in understanding the full empirical range of effects (Ariel, 1990). Effects of temporal order (the quantificational binder normally precedes the bound pronoun, though see (42-a) for an example of the opposite) may well fall into this category. However, such principles do not, by themselves, explain the empirical distribution of the phenomenon.

There is a larger issue connected to domain specificity that emerges from attempts, like van Hoek's, to explain bound variable anaphora, and other syntactic phenomena, by appeal to nonstructural, cognition-wide, properties. Structure, when at play, always trumps the effect of semantic, informational, pragmatic or social properties. If phenomena are not structurally constrained, then we need explanations for why such factors do not regularly play a part in determining bound variable anaphora.

Languages vary according to what kinds of expressions can be bound by quantifiers (Déchaine and Wiltschko, 2014), but they are always restricted structurally. This is striking, especially since pronouns can refer to entities which are salient or prominent in the discourse context in a variety of ways.

For example, pronominal elements like that and it can be differentiated by a measure of givenness (Gundel et al., 1993). According to Gundel et al. (1993), it refers to the focus of attention in the discourse at the time, whereas that picks out a referent which is "activated" in the discourse, i.e., brought into current short-term memory, normally by being mentioned, but is not in the focus of attention. This is illustrated in the following pair (modeled on examples from Gundel et al.), where the subject in (49-a) is naturally understood as the focus of attention and can be referred back to by it. In contrast, the dog in (49-b) is not naturally understood as the focus of attention, and hence it is infelicitous in the continuation (as indicated by #), but since the referent is activated, it can be referred to by that.

	- b. Ikea delivered playground equipment to my neighbor with the rottweiler this morning. #It's the same dog that ate my cat's food last week [ok: That's the same dog that ate my cat's food last week].

Since notions like focus of attention are linguistically relevant in the choice of it vs. that, we might expect to find a language in which the same categories of givenness are relevant to quantifier binding. For example, the focus of attention, if quantified, would be able to bind a pronoun, as in the following example.

(50) Every one of my neighbor's dogs chased one of my cats. #It's the same dog that ate the cat's food last week.

Here, the bound reading would be that there are pairings of dogs and cats, where the dog that chased a cat also ate that particular cat's food. Such a reading is impossible in English as seen in (50), even though the quantified subject is in focus and should therefore be a legitimate antecedent for English it.

Compare the salient bound reading when the structural conditions on quantifier binding are met, in (51).

(51) Every dog chased one of my cats before it ate the cat's food.

What is important is not that English doesn't allow a bound reading in (50), it's that no language has been reported which does. This suggests that the mechanism for assigning reference to pronouns is not highly variable, across languages; it can pick up a non-quantified focus of attention without structural conditions, as in (49-a), but it can be bound by a quantifier only when introduced in the phase of the derivation in which the quantifier is interpreted, as argued above and as illustrated by the infelicity of (50).

In fact, much more exotic language systems are imaginable, and it is quite striking that they are unattested. There is remarkable cultural diversity, for example concerning how important social hierarchy is to a society. Some societies have complex systems of rank and class and their languages have complex ways of encoding respect and deference and entitlement, as in Japanese. Other societies are relatively egalitarian and their languages lack these honorifics, for example traditional Khoi society (Lee, 1979).

If languages interacted with general cognition in unrestrained ways, we might expect to find a language in which the honorific system was so important that it mattered for aspects of syntax such as quantifier binding. Imagine a language in which only socially superior entities could bind quantifiers. In this language, a speaker could have a bound reading for (52-a), but (52-b) could only have the referential reading for the pronoun.

	- b. Every slave called to his master.

Once again, such a language is unattested, suggesting that at least some aspects of pronominal reference resolution are language specific, and not permeable to arbitrary cognitive domains.

# 4. Domain Specific, Domain Specialized or Domain General?

The purpose of this section is not to argue that the principles (Merge, Transfer, Pronounce Once) so far discussed are, or are not, specific to language, but rather to sketch out the kinds of issues that can be addressed, and directions for investigation that can be pursued, once principles with explanatory depth and empirical reach are established. The principles we have identified can easily be understood as specific to language (a traditional view). This section argues that it is perhaps possible to understand them as language-specialized versions of very general cognitive and computational factors, though this is speculative. Crucially, however, these principles are mysterious when viewed from the perspective of communication, interaction, and general learning, concepts which provide little theoretical traction on important empirical phenomena of syntax and semantics.

Explanation of the unbounded link between structure and meaning requires a recursively specified procedure, or its equivalent. This is an underappreciated point. Some cognitive mechanism must be able to generate, and not simply retrieve, a form-meaning pair, since the number of such pairs is both in practice and in principle too large to store. Once there is such a mechanism, there is a generative system that restricts the possible form-meaning pairings. The fact that some structures (for example those involving center embedding of elements of the same category) are difficult to process, or are never used, is irrelevant to the question of whether there is such a procedure, for reasons understood since Miller (1956), contra Christiansen and Chater (2015). The particular formation of Merge we have given, in addition, generates constituent structures with maximal levels of branchingness (two, for the formulation we adopt here) and a scaffolding on which to hook compositional construction of meaning. We have modeled this operation as a set formation operation applying to elements in a restricted domain. We will also use the term Merge as the name for the modeled physical properties.

As we have presented Merge, its domain is restricted to what we called "syntactic units" in (31). It operates on discrete linguistic units (morphemes or words) to create larger, structured, discrete units (phrases). There seem to be few other cases of systems displaying this kind of generative nature elsewhere in human cognition. Arithmetic and tonal music have been discussed as recursive generative systems that involve similar structure-building operations (Hurford, 1987; Rohrmeier, 2008). Suppose that we posit A-Merge and T-Merge alongside L-Merge for the structure-building operations involved in arithmetic, tonal music, and language, respectively. The difference, if any, would lie in what domain these various kinds of Merge are restricted to: tonal music combines elements with sound but no meaningful content, and arithmetic combines elements with abstract content (which can be counted) but no fixed sound, while language combines elements which are pairings of meaning and sound (or other externalizable form).

Humans have natural capacities for arithmetic and tonal music differing substantially from the natural abilities of the other primates (Tomonaga and Matsuzawa, 2000; Carey, 2001). There is good evidence, in fact, that nonhuman primates lack Merge (Yang, 2013), which entails that there was an evolutionary event which led to human brains having Merge. At the same time, many cultures do not develop any arithmetic (Izard et al., 2008) or tonal music (Lomax, 1968; Wallin et al., 2000), so it is fairly clear that the pressures of natural selection could not have led to humans as a species having these particular abilities (as Darwin, 1871, noted). One is led to the conclusion that either A-Merge and T-Merge are the same thing as L-Merge, or biproducts of it, or else a single evolutionary event led to all of the different kinds of Merge. These three apparent alternatives may simply reduce to a matter of how the terms are defined.

It is clear that language is used as a communication system and that it makes central use of Merge; but it is less clear that Mergebased communication provided an evolutionary advantage that caused Merge-endowed brains to be selected for. It is just as plausible that Merge-endowed brains had some other advantage, for example in planning, or in reasoning, or in memory. In fact, Chomsky (1966, 2010) has speculated that the generative system of language might essentially be a system of thought, not of communication; communication would be something one can do with language, once it is "externalized" (i.e., pronounced audibly, or articulated visually or tactilely).

This scenario changes the terms of the question of whether Merge is specific to language; language in fact takes on a much larger role as a central part of cognition. In this scenario, Merge is not specific to language-qua-communication system. Merge is rather a property of a more general system of symbolic thought, a core component of language understood to be a generative system. The question of whether arithmetic and/or tonal music are also instantiations of it is secondary, since plausible evolutionary paths suggest that arithmetic and tonal music were not causally central to the philogenetic emergence of Merge.<sup>4</sup>

We might consider the other two principles we appealed to in an analogous manner. These principles govern the way that the structures generated by Merge are interpreted by phonological and semantic systems. The first of these principles is the following.

(53) Transfer: Transfer the minimal structure containing the finite complementiser to phonological and semantic computations. Once a structure has been transfered, it is no longer accessible to further syntactic computation.

This principle actually has three components: (i) it imposes a periodicity on the transfer of information between the structure creating and the interpretive systems; (ii) it imposes an opacity condition so that transfered structure is inaccessible for further computation; (iii) it specifies finiteness as a flag for the application of transfer. We take these in turn.

The theoretical architecture we defended as an analysis of bound variable anaphora (and many other syntactic phenomena) is stated, as we said, at the computational level—it specifies what function is computed. But the particular principles we have used are fundamentally computational in a different sense too: they involve the alteration of discrete structures according to a set of rules applying to these structures. Merge creates and manipulates an unbounded set of discrete structures of certain forms from a finite list of discrete inputs (roughly, abstract representations of words or morphemes). It is the computational nature of Merge that allows it to provide an explanation for the fundamental fact that human languages can be unbounded in how they connect forms to meanings. Periodicity in computation, a core aspect of (53), is plausibly a general natural law, going beyond domain general laws of cognition (Strogatz and Stewart, 1993). Periodicity also appears to be ubiquitous in biological phenomena, possibly evolving as a side effect of efficiency conditions relating successful organisms to their environments within constraints imposed by physical law (Glass and Mackey, 1988). It is certainly speculative, but at least the periodicity part of (53) may be a factor that is domain general, not only with respect to human cognition, but also to physical or computational systems in general. If that is true, then the organization of information transfer between Merge-built structures and other systems of the mind is not language specific, not cognition specific, not human specific and possibly not even biology specific.

However, there is more to (53) than just the periodicity of the transfer of syntactic information. There is also the notion that syntactic information, once transfered, is no longer accessible to further computation. This idea is not only important in capturing the limitations on quantifier scope, but also for locality effects elsewhere in syntax, such as the ubiquitous locality effects seen in long-distance dependencies (Chomsky, 1973, et seq). Locality of this sort may also be reducible to more general properties. Any computational system requires organized space (such as a lookup table), which stores information that is used multiple times in a computation. Again, there is some speculation here, but it does not seem implausible that such storage space is limited in human cognition, so that once the syntactic information is transfered, the relevant storage space is no longer available at the next stage of the computation. Working memory in other areas of human cognition (when used, for example, in processing language or other information) is known to be restricted (Miller, 1956; Baddeley, 1992); storage space in the computation that defines well formed structures in a language may be likewise restricted. This would be a case of a general principle of space optimization, which applies across cognition and hence is domain general, operating in a specialized way within the syntactic system to restrict the space available for computation.

It is important, however, to note that these domain general principles (periodicity and space optimization) are applying to linguistic data structures (structures generated by Merge) not as principles of processing, but at a Marrian computational level, as principles that constrain the range of possible syntactic objects. We draw much the same lesson here as we did in our discussion of Merge: the same abstract principle may be at work in different domains of cognition, and how it plays out in those domains will be affected by the nature of the primitives of those domains. So the operation of the principles is specialized to the particular structures in the relevant domain, but the principles themselves may be entirely general.

The final aspect of the Transfer principle we have not discussed is the idea that it involves finiteness. Finiteness appears to be a formal property, with some connection to both meaning (especially to the interpretation of tense) and to morphological form (the shape of complementizers, case assignment etc.), but it operates within the syntactic system independently of them (see Adger, 2007, for linguistic evidence). Further linguistic investigation is required to understand the relationship between quantifier scope and finiteness, especially since not all languages mark finiteness overtly, but all languages seem to restrict the scope of quantifiers in similar ways. We think it likely that there will be some formal specification of the point of transfer, as empirically quantifier scope seems to always respect finite clause boundaries when they are detectable, but exactly how this plays out across a richer range of languages is still something of an open question.

The final principle we appealed to in our explanation of the workings of bound variable anaphora is the following.

(54) Pronounce Once: When a single object appears at more than one position in a structure, pronounce only one instance.

<sup>4</sup>Though there are proposals that accord musical ability a more central role in language evolution, cf. Darwin (1871), Brandt et al. (2012) and references there.

This principle is at work in the interpretation of structures where a single syntactic unit is present at two distinct places in the generated structure. Phenomenologically, we hear a single pronunciation of some constituent, but there is linguistic (including psycholinguistic) evidence for its presence elsewhere in the structure. We saw this principle at work in our analysis of (55) [repeated from (42-a)]:

(55) Which of his relatives did the sybils decree that no man may love?

The bound pronoun his behaves as though it is in the scope of no man, although the phrase it is embedded within (which of his relatives) is clearly not in a surface position that would allow that. The solution is to take which of his relatives to be Merged with love, where no man can scope over it, and then to Merge again, ending up in its surface position. Independently, we also need to explain how this phrase is interpreted as the object of the verb love, so the proposal that it Merges with that verb is motivated. There is good psycholinguistic evidence that the human sentence processing mechanism is sensitive to the presence of a single constituent in multiple positions in the parsetree it constructs while comprehending a sentence with such long-distance dependencies (Lewis and Phillips, 2015, for review and references) and that it detects unpronounced constituents in general (Cai et al., 2015), providing evidence from processing for this linguistic analysis.

(54) is stipulated here as a language specific principle. It applies to realize syntactic objects as phonological objects in a way that is dependent on the nature of the structure. Chomsky (2013) has speculated that it might be understood as emerging from a particular kind of reduction of computation, perhaps minimization of the phonological computation that is required. On the assumption that a series of phonological rules need to apply to the output of the syntax, if there are two instantiations of the structure, the same phonological rules will have to apply to both instantiations, increasing the amount of computation. If there are a great many dependencies to be formed in a particular structure, the same phonology would appear multiple times. If the phonological computation can simply be done once, phonological processing is dramatically reduced.

It seems unlikely, as Chomsky notes, that this principle is functionally motivated to enhance parsing, as the absence of a phonological signal marking a grammatical dependency like a relative clause, is inimical to constructing the correct parse. Similarly, this principle applied to quantifier scope leads to an increase in grammatical ambiguity, again a property which would seem difficult to motivate on functional grounds.

If this principle is not functionally motivated by communicative or parsing pressures, might it be exapted from elsewhere in cognition, as we suggested for aspects of periodicity and locality? It is certainly the case that a fundamental aspect of human cognition is the keeping track of an identical object in time and space. Leslie et al. (1998) propose an internal representation for objects that functions as an index (much like pointing) and use this to explain the relationship between perceptual and conceptual representations of objects (cf. Pylyshyn, 1989). Speculating again, it may be the case that a mechanism that is used for objecthood in a domain outside of language is at play, though the structures to which it applies are linguistic, rather than visual or conceptual. If this is the case, then the index is phonologically realized, but points to different instances in syntactic space of the same syntactic unit. Once again, a cognition general property is specialized to the way that linguistic knowledge is structured.

The suggestions we have made are speculative, but the core point is that by developing theoretically deep explanations of linguistic phenomena, we can begin to evaluate the domain specificity of the abstract principles proposed in the knowledge that these principles are solidly based in the empirical phenomena of language.

# 5. Conclusion

We have outlined a general phenomenon at the syntax-semantic interface, shown how it is cross-linguistically valid, provided both a descriptive outline of its empirical properties and a theory of some depth explaining why those properties are as they are. We have also argued that no reasonable alternative (currently) exists. All current approaches that achieve a good level of empirical success are generative in a sense recognizable from the kind of theory we sketch here (although they may be expressed in different generative frameworks, such as Categorial Grammar, Jacobson, 1999, or Lexical Functional Grammar, Dalrymple et al., 1997).

This paper makes a methodological point and a theoretical point. The methodological point is that principles to be evaluated for domain specificity should be principles that actually do explanatory work in capturing linguistic phenomena. That is, we need to understand the nature of the linguistic phenomena first, and use that understanding to ask more general questions of cognitive science. Any alternative approach that ignores or dismisses a vast range of empirically impeccable work, and attempts to show that some proposed principle of communication or learning explains something general about language is insufficient. Any such alternative needs to have, or at very least be in principle capable of extending to, the kind of empirical coverage and explanatory depth of current generative linguistic theory.

The more theoretical point we have made is that three core principles, motivated from work in theoretical linguistics, when evaluated in terms of domain-specificity suggest something interesting. At a very abstract level, some of these principles may well be at play outside of the human language faculty, as principles of the optimization of space, periodicity of information transfer, and object identity. However, when instantiated in the human language faculty, they operate over linguistic entities created by Merge. Merge itself, we argued on the basis of cross-species comparison, appears to be unique to humans and therefore the result of some evolutionary event. It is not obvious that Merge plays a role elsewhere in human cognition (aside, perhaps, in possibly language-related areas such as music and arithmetic), or in natural law more generally, but further investigation may change our current perspective on this.

What does this discussion have to contribute to the question of whether there is an innate, language specific cognitive system? It suggests that there are principles that play a role in explaining empirical linguistic facts which may be language-specialized versions of more general cognitive principles. The human brain, then, appears to be set up in a way that involves the canalized development of such specialization. That is part of, if not the whole of, Universal Grammar.

### References


#### Funding

Arts and Humanities Research Council of the United Kingdom (Grant AH/G109274/1: Atomic Linguistic Elements of Phi).

## Acknowledgments

Many thanks to two Frontiers reviewers, to Benjamin Bruening, Brian Macwhinney, and Marcus Pearce for helpful discussion, to Brendan Fry for a factual correction, and to David Hall for comments on an earlier draft.


Marr, D. (1982). Vision. San Francisco, CA: Freeman and Co.


Safir, K. (2004). The Syntax of Anaphora. Oxford: Oxford University Press.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Adger and Svenonius. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Subtle Implicit Language Facts Emerge from the Functions of Constructions

Adele E. Goldberg\*

*Department of Psychology, Princeton University, Princeton, NJ, USA*

Much has been written about the unlikelihood of innate, syntax-specific, universal knowledge of language (Universal Grammar) on the grounds that it is biologically implausible, unresponsive to cross-linguistic facts, theoretically inelegant, and implausible and unnecessary from the perspective of language acquisition. While relevant, much of this discussion fails to address the sorts of facts that generative linguists often take as evidence in favor of the Universal Grammar Hypothesis: subtle, intricate, knowledge about language that speakers implicitly know without being taught. This paper revisits a few often-cited such cases and argues that, although the facts are sometimes even more complex and subtle than is generally appreciated, appeals to Universal Grammar fail to explain the phenomena. Instead, such facts are strongly motivated by the *functions of the constructions* involved. The following specific cases are discussed: (a) the distribution and interpretation of anaphoric *one*, (b) constraints on long-distance dependencies, (c) subject-auxiliary inversion, and (d) cross-linguistic linking generalizations between semantics and syntax.

#### Edited by:

*N. J. Enfield, University of Sydney, Australia*

#### Reviewed by:

*Jeffrey Lidz, University of Maryland, USA Elizabeth Closs Traugott, Stanford University, USA*

#### \*Correspondence:

*Adele E. Goldberg adele@princeton.edu*

#### Specialty section:

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

Received: *26 October 2015* Accepted: *17 December 2015* Published: *28 January 2016*

#### Citation:

*Goldberg AE (2016) Subtle Implicit Language Facts Emerge from the Functions of Constructions. Front. Psychol. 6:2019. doi: 10.3389/fpsyg.2015.02019* Keywords: anaphoric one, island constraints, subject-auxiliary inversion, universal grammar, grammatical constructions

# INTRODUCTION

We all recognize that humans have a different biological endowment than the prairie vole, the panther, and the grizzly bear. We can also agree that only humans have human-like language. Finally, we agree that adults have representations that are specific to language (for example, their representations of constructions). The question that the present volume focuses on is whether we need to appeal to representations concerning syntax that have not been learned in the usual way that is on the basis of external input and domain-general processes—in order to account for the richness and complexity that is evident in all languages. The Universal Grammar Hypothesis is essentially a claim that we do. It asserts that certain syntactic representations are "innate,"<sup>1</sup> in the sense of not being learned, and that these representations both facilitate language acquisition and constrain the structure of all real and possible human languages<sup>2</sup> .

I take this Universal Grammar Hypothesis to be an important empirical claim, as it is often taken for granted by linguists and it has captured the public imagination. In particular, linguists

<sup>1</sup> I put the term "innate" in quotes because the term lacks an appreciation of the typically complex interactions between genes and the environment before and after birth (see Deák, 2000; Blumberg, 2006; Karmiloff-Smith, 2006 for relevant discussion). <sup>2</sup>Universal Grammar seems to mean different things to different researchers. In order for it to be consistent with its nomenclature and its history in the field, I take the Universal Grammar Hypothesis to claim that there exists some sort of universal but unlearned ("innate") knowledge of language that is specific to grammar.

often assume that infants bring with them to the task of learning language, knowledge of noun, verb, and adjective categories, a restriction that all constituents must be binary branching, a multitude of inaudible but meaningful "functional" categories and placeholders, and constraints on possible word orders. This is what Pearl and Sprouse seem to have in mind when they note that positing Universal Grammar to account for our ability to learn language is "theoretically unappealing" in that it requires learning biases that "appear to be an order (or orders) of magnitude more complex than learning biases in any other domain of cognition" (Pearl and Sprouse, 2013, p. 24).

The present paper focuses on several phenomena that have featured prominently in the mainstream generative grammar literature, as each has been assumed to involve a purely syntactic constraint with no corresponding functional basis. When constraints are viewed as arbitrary in this way, they appear to be mysterious and are often viewed as posing a learnability challenge; in fact, each of the cases below has been used to argue that an "innate" Universal Grammar is required to provide the constraints to children a priori.

The discussion below aims to demystify the restrictions that speakers implicitly obey, by providing explanations of each constraint in terms of the functions of the constructions involved. That is, constructions are used in certain constrained ways and are combined with other constructions in constrained ways, because of their semantic and/or discourse functions. Since children must learn the functions of each construction in order to use their language appropriately, the constraints can then be understood as emerging as by-products of learning those functions. In each case, a generalization based on the communicative functions of the constructions is outlined and argued to capture the relevant facts better than a rigid and arbitrary syntactic stipulation (see also DuBois, 1987; Hopper, 1987; Michaelis and Lambrecht, 1996; Kirby, 2000; Givón, 2001; Auer and Pfänder, 2011). Thus, recognizing the functional underpinnings of grammatical phenomena allows us to account for a wider, richer range of data, and allows for an explanation of that data in a way that purely syntactic analyses do not.

In the following sections, functional underpinnings of the distribution and interpretation of various constructions are offered including anaphoric \_one\_, various long-distance dependences, subject-auxiliary inversion, and cross-linguistic linking generalizations.

# ANAPHORIC ONE

# Anaphoric One's Interpretation<sup>3</sup>

There are many interesting facts of language; let's consider this one. The last word in the previous sentence refers to an "interesting fact about language" in the first clause; it cannot refer to an interesting fact that is about something other than language. This type of observation has been taken to imply that one anaphora demonstrates "innate" knowledge that full noun phrases (or "DP"s) contain a constituent that is larger than a noun but smaller than a full noun phrase: an N/ (interesting fact of language above), and, that one anaphora must refer to an N/ , and may never refer to a noun without its grammatical complement (Baker, 1978; Hornstein and Lightfoot, 1981; Radford, 1988; Lidz et al., 2003b). However, as many researchers have made clear, anaphoric one actually can refer to a noun without its complement as it does in the following attested examples from the COCA corpus (Davies, 2008; for additional examples and discussion see Lakoff, 1970; Jackendoff, 1977; Dale, 2003; Culicover and Jackendoff, 2005; Payne et al., 2013; Goldberg and Michaelis, 2015) 4 .


In each case, the "of phrase" (e.g., of alcoholism in 1) is a complement according to standard assumptions and therefore should be included in the smallest available N/ that the syntactic proposal predicts one can refer to. Yet in each case, one actually refers only to the previous noun (problem, war, sense, and sign, respectively, in 1–4), and does not include the complement of the noun.

In the following section, I outline an explanation of one's distribution and interpretation, which follows from its discourse function. To do this, it is important to appreciate anaphoric one's close relationship to numeral one, as described below.

# The Syntactic and Semantic Behavior of One are Motivated by its Function

Leaving aside the wide range of linguistic and non-linguistic entities that one can refer to for a moment, let us consider the linguistic contexts in which one itself occurs. Goldberg and Michaelis (2015) observe that anaphoric one has the same grammatical distribution as numeral one (and other numerals), when the latter are used without a head noun. The only formal distinction between anaphoric one and the elliptical use of numeral one is that numeral one receives a sentence accent, as indicated by capital letters in **Table 1**, whereas anaphoric one must be unstressed (Goldberg and Michaelis, 2015).

The difference in accent between cardinal and anaphoric one reflects a key difference in their functions. Whereas cardinal one is used to assert the quantity "1," anaphoric one is used when quality or existence—not quantity—is at issue. That is, if asked about quantity as in (5), a felicitous response (5a) involves cardinal one, which is necessarily accented (5a; cf. 5b). If the

<sup>3</sup>This section is based on Goldberg and Michaelis (2015), which contains a much more complete discussion of anaphoric one and its relationship to numeral one (and other numerals).

<sup>4</sup>A version of the first sentence also allows one to refer to an interesting fact that is not about language:

a. There are many interesting facts of language, but let's consider this one about music .

type of entity is at issue as in (6), then anaphoric one, which is necessarily unaccented, is used (6b; cf. 6a):

	- a. She has (only) ONE. (cardinal ONE)
	- b. #She has a big one. (anaphoric one)
	- a. #She has (only) ONE (cardinal ONE)
	- b. She has a BIG one. (anaphoric one).

It is this fact, that anaphoric one is used when quality and not quantity is at issue, that explains why anaphoric one so readily picks out an entity, recoverable in the discourse context, that often corresponds to an N/ : anaphoric one often refers to a noun and its complement (or modifier) because the complement or modifier supplies the quality. But the quality can be expressed explicitly as it is in (6b; with big) or in (1–4) with the overt complement phrases<sup>5</sup> . If existence (and not quality or quantity) is at issue, anaphoric one can refer to a full noun phrase as in (7):

7. [Who wants a drink?] I'll take one.

Thus, given the fact that anaphoric one exists in English, its semantic relationship to cardinal numeral one predicts its distribution and interpretation. Anaphoric one is used when the quality or existence of an entity evoked in the discourse—not its cardinality—is relevant.

The only additional fact that is required is a representation of the plural form, ones, and both the form and the function of ones is motivated because ones is a lexicalized extension of anaphoric one (Goldberg and Michaelis, 2015). Ones differs from anaphoric one only in being plural both formally and semantically; like singular anaphoric one, plural ones evokes the quality or existence and not the cardinality of a type of entity recoverable in context.

There are several lessons that can be drawn from this simple case. First, if we are too quick to assume a purely syntactic generalization without careful attention to attested data, it is easy to be led astray. Moreover, it is important to recognize relationships among constructions. In particular, anaphoric one is systematically related to numeral one, and a comparison of the functional properties of these closely related forms serves to explain their distributional properties.

TABLE 1 | Distributional contexts for anaphoric one and the elliptical use of cardinal one.


*The two differ only in that only numeral one receives a sentence accent and asserts the quantity "1."*

<sup>5</sup>To fully investigate the range of data that have been proposed to date in the literature, judgment data should be collected in which contexts are systematically varied to emphasize definiteness, quality, existence and cardinality.

There remain interesting questions about how children learn the function of anaphoric one. But once we acknowledge that children do learn its function—and they must in order to use it in appropriate discourse contexts—there is nothing mysterious about its formal distribution.

# CONSTRAINTS ON LONG DISTANCE DEPENDENCIES

## The Basic Facts

Most languages allow constituents to appear in positions other than their most canonical ones, and sometimes the distance between a constituents' actual position and its canonical position can be quite long. For example, when questioned, the phrase which/that coffee in (8) is not where it would appear in a canonical statement; instead, it is positioned at the front of the sentence, and there is a gap (indicated by "\_\_\_\_") where it would normally appear.

8. Which coffee did Pam say Sam likes \_\_\_\_better than tea? (cf. Pam said Sam likes that coffee better than tea.)

This type of relationship is often discussed as if the constituent "moved" or was "extracted" from its canonical position, although no one has believed since Fodor et al. (1974) that the movement is anything more than a metaphor. I use more neutral terminology here and refer to the relation between the actual position and the canonical position as a long-distance dependency (LDD).

There are several types of LDD constructions including whquestions, the topicalization construction, cleft constructions, and relative clause constructions. These are exemplified in **Table 2**.

Ross (1967) long ago observed that certain other types of constructions resist containing the gap of a LDD. That is, certain constructions are "islands" from which constituents cannot escape. Combinations of an "island construction" with a LDD construction result in ill-formedness (see **Table 3**):

TABLE 2 | Examples of long distance dependency (LDD) constructions: constructions in which a constituent appears in a fronted position instead of where it would canonically appear.<sup>6</sup>


<sup>6</sup>Other LDD constructions include comparatives (Bresnan, 1972; Merchant, 2009) and "tough" movement constructions (Postal and Ross, 1971) which should fall under the present account as well; more study is needed to investigate these cases systematically from the current perspective (see Hicks (2003); Sag (2010); for discussion).



# A Clash Between the Functions of LDD Constructions and the Functions of Island Constructions

Several researchers have observed that INFORMATION STRUCTURE plays a key role in island constraints (Takami, 1989; Deane, 1991; Engdahl, 1997; Erteschik-Shir, 1998; Polinsky, 1998; Van Valin, 1998; Goldberg, 2006, 2013; Ambridge and Goldberg, 2008). Information structure refers to the way that information is "packaged" for the listener: constituents are topical in the discourse, part of the potential focus domain, or are backgrounded or presupposed (Halliday, 1967; Lambrecht, 1994). Different constructions that convey "the same thing," typically exist in a given language in order to provide different ways of packaging the information, and thus information structure is perhaps the most important reason why languages have alternative ways to say the "same" thing. As explained below, the ill-formedness of island effects arises essentially from a clash between the function of the LDD construction and the function of the island construction. First, a few definitions are required.

The FOCUS DOMAIN is that part of a sentence that is asserted. It is thus "one kind of emphasis, that whereby the speaker marks out a part (which may be the whole) of a message block as that which he wishes to be interpreted as informative" Halliday (1967: 204). Similarly Lambrecht (1994: 218) defines the focus relation as relating "the pragmatically non-recoverable to the recoverable component of a proposition [thereby creating] a new state of information in the mind of the addressee." What parts of a sentence fall within the focus domain can be determined by a simple negation test: when the main verb is negated, only those aspects of a sentence within the potential focus domain are negated. Topics, presupposed constituents, constituents within complex noun phrases, and parenthetical remarks are not part of the focus domain, as they are not negated by sentential negation:<sup>7</sup>

9. Pam, as I told you before, didn't sell the book to the man she just met.

negates that the book was sold; does not negate that she just met a man or that the speaker is repeating herself.

It has long been observed that the gap in a LDD construction is typically within the potential focus domain of the utterance (Takami, 1989; Erteschik-Shir, 1998; Polinsky, 1998; Van Valin, 1998; see also Morgan, 1975): this predicts that topics, presupposed constituents, constituents within complex noun phrases, and parentheticals are all island constructions and they are (see previous work and Goldberg, 2013 for examples).

It is necessary to expand this view slightly by defining BACKGROUNDED CONSTITUENTS to include everything in a clause except constituents within the focus domain and the subject. Like the focus domain, the subject argument is part of what is made prominent or foregrounded by the sentence in the given discourse context, since the subject argument is the default TOPIC of the clause or what the clause is "about" (MacWhinney, 1977; Chafe, 1987; Langacker, 1987; Lambrecht, 1994). That is, a clausal topic is a "matter of [already established] current interest which a statement is about and with respect to which a proposition is to be interpreted as relevant" (Michaelis and Francis, 2007: 119). The topic serves to contextualize other elements in the clause (Strawson, 1964; Kuno, 1976; Langacker, 1987; Chafe, 1994). We can now state the restriction on LDDs succinctly:

⋆ Backgrounded constituents cannot be "extracted" in LDD constructions (Backgrounded Constituents are Islands; Goldberg, 2006, 2013).

The claim in ⋆ entails that only elements within the potential focus domain or the subject are candidates for LDDs. Notice that constituents properly contained within the subject argument are backgrounded in that they are not themselves the primary topic, nor are they part of the focus domain. Therefore, subjects are "islands" to extraction.

Why should ⋆ hold? The restriction follows from a clash of the functions of LDD constructions and island constructions. As explained below: a referent cannot felicitously be both discourseprominent (in the LDD construction) and backgrounded in discourse (in the island construction). That is, LDD constructions exist in order to position a particular constituent in a discourse-prominent slot; island constructions ensure that the information that they convey is backgrounded in discourse. It is anomalous for an argument, which the speaker has chosen to make prominent by using a LDD construction, to correspond to a gap that is within a backgrounded (island) construction.

What is meant by a discourse-prominent position? The wh-word in a question LDD is a classic focus, as are the fronted elements in "cleft" constructions, another type of LDD. The fronted argument in a topicalization construction is a newly established topic (Gregory and Michaelis, 2001) 8 . Each of these LDD constructions operates at the sentence level and the main

<sup>7</sup>Backgrounded constituents can be negated with "metalinguistic" negation, signaled by heavy lexical stress on the negated constituent (I didn't read the book that Maya gave me because she didn't GIVE me any book!). But then metalinguistic negation can negate anything at all, including intonation, lexical choice, or accent. Modulo this possibility, the backgrounded constituents of a sentence are not part of what is asserted by the sentence.

<sup>8</sup>The present understanding of discourse prominence implicitly acknowledges the notions of topic and focus are not opposites: both allow for constituents to be interpreted as being prominent (see, e.g., Arnold, 1998: for experimental and corpus evidence demonstrating the close relationship between topic and focus).

clause topic and focus are classic cases of discourse-prominent positions.

The relative clause construction is a bit trickier because the head noun of a relative clause—the "moved" constituent—is not necessarily the main clause topic or focus, and so it may not be prominent in the general discourse. For this reason, it has been argued that relative clauses involve a case of recycling the formal structure and constraints that are motivated in the case of questions to apply to a distinct but related case: relative clauses (Polinsky, 1998). But in fact, the head noun in a relative clause construction is prominent when it is considered in relation to the relative clause itself: the purpose of a relative clause is to identify or characterize the argument expressed by the head noun. In this way, the head noun should not correspond to a constituent that is backgrounded within the relative clause. Thus, there is a clash for the same reason that sentence level LDD constructions clash with island constructions, except that what is prominent and what is backgrounded is relative to the content of the NP: the head noun is prominent and any island constructions within the relative clause are backgrounded.

We should expect the ill-formedness of LDDs to be gradient and degrees of ill-formedness are predicted to correspond to degrees of backgroundedness, when other factors related to frequency, plausibility, and complexity are controlled for. This idea motivated an experimental study of various clausal complements, including "bridge" verbs, manner-of-speaking verbs, and factive verbs and exactly the expected correlation was found (Ambridge and Goldberg, 2008): the degree of acceptability of extraction showed a strikingly strong inverse correlation with the degree of backgroundedness of the complement clause—which was operationalized by judgments on a negation test. Thus, the claim is that each construction has a function and that constructions are combined to form utterances; constraints on "extraction" arise from a clash of discourse constraints on the constructions involved.

The functional account predicts that certain cases pattern as they do, even though they are exceptional from a purely syntactic point of view (see also Engdahl, 1997). These include the cases in **Table 4**. Nominal complements of indefinite "picture nouns" fall within the focus domain, as do certain adjuncts, while the recipient argument of the double object construction, as a secondary topic, does not (see Goldberg, 2006, 2013 for discussion). Therefore, the first two cases in **Table 2** are predicted to allow LDDs while the final case is predicted to resist LDDs<sup>9</sup> . No special assumptions or stipulations are required.

There is much more to say about island effects (see e.g., Sprouse and Hornstein, 2013). The hundreds of volumes written on the subject cannot be properly addressed in a short review such as this. The goal of this section is to suggest that TABLE 4 | Cases that follow from an information structure account, but not from an account that attempts to derive the restrictions from configurations of syntactic trees.


a recognition of the functions of the relevant constructions involved can explain which constructions are islands and why; much more work is required to explore whether this proposal accounts for each and every LDD construction in English and other languages.

# SUBJECT AUXILIARY INVERSION (SAI)

# SAI's Distribution

Subject-auxiliary inversion (e.g., is this it?) has a distribution that is quite unique to English. In Old English, it followed a more general "verb second" pattern, which still exists in Germanic and a few other languages. But English changed, as languages do, and today, subject-auxiliary inversion requires an auxiliary verb and is restricted to a limited range of constructions, enumerated in (10–17):


When SAI is used, the entire subject argument appears after the first main clause auxiliary as is clear in a comparison of (18a) and (18b):

	- b. The girl who was in the back of the room has had enough to eat. (non-inverted).

Notice that the very first auxiliary in the corresponding declarative sentence (was) cannot be inverted (see 19a), nor can the second (or other) main clause auxiliary (see 19b).

This makes sense once we realize that one sentence's focus is often the next sentence's topic.

<sup>9</sup>Cross linguistic work is needed to determine whether secondary topics generally resist LDDs as is the case in the English double-object construction, or whether the dispreference is only detectable when an alternative possibility is available, as in English, where questioning the recipient of the to-dative is preferred (see note 10).

<sup>10</sup>Support for this judgment comes from the fact that questions of the recipient of the to-dative outnumber those of the recipient of the double-object construction in corpus data by a factor of 40 to 1 (Goldberg, 2006: 136).

	- b. <sup>∗</sup>Had the girl who was in the back of the room has enough to eat? (only the first main clause auxiliary can be inverted).

Thus, the generalization at issue is that the first auxiliary in the full clause containing the subject is inverted with the entire subject constituent.

SAI occurs in a range of constructions in English and each one has certain unique constraints and properties (Fillmore, 1999; Goldberg, 2009); for example, in the construction with negative adverbs (e.g., 12), the adverb is positioned clause initially; curses (e.g., 13) are quite particular about which auxiliary may be used (May a million fleas invest your armpits. vs.∗Might/will/shall a million fleas invest your armpits!); and inversion in comparatives (e.g., 14) is restricted to a formal register. Thus, any descriptively adequate account of SAI in English must make reference to these properties of individual constructions.

The English constructions evolved diachronically from a more general constraint which still operative in German main clauses. But differences exist across even these closely related languages. The German constraint applies to main verbs, while English requires an auxiliary verb, and in English the auxiliary is commonly in first not second position (e.g., did I get that right?). Also, verb-second in German is a main clause phenomenon, but in English, SAI is possible in embedded clauses as well (20–21):


Simple recurrent connectionist networks can learn to invert the correct auxiliary on the basis of simpler input that children uncontroversially receive (Lewis and Elman, 2001). This model is instructive because it is able to generalize correctly to produce complex questions (e.g., Is the man who was green here?), after receiving training on simple questions and declarative statements with a relative clause. The network takes advantage of the fact that both simple noun phrases (the boy) and complex noun phrases (The boy who chases dogs) have similar distributions in the input (see also Pullum and Scholz, 2002; Reali and Christiansen, 200511; Ambridge et al., 2006; Rowland, 2007; Perfors et al., 2011).

The reason simple and complex subjects have similar distributions is that the subject is a coherent semantic unit, typically referring to an entity or set of entities. For example, in (22a–c), he, the boy, and the boy in the front row, all identify a particular person and each sentence asserts that the person in question is tall.

22.a. He is tall.


Thus the distributional fact that is sufficient for learning the key generalization is that subjects, whether simple or complex, serve the same function in sentences.

We might also ask why SAI is used in the range of constructions it is, and why these constructions use this formal feature instead of placing the subject in sentence-final position or some other arbitrary feature. Consider the function of the first auxiliary of the clause containing the subject. This auxiliary indicates tense and number agreement (23), but an auxiliary is not required for these functions, as the main verb can equally well express them (24).

23. a. She did say.


The first auxiliary of the clause containing the subject obligatorily serves a different purpose related to negative or emphasized positive polarity (Langacker, 1991). That is, if a sentence is negated, the negative morpheme occurs immediately after—often cliticized to—the first auxiliary of the clause that contains the subject (25):

25. She hadn't been there.

And if positive polarity is emphasized, it is the first auxiliary that is accented (26):

26. She HAD been there. (cf. She had been there).

If the corresponding simple positive sentence does not contain an auxiliary, the auxiliary verb do is drafted into service (27):

27.a. She DID swim in the ocean.


Is it a coincidence that the first auxiliary of the main clause that contains the subject conveys polarity? Intriguingly, most SAI constructions offer different ways to implicate a negative proposition, or at least to avoid asserting a simple positive one (Brugman and Lakoff, 1987; Goldberg, 2006) 12 . For example, yes/no questions ask whether or not the proposition is true; counterfactual conditionals deny that the antecedent holds; and the inverted clause in a comparative can be paraphrased with a negated clause as in (28):

28. He was faster than was she. She was not as fast as he was.

Exclamatives have the form of rhetorical yes/no questions, and in fact they commonly contain tag questions (e.g., Is he a jerk, or what?!) (Goldberg and Giudice, 2005). They also have the

<sup>11</sup>See Kam et al. (2008) for discussion of the difficulties of using only bi-grams. Since we assume that meaningful units are combined to form larger meaningful units, resulting in hierarchical structure, this critique does not undermine the present proposal.

<sup>12</sup>Labov (1968) discusses another SAI construction used in AAVE, which requires a negated auxiliary (e.g., Can't nobody go there.).

pragmatic force of emphasizing the positive polarity, which we have seen is another function of the first auxiliary. Likewise, the positive conjunction (so did she) emphasizes positive polarity as well.

Thus the form of SAI in English is motivated by the functions of the vast majority of SAI constructions: in order to indicate non-canonical polarity of a sentence—either negative polarity or emphasized positive polarity—the auxiliary required to convey polarity is inverted. Once the generalization is recognized to be iconic in this way, it becomes much less mysterious both from a descriptive and an acquisition perspective.

There is only one case where SAI is used without implicating either negative polarity or emphasizing positive polarity: nonsubject wh-questions. This case appears to be an instance of recycling a formal pattern for use with a construction that has a related function to one that is directly motivated (see also Nevalainen, 1997). In particular, wh-questions have a function that is clearly related to yes/no questions since both are questions. But while SAI is directly motivated by the non-positive polarity of yes/no questions, this motivation does not extend to whquestions (also see Goldberg, 2006 and Langacker, 2012 for a way to motivate SAI in wh-questions more directly). Nonetheless, to ignore the relationship between the function of the first auxiliary as an indicator of negative polarity or emphasized positive polarity, and the functions of SAI constructions, which overwhelmingly involve exactly the same functions, is to overlook an explanation of the construction's formal property and its distribution. Thus, we have seen that the fact that the subject is treated as a unit (so that any auxiliary within the subject is irrelevant) is not mysterious once we recognize that it is a semantic unit. Moreover, the fact that it is the first auxiliary of the clause that is inverted is motivated by the functions of the constructions that exhibit SAI.

# CROSS-LINGUISTIC GENERALIZATIONS ABOUT THE LINKING BETWEEN SEMANTICS AND SYNTAX

The last type of generalization considered here is perhaps the most straightforward. There are certain claims about how individual semantic arguments are mapped to syntax that have been claimed to require syntactic stipulation, but which follow straightforwardly from the semantic functions of the arguments.

Consider the claimed universal that the number of semantic arguments equals the number of overt complements expressed (the "θ criterion"; see also Lidz et al., 2003a). While the generalization holds, roughly, in English, it does not in many perhaps the majority—of the world's languages, which readily allow recoverable or irrelevant arguments to be omitted. Even in English, particular constructions circumvent the general tendency. For example, short passives allow the semantic agent or causer argument to be unexpressed (e.g., The duck was killed), and the "deprofiled object construction" allows certain arguments to be omitted because they are irrelevant (e.g., Lions only kill at night). (Goldberg, 2000). Thus, the original syntactic claim is too strong. A more modest, empirically accurate generalization is captured by the following:

Pragmatic Mapping Generalization (Goldberg, 2004):


The pragmatic mapping generalization makes use of the fact that language is a means of communication and therefore requires that speakers say as much as is necessary but not more (Paul, 1889; Grice, 1975). Note that the pragmatic generation does not make any predictions about semantic arguments that are recoverable or irrelevant. This is important because, as already mentioned, languages and constructions within languages treat those arguments variably.

Another general cross-linguistic tendency is suggested by Dowty (1991), who proposed a linking generalization that is now widely cited as capturing the observable (i.e., surface) crosslinguistic universals about how syntactic relations and semantic arguments are linked. Dowty argued that in simple active clauses, if there both a subject and an object, and if there is an agent-like semantic argument and an undergoer-like semantic argument, then the agent will be expressed by the subject, and the undergoer will be expressed by the direct object (see also Van Valin, 1990). Agent-like entities are entities that are volitional, sentient, causal or moving, while undergoers are those arguments that undergo a change of state, are causally affected or are stationary. Dowty further observed that his generalization is violated in syntactically ergative languages, which are quite complicated and do not neatly map the agent-like argument to a subject. In fact, there are no syntactic tests for subjecthood that are consistent across languages so there is no reason to assume that the grammatical relation of subject is universal (Dryer, 1997).

At the same time, there does exist a more modest "linking" generalization that is accurate: actors and undergoers are generally expressed in prominent syntactic slots (Goldberg, 2006). This simpler generalization, which I have called the salientparticipants-in-prominent-slots generalization has the advantage that it accurately predicts that an actor argument without an undergoer, and an undergoer without an actor are also expressed in prominent syntactic positions.

The tendency to express salient participants in prominent slots follows from well-documented aspects of our general attentional biases. Humans' attention is naturally drawn to agents, even in non-linguistic tasks. For example, visual attention tends to be centered on the agent in an event (Robertson and Suci, 1980). Speakers also tend to adopt the perspective of the agent of the event (MacWhinney, 1977; Hall et al., 2013). Infants as young as 9 months have been shown to attribute intentional behavior even to inanimate objects that have appropriate characteristics (e.g., motion, apparent goal-directedness) (Csibra et al., 1999). That is, even, pre-linguistic infants attend closely to the characteristics of agents (volition, sentience, and movement) in visual as well as linguistic tasks.

The undergoer in an event is also attention-worthy, as it is generally the endpoint of a real or metaphorical force (Langacker, 1987; Talmy, 1988; Croft, 1991). The tendency to attend closely endpoints of actions that involve a change of state exists even in 6 month old infants (Woodward, 1998), and we know that the effects of actions play a key role in action-representations both in motor control of action and in perception (Prinz, 1990, 1997). For evidence that undergoers are salient in non-linguistic tasks, see also Csibra et al. (1999); Bekkering et al. (2000); Javanovic et al. (2007). For evidence that endpoints or undergoers are salient in linguistic tasks, see Regier and Zheng (2003); Lakusta and Landau (2005), and Lakusta et al. (2007). Thus, the observation that agents and undergoers tend to be expressed in prominent syntactic positions is explained by general facts about human perception and attention.

Other generalizations across languages are also amenable to functional explanations. There is a strong universal tendency for languages to have some sort of construction that can reasonably be termed a "passive." But these passive constructions only share a general function: they are constructions in which the topic and/or agent argument is essentially "demoted," appearing optionally or not at all. In this way, passive constructions offer speakers more flexibility in how information is packaged. But whether or which auxiliary appears, whether a given language has one, two, or three passives, whether or not intransitive verbs occur in the pattern, and whether or how the demoted subject argument is marked, all differ across different languages (Croft, 2001), and certain languages such as Choctaw do not seem to contain any type of passive (Van Valin, 1980). That is the only robust generalization about passive depends on its function and is very modest: most, but not all languages, have a way to express what is normally the most prominent argument in a less prominent position.

# CONCLUSION

When it was first proposed that our knowledge of language was so complex and subtle and that the input was so impoverished that certain syntactic knowledge must be given to us a priori, the argument was fairly compelling (Chomsky, 1965). At that time, we did not have access to large corpora of child-directed speech so we did not realize how massively repetitive the input was; nor did we have large corpora of children's early speech, so we did not appreciate how closely children's initial productions reflect their input (see e.g., Mintz et al., 2002; Cameron-Faulkner et al., 2003). We also had not yet fully appreciated how statistical learning worked, nor how powerful it was (e.g., Saffran et al., 1996; Gomez and Gerken, 2000; Fiser and Aslin, 2002; Saffran, 2003; Abbot-Smith et al., 2008; Wonnacott et al., 2008; Kam and Newport, 2009). Connectionist and Bayesian modeling had not yet revealed that associative learning and rational inductive inferences could be used to address many aspects of language learning (see e.g., Elman et al., 1996; Perfors et al., 2007; Alishahi and Stevenson, 2008; Bod, 2009). The important role of language's function as a means of communication was widely ignored (but see e.g., Lakoff, 1969; Bolinger, 1977; DuBois, 1987; Langacker, 1987; Givón, 1991). Finally, the widespread recognition of emergent phenomena was decades away (e.g., Karmiloff-Smith, 1992; Lander and Schork, 1994; Elman et al., 1996). Today, however, armed with these tools, we are able to avoid the assumption that all languages must be "underlyingly" the same in key respects or learned via some sort of tailor-made "Language Acquisition Device" (Chomsky, 1965). In fact, if Universal Grammar consists only of recursion via "merge," as Chomsky has proposed (Hauser et al., 2002), it is unclear how it could even begin to address the purported poverty of the input issue in any case (Ambridge et al., 2015).

Humans are unique among animals in the impressive diversity of our communicative systems (Dryer, 1997; Croft, 2001; Tomasello, 2003:1; Haspelmath, 2008; Evans and Levinson, 2009; Everett, 2009). If we assume that all languages share certain important formal parallels "underlyingly" due to a tightly constrained Universal Grammar, except perhaps for some simple parameter settings, it would seem to be an unexplained and maladaptive feature of languages that they involve such rampant superficial variation. In fact, there are cogent arguments against positing innate, syntax-specific, universal knowledge of language, as it is biologically and evolutionarily highly implausible (Christiansen and Kirby, 2003; Chater et al., 2009; Christiansen and Chater, 2016).

Instead, what makes language possible is a certain combination of prerequisites for language, including our pro-social motivation and skill (e.g., Hermann et al., 2007; Tomasello, 2008); the general trade off between economy of effort and maximization of expressive power (e.g., Levy, 2008; Futrell et al., 2015; Kirby et al., 2015; Kurumada and Jaeger, 2015); the power of statistical learning (Saffran et al., 1996; Gomez and Gerken, 2000; Saffran, 2003; Wonnacott et al., 2008; Kam and Newport, 2009); and the fact that frequently used patterns tend to become conventionalized and abbreviated (Heine, 1992; Dabrowska, 2004; Bybee et al., 1997; Verhagen, 2006; Traugott, 2008; Bybee, 2010; Hilpert, 2013; Traugott and Trousdale, 2013; Christiansen and Chater, 2016).

While these prerequisites for language are highly pertinent to the discussion of whether we need to appeal to a Universal Grammar, the present paper has attempted to address a different set of facts. Many generative linguists take the existence of subtle, intricate, knowledge about language that speakers implicitly know without being taught as evidence in favor of the Universal Grammar Hypothesis. By examining certain of these wellstudied such cases, we have seen that, while the facts are sometimes even more complex and subtle than is generally appreciated, they do not require that we resort to positing syntactic structures that are unlearned. Instead, these cases are explicable in terms of the functions of the constructions involved. That is, the constructionist perspective views intricate and subtle generalizations about language as emerging on the basis of domain-general constraints on perception, attention, and memory, and on the basis of the functions of the learned, conventionalized constructions involved. This paper has emphasized the latter point.

Constructionists recognize that languages are not unconstrained in their variation and that various systematic patterns recur in unrelated languages. While certain generalizations follow from domain-general processing constraints (see e.g., McRae et al., 1998; Hawkins, 1999; Futrell et al., 2015), this paper as argued that many constraints and generalizations follow from the functions of the constructions involved. That is, speakers can combine conventional constructions in their language on the fly to create new utterances, but the functions of each of the constructions involved must be respected. This allows speakers to use language in dynamic, but delimited ways.

#### REFERENCES


### AUTHOR CONTRIBUTIONS

AG wrote the paper in its entirety with appropriately cited references.

#### ACKNOWLEDGMENTS

I would like to thank Elizabeth Traugott, Jeff Lidz, and Nick Enfield for very helpful feedback on an earlier draft of this paper.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Goldberg. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Phonology without universal grammar

#### Diana Archangeli 1, 2 \* and Douglas Pulleyblank <sup>3</sup> \*

<sup>1</sup> Department of Linguistics, University of Hong Kong, Pok Fu Lam, Hong Kong, <sup>2</sup> Department of Linguistics, University of Arizona, Tucson, AZ, USA, <sup>3</sup> Department of Linguistics, University of British Columbia, Vancouver, BC, Canada

The question of identifying the properties of language that are specific human linguistic abilities, i.e., Universal Grammar, lies at the center of linguistic research. This paper argues for a largely Emergent Grammar in phonology, taking as the starting point that memory, categorization, attention to frequency, and the creation of symbolic systems are all nonlinguistic characteristics of the human mind. The articulation patterns of American English rhotics illustrate categorization and systems; the distribution of vowels in Bantu vowel harmony uses frequencies of particular sequences to argue against Universal Grammar and in favor of Emergent Grammar; prefix allomorphy in Esimbi illustrates the Emergent symbolic system integrating phonological and morphological generalizations. The Esimbi case has been treated as an example of phonological opacity in a Universal Grammar account; the Emergent analysis resolves the pattern without opacity concerns.

Keywords: linguistics, phonology, morphology of words, universal grammar, emergent properties, Esimbi, English, ultrasound and language

# 1. Introduction

In exploring the role of "Universal Grammar" in phonology, our starting point here is the observation in Deacon (1997) that "[l]anguages are under powerful selection pressure to fit children's likely guesses, because children are the vehicle by which a language gets reproduced." (Deacon, 1997, p. 109). At issue is the source of those "likely guesses": are they due to an innate capability specific for language, the Universal Grammar hypothesis (UG), or are they simply the abilities that infants use to learn about all aspects of their world, the Emergent Grammar hypothesis (EG)?

We know that humans perceive gradient information categorically, and that we are good at categorizing in general (e.g., Rosch et al., 1976; Zacks and Tversky, 2001; Zacks et al., 2006; Seger and Miller, 2010). We know that humans make use of Bayesian probabilities (e.g., Tenenbaum and Griffiths, 2001). And we know that infants are very aware of skewed frequencies in language (Maye et al., 2002; Gerken and Bollt, 2008; Dawson and Gerken, 2011). We know that humans create symbolic systems to represent their knowledge (Deacon, 1997). Under the Emergent Grammar hypothesis (e.g., Hopper, 1987, 1998; MacWhinney and O'Grady, 2015) the infant language learner is expected to make use of these abilities in understanding the language environment in which s/he is immersed.


#### Edited by:

N. J. Enfield, University of Sydney, Australia

Reviewed by:

Brian MacWhinney, Carnegie Mellon University, USA Gwendolyn Hyslop, University of Sydney, Australia

#### \*Correspondence:

Diana Archangeli, Department of Linguistics, University of Arizona, Communications Building Room 109, PO Box 210025, Tucson, AZ 85721, USA dba@email.arizona.edu; Douglas Pulleyblank, Department of Linguistics, University of British Columbia, Totem Field Studios, 2613 West Mall, Vancouver, BC V6T 1Z4, Canada douglas.pulleyblank@ubc.ca

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 28 March 2015 Accepted: 03 August 2015 Published: 04 September 2015

#### Citation:

Archangeli D and Pulleyblank D (2015) Phonology without universal grammar. Front. Psychol. 6:1229. doi: 10.3389/fpsyg.2015.01229 By stripping away aspects of phonological systems that can be determined by remembering items, categorizing them, attending to frequencies and creating symbolic systems, we will have a better understanding of the role of UG in phonology: UG is responsible for the residue that cannot be explained as emergent properties.

Consider the multiple tasks facing the infant learner on encountering language sounds. Among them are (a) the challenge of isolating specific sounds from the sound stream, (b) assigning specific sounds to sound classes, and (c) building a grammar to characterize the occurring sounds. Under both models, the first step involves grouping similar sounding sounds into single categories, a categorization task. After that point the two models differ. Under EG, the next step is a higher order categorization, identifying groups of sounds as similar in some way, such as articulation, acoustics, or behavior. (See Ellis et al., 2015 for a review of categorization and the internal structure of categories, particularly with respect to usage-based linguistic models. See Mielke, 2004 on why features cannot be innately defined, but must be learned.) This similarity leads to positing a category for that group of sounds: these categories correspond roughly to the familiar "distinctive features," though there is no a priori set of features to map the sounds to, and in fact, a behavioral category is not necessarily an acoustic or articulatory category, and vice versa. The grammar involves further steps of abstraction, for example expressing observations about co-occurrences of feature categories (such as "all round vowels are back").

In contrast, under UG, once sounds have been identified, a very different task arises, of mapping these sounds to an innate set of features. This is a challenge because the fit is imprecise: when we compare sounds across languages we see that there is a lot of variation in the realization of features (Lindau and Ladefoged, 1986; Ladefoged and Maddieson, 1996; Pulleyblank, 2006). Building the grammar is a further challenge: in addition to encoding observations about the behavior of the features that function distinctively in the language, the learner must also encode the directive to disregard features that do not function in the language. For example, under Optimality Theory, this would include learning which feature-constraints to promote and which to demote in the constraint hierarchy. The contrasts are summarized in **Table 1**.

In essence, under EG the learner's task is to work from concrete sounds to an increasingly symbolic system; under UG the tasks continually change, from categorizing, to mapping categories to abstract symbols, to organizing grammatical

TABLE 1 | The challenge.


statements in a way that matches the observed categories (and creating those statements if they are not part of the genetic endowment).

In this paper, we explore the contribution of EG to the acquisition of adult grammars, looking first at the effects of categorization (English /ô/, Section 2), then examining the role of frequency (Bantu harmony, Section 3), and finally considering the impact of this approach on building a symbolic system, the overall morphophonological grammar (Esimbi prefixes, Section 4).

# 2. Categories and Generalization

Mielke et al. (2010), Mielke et al. (accepted), Archangeli et al. (2011) report on a study of the articulation of American English /ô/, in different syllable-, consonant-, and vowel-contexts. It is well-known that there are both bunched and retroflex articulations of /ô/ in American English, where a retroflex /ô/ has the tip raised and the dorsum lowered and a bunched /ô/ has the tip lowered and the dorsum raised (Zhou et al., 2008); while some speakers use one articulation and some use the other, still others use both (Delattre and Freeman, 1968; Ong and Stone, 1998; Guenther et al., 1999; Campbell et al., 2010). Mielke et al. (accepted) demonstrates that for those speakers who use both bunched and retroflex articulations, the distribution of articulations is highly systematic for each speaker, and highly categorical.

Mielke et al. (accepted) observes that, by and large, subjects who use both bunched and retroflex do so categorically by environment. Interestingly, these environments are not shared across speakers. Rather, each speaker using both articulations has developed his/her own pattern of bunched and retroflex environments. Furthermore, there appears to be no evidence that the different articulations are perceptible, hence the speakerspecific systems appear to be "covert."

A further point of interest is the nature of these covert grammars. The **Table 2a** shows Optimality Theoretic grammars (Prince and Smolensky, 1993; McCarthy and Prince, 1995; McCarthy, 2002) for four of the 11 such speakers. In each case, the constraint <sup>∗</sup> ô ("avoid bunched /ô/") is ranked above ∗ õ ("avoid retroflex /õ/") encoding the fact that these speakers preferred retroflex /õ/ to bunched /ô/ except in a specific set of environments. The several constraints that outrank "avoid bunched" provide the simplest characterization for each speaker of the special environments where /ô/ is bunched, not retroflex. As inspection of **Table 2a** reveals, there is a high degree of similarity in these grammars, but none are the same. (See Mielke et al., accepted for discussion of the phonetic properties of the conditions governing the two articulations of /ô/.)

In contrast, compare the grammars for dark and light /l/ in four languages where the distinction is allophonic (of 17 languages reported on), **Table 2b**. In these cases, the pattern is a characteristic of the language, not of individual speakers. Most striking is the relative simplicity of these four overt systems, with only one or two constraints outranking the core grammar defining how /l/ is articulated in each language.


iv. Alabama \*ulC >> \*lC >> \*ł >> \*l

TABLE 2 | Covert and overt systems: column headings in (a) indicate general properties of the relevant contexts; individuals implement the contexts in different ways.

#### 2.1. Summary

While we see that the overt systems have simplified rules/constraint hierarchies, possibly to make them more robustly identifiable, of real interest here are the covert systems. The patterns of covert /ô/ allophony show the categorical use of distinct articulations. More importantly, we see that individuals make individual generalizations even in the absence of consistent input in the environment: Categorization and generalization into a symbolic system is simply what humans do when encountering data. (For further discussion, see Archangeli, 2009).

These observations are consistent with the conclusion that humans are driven to generalize. These generalizations are based on data available to the learner but may go beyond observable patterns. Shared patterns result when certain sound types or sound sequences have an observable skewed distribution, the topic of our next section.

# 3. Frequency and Generalization

A key difference between UG and EG is that UG leads us to expect categorical effects: a rule applies, or it does not apply; a constraint is dominant, or it is subordinate. Under this categorical approach, arbitrary exceptions are troubling, yet it is well-known that language is "messy." This has led to models which assume UG yet abandon strict categorical behavior by allowing exceptions to rules (e.g., Chomsky and Halle, 1968) and to constraint-based models which assign values to constraints without imposing discrete ranking (e.g., Legendre et al., 1990; Boersma, 1997; Boersma and Hayes, 2001; Goldrick and Daland, 2009; Pater, 2009). In contrast, exceptions are expected and normal under the Emergentist model because frequency does not require absolute, categorical behavior, but simply a skewed distribution in order to identify a pattern.

In this section, we present evidence showing that frequency data is consistent with EG, not with UG. The discussion is based on Archangeli et al. (2012b).

Archangeli et al. (2012b) considers three differing predictions of the UG and EG models. We address two of them here, summarized in **Table 3**. First, how well do the data match the



rules/constraint rankings of the language? We call this "goodness of fit"; UG predicts very few exceptions to rules/constraint rankings, so the data should fit the grammar very tightly. EG, by contrast, builds a grammar from the bottom up, so predicts a range of fits from tight to loose even within the same grammar. Second, the bottom-up EG model means that the learner is figuring out phonological patterns—and making classificatory errors—even before morphological categories are established. Consequently, EG predicts that a pattern that is morphologically restricted to one domain will gradiently extend into other domains (e.g., a pattern restricted to verbs will nonetheless be found, though to a lesser extent, in nouns). UG predicts the absence of extension into other morphological domains: the rules/constraints are defined and exceptions should not occur. (Similarly a pattern that is phonologically restricted is expected under EG to extend to a broader phonological domain, a prediction UG does not make. We suppress that discussion here in the interests of space; see Archangeli et al., 2012b for details.)

This study required data with very specific properties. In addition to identifying languages with some pattern having both morphological and phonological restrictions, the languages had to be organized into searchable databases and there needed to be comparable control languages.

An appropriate pattern was found with Bantu height harmony. In many Bantu languages, verb suffixes alternate between high vowels and mid vowels, with the mid vowels occurring after other mid vowels. The pattern is described as morphologically restricted to verbs. It is also phonologically asymmetric, with [e] typically not followed by a high front vowel ( ∗ e...i) and with [o] not followed by both front and back high vowels (<sup>∗</sup> o...i, ∗ o...u). The paradigm in **Table 4** illustrates the pattern.

The harmonic pattern leads to an expected skewing of the distribution of vowels in these languages: we expect even

TABLE 4 | Bantu Height Harmony in Ciyao (Ngunga, 2000).


distribution of all Vi ... Vj sequences except with three sequences, e...i, o...i, and o...u. Each of these three sequences is unexpected in test-case verbs but expected in the other two environments, Bantu nouns and the control languages. Archangeli et al. (2012b) focuses specifically on these three sequences.

Relevant data sets of Bantu languages with a five vowel system [i, e, a, o, u] and height harmony are found in Bukusu, Chichewa, Ciyao, Ikalanga, Jita, and Nkore-Kiga, in the Comparative Bantu OnLine Dictionary (CBOLD: http://www.cbold.ish-lyon. cnrs.fr/). Control cases (with the same five vowels and no harmony) were found in freelang.net (http://www.freelang.net/): Ainu, Fulfulde, Hebrew, Japanese, Kiribati, and Maori.

In the test words, Archangeli et al. (2012b) counted sequences of two vowels, V1...V2 for all V1, V2, ignoring intervening consonants in all words and ignoring prefix vowels in the test languages (because the harmonic pattern does not extend to prefixes). These counts were used to determine the expected distribution of V1...V2 sequences for each V1...V2 pair in each language; for test languages, the data were further subdivided into nouns and verbs. Comparing the observed with the expected distributions (chi square, with observed/expected ratios converted to log2 values) revealed which sequences were overrepresented and which were under-represented. As noted above, of special interest are the sequences e...i, o...i, and o...u, each of which is expected to be underrepresented, given the harmony pattern. A value of 0 shows distribution as expected; negative values show under-represented sequences (−1 appears half as often as expected, etc.) and positive values show overrepresentation.

In determining goodness-of-fit, a tight fit for a disallowed pattern is shown by extremely negative values (non-occurrence is -∞), while a loose fit is shown by somewhat negative values. In all three cases, the control language averages are very close to 0, while the verbs in test languages average significantly below 0.

At the same time, each of these key sequences is found in verbs, in some if not all of the test languages. As Archangeli et al. (2012b) shows, there are only three languages where one of the sequences is not found in the verb sample. In all three cases, the unattested sequence is o...u; it is not found in Chichewa, Ciyao, or Nkore-Kiga. A sequence like e...i, in contrast, is rare but does occur occasionally; for example, Ciyao has verb stems like -nyésíma "glitter" and -gwésima "be dullwitted," exceptions to the general prohibition against a mid vowel followed by a high front vowel.

A close, tight fit between data and generalization would show no occurrences of these sequences in any of the languages. But in all cases, while the distribution of the key sequences in verbs is well-below the 0-line, the distance from the 0-line varies by language and by vowel sequence. In short, we do not see the tight fit predicted by UG; instead we see gradient adherence to the pattern as predicted by EG.

The expectation with morphological extension under EG is that the distribution of the three key sequences will also be depressed in nouns (less than 0, but greater than the verbs); UG expects these sequences to show normal random distribution (near 0). The facts support the EG hypothesis: There is a skewing toward under-representation of these sequences in nouns, though it is not as pronounced as in verbs. Furthermore, the more skewed the verb sequence is, the more skewed the noun sequence as well.

In this section, we have summarized the argument in Archangeli et al. (2012b), that frequencies of V...V sequences in the Bantu show a loose fit to the pattern, and a gradient extension of the morphosyntactic domain, precisely as predicted in **Table 3** by a minimal innate linguistic endowment for phonology, the Emergentist model.

Archangeli et al. (2012a) goes a step further, expressing prohibited and preferred sequences as conditions; overrepresented sequences such as e...e and o...e lead to the promotion of conditions such as if V1 = [e, o] then V2 = [e], while under-represented sequences such as e...i and o...i do not induce promotion of some condition, etc. These conditions express the grammatical generalizations that phonologists converge on, and so provide a means of discovering phonological patterns in a language without appeal to innate constraints or constraint (or rule) schema. From these demonstrations, we conclude that the language-learning infant can discover and express phonological patterns in their language without appeal to innate linguistic universals, at least in the kinds of cases considered: The general strategy of attending to the frequency of different sequences leads to identifying and symbolizing patterns.

Our goal to this point has been to demonstrate the merit of Emergent Grammar: the predictions of EG fit the data better than do the predictions of UG. We turn now to a very different type of question, namely, the implications of EG for other aspects of grammar. That is, does the nature of an analysis change significantly if we adopt EG? In the next section, we argue that there are clear differences in the way a language is represented.

#### 4. Implications for Grammars

In this section, we explore the prefix vowel patterns in Esimbi, a Tivoid language, a member of the Bantoid branch of Niger-Congo (Stallcup, 1980a,b; Hyman, 1988; Coleman et al., unpublished manuscript; Kalinowski, unpublished manuscript; Koenig et al., unpublished manuscript; Stallcup, unpublished manuscript)<sup>1</sup> . While the surface vowels of roots do not alternate, some prefix vowels do alternate, depending largely on the root to which they are attached. Esimbi vowels are given in **Table 5a**, and the forms in **Table 5b** show no surface trigger for the difference in prefix vowel height: the class 7, 8, 9, and 10 prefixes are high with the roots for "bone" and "back" but mid with the roots for "belly" and "cane rat."<sup>2</sup>

<sup>1</sup>This section is based on a more complete study of Esimbi vowels (Archangeli and Pulleyblank, unpublished manuscript).

<sup>2</sup>The language is generally analyzed as having three level tones, a rising tone and a falling tone. Tone is marked as in our sources, though in some instances tones



TABLE 6 | Verbs with infinitive prefix (Hyman, 1988) (tone not included in source).


A standard generative approach to the pattern would be to assign underlying height values to roots, cause prefixes to harmonize with roots in terms of height, and then to neutralize all root vowels to high (see, e.g., Hyman, 1988). This results in surface opacity under the assumption that the prefix height is a phonological alternation because there is no surface phonological trigger for the prefix alternation. Under EG, we ask first what the learner is likely to generalize based on frequency of category distributions. We then turn to the question of whether these generalizations resolve the opacity problem.

#### 4.1. Root Properties

Without going into detail here, we assume that identifying morphs and classes of morphs in a concatenating language like Esimbi is a challenge that the learner has faced and overcome. (See Archangeli and Pulleyblank, 2012, Forthcoming 2016a,b for those details.) We start here with the point at which the learner has already started identifying nouns and verbs as distinct from each other, and is noting that phonologically different forms of verbs appear with different meanings. This enables the learner to identify, for a sequence such as uri, that there is a verb root, ri "eat," and an infinitival marker u.

As the data in **Table 6** show, verb roots vary in length from 1 to 3 syllables. However, despite the 8 vowels in the vowel inventory, **Table 5a**, verb roots are restricted to a limited set of vowels, the high vowels [i, 1, u]. Furthermore, the vowels in a verb root are overwhelmingly identical, all [i], all [1], or all [u], but no combinations. In short, root vowels are high; root vowels agree in frontness and in rounding. This pattern is further confirmed by inspection of nouns, representative examples given in **Table 7**, which shows that this distribution of height and identity holds of all roots, not just of verbs.

Review of prefixes in Esimbi shows that any of the eight vowels may occur as a prefix, one property that distinguishes them from roots. (The vowel [1] occurs only in invariable prefixes, not in the prefixes that alternate; our focus is on the alternating prefixes.) Our first set of generalizations, a–e below, captures the restrictions on roots, restrictions that do not extend to prefixes. We express the sequential conditions as unbounded restrictions on particular feature sequences (Smolensky, 1993; Pulleyblank, 2002; Heinz, 2010). We also assume that generalizations about the sounds of the language include statements like <sup>∗</sup> [front, round], etc.; we do not include these statements in our discussion.


#### 4.2. Prefix Distribution

The prefixes are far more challenging. In **Tables 6**, **7**, we see that the correct form of the prefix depends in part on the particular prefix (e.g., the infinitive prefix is back and rounded, one of [ u, o, O ], while the singular 9 prefix is front and unrounded, one of [ i, e, E ], etc.), while selection of a specific morph from within each prefix set depends on which root the prefix is attached to.

In figuring out the morphs of Esimbi, a further set of generalizations is possible, relating prefix morphs to each other. This set of generalizations is definitive in some cases, shown in f–i, but in other cases options are available, as in j–l<sup>3</sup> .


Lexical generalizations of this sort are potentially useful to the learner: when a new form is encountered, it is possible to "fill in the blanks" in the lexicon. Thus, if a new form with an [i] prefix is encountered, the learner anticipates items with [e] and with [E] as the corresponding prefix.

Which prefix morph is selected depends on the root to which the prefix is attached, as summarized in **Table 8**. An examination of these patterns establishes that roots need to be partitioned into three sets, A, B, and C, corresponding to the three rows in **Table 6** and to the partitions within the three "root V" blocks in **Table 7**.

differ depending on the source and not all tones are marked orthographically. There are unresolved issues in the analysis of tone in Esimbi (Coleman et al., unpublished manuscript).

<sup>3</sup>These generalizations can be made even more general. For instance, f could be stated as "If a prefix has a high front vowel, its morph set includes all front vowels"; statements f,h can be generalized over: "If a prefix has a high vowel, its morph set includes all vowels with like rounding/backing," and so on. However, further discussion of this type of generalization takes us afield from our main point here.



[Typos in the tones of 'fish' and 'hoe' in Hyman (1988) have been corrected (Larry Hyman, p.c.).]

TABLE 8 | Esimbi prefix descriptive summary.


As summarized in **Table 8**, a Set A root selects the highest/most advanced morph possible: { i, e, E }; { u, o, O }; { o, E, O, a }; Set C roots select the lowest/most retracted morph possible: { i, e, ε }; { u, o, O }; { o, E, O, a }. Set B selects morphs that are not peripheral in terms of height among the possible morphs: { i, e, E }; { u, o, O }; { o, ε, O, a }.

As this point, we have identified lexical properties of both prefixes and roots in Esimbi. Roots are assigned to one of three sets, A, B, C, and as far as we can tell, the assignments are arbitrary. That is, there is no phonological property of a root that could be used to determine which prefix occurs with that root. Prefixes are identified as a collection of morphs. What remains is to identify the generalizations by which roots select the appropriate morph from each set<sup>4</sup> .

#### 4.3. Esimbi Prefix Selection

The general strategy we propose when selecting among alternatives is to identify the form that best fits whatever requirements there are for a given situation; for Esimbi prefixes, that means selection of the morph that best fits the requirements of the root to which it is attached. Essentially, with Set A roots, the root prefers a high and advanced vowel if possible, while with Set C roots, the preference is for a retracted vowel, preferably low. With Set B roots, the root gives no guidance and so the most representative morph of the set is selected.

#### 4.3.1. Set A Roots: Prefer High Advanced Vowels

Consider first roots of Set A, exemplified in **Table 9i** (there are no Set A roots with the root vowel [1]). Set A roots require the highest, most advanced morph of the set. The key generalization for Set A roots is that these roots prefer that a prefix be high and be advanced, **Table 9ii**. As laid out in Archangeli and Pulleyblank (2015), the grammatical expression of this kind of preference is part of the lexical representation of the verb roots. For Esimbi, Set A is defined by a specified preference for a preceding high vowel and a preceding advanced vowel, **Table 9iii**.

Where the prefix morph set includes a vowel that is high and advanced, that vowel is selected because it is a perfect match: as shown in **Table 9iv**, the prefix { i } is selected over other members of the morph set { i, e, E }, and as shown in **Table 9v**, { u } is selected out of { u, o, O }. If the prefix morph set does not contain a high advanced vowel, as with the morph set { o, E, O, a }, then an advanced vowel is the best selection possible, as shown in **Table 9vi**. Defaults (underlined) are discussed in Sections 4.3.3, 4.3.4.

Our formal representation of selection, shown in **Table 9** as well as in **Tables 10**–**12**, bears similarities to Optimality Theoretic tableaux (Prince and Smolensky, 1993; McCarthy, 2002). Differences lie in the nature of constraints (learned vs. innate) and the "candidate set" (the Cartesian product of relevant morph sets vs. an infinite set). Tables like those in **Tables 9iv–vi** are interpreted in a fashion similar to Optimality Theory tableaux (Prince and Smolensky, 1993), with the following differences. First, the upper left cell shows the morpho-syntactic features to be manifested in a phonological form (see Archangeli and Pulleyblank, Forthcoming 2016a for more on this point). The conditions across the top row are the conditions learned based on exposure to data; they are not innate "universals." The possibilities in the lefthand column are all logically possible combinations of the relevant morphs—a finite set limited by the

<sup>4</sup>While much of our proposal here is compatible with that of Donegan (2015), this is a difference: Donegan requires that the hearer "undo" phonological processes to access the lexical item; in our model, there is no single abstract underlying form and so no phonological processes to undo. In this way, our work is more similar to the Cognitive Grammar model in Nesset (2008).

#### TABLE 9 | Analysis of prefix selection for Esimbi Set A words.



ii. Set A conditions: Set AHI , ATR

Set A roots prefer a preceding [High] vowel. Set A roots prefer a preceding [ATR] vowel.

#### iii. Set A example representations

{ bìHI , ATR } 'goat' { sùmuHI , ATR } 'thorn'

iv. Assessment of Set A root { s´uHI , ATR } FISH.9/10 with { i, e, E } SINGULAR.9 prefix


v. Assessment of Set A root { tiliHI , ATR } END.3/6 with { u, o, O } SINGULAR.3 prefix


#### vi. Assessment of Set A root { tiliHI , ATR } END.3/6with { o, E, O, a } PLURAL.6 prefix


vii. Prefix selection, Set AHI , ATR roots

characterization: Set AHI , ATR implementation: Select High, Select ATR >> Default consequence: { i, u } ≻ { o, e } ≻ { E, O, a }

Cartesian product of the morphs involved (not an infinite set as in Optimality Theory). As with Optimality Theoretic tableaux, dashed vertical lines show unranked conditions and solid vertical lines show critical rankings; the symbol <sup>∗</sup> is used to show when a form does not satisfy a particular condition and <sup>∗</sup> ! shows crucial violations that eliminate a form from consideration. The thumbs up ( ) indicates the form selected, given the morphs and conditions. See (Archangeli and Pulleyblank, Forthcoming 2016b) for deeper comparison and contrast.

The selection generalization and the implementation of bestfit are summarized in **Table 9vii**.

#### 4.3.2. Set C Roots: Prefer Low Retracted Vowels

With Set C roots, the analysis is very similar; the key difference is that these roots select for low, retracted vowels in their prefixes. Examples are given in **Table 10i**. In this case, the generalization is that low retracted vowels are preferred. In the absence of a low retracted vowel, either low or retracted vowels are preferred. This selects { a } over { o, E, O }, { E } over { i, e }, and { O } over { u, o }. Set C is defined and exemplified in **Tables 10ii–iii**.

Where the prefix morph set includes a vowel that is low and retracted, that vowel is selected because it is a perfect match: { a } is selected over { o, E, O }, as shown in **Table 10iv**. If the prefix morph set does not contain such a vowel, as with the morph sets { i, e, E } and { u, o, O }, then a retracted vowel is preferred to the two advanced vowels, as shown in **Tables 10v,vi**, respectively. The selection generalization and the implementation are summarized in **Table 10vii**.

The preference for low retracted vowels selects { a } for CLASS 6, the one prefix morph set with a low retracted vowel. In the

#### TABLE 10 | Analysis of prefix selection for Esimbi Set C words.

i. Set C nouns


#### ii. Set C conditions: Set CLO , RTR

Set C roots prefer a preceding [Low] vowel.

Set C roots prefer a preceding [RTR] vowel.

#### iii. Set C representations

{ zùLO , RTR } 'snake' { b1LO , RTR } 'broom' { simiLO , RTR } 'grain'


#### v. Assessment of Set C root { zùLO , RTR } SNAKE.9/10 with { i, e, E } SINGULAR.9 prefix


#### vi. Assessment of Set C root { simiLO , RTR } GRAIN.3/6 with{ u, o, O } SINGULAR.3 prefix




other two prefix morph sets, there is no low-voweled morph, and the next best thing is a match for the retracted feature, selecting { E } for CLASS 9–10 and { O } for CLASS 3.

#### 4.3.3. Set B Roots: Phonological, But Not Morphological, Selection

Selection of the prefix morph for B roots is a bit more interesting, involving both selection of a "default" morph and an interaction of phonological sequencing restrictions with morph selection. We consider the default effect first. Set B nouns are illustrated in **Table 11i**.

We propose that Set B roots place no restrictions on morph vowels, leaving the selection to be determined for each affix by other criteria, such as the properties of the morph set itself. Since Set B roots do not impose any selectional restrictions on morph choice, the default form of each prefix is selected, illustrated in **Table 11ii** for o-ki TAIL.3.SG and in **Table 11iii** for ´O-tu EAR.6.PLURAL.

Of some interest, however, and unexplained at this point, is why the morph set in **Table 11iii** includes the vowel [E], since neither root Set A nor root Set C selects [E], and since [E] is not the default vowel for the morph set. To

#### TABLE 11 | Analysis of non-low prefix selection for Esimbi Set B words.

#### i. Set B nouns


#### ii. Assessment of Set B root { ki } TAIL.3/6 with { u, o, O } SINGULAR.3 prefix


#### iii. Assessment of Set B root { tu } EAR.3/6 with { o, E, O, a } PLURAL.6 prefix


#### TABLE 12 | Analysis of prefix selection for Esimbi Set B words and the prefix set with the low vowel option.



ii. Front/back generalizations: Avoid sequences that disagree for [back], [front] within words.

a. \*[back]...[front]WORD

b. \*[front]...[back]WORD

#### iii. Assessment of Set B root { ki } TAIL.3/6 with { o, E, O, a } PLURAL.6 prefix


#### iv. Assessment of Set B root { tu } EAR.3/6 with { o, E, O, a } PLURAL.6 prefix


address this point, let us consider where { E } appears with Set B items. Representative examples are given in **Table 12i**.

Inspection of these forms reveals a familiar restriction: not only must vowels agree for backness in roots but also, as these data show, in words as well. That is, two of the root restrictions seen above (<sup>∗</sup> [back]...[front]ROOT and <sup>∗</sup> [front]...[back]ROOT) hold more broadly than of the root alone. These two restrictions, stated in **Table 12ii**, hold of words, and so can drive the selection among morphs.

As seen in **Table 12iii**, the front/back requirement takes priority over default morph choice. Note that the front/back phonotactic serves to choose a particular morph, not to require morphs to change their form. Otherwise, the default [O] is selected, **Table 12iv**. Where the morph set contains no morph satisfying the phonotactic condition, then the condition serves no deciding role. For example, since all class 9 morphs { i, e, E } are front, it is impossible to satisfy the phonotactic when this prefix occurs with a back vowel root, e.g., [ì-sú] "fish" **Tables 9i,iv**, and other criteria determine which form to select.

#### 4.3.4. Excursus on Identifying Defaults

In this section, we consider how the default morph might be identified during acquisition. While completely arbitrary designation of a default morph may be necessary in at least some instances, there is more that can be said in general.

First, consider that the default morph must be in an elsewhere relation with selected morphs. For example, with the morph set { i, e, E }, Set A roots select morph { i } and Set C roots select morph { E }. In the absence of such specific selections, the default is therefore the only remaining morph, namely { e }. While the selected morphs must have specific properties to match selectional criteria, there is no such requirement of the default morph. We might therefore expect that in at least certain cases, default morphs would not yield as straightforwardly to a unique characterization. This is certainly true in the Esimbi case. The three default morphs { e }, { o }, and { O } do not share any consistent features as unique identifiers of the set. They can be front, back, unrounded, rounded, advanced, retracted; even their mid-vowel height, while a necessary property in these prefix cases, is not a sufficient property (consider, for example, the set { o, E, O, a }).

Independent of such selectional issues, we might expect default morphs to exhibit certain properties. For example, all else being equal, we would expect that if morphs differ in their frequency: the more frequent morph is the default morph. While we consider this hypothesis reasonable, we do not have the data to assess it for Esimbi.

An additional property we hypothesize to hold of default morphs is representability, that is, the default morph best represents the full set of morphs. Consider three cases. If there is a single morph in a set, then obviously that morph is fully representative of the set. It is the "default" in that it will occur independent of specific requirements, but since there is only one form the notion of "default" is not interesting. If there are two morphs, then it is impossible to speak of one or the other better representing the set as each morph represents an identical (but opposite) divergence from the set's (putative) default. In such binary cases, we might refer to frequency to establish the default morph, but representability will be irrelevant. In cases with more than two morphs, however, we can assess overall properties of the morph set, and identify a particular morph as being representative of those properties.

We will consider the morph sets one by one, starting with the set { i, e, E }. In this set, all vowels are front and unrounded. This clearly establishes that the prototypical version of this morph set should be front and unrounded. Differences in the morphs are restricted to differences in the features [high] and [ATR]. With respect to [high], two of the three vowels—the majority are nonhigh while only one is high. This establishes that the prototypical value should be nonhigh. Similarly, two of the three morphs are advanced, establishing that the prototypical value should be advanced. Put together, a consideration of all features individually establishes that the prototypical morph for { i, e, E } should be front, unrounded, high and advanced, that is, { e }.

A similar assessment of { u, o, O } establishes that the prototypical morph in the morph set should be { o }. All vowels are back, all are rounded, two of the three are nonhigh, two of the three are advanced, hence back, rounded, nonhigh, advanced.

Turning to the four-vowel morph set { o, E, O, a }, the same assessment of representability establishes { O } as the default. All vowels in the set are nonhigh. Three of the four vowels are nonlow. Three are back. Three are retracted. Interestingly, two of the vowels are rounded and two are unrounded. Hence a consideration of representability establishes that the prototypical vowel for this set should be nonhigh, nonlow, back, and retracted; rounding is not determined. A consideration of the prototypical properties uniquely identifies { O } as the default (nonhigh, nonlow, back, retracted) in spite of the fact that rounding is indeterminate.

#### 4.4. Summary: Opacity Revisited

In this very brief discussion of Esimbi, we have shown that prefix forms in Esimbi have both idiosyncratic and systematic properties. The fact that there are three different prefix morph sets is idiosyncratic under this analysis, as is which prefix morph set is selected by a particular root. Each of these properties is characterized as part of the lexical representation for prefixes (morph set) and of roots (selection of prefix morph set). The choice of morphs from each set is systematic given the generalizations proposed for the language. The systematic properties are defined in terms of the features of the morphs, for specific selection within a class, for default selection, and for word-level phonotactic wellformedness.

The issue of surface opacity, raised in the discussion of **Table 5b** is a non-issue under this analysis. The problem derives from assuming that patterns such as these are entirely phonological. Assuming that a phonological difference in the roots is the source of the difference in prefix height requires that height distinctions be encoded in roots even though there is no surface evidence—in the roots—for the required distinction. "Markedness" constraints force uniformity of height features in roots; "faithfulness" constraints must reference features in roots which surface roots show no evidence of.

Emergent Grammar recognizes all types of generalizations that the learner might make. Among these are generalizations over sets of lexical items that are arbitrary based on their surface forms—such as a set of roots that selects for a high, advanced prefix. It is the recognition of such lexical generalizations coexisting with phonological generalizations that eliminates opacity as an issue in Esimbi prefix selection.

It is important to remember that Emergent Grammar principles led to this analysis of the interactions between phonology and morphology in Esimbi. At this point in our development of the model, learners identify morph sets, with selectional restrictions specific to morphs or morph sets, as well as purely phonological selectional criteria (such as the Esimbi prohibition against mixing back and front within a word). The generalizations proposed are all types of generalizations that might arise from making categories out of similar frequent items, coupled with a strong pressure to generalize and create a more abstract, symbolic representation.

# 5. Conclusion

We make three basic assumptions about human capabilities that are non-linguistic, but that are recruited to deal with language data:


We presented three cases studies, each illuminating a way in which these capabilities are implemented in language. First, we considered the case of English /ô/, arguing for the spontaneous creation of categories and generalizations over those categories, even in the absence of external evidence. Second, we reviewed the implications of the frequency of specific patterns with respect to languages showing Bantu height harmony vs. languages without a height harmony pattern. Finally, we presented the case of Esimbi prefixes, showing

# References


the role of categorizing morphs into sets and generalizing over both morphological and phonological categories to select the appropriate prefix morph. This analysis also demonstrated the ability to characterize morphophonological interactions of considerable surface complexity, but without appealing to the power of complex innate linguistic capacities.

To conclude, we have argued that conceptually, there are good reasons to explore how far we can get without UG. As seen with our case studies, phonological analysis without appeal to UG has promising empirical coverage. In other words, assuming categorization, attention to frequencies, and a preference for generalization gets us a long way toward a phonological system with minimal appeal to innate linguistic-specific capabilities. In answer to the question raised by the topic of this issue, we have yet to discover a persuasive role for innate linguistic endowment in phonology of the type frequently assumed. At the same time, we find that Emergent exploration of the phonologies of different languages frequently reveals an interaction with lexical representations of morphs, suggesting there may be a largely Emergent component to language morphologies as well.

# Author Contributions

The two authors shared equally in all aspects of this project.

## Funding

This research was supported in part by grant #410-2011-0230 to DP from the Social Sciences & Humanities Research Council of Canada.

# Acknowledgments

The Esimbi analysis benefits from discussion with colleagues and students at the University of Colorado in Boulder, the University of Arizona, and the University of British Columbia. We are especially grateful to Cristin Kalinowski and to Brad Koenig for generously sharing pre-publication versions of their work on Esimbi.

Come From?, eds R. Ridouane and G. Clements (Amsterdam: John Benjamins), 173–196.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Archangeli and Pulleyblank. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 1

# Design Features for Linguistically-Mediated Meaning Construction: The Relative Roles of the Linguistic and Conceptual Systems in Subserving the Ideational Function of Language

#### Vyvyan Evans\*

Bangor University, Bangor, UK

#### Edited by:

Umberto Ansaldo, University of Hong Kong, Hong Kong

#### Reviewed by:

Stefan Hartmann, University of Mainz, Germany Michael Pleyer, Universität Heidelberg, Germany

> \*Correspondence: Vyvyan Evans v.evans@bangor.ac.uk

#### Specialty section:

This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology

Received: 14 August 2015 Accepted: 27 January 2016 Published: 19 February 2016

#### Citation:

Evans V (2016) Design Features for Linguistically-Mediated Meaning Construction: The Relative Roles of the Linguistic and Conceptual Systems in Subserving the Ideational Function of Language. Front. Psychol. 7:156. doi: 10.3389/fpsyg.2016.00156 Recent research in language and cognitive science proposes that the linguistic system evolved to provide an "executive" control system on the evolutionarily more ancient conceptual system (e.g., Barsalou et al., 2008; Evans, 2009, 2015a,b; Bergen, 2012). In short, the claim is that embodied representations in the linguistic system interface with non-linguistic representations in the conceptual system, facilitating rich meanings, or simulations, enabling linguistically mediated communication. In this paper I build on these proposals by examining the nature of what I identify as design features for this control system. In particular, I address how the ideational function of language our ability to deploy linguistic symbols to convey meanings of great complexity—is facilitated. The central proposal of this paper is as follows. The linguistic system of any given language user, of any given linguistic system—spoken or signed facilitates access to knowledge representation—concepts—in the conceptual system, which subserves this ideational function. In the most general terms, the human meaning-making capacity is underpinned by two distinct, although tightly coupled representational systems: the conceptual system and the linguistic system. Each system contributes to meaning construction in qualitatively distinct ways. This leads to the first design feature: given that the two systems are representational—they are populated by semantic representations—the nature and function of the representations are qualitatively different. This proposed design feature I term the bifurcation in semantic representation. After all, it stands to reason that if a linguistic system has a different function, vis-à-vis the conceptual system, which is of far greater evolutionary antiquity, then the semantic representations will be complementary, and as such, qualitatively different, reflecting the functional distinctions of the two systems, in collectively giving rise to meaning. I consider the nature of these qualitatively distinct representations. And second, language itself is adapted to the conceptual system—the semantic potential that it marshals in the meaning construction process. Hence, a linguistic system itself exhibits a bifurcation, in terms of the symbolic resources at its disposal. This design feature I dub the birfucation in linguistic organization. As I shall argue, this relates to two distinct reference strategies available for symbolic encoding in language: what I dub words-to-world reference and words-to-words reference. In slightly different terms, this design feature of language amounts to a distinction between a lexical subsystem, and a grammatical subsystem.

Keywords: meaning construction, language faculty, access semantics, LCCM theory, design features for meaning

# INTRODUCTION

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 2

This paper relates to broad research programme concerning the nature of the human capacity for language, and investigating what makes language special. For much of the second half of the twentieth century investigating this capacity has often been driven, in general terms, by asking the following question: What is the nature of language? One response to this question, and the prevailing view in Anglo–American language science, at least until relatively recently, was that it would ultimately be possible to identify principles, specific to language, that accounted for what makes it unique (e.g., Hauser et al., 2002). In particular, such principles were presumed to make language functionally distinct from other aspects of human cognition, as well as qualitatively distinct from, and functionally far more sophisticated than the communicative systems exhibited by other species. And these principles were assumed to be part of the human genetic-endowment. This functional specificity a species-specific feature of the human mind—is often referred to as the language faculty, and is embodied most notably in the tradition pioneered by Chomsky (1965), and thereafter.

In this paper, I propose a somewhat different perspective. And this arises as I begin, by asking a slightly different question. My starting point is to ask: what is language for? It is presumably unarguable that, from the perspective of language as a communicative system, it exhibits two main functions. The first can be characterized as an ideational function: language serves to convey ideas, ranging from stating one's name, making an idle comment on the weather, to declaring undying love (e.g., Evans, 2009, 2015b). And the second can be characterized as an interactive-interpersonal function (e.g., Levinson, 2006; Heine and Kuteva, 2007; Tomasello, 2008). Here language serves to signal intentional actions: actions in the sense that linguistic utterances have illocutionary force (Searle, 1969)—they attempt to influence the mental states, wishes, feelings and behavior of our interlocutor, influencing and even changing aspects of the world in the process.

Evidence for the interpersonal-interactive function of language comes from grammatical organization, as well as language-specific discourse conventions, demonstrating that language is fundamentally dyadic in nature. For instance, the languages of the world virtually all appear to include a pronoun system that maintains a role for second-person ('you'). Seemingly universal aspects of linguistic organization such as interrogatives (questions), imperatives (commands), and deontic modality (e.g., You may. . .), provide linguistic resources that seek to influence others (Heine and Kuteva, 2007). Moreover, knowledge of language use includes a complex system of turn-taking conventions associated with competent language use during ongoing discourse (Sacks et al., 1974; Sacks, 1992). Finally, languages appear to universally assume a speaker-hearer distinction. The distinction in English, for instance, between definite vs. indefinite articles (e.g., the vs. a), is evidence of this: the use of the indefinite article signals that while the entity being referred to is known by the speaker, it is left as unidentified for the hearer. Similarly, spatial deictic expressions—terms that take their reference from the speaker's spatial location, 'deictic' derives from the Greek deixis, meaning 'pointing'—in languages often assume a speaker-hearer dichotomy.

An example of spatial deixis, in English, isthe deictic expression this, which designates something proximal to speaker (and hearer), while that points to an entity that is distal. Some language have spatial deictics that disambiguate between proximity and distance from speaker and hearer, for instance 'close to speaker, but not hearer,' and 'distal to speaker, but not hearer.' Lexical items such as these again point to an organizational principle involving a speaker/hearer, and hence an interpersonal-interactive context.

For language to facilitate its interactive-interpersonal function, it stands to reason that language must have evolved as a means of expressing ideational complexity (Hurford, 2007). After all, we can, presumably, only influence the mental states of others once we have a fairly sophisticated symbolic means of expressing our own thoughts and feelings, in a bid to encode and externalize these, in order to have an impact on others and the world around us. And while the two communicative functions may have co-evolved—the interpersonal-interactive function may have led to increased ideational complexity, a more sophisticated means of expressing ideational complexity, in turn, enhanced our ability to engage interactively with others (e.g., Deacon, 1997; Hurford, 2012; Evans, 2015b)—in this paper, I primarily focus on the ideational function of language.

In particular, I argue that language, in fulfilling its ideational function, takes advantage of the semantic potential of the evolutionarily prior conceptual system, a system that, in outline at least, we share with other great apes—gorillas, chimpanzees, bonobos, and orangutans—primates more generally, and indeed, many other mammalian species (e.g., Barsalou, 2005; Call and Tomasello, 2008; Hurford, 2012). From this perspective, what makes language special is not that it is functionally distinct, for instance, an informationally encapsulated faculty or module of mind (e.g., Fodor, 1983). On the contrary, the linguistic system of any given language user, of any given linguistic system—spoken or signed—facilitates access to knowledge representation—concepts—in the conceptual system, in order to construct meaning, during the course of communication. The relationship, then, between a linguistic system, and the human conceptual system, is that of a symbiotic assembly, co-evolved fpsyg-07-00156 February 17, 2016 Time: 20:7 # 3

and co-adapted in order to enable meaning construction in the course of communication (Evans, 2015a,b). In order to achieve this, this meaning-making complex exhibits a number of design features, facilitating meaning construction. In this article I examine the two central design features of this meaning-making complex, which enable human meaning construction: the design features for a bifurcation in semantic representation, and for a bifurcation in linguistic organization.

In the most general terms, the human meaning-making capacity is underpinned by two distinct, although tightly coupled representational systems: the conceptual system and the linguistic system. Each system contributes to meaning construction in qualitatively distinct ways. This leads to the first design feature: given that the two systems are representational they are populated by semantic representations—the nature and function of the representations are qualitatively different (see The Bifurcation in Semantic Representation Design Feature). After all, it stands to reason that if a linguistic system has a different function, vis-à-vis the conceptual system, which is of far greater evolutionary antiquity, then the semantic representations will be complementary, and as such, qualitatively different, reflecting the functional distinctions of the two systems, in collectively giving rise to meaning. I consider the nature of these qualitatively distinct representations.

Second, language itself is adapted to the conceptual system the semantic potential—that it marshals in the meaning construction process (see The Bifurcation in Linguistic Organization Design Feature). Hence, a linguistic system itself exhibits a bifurcation, in terms of the symbolic resources at its disposal. As I shall argue, this relates to two distinct reference strategies available for symbolic encoding in language: what I dub words-to-world reference and words-to-words reference. In slightly different terms, this design feature of language amounts to a distinction between a lexical subsystem, and a grammatical subsystem.

# BACKGROUND

In this section I present the proposal that the conceptual and linguistic systems have distinct, albeit complementary, functions in subserving the ideational function of language. This section provides the necessary background for discussion of the two design features, that enable this, later in the paper.

# The Conceptual and Linguistic Systems

In previous work (Evans, 2009, 2015b), I have argued that human-like meaning-making is contingent upon a bifurcation in the two representational systems upon which linguistically mediated communication depends. Linguistic communication is contingent on an evolutionarily prior conceptual system. The human conceptual system, shared, at least in outline with the other great apes, evolved not for communication, but rather for functions such as reason, choice, learning, categorization and advance planning, in the quotidian world of threat and opportunity (Evans, 2009). Much later, and probably for much of the 2.8 million years of the evolutionary trajectory of the genus Homo, a linguistic system has been evolving built on the cooperative intelligence that emerged with the genus Homo (Evans, 2015b; see Deacon, 1997; Tomasello, 2014). And the linguistic system makes use of the qualitatively distinct representational format of the conceptual system, for purposes of communication. On this account, language provides a means of bootstrapping representations in the conceptual system for linguistically mediated communication (Evans, 2015a,b).

Our species shares in outline, especially with the other great apes, a complex conceptual system (e.g., Barsalou, 2005; Hurford, 2007, 2012; Evans, 2009). A conceptual system evolved not for communication, but for a range of more pressing, quotidian concerns, such as categorization, learning, forward-planning, way finding, and so on (Barsalou, 1992). But while many higherorder species possess sophisticated conceptual systems, humans appear to be alone in possessing language (e.g., Evans, 2014). In addition, the conceptual prowess of humans, as manifested, perhaps most notably, by the ideational and material culture characteristics of all human groups, is both quantitatively and qualitatively distinct from any other extant species (Tomasello, 1999, 2014).

One implication of this fact is that it may be language—and the cognitive and biological changes that were necessitated by it over the 2.8 million years of the ancestral human evolutionary trajectory—that has provided the sine qua non: language may be the key in unlocking the otherwise mute semantic potential of the human conceptual system (see, for instance, Mithen, 1996; Deacon, 1997; Evans, 2015b).

From this perspective, a linguistic system provides our species with added value: it provides an "executive" control function—an idea I shall develop during the course of the paper, operating over embodied concepts in the conceptual system (Barsalou, 2005; Barsalou et al., 2008; see also Evans, 2009). The idea I advance here is that language provides the framework that facilitates the composition of concepts for purposes of communication. This is achieved as language consists of a grammatical system, with words and grammatical constructions cueing activations of specific body-based states in the brain (Bergen, 2012: Chapter 5). On this account, language allows us to control and manipulate the conceptual system, which, after all, must have originally evolved for more rudimentary functions, such as object recognition and classification. Under the control of language, we can make use of body-based (not exclusively sensorimotor) concepts in order to develop abstract thought.

In short, representations in the linguistic system co-conspire with representations in the conceptual system in the process of meaning construction. And accordingly, the linguistic representations must have evolved to complement concepts in the conceptual system; accordingly, it stands to reason that the nature of linguistic representations must have a different quality from the rich, multimodal concepts in the conceptual system. After all, if language really does provide an executive control function, specialized for tapping into the conceptual system's meaning potential, then it stands to reason that language evolved a complementary function; the nature of semantic representation in language must be qualitatively different from fpsyg-07-00156 February 17, 2016 Time: 20:7 # 4

the representations—concepts—that populate the conceptual system.

# Evidence for the Embodied Nature of Concepts

Before continuing, I briefly review some of the evidence for thinking that the conceptual system is populated by representations that are embodied in nature. The embodied (or grounded) embodied cognition account of concepts blurs the distinction between perception/interoception and cognition (e.g., Barsalou, 1999, for an early, influential account). On this view, concepts are directly grounded in the perceptual and interoceptive brain states that give rise to them. This embodied cognition perspective takes a modal view of concepts: the semantic substrate of concepts is directly grounded in, and arises from, the sorts of modalities that the concept is a representation of (see Barsalou, 2008 and Shapiro, 2010 for reviews. Notable exemplars of this view include e.g., Damasio, 1994; Clark, 1997; Glenberg, 1997; Barsalou, 1999; Lakoff and Johnson, 1999; Zwaan, 2004; Gallese and Lakoff, 2005; Chemero, 2009; Evans, 2009; Vigliocco et al., 2009).

The embodied cognition view assumes that concepts arise directly from the perceptual experiences themselves. Take the example of the experience of dogs. When we perceive and interact with dogs, this leads to extraction of perceptual and functional attributes of dogs, which are stored in memory in analog fashion: our concept for 'dog,' on this view, closely resembles our perception and experience of a dog. When we imagine a dog, this is made possible by reactivating, or to use the technical term, simulating the perceptual and interoceptive experience of interacting with a dog—these include sensorimotor experiences when we pat and otherwise interact with a dog, as well as affective states, such as the pleasure we experience when a dog responds by wagging its tail, and so forth. But while the simulated dog closely resembles our conscious perceptual and interoceptive experience, it is, according to embodyists, attenuated.

In other words, the concept for 'dog' is not the same as the vivid experience of perceiving a dog. When we close our eyes and imagine a dog, we are at liberty to simulate an individual dog—perhaps our own pet—or a type of dog, or a dog composed of aspects of our past experiences of and with dogs. But the simulation is attenuated with respect to the perceptual experience of a dog—it doesn't have the same vivid richness that comes with directly perceiving a dog in the flesh.

Importantly, the claim made by the embodied cognition perspective is that the simulation is directly grounded in the same brain states—in fact, a reactivation of aspects of the brain states—that are active when we perceive and interact with the dog. The simulation is then available for language and thought processes. As the reactivation of some aspects of the perceptual and interoceptive experiences of a dog is, in part, constitutive of the concept for 'dog,' the concept is an analog of the perceptual experience. It is analog in the sense that it is very much like our perceptual experience of dogs: the concept must, in part, be constituted of body-based representations the sensorimotor experiences that comprise our perceptual experience—and, therefore, must be stored in the broadly the same brain regions that process the perceptual experience to begin with. This constitutes an embodied perspective as concepts are made-up, in part, of the very same body-based experiences that comprise our perceptual and interoceptive experiences.

Two main lines of empirical evidence suggest that the embodied cognition view of concepts, rather than the disembodied account, is on the right track. These relate to how the brain processes concepts, and how human subjects perform in behavioral tasks, when they must call up conceptual representations. Together, these two lines of evidence strongly suggest that concepts make use of the same brain regions that process the perceptual experiences that the concepts are representations of: it doesn't matter whether you are perceiving a particular experience (percept), or later, thinking about it after the event (concept), the same brain states are activated in both cases. This suggests that the same mental substrate that underpins perception also underpins cognition, and our representations (or concepts) of perceptual experiences.

Brain-based demonstrations reveal that the brain's sensorimotor and other modal systems—systems that are activated when we perceive a particular experience—are also activated during conceptual processing—when we think about or recall the experience, or even when we use or understand language relating to the experience. As we shall below, for instance, motor regions of the brain that are deployed for perceiving a particular tool, such as a hammer, and the way it is used, are automatically activated during non-perceptual tasks, such as thinking or talking about hammering. In short, a raft of studies provides clear evidence that the same motor processes in the brain are automatically engaged when subjects perform perceptual and conceptual tasks (Barsalou, 2003, 2008; Rizzolatti and Craighero, 2004; Gallese and Lakoff, 2005; Pulvermüller et al., 2005; Boulenger et al., 2008).

Behavioral demonstrations involve applying a stimulus of some kind to human subjects, and then observing their behavior when performing a particular task. Many of the relevant studies have involved sentence comprehension and lexical decision tasks (and I will have more to say about the relationship between language and concepts below).

However, one representative and important study required subjects to perform a lexical decision task employing action verbs relating to either arm or leg actions (Pulvermüller et al., 2005). The experiment made use of a technique, in cognitive neuroscience, known as transcranial magnetic stimulation (TMS). This is a non-invasive technique that involves passing a weak electric current, using electrodes attached to the scalp, to specific brain regions in order to stimulate them.

Subjects were asked to read words that related either to arm movement, such as punch, or leg movement, like kick. Immediately after reading, the TMS pulse was passed through either the leg region of the brain's motor cortex or the arm region. Subjects were then asked to signal when they had understood the word. The experimenters found that when subjects received a pulse to the 'arm' region of the brain, they processed arm words more quickly. And when exposed to an electric current to the leg region, they understood leg words more quickly. What this reveals is that words—which relate to mental representations, concepts—were influenced by activation of the perceptual areas of the brain dedicated to perceiving either leg or arm actions. And consequently, this provides powerful evidence that perceptual experiences underpin conceptual representations, as manifested in language.

# The Nature of Simulations

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 5

If the linguistic and conceptual systems together constitute a meaning-making complex, how do simulations arise? A linguistically mediated simulation is a general purpose computation, performed by the brain, which provides language users with an approximation of a speaker's linguistically mediated communicative intention.

The proposal is that words and other linguistic symbols are in fact cues that guide the way in which body-based states processed and stored by the brain are composed, in order to facilitate linguistically mediated meaning construction (Glenberg and Robertson, 1999; Fischer and Zwaan, 2008; Evans, 2009, 2013; Bergen, 2012: chapter 5). To illustrate, consider the use of red in the following example sentences:


In the first example, the use of red evokes a bright, vivid red. In the second, a dun or browny red is typically called to mind. This reveals that the meaning, or, more precisely, the perceptual simulation of red, is not, in any sense, there in the word. After all, red could, in principle, lead to activation of the full panoply of distinct hues we normally associate with red. These range, for instance, from the orange-red of fire, to the auburn-red of henna, to the crimson-red of blood, to the truly red of lipstick, and so forth. Knowledge of all these different shades arises from our interaction in and with the world, which we can, in principle, call to mind, and visualize in our mind's eye in the absence of language.

In these sentences, the word red provides access to this meaning potential: all our stored experiences for red. But while the sensory experience of redness is not coming from language itself, the word cues the perceptual and interoceptive states stored in the brain, associated with red in all its glory. And these bodybased states are reactivated during language use. Put another way, the word form red gives rise to distinct simulations for different hues of red.

But importantly, what's remarkable about the meaning-making complex—the linguistic and conceptual systems-assembly—is that the sentences in (1) enable us to construct just the right shade of red: a contextually appropriate shade. The linguistic context, in each sentence, guides the construction of the simulation, such that we obtain the 'correct' perceptual hue in each case.

#### The Bifurcation in Semantic Representation Design Feature

If the function of language is to index or activate bodybased concepts in the conceptual system, what is the difference between representations in the conceptual system vis-à-vis those in the linguistic system? The first design feature of linguistically mediated meaning construction, I argue in this section, constitutes a qualitative distinction in the two types of representation: the design feature for a bifurcation in semantic representation. This distinction I operationalise in terms of analog knowledge (indigenous to the conceptual system), and parametric knowledge (indigenous to the linguistic system).

# Arguments for Semantic Representations Indigenous to Language

There are a number of reasons for thinking that language comes equipped with semantic representations that are distinct from those that reside in the conceptual system—embodied concepts. I briefly review five here, based on Evans (2015a).

First, if language had no indigenous semantic content, we would be unable to use language to evoke ideas we haven't yet experienced. This follows as the brain states wouldn't yet exist for the corresponding experiences. But, it appears to be the case that language can do just that, facilitating the evocation of just those experiences not yet witnessed (Taylor and Zwaan, 2009; Vigliocco et al., 2009). For instance, I can describe a dance move to someone, using language, and more or less convey the move, even though my interlocutor may have never had previous experience of the move. While seeing and acting provide a directly perceived, multimodal context, enabling the formation of conceptual representations, an approximation can nevertheless be facilitated via language. While direct experience of the dance move—the experience of seeing, acting, and interacting, gives rise to body-based representations that are analog in nature—language, in contrast, doesn't work like that. The representations are more sketchy. Nevertheless, language can be used, even in the absence of prior experience, in order to evoke a partial representation of the dance move. This demonstrates that conceptualisations can arise via the medium of language.

Second, although activations of body-based brain states arise automatically in response to language use, they are not necessary for language to be successfully used. Patients with Parkinson's and motor neuron disease display difficulty in carrying out motor movements, as motor representations in the brain are damaged. Yet, both sets of patients are able to use and understand corresponding action verbs (Bak et al., 2001; Boulenger et al., 2008). This reveals that simulations arise not just from embodied brain states.

Third, language itself appears to encode a type of semantic representation that is qualitatively distinct from the sorts of rich, multimodal representations that populate the conceptual system. Consider, for instance, the semantic divergence between the use of the definite article, the, with the indefinite, a. One key distinction concerns specificity, as well as whether the information being introduced is already present or not, in the current discourse: whether the subject under discussion is given or new. That said, the and a don't have specific referents in the world, nor are they ideas that can be visualized, in the way that, say the noun dog, or even a scene associated with the more abstract nominal jealousy can be visualized.

What this reveals is that so-called grammatical or function words appear to provide a relatively schematic semantic representation: a type of content that is qualitatively distinct from concepts. The grammatical structure of language may provide an indigenous level of semantic representation, distinct from non-linguistic concepts.

Fourth, language appears to directly influence perception. In one study, the distinction in the linguistic encoding of color was exploited to investigate the non-linguistic effects of language (Thierry et al., 2009). It was found that differences across languages, for instance, Greek vs. English, in terms of encoding of monolexemic color terms led to distinctions in the perception and categorisation of color space. This finding strongly suggests that language provides semantic content independent of the conceptual system, consequently leading to the cognitive restructuring in non-linguistic cognition.

The fifth reason relates to what I have termed, in earlier work, the illusion of semantic unity (Evans, 2009). Otherwise distinct aspects of semantic space can, under the influence of language, come to be viewed as unified. For instance, the polysemy exhibited by language can relate a number of distinct semantic parameters, providing the appearance of homogeneity. Take the English lexical item over, as in the following examples:


What these examples reveal is that in English a variety of distinct semantic parameters—'above,' 'on the other side,' 'covering,' and 'completion'—are encoded by the same form. While the relationship between these semantic units is motivated (Tyler and Evans, 2003; Evans, 2015b), the units are nevertheless distinct. But the consequence of English employing the same form to encode a range of distinct—albeit semantically related meanings, is that English speakers perceive the semantic units to form a coherent semantic range. In contrast, other languages divide similar semantic space across different lexical items. The consequence is that the appearance of semantic unity is just that, an illusion, an artifact of the way in which individual languages cut up and/or unify semantic space. It also provides further evidence that language provides a level of semantic content independent of the conceptual system, which it nuances during the process of meaning construction.

# Parametric vs. Analog Concepts

This discussion of the semantic content, derived via the linguistic system, and distinct from non-linguistic concepts, brings us to the design feature of linguistically mediated meaning construction under discussion in this section. The semantic content associated with a mental simulation appears to arise from a symbiotic coupling of two qualitatively distinct knowledge types. For instance, the content associated with so-called content words, such as the open-class noun waiter, self-evidently, relate to information "above" the level of language. When we imagine a waiter, this involves rich information concerning the appearance, dress, location, and tasks involved in being a waiter. Information of this kind is multimodal in nature, involving information that is sensorimotor and/or interoceptive. In short, it is analog the information called to mind approximates the veridical "immersed" experience of perceiving and interacting with a waiter (cf. Zwaan, 2004).

In contrast, the so-called function or grammatical words and constructions concern information that is neither rich, nor multimodal, in the same way. In fact, the information conveyed is far more schematic in nature (Talmy, 2000; Evans and Green, 2006; Evans, 2009, 2013; see also Bergen, 2012: Chapter 5). To illustrate, if we exclude the semantic content associated with the open-class content words, in (3), we are left with a type of schematic representation that is not straightforwardly imageable, or perceptual. In short, the representations associated with grammatical structure, appear not to relate, in a straightforward way, with perceptual representations. And yet, such representations are nevertheless meaningful:

# (3) **Those** decorator**s are** ruin**ing my** wall**s**

In (3), by excluding the content words—decorator, ruin and wall—what remains is the function words, which I've highlighted in bold font. These are the inflections –ing and –s and the lexical items those, are, and my. In addition, the grammatical categories noun and verb also encode schematic semantic units, those of THING and PROCESS, independently of the specific lexical items that fill them—decorator, wall and ruin (Langacker, 1987, 2008; Evans, 2015b). So, the semantic representation of just these closed-class elements, together with the syntactic configuration in which they are embedded, can be captured as in (4):

(4) Those somethings are somethinging my somethings.

The gloss for this semantic representation can be provided as in (5):

(5) More than one entity close to the speaker is presently in the process of doing something to more than one entity belonging to the speaker. This provides quite a lot of semantic content.

That said, this semantic representation is, nevertheless, highly schematic. We don't have the details of the scene: we don't know what the entities in question are, nor do we know what is being done by the agent to the patient. Nevertheless, this illustration reveals the following: there appears to be a type of semantic representation that is unique to the linguistic system. Moreover, this representation provides information relating to how a simulation should be constructed (see Bergen, 2012 for a related point).

After all, the grammatical organization of the sentence entails that the first entity is the agent and the second entity the patient: the first entity is performing an action that affects the second entity. This level of semantic representation derives exclusively from language, rather than from representations in the conceptual system. It provides a set of instructions as to the relative significance, and the relation that holds, between the

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 6

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 7

two entities in the sentence. In short, the function words and the grammatical construction—in the sense of Goldberg (1995, 2006)—involves semantic content, albeit of a highly schematic sort (Talmy, 2000; Evans, 2009).

This distinction, in terms of the nature of the content associated with content words on the one hand, and function elements on the other, constitutes the second design feature for human meaning construction. Words like decorator, ruin and wall give rise to rich experiences, which are analog in nature: they relate to entities which we have directly experienced and about which we retain detailed knowledge. Accordingly, I refer to knowledge of that sort as analog concepts—concepts that are directly grounded in the experiences that give rise to them. How then does semantic structure (in language) differ from this level of conceptual structure—which is to say, from analog concepts?

To illustrate, consider the use of the adjective red, and the noun redness:


In both instances, the same perceptual hue is evoked, caused by the toxin we attribute to bee stings. But the simulation associated with the sentences is slightly distinct. In the first example, red, an adjective, gives rise to an interpretation in which the person's hand has the property of being red, a consequence of the bee sting. But in the second, the bee sting causes a particular ailment, deriving from the use of the noun redness.

As we have seen, a noun encodes a semantic unit: THING; this is what I refer to as a semantic parameter—a schematic semantic 'atom' of meaning, one specialized for being encoded in language (Evans, 2015b,c). In contrast, an adjective encodes the parameter PROPERTY (OF A THING). The consequence of the grammatical categories noun vs. adjective encoding distinct parameters is that the way in which the conceptual structure—the mental representation of red that resides in the conceptual system—becomes activated is nuanced by the language-specific representations—the parameters—encoded by grammatical structure. In short, the interpretation deriving from each of the examples in (6) diverges in subtle, albeit important ways. The interpretation arising from (6a), that the perceptual hue arises due to a skin property, is due to the use of the adjective. In contrast, the interpretation in (6b), with a divergent simulation, that of a skin ailment, is a consequence of the use of the noun. Put another way, language provides a level of knowledge that is more schematic—I use the term parametric than the rich, analog concepts—available from the conceptual system. And these semantic parameters, specific to language, I term parametric concepts.

My proposal is that analog concepts—which are semantic representations that populate the conceptual system—in evolutionary terms, had to precede the existence of language. Parametric concepts constitute a species of concept that arose as a consequence of the emergence of language. They provide a level of schematic representation directly encoded by language: parametric concepts guide how analog concepts are activated and, consequently, how simulations are constructed in the service of linguistically mediated meaning construction. For instance, the forms red and redness both index the same perceptual state(s). But they package the conceptual content in a different way, giving rise to distinct simulations: redness = ailment; red = property of skin. The schematic parametric concepts, which is to say, that part of semantic representation that is native to language, relates to THING vs. PROPERTY. Parametric concepts are language-specific affordances, rather than affordances of the conceptual system.

# THE BIFURCATION IN LINGUISTIC ORGANIZATION DESIGN FEATURE

While I've presented a proposal that there is a distinction between two semantic representational systems—the conceptual vs. the linguistic, which are each populated by qualitatively distinct representations—analog vs. parametric—the second design feature for linguistically mediated meaning construction relates to language itself. Language exhibits a bifurcation in terms of the nature of the linguistic symbols that populate it: the design feature of a bifurcation in linguistic organization. And this design feature is fundamental in terms of the enabling language to engage with the representations in the conceptual system, and hence, in terms of guiding the parcellation of analog knowledge in meaning construction.

# Two Types of Symbolic Reference

Language appears to employ two qualitatively distinct types of symbolic reference (Evans, 2015b). The first constitutes what I dub a words-to-world direction of symbolic reference: the type of symbolic reference which de Saussure (1916) largely focused on. In this type, signs are conventionally associated with specific objects and events in the world, and/or in the mind of the language user. The symbolic relation holds between a referential vehicle from the linguistic system, and an entity or idea outside the system. For instance, the English word /d6g/ refers to the pet of choice for many western households, as represented in **Figure 1**.

**Figure 1** captures this type of relation. It shows that a given sign—sign1, sign2, and so forth—is symbolically related to objects and events in the world and/or the mind. The symbolic relation, established by convention, is represented by the directed arrow, connecting a particular sign with its referential target.

Importantly, the nature of the referential target constitutes a potentially large body of knowledge that you and I may have concerning dogs, knowledge which is dynamically updated: each time you step outside your front door, and see a dog across the street, your knowledge is updated, and the symbol refers not just to specific exemplars of dogs, for instance, a Welsh Corgi, depicted in **Figure 1**, but other breeds too. It may also include a wide range of knowledge you possess concerning dogs, including their behavior and life cycle, their appearance, their status in human life and culture, as well as a plethora of information you'll have gleaned through direct experience with dogs, including dogs you may have known, as well as information derived through cultural transmission. Hence, the referential target of a sign in

fact relates to a complex web of knowledge, what I term the semantic potential of the target—developed in my theory of Access Semantics (Evans, 2009, 2013, 2015b).

The second symbolic reference strategy involves what I dub a words-to-words direction of symbolic reference (**Figure 2**). Here, the symbolic relation holds not between a sign, and an entity in the world and/or the mind. Instead, reference holds between one linguistic symbol and another. To illustrate, consider the following referring expression:

#### (7) a dog

While the noun phrase (NP), a dog, as a whole, refers in a words-to-world direction, the indefinite article refers to the noun (N), dog: it has a words-to-words direction of reference. Indeed, the semantic function of the indefinite article is specialized for words-to-word reference: whatever it is that the symbol, dog, refers to, the indefinite article tells us that the sign to which it refers, in this case, dog, is both univalent there's just one of it—and non-specific—the hearer can't be expected to have specific information about the entity; it is for this reason that the symbol a is termed the 'indefinite' article.

One way of thinking about the indefinite article is that, in part, it encodes a schematic slot—what has been termed an elaboration site (Langacker, 1987) –which is completed by a noun. In short, the English indefinite article requires a noun to elaborate it, and hence to complete its meaning. Notice that while the overall function of the referring expression—a dog—is to identity an individual entity in the world—a words-to-world direction of reference—the English symbol a is specialized for a words-towords direction: it assumes a distinction in lexical classes, such as noun vs. indefinite article.

Now consider a more complex example of words-to-words symbolic reference, focusing on the noun aim, in the following attested example:

(8) The Government's **aim** is to make GPs more financially accountable, in charge of their own budgets, as well as to extend the choice of the patient. (Schmid, 2000).

In (8), aim can be thought of as a shell noun (Schmid, 2000)—it refers to the entire conceptual complex that I've underlined. The underlined portion of the discourse chunk, whilst, on the face of it, relating to a complex set of ideas, is encapsulated as a coherent conceptual whole. Importantly, this is achieved via word-to-word symbolic reference: the noun, aim, provides a linguistic "shell," enabling reference to the complex idea that it points to. Evidence for this function comes from the next sentence in the discourse, which I present below:

(9) The Government's **aim** is to make GPs more financially accountable, in charge of their own budgets, as well as to extend the choice of the patient. Under **this new scheme**, family doctors are required to produce annual reports for their patients. . .(Schmid, 2000).

Having established a shell noun complex—the underlined portion—by virtue of a referring shell noun, aim, it is then possible to continue treating the complex as a single coherent conceptual entity, in ongoing discourse. Evidence for this comes from the new shell NP, this new scheme, which, again, I've highlighted in (9). This shell NP refers back to the underlined shell noun complex, established by the symbol aim, in the first sentence of the discourse chunk. In short, both aim and this new scheme refer symbolically in a words-to-words fashion, providing a means of packaging a complex idea—a shorthand mnemonic without the need to continue to spell out the entire idea itself.

Language, then, appears to make use both of words-toworld and words-to-words types of symbolic reference. **Figure 3** captures both directions of symbolic reference.

These two types of symbolic reference, exploited by language, are qualitatively different: words-to-words symbolic reference is more abstract than words-to-world symbolic reference. This follows as reference in this direction is to another symbol, rather than to an idea or entity in the world (or mind) per se. It presumes the existence of a linguistic level of semantic representation which can be referred to, independently of entities in the world. Moreover, this distinction reflects a fundamental design feature of language: the distinction between a lexical system and a grammatical system.

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 8

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 9

# How do the Two Reference Strategies Contribute to Meaning Construction?

The words-to-world referential strategy constitutes our ability to use linguistic symbols to cue or activate representations in the conceptual system. In contrast, the words-to-words strategy constitutes the means, afforded by the linguistic system, to construct the semantic scaffolding for a simulation. The semantic scaffolding enables the relevant part of the conceptual system's vast semantic potential to become activated during meaning construction. In short, words-to-words reference provides the basis for words-to-world reference to narrow in on just that aspect of the conceptual system that is relevant for linguistically situated meaning construction, as when the word red, in (1) enabled activation of two distinct perceptual hues. To illustrate the way in which this works, consider the following linguistic example:

(10) A waiter served the customers.<sup>1</sup>

This sentence features words, and other linguistic constructions, that serve two distinct reference strategies. Let's consider the words-to-world strategy first. This strategy equates, in linguistic terms with the so-called content words in the sentence. I've highlighted these in bold in (15):

#### (11) A **waiter serve**d the **customer**s.

A content word, as discussed earlier, usually taken to be a word that concerns rich content. In (11) these are the nouns waiter and customer, and the verb serve. These words relate to relatively rich aspects of the scene being described, in particular, the participants in the scene and the relationship that holds between them. Moreover, we are able to map these words onto rich and detailed scenarios, stored in the conceptual system, relating to our experience of interacting in the world. We each have rich, and varied experiences of restaurants, eateries and other venues that sell food for consumption in situ, including the format and moves involved in such service encounters. We know that a waiter is someone who liaises with the customer on choice of food, and the kitchen where the food is prepared. The waiter's function is to communicate with both parties, and to deliver the food, once prepared, to be consumed by the customer, in return for pay, and often, for a tip. In short, these content words encode a words-to-world relation: they enable language users to map the words onto specific participants and the relations holding between them; in slightly different terms, they facilitate to the rich analog knowledge that resides in the conceptual system: knowledge we have about a restaurant frame.

In contrast, the sentence also consists of function words and grammatical constructions (within which the content and function elements are embedded). I've placed the function elements in bold in (12):

(12) **A** waiter serve**d the** customer**s**.

Function words encompass those schematic notions which, in the most simplistic of terms, aren't imageable. For instance, while we can call to mind, should we wish, a waiter or a customer, or imagine what is entailed by a waiter serving a customer, it's not clear what is called to mind by grammatical words such as a, or the, the past tense marker –ed, or the bound plural morpheme – s. These elements, on their own, are specialized not for indexing particular entities in the world per se. Rather, their function is to say something about how we should interpret the other words in the sentence that they relate to. For instance, the past tense marker constrains our interpretation of the verb serve: it situates the serving event as having taken place before now. But in this way, the past tense marker is guiding the way in which, whatever it is in our conceptual system that serve facilitates access to, the way this knowledge becomes activated. Similarly, the plural marker provides a means of interpreting the free morpheme, the noun customer, to which it is morphologically bound.

One line of evidence for distinguishing between content and function words, between words-to-world and words-towords reference, takes the following form: if we change the content words, we obtain a different scene, yet the structural elements, provided by words-to-words reference, remain the same. Consider the following:

(13) A rockstar smashed the guitars.

In (13), when changing just the three content words an entirely different experiential complex—a simulation—arises, one involving a rockstar smashing guitars. This reveals that the function of words-to-world reference concerns people, things, events, properties of things and events, and so on. But the semantic scaffolding remains the same, as the words-to-words relations are unchanged: a, -d, the and-s. These aspects of the sentence concerns whether the participants (rockstar/guitars) evoked can be easily identified by the interlocutor (the use of the indefinite article a vs. the definite article the), that the event took place before now (the used of the past tense marker, -d), and how many participants were involved (the presence, or absence, of the plural marker –s).

Moreover, the semantic scaffolding provided by words-towords reference encompasses more than just the function words. It also includes the full range of grammatical constructions in which the content words participate. This includes the lexical class in which words participate: waiter and customer are nouns, while serve is a verb, as well as word order—in these example sentences, we have a declarative word order. And finally, the sentences all invoke active, rather than passive voice. In each case, these grammatical constructions—lexical class, word order, and voice—all facilitate a words-to-words referential strategy: they constrain how we should interpret the participants in the event, and the nature of the relationship holding between them.

Let's focus on lexical class first. Consider the following expressions:


While the expressions in (14) involve the same phonological forms, lift and thumb belong to different lexical classes in each expression. In (14a) thumb is a verb, and lift is a noun. In contrast,

<sup>1</sup>Example based on Evans (2009, p. 102).

in (14b) lift is a verb and thumb a noun. This follows because, one of the things we happen to know about English is that the article typically precedes a noun. And on the basis of this distributional analysis, lift, in (14a) and thumb in (14b) are nouns. Moreover, because we also know that verbs can serve an imperative function, especially when they appear in first position in an expression, thumb is a verb in (14a), while lift is a verb in (14b).

Now, the fact that the same phonological forms can shift their lexical class, as they do in these examples, reveals that the lexical classes, noun vs. verb, is a functional category independent of the phonological forms themselves. The categories noun vs. verb have functional significance independently of their lexical instantiations, and serve to constrain how we should interpret the phonological forms, and their referential targets, in each case. In (18a), consequently, the scene involves a hitch-hiking scenario, whilst in (14b) a different scenario is evoked, involving physical movement of someone's anatomy.

Similarly, the declarative word order in (10) signals that the scenario being evoked is one that the speaker knows, or assumes to be true, and is presenting it as such to the interlocutor. If we alter the word order, by adding the function word did so that waiter is no longer the first element in the sentence, as in (15), we no longer have a declarative construction, but rather an interrogative. And now we have a different perspective on the scenario: the speaker is no longer presenting the scenario as fact, but, in fact, signaling that they don't know whether the scenario is true.

#### (15) Did a waiter serve the customers?

What this shows is that the declarative, and indeed, interrogative word orders, in English constrain in rather important ways the way the information—the words-to-world strategy— is being packaged. Moreover, the ideational function and hence interactive-interpersonal function of both sentences is rather different: (15) invites a response in a way that (10) doesn't.

And finally, active voice designates a particular point of view, which constrains the nature of the relationship holding between the participants in a scene. In (10), the point of view is being designated as located with the agent—the waiter. If we change the grammatical construction to passive, as in (16), the point of view is now situated with the customers, even though the waiter remains the active participant—the agent—in the wordsto-world relation designated:

(16) The customers were served by a waiter.

The upshot of all this is that while the content of the simulation is achieved by language working to provide a structure for analog concepts—the scaffolding upon which the scene is constructed—language both affects, and consequently transforms, in significant ways non-linguistic content; in short, the conceptual content is packaged, for communicative purposes in the course of linguistically mediated meaning construction, by virtue of language-specific representations. **Table 1** provides a summary of what is conveyed by function words for sentence (14), whilst **Table 2** provides a summary from the perspective of analog concepts, accessed by content words (see Evans, 2009).

# Linguistic Access to the Conceptual Meaning Potential

The lexical vs. grammatical subsystems can be analyzed in terms of words-to-world and words-to-words alignment, and in terms of analog vs. parametric knowledge. And this provides the critical design feature for meaning construction.

I have argued that analog knowledge does not in fact coming from language: the distinction between lexicon and grammar provides a design feature for access to the conceptual system open-class words provide access, while the grammatical system


#### TABLE 2 | Content deriving from words-to-words referring expressions.


fpsyg-07-00156 February 17, 2016 Time: 20:7 # 10

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 11

facilitates the parcellation of analog knowledge to which open class words facilitate access. In short, a class of lexical elements, which I have loosely referred to as content words, and in English, associated most notably, but perhaps not exclusively with the 'big four'—noun, verb, adjective and adverb—facilitate access to analog knowledge, to the conceptual system. On this Access Semantics account (aka The Theory of Lexical Concepts and Cognitive models, or LCCM Theory, developed in Evans, 2009, 2013, 2015a), language has a ready-made means of facilitating access to a type of knowledge not present within the linguistic system. It can reuse existing knowledge, evolved for other means than communication, for purposes of communication. The words-to-words function of the grammatical subsystem enables the parcellation of analog knowledge and hence a means of sophisticated meaning construction. As we've seen, knowledge of this type is schematic, providing a semantic scaffolding that nuances the analog information, giving rise to complex and subtle meaning, as in the distinction between a skin condition rather than unwanted colouration of the skin, as in the examples in (6).

On this account, a subset of linguistic symbols provide access to the conceptual system: in both words-to-world, and words-towords directions; red and redness provide both types of symbolic reference. The parcellation of knowledge associated with analog information is driven by the parametric content conventionally associated with these forms: whether the perceptual hue is interpreted as a property of an entity or an entity in its own right, reified independently of whatever it happens to be a property of.

# CONCLUSION

In this paper, I have examined proposals for two central design features of the human capacity for linguistically mediated meaning construction: a bifurcation in semantic representation, and a bifurcation in linguistic organization. The striking claim to emerge is that language is tightly coupled with non-linguistic representations, in the conceptual system, which evolved not for communication. But language has evolved in order to bootstrap these representations for linguistically mediated communication.

The over-arching design feature of the human meaningmaking capacity amounts to two distinct representational

### REFERENCES


systems: the conceptual system and the linguistic system. Each system contributes to meaning construction in qualitatively distinct ways. The second is, given that the two systems are representational—they are populated by semantic representations—the nature and function of the representations are qualitatively different. After all, as a linguistic system has a different function, vis-à-vis the conceptual system, which is of far greater evolutionary antiquity, then the semantic representations are complementary, and as such, qualitatively different, reflecting the functional distinctions of the two systems, in collectively giving rise to meaning.

And finally, language itself is adapted to the conceptual system—the semantic potential—that it marshals in the meaning construction process. Hence, a linguistic system itself exhibits a bifurcation, in terms of the symbolic resources at its disposal. This relates to two distinct reference strategies available to linguistic symbols: words-to-world reference and words-towords reference. In slightly different terms, this design feature of language amounts to a distinction between a lexical subsystem, and a grammatical subsystem.

The overall conclusion to emerge from this discussion is the following. The ideational function of language—its communicative potential—is, in large measure, a function of the way in which it is adapted to, and interfaces with the conceptual system. Rather than language being a distinct module or faculty of mind, it subserves meaning construction through a close and symbiotic relationship with the conceptual system: it has evolved and is designed to exploit those nonlinguistic representations for purposes of linguistically mediated communication. But to achieve this, it has evolved a means of words-to-words symbolic reference—a grammatical capacity which appears to be a species-specific trait. And it is the parametric knowledge units, associated with morphosyntax and lexical items, that enables our species to harness the otherwise mute semantic potential of concepts in order to convey meaning.

# AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Evans. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fpsyg-07-00156 February 17, 2016 Time: 20:7 # 12

# An Evaluation of Universal Grammar and the Phonological Mind<sup>1</sup>

*Daniel L. Everett\**

*Department of Arts and Sciences, Bentley University, Waltham, MA, USA*

This paper argues against the hypothesis of a "phonological mind" advanced by Berent. It establishes that there is no evidence that phonology is innate and that, in fact, the simplest hypothesis seems to be that phonology is learned like other human abilities. Moreover, the paper fleshes out the original claim of Philip Lieberman that Universal Grammar predicts that not everyone should be able to learn every language, i.e., the opposite of what UG is normally thought to predict. The paper also underscores the problem that the absence of recursion in Pirahã represents for Universal Grammar proposals.

Keywords: phonology, recursion, universal grammar, linguistic universals, syntax

# INTRODUCTION: TWO CONCEPTIONS OF LANGUAGE

#### *Edited by:*

*N. J. Enfield, The University of Sydney, Australia*

#### *Reviewed by:*

*Sascha Sebastian Griffiths, Queen Mary University of London, UK Robert Van Valin Jr., Heinrich-Heine-Universität Düsseldorf, Germany*

> *\*Correspondence: Daniel L. Everett deverett@bentley.edu*

#### *Specialty section:*

*This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology*

*Received: 01 December 2015 Accepted: 06 January 2016 Published: 08 February 2016*

#### *Citation:*

*Everett DL (2016) An Evaluation of Universal Grammar and the Phonological Mind. Front. Psychol. 7:15. doi: 10.3389/fpsyg.2016.00015*

From Panini in India to Plato in Greece, scholars have for centuries studied human language to reveal the essence of human nature (cf. Everett, accepted for recent arguments against the very idea of "human nature")2 . Simplifying somewhat, the modern study of language has investigated how languages diverge over time (diachronic linguistics). It examines the physical properties of speech sounds, borrowing from physiology and physics to understand how sounds are made, how they are transmitted across a medium, how they are heard, and what their articulatory and physical properties are both in isolation and in context (phonetics). This scientific tradition has also examined how larger spans of sounds are organized into a phonology (syllables, "feet," and so on). It also investigates word-formation (morphology), how sentences are put together (syntax), how stories are structured (discourse theory), what meaning is and how it interacts with language forms (semantics), and how language is shaped via the apex of human linguistic development – conversations (pragmatics). And it asks about the universality of its findings.

From this rich history of linguistic studies, we have reached a divide: some researchers believe that language structures emerge from universal principles of grammar. For the former, as early as the late 17th century, culminating in the Port Royal grammar of 1660, linguists and philologists began to postulate a universal base for human languages. Such researchers made the case that all languages likely trace back to some original blueprint. Then in the latter half of the 20th century, Noam Chomsky took on the challenge of understanding and investigating this blueprint, by looking to biology as the source of grammar, proposing that all languages are simply local manifestations of a biologically transmitted Universal Grammar.

Other researchers believe that grammar is in essence a local phenomenon. This is the alternative I explore in what follows. In this view language and its components (grammar, phonetics,

<sup>1</sup>This paper is excerpted in large part from Everett (accepted) forthcoming. The section against the phonological mind, for example, is almost verbatim from that larger work.

<sup>2</sup>Of course, the idea that there is no human nature is not original with me. See, e.g., Marx's (1845/1976) "Thesen über Feuerbach."

phonology, semantics, and so on) are perceived as local, culturalcommunicational outputs, with little or no evidence for a genetic blueprint for grammar. I largely approach this issue negatively in what follows, arguing against UG proposals, from the perspective of Everett (2012a; see also Everett, accepted, in progress, among others). I argue that there is no evidence for UG, not even from the most articulated grammatical proposals in its favor to date, Hauser et al. (2002) and Berent (2013a), along the way indirectly supporting my own theory (see references) that language is a tool shaped by culture (among other things) for communication. This is by no means a novel position, though it is a path less followed. It represents in fact the traditional position of the most influential North American linguists of the early 20th century, Franz Boas and Edward Sapir. Much of what follows draws heavily from arguments and texts provided in Everett (accepted), updated where appropriate.3

## "MAN-IN-A-CAN" VIEW OF LANGUAGE

The modern idea of Universal Grammar (UG) emerged from Chomsky's work. The basic thesis of UG is that there is something about the genetic component of human nature that guarantees that there will be a core of "knowledge" common to all humans. If so, then languages are essentially the same and only superficially different.4

Yet an often overlooked, genetic criticism of UG, raised by Lieberman (2013, 56ff) is that UG predicts the opposite of what it is claimed to predict. UG was proposed to account for a hypothesized (never demonstrated) acquisitional homogeneity, which children across cultures are said to achieve for their native languages, as well as for the fact that all languages are built on the same grammatical plan, a plan located somehow, somewhere in the human genome (in a way that has never been specified in the literature). As Lieberman points out, however, if language were actually specified on the genes, it would be subject to mutations, presenting a non-trivial problem for UG.

To take a concrete example, consider one commonly assumed "parameter" of UG, the so-called "pro-drop" parameter – intended to account for the ability of speakers of a language to omit overt subjects from sentences. Thus in Portuguese, a pro-drop language via a single gene (unlikely) or relationships among multiple genes, one can utter "Está chovendo" while in English the literal translation *Is raining* is ungrammatical. Instead, English speakers must say "It is raining," for the reason that English apparently requires subjects and thus lacks "prodrop."5 The question that arises is whether it is possible for there to be a mutation that would prevent a particular person from learning a pro-drop language. Such a mutation might subsequently spread through a population via genetic drift or some such, though that is not crucial. We need only to find a single individual that cannot learn English-like or Portugueselike languages, with no other cognitive deficit – (simplifying) if we assume that pro-drop is a genetically based parameter. This is a valid question to put to any nativist theory. In fact, rather than view this negatively, it can be seen positively – a strong prediction by the theory of UG.6 It would strongly support UG to find an individual or a population whose only "cognitive quirk" were the inability to learn pro-drop.7

A UG-proponent might rebut this argument, however, by claiming that the "language instinct" is an organ and is no more subject to mutation than arms, legs, livers, hearts, etc. But all *are* subject to mutations. There are many genetic disorders of the body and brain (e.g., sickle-cell anemia, dwarfism, autism, and so on). Such disorders are usually fatal or produce reductions in offspring and thus are not selected for. Sickle-cell anemia, for example, shortens the lives of carriers relative to healthy people (not good) but lengthens it relative to people stricken with malaria (good). It spreads through a population in spite of the unpleasant end it brings to its hosts, because it nonetheless provides local advantages. In language a local advantage might be to learn one's parents' language more quickly, even at the expense of being able to learn other languages. There are indeed mutations responsible for people being born with different genes for body shape, etc. In other words, and quite ironically, if grammar is carried on the genes, then the strongest evidence for UG would be the discovery that *not all people may be able to learn every language.*<sup>8</sup>

There are claims for some mutations in language by proponents of UG, but these are not the same. For example, consider the following quote from Reboul (2012, p. 312):

*"I would like to end this paper by discussing one of Everett's claims regarding the non-biological nature of language... if language is biological, one would expect to find "culture-gene mutations affecting specific languages of the world" (Everett, 2012a, p. 42) and these do not exist. In fact, recent findings (see Dediu and Ladd, 2007; Nettle, 2007) suggest that such mutations exist. Dediu and Ladd established a strong correlation between (geographically dispersed) tone-languages and allele frequencies for two genes (ASPM and Microcephalin) in the populations speaking those languages as compared with speakers of non-tonal languages. The interpretation is that these specific alleles would facilitate the learning of tonal languages through better acoustic discrimination."*

The idea that language (I-language, grammar, etc.) is carried by the genes definitely predicts that it is subject to mutations. This is not an argument – it is an *entailment* of nativist

<sup>3</sup>There are many other sources, however. Evans (2014)is another important source, one of the very best, of arguments against Universal Grammar. But see also Sampson (1999) and Tomasello (2005), among others.

<sup>4</sup>Often UG is confused in the popular media with "Deep Structure," perhaps owing in part to the work on the "Universal Base Hypothesis" of the Generative Semanticists, the forefathers of functionalism. Though many linguists laugh at this confusion, it is somewhat understandable.

<sup>5</sup>This is not entirely true, of course. One can say, while dying in a movie, for example (in the indicative mood), "Must save Susan" or "Feel no hunger," etc.

<sup>6</sup>Note that specific theories which assume UG make many predictions, though none that UG itself is causally implicated in so far as I can tell.

<sup>7</sup>But, in fact, there is not a shred of evidence for this. In fact, there is no evidence at all for any language-specific cognitive deficit (not even the Whorfian-labeled "Specific Language Impairment," Everett, in progress).

<sup>8</sup>Moreover, given what we know about "dual inheritance theory" and other examples of quick genetic changes due to cultural pressures, e.g., lactose tolerance in populations where milk is a part of the diet beyond infancy – a mutation that spread within the past 6,000 years, well within the time frame of, say, pro-drop (which goes back to Indo-European), there is nothing implausible about quick changes.

theory. And cultures, as Everett (2012a) points out, provide one source of selectional pressure. What counts against nativism of the Chomskyan variety is the clear failure of this prediction. Moreover, the "counterexample" to my claim that Reboul provides merely strengthens my case.

The claim that Reboul is supposed to be criticizing is the idea that one population could, through selection of some genetic features of language, be unable to learn the language of another population. The findings of Dediu and Ladd, if true (and I doubt it), far from falsifying my claim support it. This is because they show that evolution can enhance perception by human populations of the phonological/phonetic forms that they commonly use. Their results apparently show that some populations speaking tonal languages become better at perceiving tones than others. But this contradicts my claim not at all, because *all* languages use pitch. Therefore, this enhancement would benefit all speakers of all populations and could not become the basis for one population losing the ability to learn the language of another. However, this gets us back to our original question. If this tonal restriction were indeed an example of a cultural (speaking a tone language) pressure affecting one's genes, then the absence of the opposite effect, the principle prediction of Chomskyan Universal Grammar – a genetic mutation that would render one population unable to learn the grammar of another – becomes even more mysterious.

Unfortunately, the most serious problem for UG is that as the years have passed, it has reached the point that it is vague and it makes no predictions about language proper – it is disconnected from empirical content. For example in response to a now famous paper by Evans and Levinson (2009), "The Myth of Language Universals," the UG community objected to the idea that UG predicts universals in Evans and Levinson's "naive" sense. Critics claimed that Evans and Levinson confused UG with Greenbergian universals (as discussed below).

To give a closer-to-home, concrete illustration of a lack of empirical constraints on the content of Chomskyan linguistics, let's look at the so-called Pirahã recursion debate. I have in past publications (see especially Everett, 2005, Everett, 2012a,b) criticized Noam Chomsky's claim that all languages are built on a recursive grammatical procedure he calls "Merge," defined as in (1):

(1) Merge (α, β) → {α, {α, β}}.

If α is a verb, e.g., 'eat' and β a noun, e.g., 'eggs,' then this will produce a verb phrase (i.e., where alpha is the head of the phrase), 'eat eggs.' As I said in Everett (2012b), "The operation Merge incorporates two highly theory-internal assumptions that have been seriously challenged in recent literature (see Everett, accepted, in progress). The first is that all grammatical structures are binary branching, since Merge can only produce such outputs. The second is that Merge requires that all syntactic structures be endocentric (i.e., headed by a unit of the same category as the containing structure, e.g., a noun heading a noun phrase a verb a verb phrase, etc.).

My criticism is based on the fact that the Amazonian language, Pirahã, among others (see Kornai, 2014; Jackendoff and Wittenberg, in preparation), lacks recursive structures (Everett, 2005, Everett, 2012b; Futrell et al., in preparation) – and thus, a fortiori, Merge. My claim is that the absence of recursion is the result of cultural values, rather than a culture-independent grammar. One of the most common objections raised to this criticism of Chomskyan theory is that the superficial appearance of lacking recursion in a language does not necessarily mean that the language could not be derived from a recursive process like Merge. There are ways to rescue the theory. And of course this is correct.

From this latter observation, some conclude that the (misguided in their perspective) suggestion that Piraha represents a problem for Chomskyan theory is due to the failure distinguish between Greenbergian vs. Chomskyan universals. Greenbergian universals (Greenberg, 1966) have always referred to linguistic phenomena that can actually be observed (and thus easily falsified). These claims are tightly constrained empirically.

On the other hand, Chomskyan universals are quite different because they are never directly observable. Chomsky's concept of universals is that they are restrictions on language development, not necessarily observed directly in actual surface structures of languages. Formal universals are grammatical principles or processes or constraints common to all languages – that is, supposedly following from UG – at some level of abstraction from the observable data. These abstractions can only be appreciated, it seems, by the appropriate theoretician. Unfortunately, this makes formal universals difficult to falsify because they can always be saved by abstract, unseen principles or entities, e.g., so-called "empty categories."

In this sense, the Chomskyan claim regarding recursion (Hauser et al., 2002) would be that all languages are formed by a recursive process, even though the superficial manifestation of that process may not look recursive to the untrained eye. A language without Merge would lack utterances of more than two words according to Chomsky (by this strange reasoning, all utterances greater than three words would support Chomsky, a rather low threshold of evidence). So long as we can say that a sentence is the output of Merge, limited in some way, then it was produced recursively, even though superficially non-recursive. The Greenbergian perspective, on the other hand, would be that either you see recursion or it is not there.

Both positions are completely rational and sensible. But, as I have said, the Chomskyan view renders the specific claim that all languages are formed by Merge untestable. In Chomsky's earlier writings he claimed that if two grammars produce the same surface strings (weak generative capacity), we could still test them by examining the predictions of the structures they predict for the strings (strong generative capacity). Since my work on Piraha recursion (as well as Wari'; Everett and Kern, 1997; Everett, 2009) has shown that the predictions Merge makes are problematic (falsified if that were possible with such abstract universals), I have dealt exclusively with strong generative capacity. On the other hand, a linguist could add ancillary hypotheses to their accounts in order to save Merge, entailing two consequences: (i) Merge loses all predictive power and (ii) Merge provides a longer, hence less parsimonious, account of the same structures (Everett, 2012b).

Nativism, again, is the idea not only that we are innately capable of language (everyone surely believes this), but that our capabilities are specific to particular domains, e.g., grammar. Now veterinarians who artificially inseminate animals, such as thoroughbreds or other competitive breeds, occasionally refer to their metal-encased syringes of semen as "man-in-a-can." This is a good metaphor for some theories like UG, which place the development of human abilities in the genes rather than the environment, i.e., those that lean strongly to the nature side of the nature-nurture continuum, predicting that all languages emerge from the same biological can.

Though I have argued (Everett, 2012a) that there is no convincing evidence for UG from universals, acquisition, nor language deficits, some have countered such arguments by claiming that "emerging" languages (creoles, Nicaraguan Sign Language, Homesigns, and so on) manifest UG principles that could not have been learned. Everett (2012a, 2015) argues that they show nothing of the sort.

Stepping back a bit, it is clear that all creatures have instincts or innate capacities. Even so, the evidence presented for such capacities is often weak. This is particularly true for claims on cognitive nativism. In fact, if Everett (accepted) is correct, then higher-level cognitive capacities in *Homo sapiens* are the least likely places to find instincts. If one is claiming that a cognitive characteristic is innate or an instinct, they must do the following at a minimum:


If you can't meet these minimal requirements, talk of instincts, UG, nativism, etc. is premature. Yet because almost no claim for instincts gets beyond 1, as Blumberg (2006, p. 205) says, such talk is "bedtime stories" for adults (see also: http://www. pointofinquiry.org/mark\_blumberg\_freaks\_of\_nature/).

What does this mean? It means that if you see claims for a morality instinct, an art instinct, a language instinct, etc. you are reading nothing more speculation, unless it gets significantly beyond level 1 above. I am not aware of any that do.

To offer a more detailed example of the shortcomings of UG proposals, let's consider the research program developed by Iris Berent on "the phonological mind." My theory of "dark matter" (Everett, accepted) implies that instincts should be minimized in *Homo sapiens*. This is not because instincts are incompatible with culture or dark matter as defined in Everett (accepted). Rather, they simply become less relevant to our understanding. If humans learn from and participate in their surroundings and language, then it turns out that instincts become less compelling (Prinz, 2012, 2013). Of course, the concept of instincts is common enough in the literature on animal behavior, in Evolutionary Psychology, as well as in Chomskyan linguistics. At the same time, everyone agrees general learning is responsible for at least some of how people come to learn about the world, their society, and themselves. My claim (Everett, accepted) is that, given our capacity for general learning, that instincts complicate the picture of human development, going against the inherent cognitive and cerebral plasticity of the species. In my view, appeal to epistemological nativism should be excised by Occam's Razor.

In what follows, I want to give a concrete example of what I mean by discussing and rejecting recent work on phonological nativism (Berent, 2013a). To anticipate somewhat, the problems faced by all nativist proposals include the following: (i) the non-linear relationship of genotype to phenotype; (ii) failure to link "instincts" to environment – today's instincts are often tomorrow's learning, once we learn more about the environmental pressures to acquire certain knowledge; (iii) problematic definitions of innateness; (iv) failure to rule out learning before proposing an instinct; (v) the unclear content of what is left over for instincts after acquired dark matter (all tacit knowledge) is accounted for; (vi) lack of an evolutionary account for the origin of the instincts.

In Berent's (2013a) *The Phonological Mind*, the author argues in detail for apparently innate preferences for some sounds and sound sequences (and signs and sign sequences) in all languages. I want to briefly review the more detailed criticisms of Everett (accepted) of her proposals, limited to a small portion of her monograph.9 From the outset we should observe that the most serious shortcomings of her notion of innate phonological knowledge, in fact a problem for all nativist theories, is the "origin problem." The question needing to be answered is "Where did the phonological knowledge come from?" Without an account of the *evolution of an instinct*, proposing nativist hypotheses is pure speculation. Rather, at best, we can take nonevolutionary evidence for an instinct as explanada rather than explanans. Berent's specific proposal is that her experimental results from English, Spanish, French, and Korean support her proposal that there is knowledge of some type that leads to a sonority sequencing generalization (SSG) inborn in all *Homo sapiens*.

To understand her arguments, however, we must first understand the terms she uses, beginning with "sonority." Sonority is the property of one sound being inherently louder than another sound. For example, when the vowel [a] is produced in any language the mouth is open wider than for other vowels and, like other vowels, [a] offers very little impedance to the flow of air out of our lungs and mouths. This makes [a] the loudest sound relatively speaking of all phonemes of English. A sound with less inherent loudness, e.g., [k] is said to be less sonorous. Several of Berent's experiments demonstrate that speakers of all the languages she tested, children and adults,

<sup>9</sup>Much of the following is taken from Everett (accepted).

Since [a] is the most sonorous element, it is in the nucleus position. [s] and [t] are at the margins, onset and coda, as they should be. Now take the hypothetical syllables, [bli] and [lbi].

Both [bli] and [lbi] have what phonologists refer to as "complex onsets," multiple phonemes in a single onset the same can happen with codas as with "pant" in which [n] and [t] form a complex coda. Now, according to the SSG, since [b] is less sonorous than [l], it should come first in the onset. This means that [bli] is as a well-formed syllable should be, i.e., organized from least sonorant/sonorous segment to most sonorous, [i], and then, if there were a coda, to a segment less sonorous than [i] (softer → louder). Therefore, the correct syllabic organization is shown in the following diagram (**Figure 2**).

Such preferences emerge even when the speakers' native languages otherwise allow grammatical strings which appear to violate the SSG. Since the SSG is so important to the work on a phonological instinct, we need to take a closer look at it. To make it concrete, let's consider one proposal regarding the so-called sonority hierarchy (as we will see, not only do many phoneticians consider this hierarchy to be a spurious observation, but it is also inadequate to account for many phonotactic generalizations, suggesting that not sonority but some other principle is behind Berent's experimental results).10 One form of this hierarchy comes from Selkirk (1984; from most sonorant on left to least on right):

<sup>10</sup>Sonority is a formal property of sounds in which it is easier to produce "spontaneous voicing (vibration of the vocal folds while producing the sound)," though the lay person can refer to sonority as relative loudness with little loss of accuracy.

[a] *>* [e o] *>* [i u] *>* [r] *>* [l] *>* [m n N] *>* [z v ð] *>* [s f θ] *>* [b

d g] *>* [p t k]. The hierarchy has often been proposed as the basis for the SSG, which might also be thought of as organizing syllables left to right into a crescendo, peak, and decrescendo of sonority, going from the least sonorant (least inherently loud) to the most sonorant (most inherently loud) and back down, in inverse order, to the least sonorant (in fact, I was once a proponent of the SSG myself. See Everett (1995, for a sustained attempt to demonstrate the efficacy of this hierarchy in organizing Banawá syllable structure).

Without reviewing all of her experimental results (which all roughly show the same thing – preference in subjects for the SSG in some conditions), consider the following evidence that Berent (2013b, p. 322) brings to bear:

*"... Syllables with ill-formed onsets (e.g., lba) tend to be systematically misidentified (e.g., as leba)—the worse formed the syllable, the more likely the misidentification. Thus, misidentification is most likely in lba followed by bda, and is least likely in bna. Crucially, the sensitivity to syllable structure occurs even when such onsets are unattested in participants' languages, and it is evident in adults [64,67–70,73] and young children..."*

Again, as we have seen, a licit syllable should build from least sonorant to most sonorant and then back down to least sonorant, across its onset, nucleus, and coda. This means that while [a] is the ideal syllable nucleus for English, a voiceless stop like [p, t, k] would be the least desirable (though in many languages this hierarchy is violated regularly, e.g., Berber). Thus a syllable like [pap] would respect the hierarchy, but there should be no syllable like [opa] (though of course there is a perfectly fine *bisyllabic* German word *opa* "grandpa"). For the latter word, the SSG would only permit this to be syllabified as two syllables [o] and [pa] with each vowel its own syllable nucleus. This is because both [o] and [i] are more sonorous than [p] so [p] must be either the coda or the onset of a syllable in which one of these two vowels is the nucleus.11 Moreover, according to the SSG, a syllable like [psap] should be favored over a syllable [spap]. This gets us to the obvious question of why "misidentification" by Korean speakers is least likely in *bna* (even though Korean itself lacks such sequences)? Because, according to Berent, all humans are born with an SSG instinct.

I do not think anything of the kind follows. To show this, I first want to argue that there is no SSG period, not phonetically, grammatically, or even functionally. Second, I argue that even if we ignored the first argument, i.e., even if some other, better (though yet undiscovered) principle than the SSG were appealed to, the arguments for a phonology instinct do not go through. Third, I offer detailed objections to every conclusion she draws from her work, concluding that there is no such thing as the "phonological mind."

<sup>11</sup>For independent reasons – but reasons that once again show the inadequacy of the SSG, onsets are preferred to codas, thus favoring the syllabification of *o.pa* over *op.a*. The reason that a simple preference such as "prefer onsets" is a problem for the SSG is that the preference clearly shows that SSG is unable to provide an adequate theory of syllabification (at least on its own).

Let's address first the reasons behind the claim that the SSG is not an explanation for phonotactics. The reasons are three: (i) there is no phonetic or functional basis for the generalization; (ii) the SSG that Berent appeals to is too weak – it fails to capture important, near-universal phonotactic generalizations; (iii) the generalization is too strong – it rules out commonly observed patterns in natural languages, e.g., English, that violate it. But then if the SSG has no empirical basis in phonetics or phonology and is simply a spurious observation, it is unavailable for grammaticalization and therefore cannot serve as the basis for the evolution of an instinct (though, of course, some other concept or principle might be). One might reply that if the SSG is unable to explain all phonotactic constraints, that doesn't mean that we should throw it out. Perhaps we can simply supplement the SSG with other principles. But why accept a disjointed set of "principles" to account for something that may have an easier account based more solidly in phonetics and perception? Before we can see this, though, let's look at the SSG in more detail.

The ideas of sonority and sonority sequencing have been around for centuries. Ohala (1992) claims that the first reference to a sonority hierarchy was in 1765. Certainly there are references to this in the nineteenth and early twentieth centuries. As Ohala observes, however, references to the SSG as an explanation for syllable structure are circular, descriptively inadequate, and not well-integrated with other phonetic and phonological phenomena.

According to Ohala, both the SSG and the syllable itself are theoretical constructs that lack universal acceptance. There is certainly no complete phonetic understanding of either, a fact that facilitates circularity in discussing them. If we take a sequence such as *alba*, most phonologists would argue that the word has two syllables, and that the syllable boundary must fall between /l/ and /b/, because the syllable break a.lba would produce the syllable [a], which is fine, but also the syllable [lba] which violates the SSG ([l] is more sonorous than [b] and thus should be closer to the nucleus than [b]). On the other hand, if the syllable boundary is al.ba, then both syllables respect the SSG, [al] because [a] is a valid nucleus and [l] a valid coda and [ba] because [b] is a valid onset and [a] is a valid nucleus. The fact that [l] and [b] are in separate syllables by this analysis means that there is no SSG violation, which there was in [a.lba]. Therefore, SSG guides the parsing (analysis) of syllables. However, this is severely circular if the sequences parsed by the SSG then are used again as evidence for the SSG.

The SSG is also descriptively inadequate because it is at once too weak and too strong. For example, most languages strongly disprefer sequences such as /ji/, /wu/, and so on, or, as Ohala (1992, p. 321) puts it "... offglides with lowered F2 and F3 are disfavored after consonants with lowered F2 and F3."12*,*<sup>13</sup> Ohala's generalization here is vital for phonotactics crosslinguistically and yet it falls outside the SSG, since the SSG allows all such sequences. This means that if a single generalization or principle, of the type Ohala explores in his article, can be found that accounts for the SSG's empirical range plus these other data, it is to be preferred. Moreover, the SSG would then hardly be the basis for an instinct and Berent's experiments would be merely skirting the edges of the real generalization. As we see, this is indeed what seems to be happening in her work. The SSG simply has no way of allowing a *dw* sequence, as in *dwarf* or *tw* in *twin* while prohibiting *bw*. Yet [dw] and [tw] are much more common than [bw], according to Ohala (though this sequence is observed in some loanwords, e.g., *bwana*), facts entirely missed by the SSG.

Unfortunately, Berent neither notices the problem that such sequences raise for the SSG "instinct" nor does she experimentally test the SSG based on a firm understanding of the relevant phonetics. Rather, she assumes that since the SSG is "grammaticalized" and now an instinct the phonetics no longer matter. But this is entirely circular. Here, the lack of phonetic experience and background in phonological analysis seem to have led to hasty acceptance of the SSG, based on the work of a few phonologists, without careful investigation of its empirical adequacy. This is a crucial shortcoming when it comes to imputing these behaviors to "core knowledge" (knowledge that all humans are hypothesized to be born with). It hardly needs mentioning, however, that a spurious observation of a few phonlogists is not likely to serve as an instinct.

To take another obvious problem for the SSG, sequences involving syllable-initial sibilants are common crosslinguistically, even though they violate the SSG. Thus the SSG encounters problems in accounting for English words like "spark," "start," "skank," etc. Since [t], [k], [p] – the voiceless stops – are not as loud/sonorous as [s], they should come first in the complex onset of the syllable. According to the SSG, that is, [psark], [tsart], should be grammatical words of English (false) while [spark], [start], etc. should be ungrammatical – also false. Thus the SSG is too strong (incorrectly prohibits [spark]) and too weak (incorrectly predicts [psark]) to offer an account of English phonotactics. Joining these observations to our earlier ones, we see that the SSG not only allows illicit sequences such as /ji/ while prohibiting perfectly fine sequences such as /sp/, it simply is not up to the task of English phonotactics more generally. And although many phonologists have noted such exceptions, there is not way to handle them except via ancillary hypotheses (think "epicycles") if the basis of one's theory of phonotactics is the SSG.

I conclude that Berent's phonology instinct cannot be based on the SSG, because the latter doesn't exist. She might claim instead that the instinct she is after is based on a related principle or that the SSG was never intended to account for all of phonotactics, only a smaller subset, and that phonotactics more broadly require a set of principles. Or we might suggest that the principles behind phonotactics are not phonological at all, but phonetic, having to do with relative formant relationships, along the lines adumbrated by Ohala. But while such alternatives might better fit the facts she is invested in, a new principle or set of principles

<sup>12</sup>Formants are caused by resonance in the vocal tract. They are concentrations of acoustic energy around a particular frequency in the speech stream. Different formant frequencies and amplitudes result from changing shapes of the tract. For any given segment there will be several formants, each spaced at 1000 Hz intervals. By resonance in the vocal tract, I mean a place in the vocal apparatus where there is a space for vibration – the mouth, the lips, the throat, the nasal cavity, and so on. 13F2 and F3 refer to the second and third formants of the spectrographic representation or acoustic effects of producing sounds.

cannot rescue her proposal. This is because the evidence she provides for an instinct fails no matter what principle she might appeal to. To see why let's consider what Berent infelicitously refers to (Berent, 2013b, p. 320) as "the seven wonders of phonology." She takes all of these as evidence for "phonological core knowledge." I see them all as red herrings, rather than as evidence for a phonological mind or an instinct. These "wonders" are:


These are worth exploring, however, because Berent's work is a model for other claims of grammatical innateness and far better articulated than most. Therefore, let's consider each of them in turn.

"Algebraic rules" are nothing more than the standard rules that linguists have used since Panini (4th century BCE). For example, Berent uses an example of such a rule that she refers to as the "AAB rule" in Semitic phonologies. In Semitic languages, as is well-known, consonants and vowels mark the morphosyntactic functions of words, using different spacings and sequences (internal to the word) of Cs or vs. based on conjugation or *binyanim* – the order of consonants and intercalated vowels. An example of what the variables here are illustrated below:

Modern Hebrew

CaCaC katav 'write' niCCaC niršam 'register' hiCCiC himšix 'continue' CiCeC limed 'teach' hitCaCeC hitlabeš 'get dressed.'

In other languages such functions would most frequently be marked by suffixes, infixes, prefixes, and so on. So, clearly, taking only this single, common example, variables are indeed found in phonological rules.

Now, in Berent's AAB rule (more precisely, it should be stated as a constraint "∗AAB," where ∗ indicates that the sequence AAB is ungrammatical) is designed to capture the generalization that the initial consonants of a word cannot be the same. Thus a word like ∗*sisum* would be ungrammatical, because the first two consonants are /s/ and /s/, violating the constraint. The constraint is algebraic because A and B are variables ranging across different phonological features (though A must be a consonant). But calling this an algebraic rule and using this as evidence for an instinct makes little sense. Such rules are regularly learned and operate in almost every are of human cognition. For example, one could adopt a constraint on dining seating arrangements of the type <sup>∗</sup>G1G1X, i.e., the first two chairs at a dinner table cannot be occupied by people of the same gender (G), even though between the chairs there could be flower vases, etc. Humans learn to generalize across instances, using variables frequently. Absolutely nothing follows from this regarding instincts.

Universality is appealed to by Berent as further evidence for a phonology instinct. But as any linguist can affirm (especially in light of controversies over how to determine whether something is universal or not in modern linguistic theory), there are many definitions, uses, and abuses of the term "universality" in linguistics. For example, some linguists, e.g., Greenberg (1966) and Evans and Levinson (2009) argue that for something to be meaningfully universal, it actually has to be observable in every language. That is, a universal is a concrete entity. If it is not found in all languages, it is not universal. That is simple enough, but some linguists, e.g., Chomsky (1986), prefer a more abstract conception of universal such that for something to be universal it need only be available to human linguistic cognition. This set of universal affordances is referred to as the "toolbox." I have argued against this approach in many places, for being imprecise and often circular (in particular Everett, 2012a,b). But in any case, Berent clearly follows the notion of "universal" advocated by Chomsky and Jackendoff, inter alia. Such universals need not be observed in all languages. Thus Berent would claim that the SSG is universal, not because it is obeyed in all its particulars in every language – like me, she would recognize that English allows violations of the SSG – but because her experiments with speakers of various languages show that they have preferences and so on that seem to be guided by knowledge of the SSG, even when their own native languages do not follow the SSG in particulars or have a simple syllable structure that is by definition unable to guide their behavior in experiments. If a Korean speaker, for example, shows preference for or perceptual illusions with some onset clusters and not others – in spite of the fact that there are no such clusters in Korean (and thus s/he could not have learned them, presumably), then this shows the universality of the SSG (as part of the linguistic toolbox).

But there is a huge leap taken in reasoning from this type of behavior to the presence of innate constraints on syllable structure. For example, there are phonetic reasons why Korean (or any) speakers prefer or more easily perceive, let us say, [bna] sequences rather than [lba], even though neither sequence is found in Korean. One simple explanation that comes to mind (and highlighted by phoneticians, though overlooked by many phonologists), is that the sequence [bna] is easier to perceive than [lba] because the interconsonantal transition in the onset of the former syllable produces better acoustic cues than in the second. Berent tries to rule out this kind of interpretation by arguing that the same restrictions show up in reading. But reading performance is irrelevant here for a couple of reasons. First, we know too little about the relationship between speaking and reading cognitively to draw any firm conclusions about similarity or dissimilarities in their performance to use as a comparison, in spite of a growing body of research on this topic. Second, in looking at new words speakers often try to create the phonology in their heads and so this "silent pronunciation" could guide such speakers' choices, etc. Everyone (modulo pathology) has roughly the same ears matched to roughly the same vocal apparatus. Thus although phonologies can grammaticalize violations of functionally preferable phonotactic constraints, one would expect that in experiments that clearly dissociate the experimental data from the speaker's own language, the functionality of the structures, e.g., being auditorily easier to distinguish, will emerge as decisive factors, accounting for speakers' reactions to nonnative sequences that respect or violate sonority sequencing, etc. In fact, there is a name for this, though with a somewhat different emphasis, in Optimality Theoretic Phonology (Prince and Smolensky, 1993/2004; McCarthy and Prince, 1994) – the "emergence of the unmarked." So there is nothing special I can see about the universality of these preferences. First, as we have seen, the SSG is not the principle implicated here, because there is no such principle. It is a spurious generalization. Second, local phonologies may build on cultural preferences to produce violations of preferable phonetic sequences, but the hearers are not slaves to these preferences. Let us say that a language has a word like "lbap." In spite of this, the phonetic prediction would be that in an experimental situation, the speakers would likely prefer "blap" and reject "lbap," since the former is easier to distinguish clearly in a semantically or pragmatically or culturally neutral environment. In other words, when asked to make judgments in an experiment about abstract sequences, it is unsurprising that the superiority of the functionality of some structures emerges as decisive. Such motivations reflect the fact that the ear and the vocal apparatus evolved together. Therefore, what Berent takes to be a grammatical and cognitive universal is neither, but rather a fact about perceptual ability, unrelated to a phonology instinct.

Next, Berent talks about "shared design." This is just the idea that all known phonological systems derive from similar phonological features. But this is not a "wonder" of any sort. There is nothing inherently instinctual in building new phonological systems from the same vocal apparatus and auditory system, using in particular the more phonetically grounded components of segmental sequencing.

Another purported "wonder" is what Berent refers to as "scaffolding." This is nothing more than the idea that our phonologies are reused. They serve double duty – in grammar and as a basis for our reading and writing (and other related skills). This is of course false in much of some writing systems (e.g., Epi-Olmec hieroglyphics, where speaking and writing are based on nearly non-overlapping principles). In fact "reuse" is expected in cognitive or biological systems to avoid unnecessary duplication of effort. It is not only a crucial feature of brain functioning (Anderson, 2014), but it is common among humans to reuse technology – e.g., the use of cutting instruments for a variety of purposes, from opening cans to carving ivory. Therefore, reuse is a common strategy of cognition, evolution, resource management, and on and on, and is thus orthogonal to the question of instincts.

Next, Berent talks about "regenesis," the appearance of the same (apparently) phonological principles in new languages, in particular when principles of spoken phonology, e.g., the SSG according to Berent, show up in signed systems. The claim is that the SSG emerges when humans generate a new phonological system *de novo*. But even here, assuming we can replace the invalid SSG with a valid principle, we must use caution in imputing "principles" to others as innate knowledge. We have just seen, after all, how the particular phonetic preference Berent calls the SSG could occur without instincts.

But even if we take her claims and results and face value, "regenesis" still offers no support for nativism. In spoken languages, the notion simply obscures the larger generalization or set of generalizations that *people always prefer on the bestsounding sequences perceptually*, even when cultural effects in their native languages override these. Berent again attempts to counter this with research on sequences of signs in signed languages. Yet there is no sound-based principle in common between signed and spoken languages – by definition, since one lacks sounds altogether and the other lacks signs. Both will of course find it useful to organize word-internal signs or sounds to maximize their perceptability, but no one has ever successfully demonstrated that signed languages have "phonology" in the same sense as spoken languages. In fact, I have long maintained that, in spite of broadly similar organizational principles, sign organization in visual vs. spoken languages are grounded in entirely different sets of features (for example, where is the correlate of the feature "high tone" or F2 transition in signed languages?) and thus that talking of them both as having "phonologies" is nothing more than misleading metaphor.

Another "wonder" Berent appeals to show that phonology is an instinct is the common poverty of the stimulus argument or what she refers to as "early onset." Children show the operation of sophisticated linguistic behaviors early on, so early in fact that a particular researcher might not be able to imagine how it might have been learned, jumping to the conclusion that it must not have been learned but emerges from the child's innate endowment. Yet all Berent shows in discussing early onset is the completely unremarkable fact that children rapidly learn and prefer those sound sequences that their auditory and articulatory apparatuses have together evolved to recognize and produce most easily. This commonality is not linguistic *per se*. It is physical, like not trying to pick up a ton of bricks with only the strength in one's arms. Or, more appropriately, in not using sounds that people cannot hear, e.g., with frequency that only dogs can hear.

Finally, Berent argues for "core phonological knowledge" based on what she terms "unique design." This means that phonology has its own unique properties. But this shows nothing about innate endowment. Burrito-making has its own unique features, as does mathematics, both eminently learnable (like phonology). Berent's discussion fails to explain whey these unique features could not have been learned, nor why the would be any evolutionary advantage such that natural selection would favor them.

Summing up to this point, Berent has neither established that speakers are following sonority organization that is embedded in their "core knowledge," nor that her account is superior to more intuitively plausible phonetic principles. Nor are any of her "seven wonders of phonology" remotely wondrous.

And yet, in spite of all of my objections up to this point, there is a far more serious obstacle to accepting the idea of a phonological mind, mentioned at the outset of this discussion. This is what Blumberg (2006) refers to as the problem of "origins" which we have mentioned and which is discussed at length in several recent books (Blumberg, 2006; Buller, 2006; Richardson, 2007; among others) – an obstacle Berent ignores entirely – an all too common omission from proponents of behavioral nativism. Put another way, how could this core knowledge have evolved? More seriously, relative to the SSG, how could an instinct based on any related principle have evolved? As we have seen, to answer the origins problem, Berent would need to explain (as Tinbergen, 1963 among others, discusses at length) the survival pressures, population pressures, environment and so on at the time of the evolution of a valid phonotactic constraint – if the trait appears as a mutation in one mind what leads to its genetic spread to others in a population – what was its fitness advantage? In fact, the question doesn't even make sense regarding the SSG, since there is no such principle. But even if a better-justified generalization could be found, coming up with any plausible story of the origin of the principle is a huge challenge, as are definitions of innate, instinct, and the entire line of reasoning based on innate knowledge, inborn dark matter.

# REFERENCES


# CONCLUSION

In this paper I have argued for three points: first, UG makes only one ironic prediction: *not all people should be able to learn all languages*. Second, the most recent incarnation of UG – recursion (Hauser et al., 2002) – is either falsified or it has no empirical content.14 Third, I argue that arguably the most well-developed case for grammatical nativism, Berent (2013a), itself fails to offer convincing evidence for grammatical nativism. Because of the importance and novelty of Berent's arguments, I have spent the majority of the space allotted arguing against her concept of a "phonological mind."

# FUNDING

Funding was provided by Bentley University for this research.

<sup>14</sup> Because Hauser et al. (2002) claim that recursion just is the sole item of the Narrow Faculty of Language (FLN) its absence in a language cannot be equated with the lack of clicks in English or the lack of more than three vowels in Hawaiian. Those who claim this kind of thing would be committing a category-mistake.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Everett. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*