The Grammar of Exchange: A Comparative Study of Reciprocal Constructions Across Languages

Cultures are built on social exchange. Most languages have dedicated grammatical machinery for expressing this. To demonstrate that statistical methods can also be applied to grammatical meaning, we here ask whether the underlying meanings of these grammatical constructions are based on shared common concepts. To explore this, we designed video stimuli of reciprocated actions (e.g., “giving to each other”) and symmetrical states (e.g., “sitting next to each other”), and with the help of a team of linguists collected responses from 20 languages around the world. Statistical analyses revealed that many languages do, in fact, share a common conceptual core for reciprocal meanings but that this is not a universally expressed concept. The recurrent pattern of conceptual packaging found across languages is compatible with the view that there is a shared non-linguistic understanding of reciprocation. But, nevertheless, there are considerable differences between languages in the exact extensional patterns, highlighting that even in the domain of grammar semantics is highly language-specific.

Linguists have long known that concepts of reciprocity are expressed in various ways through the structure of language: from lexicon ("feast," "exchange"), to special morphology in some languages, to full-blown grammatical constructions (e.g., "gave to each other," "shook one another's hands"). Indeed, many languages have grammatical constructions evolved specially for the purpose of expressing reciprocal actions and reciprocal states (e.g., "loved one another"). By reciprocal construction we mean here a grammatical frame or template that has the expression of reciprocity as at least part of its central functions. In the case of reciprocity, many languages have constructions based on the nominal model, like English each other. Other languages, however, encode the same or a similar concept by means of a verbal affix. Most languages have more than one construction for expressing reciprocity.
The question we ask here is to what extent the meanings underlying such constructions are actually similar. We focus on grammatical constructions rather than lexical resources for a number of reasons. First, there has been considerable qualitative linguistic research on the meaning of reciprocal constructions, mostly in the familiar European languages. Logicians and semanticists have noted, for example, that although English each other looks just like a complex noun, in fact it operates as a complicated quantifier: "John and Mary hit each other" in the canonical case means roughly x hit y, where John and Mary are permuted through the variable slots (John and Mary take turns as both subject and object). But they have puzzled over how "they sat next to each other in a row" can possibly satisfy the approximate meaning above (for all x and y, x sat next to y and y sat next to x). This prior qualitative work provides some semantic parameters underlying reciprocal semantics that we can use to investigate cross-linguistic semantics in a more systematic way. Second, although languages also IntroductIon Reciprocity lies at the heart of human social life. Apes do not exchange food (although they may permit "tolerated theft" by the young), while humans put commensality at the center of social activities. Much recent theory and research has been dedicated to the origin of human cooperation -a puzzling phenomenon from an evolutionary point of view (see, for example, Boyd and Richerson, 2005;Enfield and Levinson, 2006;Tomasello, 2009). The general consensus is that cooperation and reciprocal exchange could only have evolved in the context of group selection (or extended kin selection) if the group was a culture bearing unitthat is, was able through social learning to transmit behaviors that were of selective advantage to individual members of the group. Early in the course of human prehistory, over a million years ago when the first cultural traditions of tool manufacture are manifest and thus when we first took the cultural turn as our special mode of adaptation, the predisposition to sharing and reciprocity would have had to have been in place. If this is correct, reciprocity lies deep in our mental makeup, and one would expect to find many manifestations of it in human psychology and behavioral predisposition.
A straightforward prediction is that concepts of reciprocity should be reflected universally in human cognition and language. There ought to be a universal core concept, and its expression in language should be pervasive. But, given that the whole point of culture is to provide a novel means of fast adaptation to different ecological niches, the cultural manifestation of these underlying concepts could be rather diverse (Evans and Levinson, 2009). This paper attempts to test the extent to which the grammaticalized encoding of reciprocity is universal cross-culturally, and to what extent it is malleable.
These are familiar problems facing any cross-cultural study of concepts. Recently there has been a new method developed which offers a solution (Levinson et al., 2003;Levinson and Wilkins, 2006;Majid et al., 2007Majid et al., , 2008). An extensional set of stimuli is designed to sample the denotational space of a semantic domain in a structured way that is cross-linguistically comparable. The strategy is not dissimilar to the celebrated study of color terms by Berlin and Kay (1969), which used a systematic set of color chips to collect speaker descriptions, allowing precise calibration of similarities and differences in color terminology across languages. An advantage of this approach is that it makes minimal presuppositions about the categories in each language, thus allowing precise quantification of the degrees of similarity and differences in categories across languages.
In this paper, we apply this method to a diverse sample of languages to try to answer the question: To what extent are the meanings of reciprocal constructions across languages based on a single, universal core? Given the special role of reciprocity in human evolution discussed above, we would expect a common semantic core crosslinguistically. But, it could be that reciprocity encoded in grammar is a matter of renewed cultural convention in each ecological setting, in which case one might predict considerable differences cross-linguistically. To test these hypotheses we developed a set of movie stimuli, each representing a different kind of reciprocal situation with differing degrees of interaction between participants, asymmetrical as well as symmetrical exchange, simultaneous vs. delayed exchange, etc. Free elicitation data was then collected from speakers of 20 different languages, and the resulting data analyzed with multivariate statistical techniques. In the next sections we describe these methods in turn.

MaterIals and Methods the stIMulus set
The goal of the current study was to compare the extensional range of reciprocal constructions from different languages. In order to provide a basis for comparison a set of videoclips were constructed that depicted a range of reciprocal events (Evans et al., 2004; these videos can be downloaded from http://fieldmanuals.mpi.nl/volumes/2004/reciprocals/). The basis of the stimulus design was previous work by formal semanticists on the nature of reciprocal semantics (e.g., Dalrymple et al., 1998, after Langendoen, 1978. We began with the "canonical reciprocal event" (as in Nedjalkov, 2007), such as "John and Mary hugged each other": There are just two participants (John and Mary), the subevents are simultaneous (John embraces Mary at the same time as Mary embraces John), and symmetrical (what John does to Mary, Mary does to John). Additional parameters were then varied in order to determine how far reciprocal constructions extend out to various conditions which relax the features of the canonical reciprocal situation. The crucial parameters are the number of participants, configuration, symmetry, temporal organization, and event-type (see Table 1; Figure 1). Note that Dalrymple et al. (1994) claim that languages have the same reciprocal semantics over these parameters, so their inclusion in our stimulus set is highly pertinent. Let us take each of these parameters in turn.
The first parameter was the number of participants, which varied between two and eleven. The second parameter configuration comes into operation once multiple participants are involved, since then the possible permutations of who acts on whom can also vary. Where all participants act on each other symmetrically the express reciprocity in lexical categories, the precise event-types that get lexical encoding can be quite diverse, thus making it more difficult to design an adequate test of cross-cultural semantic categories. Focusing on grammatical constructions, bypasses this problem.
In English the main reciprocal encoding strategy is the "each other" construction. This construction in fact can be used felicitously with a wide range of situations, as long noted by linguists (Langendoen, 1978;Dalrymple et al., 1994). For example, "The congregation all shook hands with each other" could mean that each and every pair of persons exchanged handshakes ("strong" reading), but it could mean that there was a general scrum in which all individuals shook many hands but not exhaustively ("melee" reading, or even just that those near to each other shook hands). Even more surprisingly, the actions need not necessarily be reciprocated: Many will find "The woman and the burglar chased each other down the street" acceptable even if only the woman is doing the chasing, and only the burglar the fleeing ("asymmetrical reading"). Notice that "They gave each other books" suggests, but does not require, simultaneity -perhaps they each got a book on their respective birthdays. In addition, we can say "the boxes were stacked on top of each other," an asymmetrical chaining of states rather than actions.
To what extent are all these patterns of meaning extension shared across languages? Is there a core meaning shared by all languages, with variable extension to weaker conditions? Or is there only family resemblance between cultural notions of reciprocity, so that the meanings of the constructions overlap in a chain or mosaic, so two languages at opposite extremes share no common core?
No investigation has hitherto been conducted on a sufficient sample of languages to ascertain which of these alternatives is the case. Earlier work on a few contrastive languages has come to rather opposite conclusions. Dalrymple et al. (1994) examined two unrelated languages, English and Chicheŵ a (a Bantu language), for similarities in meanings of grammaticalized reciprocal constructions. Their investigation led them to conclude that reciprocals share a universal core meaning of "strong" reciprocity (all participants act on all other participants) with systematic relaxations of this core meaning permissible in specific sentential contexts. In a different tradition, Wierzbicka (2009) reviewed reciprocal constructions in English, Polish, Russian, and Japanese and concluded that there are four distinct (although related) prototypes for reciprocity, with details differing subtly from language to language. According to her account these meanings shade from reciprocity to mutuality to joint and collective action, prompting her to ask how realistic it is to choose one of these points as "the" focal definition of reciprocal meaning cross-linguistically.
This work and many other in-depth cross-linguistic studies (see, for example, Nedjalkov, 2007;König and Gast, 2008) make the point, first, that the domain resists a simple logical understanding of reciprocity, and secondly that despite the fact that there is a recognizable family resemblance between the concepts encoded grammatically in languages, it is not easy to describe the similarities and differences. The cross-linguistic comparison of reciprocal meanings, then, runs up against the more general problem faced by any systematic comparison across languages: how can one compare the different conceptual packages evolved by different languages in such a way that we are able to make realistic and helpful crosslinguistic comparisons while avoiding artificial "essentializing" definitions (cf. Croft, 2001;Goldberg, 2006;Evans and Levinson, 2009).
The final parameter is action or event-type. Since we are interested in the range of events a construction can be applied to, it is important to establish whether constraints in applicability are due to the actions being depicted, the lexical verbs involved or even to properties of morpho-syntax (e.g., argument structure constraints or alternatively form-based constraints).
Not all combinations of all these semantic parameters could be shown to participants, since this would have meant a total of over 4000 videoclips. A representative selection of the semantic space was constructed such that each of the above parameters was depicted. Additional videoclips that depicted non-reciprocal scenarios were also included in order to establish the borders of reciprocal constructions. These all featured two participants, where only one acted on the other -e.g., one person talking with the other listening (clip#1), one person hitting another with no response (clip#17), or one person giving another a watch (clip#26). In order not to exceed our participants' patience and attention, we limited the final stimulus set to 64 videoclips.

the language saMple
Data from 20 languages was collected using the reciprocals videoclip stimuli. Researchers elicited descriptions of the clips, and in some cases acceptability judgments in addition. (Detailed descriptions of the constructions used in most of the languages can be found in a forthcoming volume, Evans et al., 2011). The number of speakers per language for whom data was elicited varied from one to nine, with the average being three.
The languages are typologically, genetically, and geographically diverse, include languages from every continent; and sample 15 maximal clades or language families (note that Papuan is not a language family, merely meaning "Language spoken in Melanesia that is not Austronesian"). There is, however, an over-representation of Australian, Papuan and South-East Asian languages, partly reflecting the availability of data-gathering opportunities for project members, but also to compensate for the converse areal bias toward Eurasia and Africa in previous work (such as Nedjalkov, 2007). The data were collected by language specialists in each field site as indicated in Figure 2. Data sessions were conducted in the native language of the speaker, or in a suitable contact language, as appropriate.

procedure
The 64 videoclips were shown to consultants in a fixed random order. The consultant viewed each clip and described what they had seen. If the consultant merely described all the subevents, the researcher probed for a compact description of the whole event. Descriptions were audio-or video-recorded for later transcription and coding (see below).
The following analyses are based on the first spontaneously produced reciprocal construction. Intuitive judgments of grammaticality and acceptability have been repeatedly shown to be unreliable guides to linguistic analysis (Schütze, 1996;Tremblay, 2005;Da˛browska, 2010) but this is especially true in semantics. In an elegant series of studies, Labov (1978) demonstrated that spontaneous descriptions of stimuli (line drawings of objects such as cups and bowls) showed a perfect series of implicational hierarchies, such that if an object could be described as a cup, then all objects lower in the event is fully saturated (strong reciprocity). For example, when four people are involved all participants could symmetrically act on all the others (strong in Figure 1). But as we move away from this type, various other configurations are logically possible. A and B, and likewise C and D, could be in strong symmetrical interaction, but with no interaction outside the pairs (pairs in Figure 1). Or A could act on B who could act on C who acts on D, resulting in a linear series of events (chain in Figure 1). Or A and C could each act on D who alone acts on C, etc. (melee in Figure 1), and so on.
The third parameter concerns whether the action is symmetrical or not. That is, are the participants both actors and recipients of the target event. The fourth parameter is whether the subevents are simultaneous or sequential. This is straightforward when there are just two participants in the event, but where more than two participants are involved "both" is also a possibility, since some of the subevents could happen simultaneously but others sequentially. For example, where there are six participants in three pairs, each pairwise event could happen simultaneously, but the pairs could act on each other sequentially, i.e., first A and B act on each other, then C and D and finally E and F.  all constructions were "dedicated" reciprocal constructions. For example, in Jahai the main construction employed by speakers to describe videoclips was a distributive construction; in Hup an "interactional" construction predominated, etc. A criterion for inclusion in this study was that the construction was used to describe a canonical reciprocal event -i.e., an event in which two participants acted on each other simultaneously and symmetrically. If the construction was used for any such event, then it was included in the following analyses -regardless of what its core function may be. The number of constructions included for each language varied from 0 (for Kilivila) to 7 (for English), with an average of 3 constructions per language.

the codIng
For each videoclip we coded whether each spontaneously produced construction was applied. For example, if English speaker #1 used the each other construction to describe clip #11 which shows a man and a woman both simultaneously talking to one another, then a 1 was coded for the each other construction, if this construction was not used then a 0 was coded. The end result is a single matrix with language-specific constructions as columns and videoclips as rows. This basic data can be manipulated in various ways to address different questions. scale could also be described as a cup. But, when participants were explicitly asked to judge whether an object was a cup, the implicational hierarchy collapsed, and the lawful use of terms was no longer apparent. Labov argued that participants who were explicitly asked whether an object was a cup were using a constructed definition against which they were comparing the object, whereas those who were just asked to name the object were drawing on an underlying definition which was much more systematic. He concluded "It is an unfortunate fact (for linguists) that the more people think about language, the more confused they become" (Labov, 1978, 229). One other aspect of Labov's data that is pertinent here is that explicit judgments lead to more restricted application than free descriptions. Labov's participants used cup for fewer stimuli when they had to make a deliberative decision whether a particular stimulus was an instance of a cup than when they just had to describe the stimuli.
Since we are interested in the range of events that a reciprocal construction can be applied to, free responses are more appropriate.

results the constructIons
Individual researchers, experts on the languages they study, were responsible for coding their own language data (see Figure 2). Constructions were identified on language internal grounds. Not

Christian Rapold
Sign Language

Beaver
Dagmar Jung

FigurE 2 | Languages in sample.
languages are also different in some way from the other languages. Note that Mah Meri depicts a very different pattern to Jahai, even though these languages are closely related. We return below to some of the reasons for the outlier status of these languages. Another way of analyzing the same data is to use cluster analysis (see Figure 4), a technique that groups together items based on the amount of agreement. Each terminal node in the figure represents a language and nodes are grouped together based on similarity, which is directly reflected in the length of the lines before clusters joinshort lines indicate more similarity, long lines less similarity. We find two main clusters -the first subsuming most of the languages in the sample, the second grouping Kilivila and Indo-Pakistani Sign Language. The first large cluster can be considered an "agreement" cluster -these languages all roughly agree about which clips should be encoded with a reciprocal construction.
The big agreement cluster breaks into four subclusters. Lao, Hup, Mawng, Kuuk Thaayorre, and Kayardild, all strongly agree with one another, as does the following group starting with Savosavo and running through Rotokas. English is more different again from these two groups. Olutec, Tsafiki, Mah Meri, and Mundari are more different again, but still share similarities to the other languages mentioned so far, as represented by the fact that they join with the other languages into one major cluster. The major difference is between Kilivila and Indo-Pakistani Sign Language and the other languages.
Both the factor analysis and the cluster analysis indicate that Kilivila and Indo-Pakistani Sign Language are outliers. In the case of Kilivila, the reason is self-evident, since it is a language with no dedicated reciprocal, but the outlier status of the one sign language in the sample is a novel finding. Indo-Pakistani Sign has one construction that appears in response to canonical reciprocal overall sIMIlarIty of languages In classIfyIng recIprocal events The question we ask to begin with is how similar are languages to one another in their overall strategy for encoding reciprocal events? We can address this question by comparing the overall pattern of classification presented by each language through the factor-analytic methods described by Romney et al. (1986; see also Majid et al., 2008). The underlying assumption in this analysis is that if speakers of two languages share a common representation of what a reciprocal event is, then they will agree on which events should be encoded with a reciprocal construction, and conversely which should not. The encoding may not be identical since there may be differences in the construal of the clips, or a speaker may just not have chosen to encode that clip with a reciprocal construction on this occasion even though that clip could receive reciprocal encoding, or for some other reason. In a statistical analysis we can capture whether there is a common pattern of categorization regardless of this variation and -perhaps more importantly -still be able to distinguish genuine cases of difference.
We begin with the basic matrix described above, crossing language-specific construction with videoclip. This was used to create a new language-by-language matrix which indicates how similar languages are to one another as measured in terms of the extensions of their reciprocal constructions. The new matrix was constructed by establishing the number of times two languages agreed that a particular event could be coded by a reciprocal construction. This count was divided by the total number of videoclips in the stimulus set to get the proportion of matches, and then corrected for chance agreement (see Romney et al., 1986 for formulae). The resulting matrix was factor analyzed using principal components analysis. If there is a universal semantics of reciprocity then languages will correlate positively with each other and factor scores on the first factor will be positive. Moreover, the first factor eigenvalue will be considerably higher than the second factor score, and a substantial amount of the variance will be explained. If, however, languages differ in reciprocal semantics then they will correlate negatively with other languages and load negatively on the first factor. Furthermore, the amount of variance explained would be predicted to be negligible.
In fact, there was considerable agreement across languages, as reflected in eigenvalue scores and variance explained. The eigenvalue of the first factor was 13.41, nine-times higher than the second factor (1.48), and accounted for nearly three-quarters of the variance (74.52%). Figure 3 shows how languages load on the first two factors. The x axis depicts "consensus" -languages that load positively on this factor agree with one another on which clips are to be encoded by a reciprocal construction. From the plot it is clear that although there is substantial agreement in the extensional pattern of reciprocal constructions, some languages do not share this strategy for encoding reciprocal events. Kilivila is the most different from the other languages, and loads the most negatively on the first factor. This is trivially the case, since Kilivila does not have any reciprocal constructions, and therefore Kilivila speakers never encoded any of the clips with a reciprocal construction. Agreement with other languages is necessarily low.
Less obviously, however, Indo-Pakistani Sign Language also correlates very little with other languages in how reciprocal events are encoded. Mah Meri and Mundari also load negatively on the first factor, indicating that the reciprocal constructions of these two languages can be very similar to one another, even though the first language has only one dedicated construction while the second has two, three, or more. However, even languages with multiple reciprocal encoding strategies often have a clearly identifiable main strategy with the remainder of the constructions playing a rather minor role 1 . Therefore, we now concentrate on the main reciprocal encoding strategies in order to determine whether constructions share extensional range across languages. To do this, a matrix consisting of the main strategy from each language (19 columns) 2 and the videoclip stimuli (64 rows) was constructed. From this basic matrix, there are at least two ways to examine how similar constructions are to one another. The first is to determine whether constructions group and distinguish the reciprocal videoclips in the same way. Videoclips are "grouped together" when the same construction is applied to them; they are distinguished when a specific construction is applied to one but not the other. Using multidimensional scaling, we can capture the degree to which languages categorize the videoclips as semantically similar by plotting them in n-dimensional spacejust as we plotted languages earlier. In such an analysis, clips that are often grouped together across language constructions will be plotted close together in space, clips that are not grouped together will be plotted further away. The extensional range of a specific construction can be indicated by use of Venn diagrams.
A second way to compare the similarity of the constructions to one another is to plot the constructions themselves, rather than the videoclips. The logic is similar to that of the analysis of the videoclips. To the extent that constructions are used over the stimulus space in similar ways, they are similar to one another and will be plotted close together in space. events, and this construction has a very limited range of applicability, being used for only 18 videoclips. The restricted range is due to two factors. Events where symmetry cannot be shown by parallel handshapes require a different construction altogether, a classifier-construction. This means that clips featuring chasing, following, being next-to, etc. received distinct coding. Furthermore, events that are asymmetric -including chaining events -do not get encoded with a reciprocal construction. Again, a separate construction is required, and this construction was not used with canonical reciprocal events that feature complete saturation. Thus Kilivila and Indo-Pakistani Sign share the property of treating all or most of the clips as "non-reciprocal" events.
Mundari and Mah Meri also do not apply reciprocal constructions as widely as the other languages. In Mundari, the reciprocal infix strategy was used for strict and melee situations, but not for chaining situations. The only case where it was applied to a chaining event was when the subevents occurred simultaneously. A different construction was used for chaining events, combining the reciprocal infixation with a serialized verb whose base meaning is "take." This construction was not included in these analyses, since it was never applied to a canonical reciprocal event. For Mah Meri, clips depicting symmetrical physical states, e.g., "be.next.to," "lean," were impossible to describe with the reciprocal construction because intransitive and stative verbs cannot enter into reciprocal constructions. Events with non-volitional or unintentional contact, e.g., "bump" likewise cannot be encoded with a reciprocal construction. Moreover, Mah Meri has a strict requirement for exact sameness of action for the reciprocal construction to apply. Verbs have detailed semantics, and often there is no hypernym, so if two events within a clip varied in the particulars of the subevents, a concise description was not possible.

extensIonal range of recIprocal constructIons
So far the analyses indicate that there is considerable agreement between languages in which events should receive reciprocal encoding, but what is the basis of this similarity? In the above analyses,  For example, many languages have a bare reciprocal construction (as in "they kissed") but it is used for only a few of the reciprocal videoclips. When all constructions are included in the analyses, these constructions are clearly distinguished from the main reciprocal coding strategies which we concentrate on here.

2
There is no construction for Kilivila.

Majid et al. The grammar of exchange
Overall this analysis shows that across languages there is relatively little differentiation of the various reciprocal parameters built into our stimulus set. If the main reciprocal strategies of languages differed substantially in the kinds of action or "event-types" they included in their scope then there would be much more differentiation between the clips in this plot. The relatively dense clustering of events on the right-hand side of the plot below shows that this is not the case. Note though that saturated (strong) events are among those furthest to the right, and that non-reciprocal asymmetric events are furthest to the left. Dimension 1, thus, distinguishes strong reciprocal events from asymmetric events. Chaining and melee events are separated widely on Dimension 2, with strong events falling centrally. Thus there is structure inside this dense cluster, with canonical symmetric exchanges central, and departures of one kind or another more peripheral. To better understand what parameters underlie Dimensions 1 and 2, we correlated the loadings of the videoclips on these dimensions with the parameters identified in Table 1. The results are plotted in Table 2. For Dimension 1, all parameters but number of participants correlated with videoclip loading. For Dimension 2 only symmetry correlated positively, with symmetrical events appearing lower in the plot than asymmetrical ones. So which events do the different constructions encompass in their range? We can illustrate holding this same plot constant as a general map of the place of the clips in semantic space, and meanwhile superimpose the extensions of each language-specific

Semantic space of reciprocal events
To uncover how videoclips were categorized across the constructions of different languages, we conducted a multidimensional scaling analysis with a binary Euclidean distance in SPSS with the ALSCAL algorithm. The stress of the resulting model was quite high at 0.20 (scores under 0.15 are generally considered acceptable), but the RSQ was suitably high at 0.85 to be considered an adequate model of the data (anything over 0.6 is considered acceptable on this measure). Figure 5 displays the semantic space of the video clips. The numbers refer to specific videoclips; for convenience, configuration types for some of the clips are illustrated within the figure. Full descriptions of all the clips can be found in Table A1 in the Appendix. The videoclips form a dense cluster on the right-hand side, with a smaller number pulled apart on the left. The stimuli on the left are the clips most likely to be excluded from the range of the reciprocal constructions we consider here. These include two-participant asymmetrical events and asymmetric radial events. Most languages clearly distinguished these events from the others.
Dimension 1 is on the x axis and Dimension 2 on the y axis. The configuration type on the right-hand side is the strong reciprocal relation. Moving clock-wise, the next configuration type is melee, followed by asymmetric radial events on the bottom left-hand side, then asymmetric events, followed by chaining events at the top. The subscripts "t 1 , t 2 " etc. on the configuration icons indicate that the events happened sequentially. construction. Figures 6-8 depict some typical extensional ranges for specific constructions in particular languages that we find in our sample. The Venn diagrams indicate which videoclips a particular language construction was applied to. The broadest extensional range was exhibited by the Lao kan3 construction and the Hup interactional (Figure 6), which were permissive in their range. These constructions are used to convey that the individuals are doing an activity together, thus including asymmetrical events such as A delousing B (#51) within their scope. Note how they nevertheless differ in whether they exclude partially symmetric chains (Hup) or radial asymmetrical exchanges (Lao).
Most of the languages did not have constructions with anything like this broadness of application. The majority of constructions analyzed here excluded asymmetrical two-participant events, but included the majority of the remaining clips, as exemplified by Jahai, Khoekhoe, English, and Savosavo in Figure 7. Notice for example that all these languages regularly include chaining and non-saturated scenes, but get picky when just two or three participants are involved in asymmetric events. The plots (setting aside Jahai for a moment) suggest why the formal semantics approach has failed to nail down reciprocal meanings -there appears to be an impressionistic filter, of the kind that legitimates general collective involvement in an exchange of actions. Jahai is the most permissive, but excludes a static saturated state (four sticks leaning against each other) presumably because of the specific properties of its distributive construction, as used for this domain.
There were, however, a few languages that had a much more restricted range. We have already noted Indo-Pakistani Sign Language is, by language, an outlier, on the grounds of a very parsimonious use of its reciprocal construction. Figure 8 (bottom) shows a plot of this construction. As one can see, not only are all chaining events excluded, so are static scenes. Nevertheless the construction is not restricted to the "canonical" saturated type (some asymmetric melee scenes are included) 3 . The Mah Meri double distributive construction also had a very restrictive application (Figure 8 top), but showed the complementary pattern to the Indo-Pakistani Sign construction, in that it was much more likely to be used for chaining events (Figure 8).
What these plots nicely illustrate is just how differently these language-specific constructions map onto extensional space. Although all of them are grounded in the cluster to the right of the plot, that  The auxiliary construction was applied to clips#19 and 57, which are described as chaining events. The prototypical chaining event has an action moving sequentially along a set of participants -e.g., A hits B, B then hits C, C then hits D, etc. However, both clips #19 and 57 can also be construed as pairwise reciprocals -e.g., for clip #57 A and B, C and D, and E and F hit each other at t 1 and then B and C, and D and E hit each other at t 2 . they partition the space in different ways, they will lie far apart from each other. Using multidimensional scaling once again, but taking constructions as our unit of analysis, we can get an overview of the similarity of construction types. The same procedure described above was applied. The stress of the model was 0.15, the RSQ was 0.91. Figure 9 shows a plot of the main reciprocal encoding strategies. Going from right to left, constructions become progressively more restricted in how many clips they are applied to: Lao and Hup are most inclusive; Mah Meri and Indo-Pakistani Sign Language most restrictive (see Figures 6-8 to see extensional spaces). Notice, too, that many of the constructions is partly because they had to include saturated "strong" reciprocity to qualify. Beyond that, they extend to the left or up and down to engulf quite differing regions of clip space.

Semantic similarity of constructions
Another way to see the similarity of the constructions to one another is to plot constructions, rather than clips. We ask here: to what extent are each of the reciprocal constructions used in all of the languages similar or dissimilar to one another? Constructions from unrelated languages will be grouped together to the extent that they partition the clip space in similar ways; to the extent that

FigurE 8 | Extensional range for Mah Meri (top) and indo-Pakistani Sign Language (bottom) constructions.
cluster at the more inclusive end of extensional space: in effect, less than a third of languages are highly selective in their application of reciprocals.
We have already noted that dedicated reciprocal constructions themselves are not all of the same kind. We can see from Figure 9 that there are broad morpho-syntactic types that can be distinguished.
According to Dalrymple et al. (1994) the meaning of reciprocal constructions should be independent of the particular construction types that express it. Thus there should be no difference in the semantic ranges or reciprocals that happen to be expressed by verb-coding as opposed to nominal-coding. The contrary prediction will seem more natural to many linguists, especially where a single language deploys more than one construction, and one expects contrastive interpretations (Wierzbicka, 2009). That is, verb-coding reciprocals might be predicted to be more similar to other verbcoding reciprocals, while nominal-coding reciprocals might in turn be more similar to other nominal-coding ones. We tested whether reciprocal constructions with the same sort of marking were more similar to each other in their semantic ranges than those with different marking. To do this, we evaluated the distances between constructions in the similarity space in Figure 9 by comparing pairwise distances between constructions. Nominal-marking constructions (such as English each other and Yélî Dnye numo/ noko) are no more similar to other argument-coded constructions than they are to verb-coded ones t(53) = 0.59, p = 0.63. And, in fact, verb-coded reciprocal constructions are more different from other verb-coded constructions than they are to argument-marking constructions t(79) = 2.27, p < 0.03. The variability in semantic range between argument-coded and verb-coded constructions is of equal size t(44) = 1.88, p < 0.07. There is a slight tendency for argument-coded constructions to be more similar to each other than verb-coded constructions are to one another but this could be an artifact of the small number of argument-coded constructions in this sample. This suggests that the mode of grammatical coding does not constrain the semantic range of reciprocal meanings expressible within core reciprocal constructions. In this respect Dalrymple et al. (1994) prove to be correct. But as demonstrated in the previous sections they incorrectly predict uniform semantic ranges for reciprocal constructions.

suMMary of results
The results of this study suggest both systematic similarities and some striking differences in the way that languages treat reciprocity. On the similarity side, languages carve out contiguous categories of cross-linguistic extensional space for their reciprocal constructions (see Figures 6-8). Even when they extend constructions used elsewhere to areas of this semantic space, they do so in a systematic way, so that extensionally the patterns are similar (but not identical) to those of dedicated reciprocal constructions.
Most of the languages in our sample showed high agreement in the kinds of situations that reciprocal constructions could felicitously be applied to, as indicated by the strong agreement in the overall language comparisons. On the other hand there are some fundamental differences between languages (Figures 2 and 4) and between the constructions (Figures 6-9) they use to cover the domain so that there are clearly no simple, strong universals in this domain, pace Dalrymple et al. (1998). Some languages have no reciprocals (Kilivila in our sample), some have up to five or six distinct constructions (English, Barupu, Tsafiki). The ranges of extension vary notably. Some languages seem to use a sloppy "general mutual involvement" criterion as in English or especially Lao, others like Mah Meri or Indo-Pakistani Sign Language have much more restrictive delimitation. Yet these last two differ: Indo-Pakistani Sign Language excludes chainingtype events from the scope of its reciprocal construction but happily includes melee-type events. Mah Meri, on the other hand, shows the opposite pattern -the reciprocal construction of this language is extended to chaining events but shows restricted applicability to melee-type events (see Figure 8).
An interesting question is whether the diversity in meaning or extensional range could partly be an outcome of the constructional coding in the different languages. Could the differences in semantic extension, for example, be predicted from the formal coding of the constructions (as verbal, nominal, or adverbial)?
The answer appears to be no. We hypothesized that verb-coded reciprocal constructions might be more similar to each other than they were to argument-coded constructions, and vice versa. But our analyses show that this is not the case (see Figure 9, where, e.g., both a verbal construction in Indo-Pakistani Sign Language and an argument or nominal construction in Mah Meri are outliers to the left on Dimension 1, while other languages that use either an argument-based or verbal-based strategy are clustered together to the right). Sharing the constructional mode of reciprocal coding does not seem to make two reciprocal constructions semantically more similar to each other. Overall, then, this suggests that for each language, the child language learner not only has to learn the language-specific semantic extensions of the constructions in its language, but also, independently, how they are expressed formally. The child can apparently not use the syntax to bootstrap into the semantics, or vice versa.

dIscussIon
The large-scale cross-linguistic comparison conducted in this study demonstrates that there is considerable semantic overlap in the notion of reciprocity across languages. The majority of languages compared show considerable agreement in exactly which types of scenarios should be considered to be of a "reciprocal" nature, and which not. This points to a common conceptual understanding across a myriad of cultural and ecological settings. However, this commonality is balanced by considerable cross-cultural variation. Moreover, some languages have not evolved any dedicated means to encode reciprocity in grammar. The single language in our sample that did not have a dedicated reciprocal construction is Kilivila. The result is all the more surprising since the Kilivila speakers of the Trobriand Islands are famed for their elaborate systems of symbolic exchange (Malinowski, 1922), eloquently making the point that the conceptualization of reciprocity is not dependent deep water with the analyst). Thus with due caution extensions provide a quantifiable proxy for the direct measurement of conceptual differences.
The second great advantage of the multivariate-extensional approach is that it allows the significant categories to emerge, rather than defining them prior to the study. This is a major asset as definitional problems in a domain like reciprocity can be intractable: What counts as a reciprocal construction cross-linguistically? Exactly what situations should it include, and what should it exclude? This sort of problem bedevils just about any other kind of typological comparison, with disagreement rife on which criteria to adopt because of the relative arbitrariness of different criteria in the absence of any fixed prototype which all languages can be assumed to make reference to. Should a construction in language X be excluded as a reciprocal because it allows some asymmetric uses, or does not allow chaining, or takes in reflexive or sociative or distributive meanings?
These definitional decisions prove difficult or even impossible for linguists to reach consensus on. But for the approach taken here, they can simply be bypassed: the statistics are able to measure the patterning of categorizations over languages and stimuli directly, without requiring us to make a prior analytic decision on what to include. In practice, of course, there are underlying decisions in the choice of the stimuli, but these can be balanced across different expectations, so letting the data decide. There are also coding decisions to be made, but the problems here are largely language internal. In addition, for purposes of presentation or analysis we may construct a category -as we did when choosing constructions for analysis which denote at least one "canonical reciprocal event." But the data can be reanalyzed making other decisions.
For these reasons, we believe that multivariate-extensional methods have an important role in cross-linguistic comparison of semantics. The data and analyses discussed in this article are a proof of concept that such approaches can be applied to the semantics of grammar as well as to the semantics of lexical categories, a domain in which multivariate-extensional analysis has been more extensively tested.

acknowledgMents
The research reported here was funded by the Australian Research Council (DP 37771) and the Max Planck Gesellschaft. We would like to thank all researchers named in Figure 2 for data collection, coding and scholarship. Further material on this study will appear as part of a much more detailed book-length presentation (Evans et al., 2011).We would like to thank several other researchers whose involvement helped enrich the way we think about reciprocals, especially Leila Behrens, Robyn Loughnane, Sebastian Fedden, Volker Gast, Martina Faller, and Ekkehard König. For assistance in the project we would also like to thank Nan van de Meerendonk, Renske Schilte, Michelle Stapel, Kimberley den Brok, Ludy Cilissen, and Peter Nijland, as well as the actors in the video stimuli. Presentations bringing this data together were held at the Reciprocals Across Languages workshop at Max Planck Institute for Psycholinguistics, Nijmegen, April 19-21 2006, organized by the authors. on grammatical encoding. Kilivila speakers have an elaborate terminology for all these forms of ritual exchange, not dissimilar to the terminology of stock exchanges, but the fact remains that the notion of reciprocity is not abstracted out and grammaticalized in the language.
Even considering languages with a dedicated reciprocal construction, we found both broad and fine-scale differences in the calibration of what exactly counts as a reciprocal event. Languages with broad reciprocal constructions still vary in exactly which scenes they take to be as acceptable members of the category. More intriguingly, perhaps, the data supports the view that there may indeed be a core common situation of reciprocity with radial extension to the other situation types we have isolated. The fact that Indo-Pakistani Sign Language (and also Mundari) uses its reciprocal construction for melee-type events and not chaining ones while Mah Meri shows the reverse pattern of extension suggests that these different permutations are each natural extensions, open to linguistic coding. This then suggests a rather subtle learning problem for the child-or second-language learner: a range of possible, naturally related meanings, which the learner must select from, without over-extension. The learning problem is compounded when one considers that reciprocal constructions are going to be rarely exemplified. The British National Corpus lists 103 occurrences of each other per million words, ten times less frequent than all the reflexives like himself, themselves (1,184 of all person/numbers). Adding one another to each other moves this up a tad, to 130 words per million, i.e., roughly 1 per 100,000 words -but still about two orders of magnitude rarer than the regular pronoun it with 10,562 per million.
Traditional methods of linguistic analysis founder against significant variation in the object of study -indeed they tend to idealize away from it. For example, there is no easy way to take 10 or 20 speakers from a single language and show how their slight differences in linguistic behavior follow from their slightly different intensional categories. Now try the same for similar numbers of speakers of 20 languages. That is the challenge, and the current method, demonstrated in this paper, rises to (the extensional version of) the challenge with ease. What the approach employed here is precisely designed for, then, is producing a measurable, precise picture of the patterning of cross-linguistic semantic variation. It answers questions like: How far do languages actually differ in the way they categorize events of type X? How important are the various potentially relevant semantic parameters in cross-linguistic categorization? Do all languages include both semantic parameter X and parameter Y in their categorizations of this phenomenon? Does the conceptual patterning of the domain track geography, phylogenetic structure, typological makeup of the language, coding mechanism, culture or what? Which languages differ most, and which cluster together, in their categorizations?
It is important, though, to be clear about the limitations of this approach. The method used here maps extensions (i.e., the situations a construction applies to) not intensions (i.e., a general formulation of what the construction means). The extensional maps give an approximate guide to the intensional terrain, for the simple reason that intensional differences generally produce extensional differences (where that is not the case, the child learner is in equal appendIx