Consistency in Motion Event Encoding Across Languages

Syntactic templates serve as schemas, allowing speakers to describe complex events in a systematic fashion. Motion events have long served as a prime example of how different languages favor different syntactic frames, in turn biasing their speakers toward different event conceptualizations. However, there is also variability in how motion events are syntactically framed within languages. Here, we measure the consistency in event encoding in two languages, Spanish and Swedish. We test a dominant account in the literature, namely that variability within a language can be explained by specific properties of the events. This event-properties account predicts that descriptions of one and the same event should be consistent within a language, even in languages where there is overall variability in the use of syntactic frames. Spanish and Swedish speakers (N = 84) described 32 caused motion events. While the most frequent syntactic framing in each language was as expected based on typology (Spanish: verb-framed, Swedish: satellite-framed, cf. Talmy, 2000), Swedish descriptions were substantially more consistent than Spanish descriptions. Swedish speakers almost invariably encoded all events with a single syntactic frame and systematically conveyed manner of motion. Spanish descriptions, in contrast, varied much more regarding syntactic framing and expression of manner. Crucially, variability in Spanish descriptions was not mainly a function of differences between events, as predicted by the event-properties account. Rather, Spanish variability in syntactic framing was driven by speaker biases. A similar picture arose for whether Spanish descriptions expressed manner information or not: Even after accounting for the effect of syntactic choice, a large portion of the variance in Spanish manner encoding remained attributable to differences among speakers. The results show that consistency in motion event encoding starkly differs across languages: Some languages (like Swedish) bias their speakers toward a particular linguistic event schema much more than others (like Spanish). Implications of these findings are discussed with respect to the typology of event framing, theories on the relationship between language and thought, and speech planning. In addition, the tools employed here to quantify variability can be applied to other domains of language.


INTRODUCTION
To what extent does our particular language constrain how we describe and conceptualize complex events? While a key property of language is that an unlimited number of ideas can be generated with finite means (Chomsky, 2002), a growing literature shows that particular languages also impose biases on what speakers express and how they express it. Cross-linguistic differences of this kind have enjoyed substantial theoretical interest in recent decades, largely as a consequence of cognitive linguistic approaches to language and their central tenet that "grammar reduces to the structuring and symbolization of conceptual content" (Langacker, 1999, p. 1) as well as their emphasis on subtle aspects of how we conceive of events (Goldberg, 2003). In this view, cross-linguistic differences may tell us something fundamental about how the world is conceptualized as a function of language (e.g., Lucy, 1992;Slobin, 1996;Wolff and Holmes, 2011;Boroditsky, 2012). Previous cross-linguistic descriptions of event encoding have mostly sought generalizations at the level of whole languages. In contrast, the aim of the present work is to focus on variability of event encoding patterns within languages, in line with the idea that variability in language is not simply noise but instead contributes valuable information for theoretical development (e.g., Dabrowska, 2012Dabrowska, , 2016Verhagen and Mos, 2016;Cunnings and Fujita, 2020;Verhagen et al., 2020).
The cross-linguistic contrast studied here concerns lexicalization patterns in motion event encoding, i.e., differences in which conceptual information of an event is linguistically expressed and how it is packaged into syntactic structure (Talmy, 1991(Talmy, , 2000Berman and Slobin, 1994; for review, see Filipović and Ibarretxe-Antuñano, 2015). Languages can be classified into types depending on how they characteristically encode the fundamental motion component of PATH, i.e., the trajectory followed by the moving entity with respect to a landmark (Talmy, 2000). In satellite-framed (henceforth S) languages like Swedish, path is expressed outside of the main verb root. 1 In contrast, verb-framed (henceforth V) languages like Spanish characteristically encode path in the main verb root (Talmy, 2000).
To illustrate, example (1) gives a typical Swedish S-description of the event shown in Figure 1. Path is expressed outside of the main verb root, in the prepositional phrase in i grottan "into the cave." The main verb skjuter ("pushes") conveys information about MANNER, i.e., the particular way in which motion is brought about or carried out. Since main verbs are syntactically obligatory, S-languages exhibit a strong tendency to express manner in motion descriptions (Slobin and Hoiting, 1994;Slobin, 2004;Slobin et al., 2014). In contrast, Spanish descriptions of the same event tend to be verb-framed (V): Path is expressed in the main verb root, as in (2) (entra "moves Abbreviations: Linguistic: S, satellite-framed; V, verb-framed; NP, noun phrase; PP, prepositional phrase; Statistical: SD, standard deviation; SE, standard error. 1 The present study follows several authors in adopting a broad notion of satellite as "anything that is not a verb root but encodes an event component" (Croft et al., 2010, p. 206;see Filipović, 2007;Beavers et al., 2010;Verkerk, 2014 for essentially the same operationalization). into"/"enters"). Because the main verb slot is taken up by the path verb, manner-if at all mentioned-needs to be encoded elsewhere. One option is to express manner in an adjunct, as in the gerund form empujando ("pushing"), in (2). However, manner is also frequently omitted in Spanish and other Vlanguages, as in (3), where the preposition con ("with") leaves manner unspecified.
(1) Han He The previous distinction refers to "typical" or "characteristic" encoding patterns in a language. However, it is now widely recognized that a strict typology is somewhat of a theoretical straitjacket because of considerable within-language variability (Matsumoto, 2003;Slobin, 2004;Kopecka, 2006;Bohnemeyer et al., 2007;Filipović, 2007;Nikitina, 2008;Ibarretxe-Antuñano, 2009;Beavers et al., 2010;Croft et al., 2010;Slobin et al., 2011;Goschler and Stefanowitsch, 2013;Verkerk, 2014). 2 English, for instance, is considered an S-language but it has a set of Latinate verbs that denote path of motion, such as cross or ascend, thus allowing for V-constructions (e.g., "he enters the cave"). Similarly, even though Italian is predominantly a V-language, it has a system of satellite-like verb particles like giu "down" or via "away" that encode path and combine with manner verbs very much as in canonical S-languages (Iacobini and Masini, 2006). What explains this variability?
Here we test a prominent account in the literature of why there is within-language variability, which we call the eventproperties account. It is most explicitly formulated in Croft et al. (2010) and states that variability can be explained by specific properties of the events, such as whether an action reaches a goal or whether the type of motion is typical or atypical. The idea is that event framing patterns need not apply to a language as a whole but rather to complex event types (Croft et al., 2010). According to this account, some languages might apply a single pattern across the board (resulting in little overall variation), while others will fine-tune their syntactic patterns to particular FIGURE 1 | Sequence of stills taken from a stimulus event in the present study.
FIGURE 2 | Hypothetical scenario in line with the event-properties account, where within-language variability is mostly explained by event properties (speakers play a minor role). (Left) Each column represents a speaker. (Right) Each column represents an event. Fill color shows percentage of descriptions following each framing pattern, either by speaker (left) or by event (right). In this scenario, consistency between speakers is high when describing the same event.
FIGURE 3 | Hypothetical scenario against the event-properties account: Variability is mostly due to differences between speakers (event properties play a minor role). In this scenario, consistency between speakers is low when describing the same event. For figure interpretation, see caption in events. In the latter case, what appears to be within-language variability is really just variability between event types (Croft et al., 2010).
The event-properties account seems compatible with previous observations in the literature. For example, Aske (1989) observed that constructions of the S-type are not impossible in Spanish in general, but only for events with telic paths, i.e., paths involving a change of location, such as crossing a physical boundary (e.g., entering a cave). However, the evidence for this "boundarycrossing constraint" (Slobin and Hoiting, 1994;Slobin, 2004) in V-languages is mixed (Naigles et al., 1998;Kopecka, 2009;Iacobini and Vergaro, 2014;Martínez Vázquez, 2015). Relatedly, Slobin has argued that manner encoding in V-languages depends on the salience of the manner component in the event, such that manner is specified only when the pattern of movement is really at issue (Slobin, 2005). In the same vein, Papafragou et al. (2006) found that whether or not Greek speakers encoded manner in their event descriptions depended on how inferable manner was, omitting it only when manner was obvious from the context (e.g., a man walking up the stairs), but not when it was not (e.g., a man crawling up the stairs).
Under the event-properties account, speakers of the same language have consistent ways of describing an event, even if differences arise between event types. An alternative to this is that some languages lack a systematic way of encoding events. Both of these alternatives can result in within-language variability and only an analysis of the sources of variability can adjudicate between the two. To illustrate, Figures 2, 3 depict two hypothetical languages, in both of which the proportion of V-and S-descriptions is identical (70 and 30%, respectively). Figure 2 would support the event-properties account: some events predominantly elicit V-descriptions while others strongly elicit S-descriptions (right panel). Variability by speaker would be relatively minor, with most speakers clustering around the language average (left panel). Such a scenario would indicate that speakers systematically activate event-specific linguistic templates (cf., Goldberg, 1995). In contrast, Figure 3 shows a scenario in which some speakers show a strong V-preference, others show an S-preference, and the rest falls somewhere in between (Figure 3, left panel). The events may still bias descriptions toward V or S to some extent, but their role is relatively minor: Any particular event elicits some V-and some S-descriptions, with proportions close to the language average (Figure 3, right panel). Against the event-properties account, this scenario would suggest that no particular pattern of event framing is strongly associated with motion events, not even when controlling for individual events. 3 Teasing apart these two scenarios has important consequences for theories on the relation between language and thought and, in particular, for linguistic relativity, a topic that has sparked considerable interest in the last decades (Boroditsky, 2012;Casasanto, 2016;Bylund, 2019). Linguistic relativity posits a relation between an individual's language and their conceptual representation of the world (Whorf, 1956;Lucy, 1992;Levinson, 2003). The two necessary assumptions are that languages systematically differ in how they linguistically categorize the world and that language affects thought. From these two assumptions, it follows that speakers of different languages will think differently (Swoyer, 2011). However, if there is little consistency between how speakers describe the same events (as in Figure 3), this clearly undermines the first assumption that languages provide a systematic categorization of the world, and thus weakens the case for relativistic effects (cf. Kay, 1996).
Indeed, variability between speaker descriptions gives a measure of how codable a concept or event is in a language (e.g., Brown and Lenneberg, 1954;Majid et al., 2018). Highly codable events are consistently described within a language community (i.e., there is low between-speaker variability). One of the first experimental tests of linguistic relativity was in fact centrally based on speaker variability: Brown and Lenneberg (1954) measured consistency in how speakers of the same language (English) named different patches of colors. They hypothesized that colors that were more consistently labeled (i.e., more codable) should lead to better performance on individuals' recognition memory, because accessible labels that were strongly associated to the stimulus would support memory processes. This hypothesis was corroborated by their results (Brown and Lenneberg, 1954).
Surprisingly, although Talmy's typology has been applied to predict cognitive phenomena, little attention has been paid to speaker variability. 4 The general approach of dividing languages into types, like S-and V-languages (and other types), may be appropriate from a purely descriptive point of view that only attends to linguistic structural patterns, which was arguably the main motivation in Talmy (2000). However, if we wish to predict general cognitive patterns that might correlate with certain linguistic habits, as in linguistic relativity research, it is clearly of relevance to know how tight the link is between the situation to be described and the linguistic schema by which it is described.

THE PRESENT STUDY
The aim of the present study is to examine variability in motion encoding across and within languages as a window onto linguistic event representation. We test the event-properties account, according to which variability should be a function of event properties and thus there should be only little within-language variability for each event.
We elicited caused motion descriptions of the same events and under identical experimental conditions from comparable populations of Spanish and Swedish speakers. Spanish and Swedish are generally considered prototypical cases of a Vand an S-language, respectively, and thus provide a useful comparison. The choice of caused motion is motivated by previous studies that have qualitatively suggested that there is variability in how this type of event is described in French (a V-language), but not in English (an S-language) (Hendriks et al., 2008;Hickmann and Hendriks, 2010). Since variability itself was the phenomenon of interest, speaker sample sizes were larger than in most previous studies (42 speakers per language, more than doubling typical samples of 12-20 speakers).
We first verify that the two languages indeed show the typologically expected pattern at the aggregated level of language: a V-preference in Spanish and an S-preference in Swedish. Next, we examine in closer detail variability in event framing to test how well event properties can account for it. Lastly, we turn to variability in the expression of manner information and assess how well it fits with the event-properties account.
To gain a detailed understanding of the sources of variability in the data, we use visualizations, the information-theoretical notion of entropy, and Bayesian mixed models. To allow for similar analyses in future work, all data and R-based scripts necessary for reproducing the results are publicly shared through a Dataverse repository (see Data Availability Statement).

Participants
The participants were 42 native Spanish speakers of the Peninsular variety (M age = 23.7, SD = 3.2; 25 females) and 42 native Swedish speakers (M age = 23.9, SD = 3.8; 25 females). Spanish and Swedish participants were recruited among university students at the Universidad Complutense Madrid (Spain) and at Stockholm University (Sweden), respectively. All participants used their first language routinely. None of them reported being bilingual from birth nor having any expert knowledge in a foreign language. 5

Stimuli
The target events consisted of 32 video animations, each ∼7 s long, originally designed by Hickmann and Hendriks (2010). In each event, the same agent moved different objects in different manners and along different paths. Several aspects of the events were systematically crossed in the stimuli, namely the manner in which the agent caused the object to move (pushing or pulling), the way in which the object itself moved (rolling or sliding) and the path followed by agent and object (up, down, into or across a landmark). For each combination of these values, there were two events: in one of them motion proceeded from left to right, and in the other, from right to left. The events comprised eight different landmarks (two per path) and 16 different objects (four per manner of cause and manner of object combination). See the Data Availability Statement for a full description of the target events. There were also seven distractor items that showed unrelated motion events in which inanimate objects moved along different trajectories, and one training item similar to the target events.

Procedure
All participants were tested individually in a quiet room after providing informed consent. They were told that they would see a figure called Hopi ("Popi" in Spanish, "Hoppi" in Swedish) doing different things and that their task was to describe what had happened in each scene after the animation had played in its entirety. 6 Great care was taken to not prime the participants with any example sentences. No strict limitations were given as to the length of the descriptions, but participants were told to focus on "what happened" rather than on the details of the scenery. All participants started describing the training item, which served to introduce the agent and clarify any questions they might have. The target events were played in four semi-randomized lists counterbalanced across participants, with the seven distractor items interspersed at regular intervals. Descriptions were audiorecorded for later transcription.

Coding
For each transcription, the target description comprised all clauses that referred to the dynamic motion event. The 2,688 target descriptions (1,344 per language) were coded for two dependent variables: framing strategy and manner encoding. Framing strategy was determined solely on the basis of where path was expressed (cf. Talmy, 2000), resulting in three possible values: (a) V-FRAMED (V) if path was expressed in the main verb, (b) S-FRAMED (S) if path was only expressed outside of the main verb, and (c) NO PATH if path was not expressed. A path expression was any expression that conveyed one of the four path values in the stimuli: up, down, into, or across. Cases in which path was redundantly expressed in the main verb and elsewhere were treated as V (e.g., sube para arriba "he ascends up"). Targets that contained more than one main clause were treated as V if at least one of the main verbs expressed path (e.g., empuja la mesa y la mete en la cueva "he pushes the table and inserts [path] it in the cave").
The second variable, manner encoding, was treated as a binary variable: a target description either expressed manner information or not. Manner was mostly conveyed as part of a verbal root (e.g., Swedish puttar "pushes"), but other means of conveying manner were also counted (e.g., Spanish delante "in front of him" in sube con un regalo delante "he ascends with a present in front of him").

Entropy Computation
To quantify variability by speakers and items, we use entropy. The entropy H of a variable quantifies that variable's degree of randomness or variability (Cover and Thomas, 2005). High entropy values indicate high variability (high randomness), whereas entropy values close to zero indicate low variability (low randomness, i.e., a predictable outcome).
For a categorical variable like Framing, which can take on discrete values (S, V, or NoPath), the entropy is defined as where x denotes each of the three possible values of Framing and p(x) is its probability. In our analysis, we compute entropy by speakers and events as follows (see Data Availability Statement for example calculations).
For speakers, p(x) in the formula above is estimated with the proportion of events a participant described with a given framing. This entropy score per speaker is a measure of how variable that speaker's descriptions are. A speaker who follows a very consistent pattern (i.e., whose descriptions show little variability) will have entropy close to zero. In contrast, speakers who do not show consistent patterns will have high entropy. For each language, we obtain a distribution of speaker entropy scores.
We also compute entropy for each event in an analogous fashion: p(x) is now estimated with the proportion of speakers who described a given event using each framing. If the pattern with which an event is described is highly consistent, entropy will be close to zero; but if variability is high for that event, entropy will also be high. The prediction in the literature that within-language variability is explained by differences between event properties means that entropy values computed by events should be close to zero.
Entropy presents a number of desirable properties. First, it is a mathematically well-defined notion that enjoys growing use in the language sciences (e.g., Montemurro and Zanette, 2011;Gries, 2012). Second, entropy is flexible and can be computed for categorical variables with any number of levels as well as for continuous variables. Lastly, entropy is a direct quantification of variability that abstracts away from the actual patterns in the data. Thus, one can compare variability across languages even when their dominant patterns differ (e.g., S vs. V). Table 1 shows the proportion of descriptions adopting each framing strategy per language. As expected, the most common strategy was verb-framing (V) in Spanish and satellite-framing (S) in Swedish. However, these proportions already reveal a striking difference in consistency: Swedish descriptions were almost exclusively of the S-type (97%), whereas in Spanish only 59% of descriptions followed the language-dominant V-pattern. The template of a typical Swedish description is given in (4): a transitive construction in which the main verb expressed manner (push, roll, etc.) and path was encoded in the prepositions of a directional prepositional phrase (into, over, etc.). An example is shown in (5). A related S-pattern involved the use of path particles immediately after the verb followed by the object NP and then a prepositional phrase, as in (6). In Spanish, the two patterns (V and S) were frequent across descriptions (59% and 35%, respectively), even though V-framing dominated. An example of a Spanish V-description is shown in (7) and one of a S-description, encoding manner in the main verb and path in a prepositional phrase, is given in (8).

Variability by Speakers and Items
Figure 4 reveals the source of variability in each language by breaking down framing strategies by speakers and events. Top panels show the Spanish, and lower panels show the Swedish data. The left panels break down the data by speakers: Each column along the x-axis represents a speaker and its fill shows the proportion of descriptions that follow each event framing. Analogously, the right panels show variability by events, with columns representing events and fills showing the proportion of speakers who described each event with each framing strategy.
For Spanish, Figure 4 (top panels) suggests that variability between speakers was substantially larger than variability between events. The top left panel shows that some Spanish speakers consistently used V-framing as in example (7) (columns at the left end), while others almost exclusively used S-framing as in (8) (columns at the right end). The rest of the speakers fell somewhere along this spectrum. In contrast, the top right panel shows that differences between events in Spanish descriptions were less marked: No event exclusively elicited one type of framing strategy or another. Instead, each event to some extent was described with V-or S-framing. The low consistency in how individual events were framed speaks against the event-properties account and Croft et al.'s (2010) prediction that Talmy's typology would apply to "individual complex event types within a language" (p. 202). The data show that even for one and the same event there can be substantial variability in how different speakers of the same language syntactically frame it, that is, no consistent pattern arises in Spanish, even when looking at events individually.
For Swedish, in contrast, the lower panels in Figure 4 show that variability was low both by speakers and by events, which follows from the fact that the pattern was highly consistent at the group level (Table 1). Descriptions followed the S-patterns exemplified in (5) and (6). Interestingly, the little variability that is found in Swedish descriptions is introduced by a few occasionally divergent speakers, rather than by some oddly described events, against the predictions of the event-properties account. 7

Quantifying Variability: Entropy Analysis
Entropy is a suitable measure to quantify variability-or conversely, consistency-by speakers and events (see section 3.5 for methodological details). The prevalent idea in the literature that differences between events explain within-language variability implies that there should be high consistency in how events are described within a language, i.e., that entropy by events should be low. Figure 5 shows entropy over event framing in both languages, computed by speakers and events (left and right panels, respectively). Swedish speakers mostly had zero or near-zero entropy values (left panel-purple triangles), which reflects their consistent use of the same S-pattern. Spanish speakers, on the other hand, showed a much wider distribution of entropy scores (left panel-green dots): Speakers with low entropy values correspond to those who were consistent in their framing choices, sticking to either V-or S-framing. Speakers with the highest entropy values are those who were about equally likely to use any of the three framing patterns (V, S, or no path) and were thus most unpredictable. A Mann-Whitney test indicated that entropy by speaker was reliably larger in Spanish (Mdn = 0.9) than in Swedish (Mdn = 0), W = 1, 688, p < 0.001. 8 The interpretation of entropy by events is analogous (Figure 5, right panel): Entropy over event framing was consistently low in Swedish, as there was little variability in how each event was described. This contrasts with the high entropy values by event in Spanish descriptions, showing that all events were described with variable patterns. A Wilcoxon matched-pairs signed-rank test indicated that entropy over event framing was reliably greater in Spanish (Mdn = 1.1) than in Swedish (Mdn = 0.2), V = 528, p < 0.001.
In sum, variability was greater in Spanish than in Swedish, both by speakers and events: The Spanish data show high entropy by events in combination with a wide spread of entropy by speakers. This means that the overall language-level variability observed in Spanish does not simply depend on event properties, because otherwise entropy by events should be low. Instead, it is largely due to differences between speakers, showing that no single framing strategy is consistently used by Spanish speakers.

Variability in Manner Encoding
So far we have focused on framing strategy, that is, the syntactic packaging of the path component. We now turn to the other aspect of motion descriptions that differs between S-and Vlanguages, namely the encoding of manner. In a similar fashion as before, we examine patterns of variability in whether or not manner information is expressed. If it is the case that manner encoding in V-languages depends on the salience (Slobin, 2005) or inferability (Papafragou et al., 2006) of the manner component in the event, then variability by events should be low in Spanish: For any given event, manner should be specified if it is salient or non-inferable-and otherwise omitted.
Overall, Swedish speakers mentioned manner very consistently: 97% of descriptions did so, as in examples (1), (5), and (6). The pattern was much more mixed in Spanish: 63% of descriptions did mention manner, as in examples (2) and (8), while 37% did not, as in (3) or (7). Figure 6 shows speaker and event variability in manner encoding in Spanish (top panels) and Swedish (bottom panels). The picture is strikingly similar to what we found for event framing. In Spanish, speakers were the largest source of variability: Some Spanish speakers had a strong preference to encode manner, others an almost equally strong preference to omit it, and most fell somewhere in between (top left panel). Differences between events were less pronounced and mostly clustered tightly about the language mean (top right panel). Swedish speakers, on the other hand, consistently encoded manner (Figure 6, bottom left panel), which follows from the overall language proportions (see Figure 7). Note that the little variability present in the Swedish data was also introduced by speakers rather than by events, as was the case for event framing.
In sum, because by design manner is held constant within each event, the high variability by events in Spanish runs counter to the event-properties account. Again, the data show that Spanish speakers, in contrast to Swedish speakers, do not consistently choose to encode the same semantic content (specifically, manner) for one and the same event.

Expression of Manner as a Function of Framing Strategy
To understand how manner encoding relates to framing strategy, Figure 7 plots the percentage of descriptions expressing manner in each language as a function of framing. For Swedish (right panel), it shows, as expected, that there is little variability with respect to both framing strategy (typically S) and manner encoding (typically expressed). For Spanish (left panel), on the other hand, it suggests that framing strategy was a predictor of manner encoding: Approximately half (49%) of Spanish Vdescriptions contained manner information, as in example (2), whereas the other half did not [see example (3)]. In contrast, most of the Spanish S-descriptions (82%) did convey manner, as in example (8). Across languages, descriptions that did not express path ("no path") were rare (4%); most of them (85%) tended to include manner information (see text footnote 7 for an example).
To confirm the relation between framing strategy and manner encoding in Spanish (Figure 7, left panel), we fitted a Bayesian logistic mixed model on the Spanish data, using the R package brms v. 2.10.0 (Bükner, 2017). Manner was the binary dependent variable (1 = expressed, 0 = not expressed) and framing strategy was the only fixed effects predictor (contrast coded: −1 = V, 1 = S); "no path" descriptions were excluded from this analysis (5.6%). The random effects included by-speaker intercepts and  slopes for framing, and by-event intercepts and slopes for framing (i.e., it was the maximal model defined by the design, see Barr et al., 2013). We used brm's default, non-informative priors for fixed and random effects. For details of model fitting, see Data Availability Statement.
This analysis indicated that framing strategy indeed was a significant predictor of manner encoding in Spanish: log-odds of expressing manner for S-vs. V-framing = 1.51, SE = 0.27 (95% Bayesian credible interval = [1.00, 2.09]). 9 Thus, on average, S-framing made manner encoding 4.5 times more likely than V-framing in Spanish.
The Bayesian logistic mixed model additionally allows us to examine whether speakers or events were associated with 9 In a Bayesian framework, a 95% credible interval means that the parameter of interest has a 95% probability of being contained in that interval (see, e.g., Kruschke and Liddell, 2018). larger variability on manner encoding, after accounting for the population-level effect of syntactic framing. This information is captured in the model's variance parameters associated to each of the random effects (Schielzeth and Nakagawa, 2013). Variance associated with speakers was consistently larger than that associated with events, both regarding random intercepts (by-speaker intercepts: standard deviation [SD] = 2.37, SE = 0.39 vs. by-event intercepts: SD = 0.78, SE = 0.16) and random slopes for framing (by-speaker slopes: SD = 1.10, SE = 0.27 vs. by-event slopes: SD = 0.29, SE = 0.15). Thus, the SD associated with speakers was more than three times larger than that associated with events.
In sum, framing strategy was a statistical predictor of manner encoding in Spanish. Once this effect was accounted for, however, speaker variance still accounted for much more of the variability in the data than variance associated with the events, further supporting the lack of a consistent pattern to describe motion events in Spanish.

DISCUSSION
The present study on cross-linguistic representation of motion events focused on variability in event encoding. We evaluated the event-properties account of within-language variability, which states that "Talmy's typological classification applies to individual complex event types within a language, not to languages as a whole" (Croft et al., 2010, p. 202). We tested the prediction that follows from this account, namely that descriptions of the same event should be consistent within a language. Each of the two languages we compared is generally taken to represent one dominant type of motion encoding: Spanish, a V-language (Talmy, 2000), and Swedish, an S-language (Ragnarsdóttir and Strömqvist, 2004;Gullberg and Burenhult, 2012).
First, we found that already at the group level Spanish and Swedish showed a striking difference in variability: Swedish descriptions were almost altogether consistent in following the S-pattern and systematically conveying manner information, but Spanish descriptions varied greatly with regard to framing strategy and manner encoding, replicating similar earlier results in French (Hendriks et al., 2008).
Critically, we found little evidence for the event-properties account of within-language variability. Instead, the data show that variability arises because no single framing pattern is consistently applied among Spanish speakers, even for one and the same event. That is, most of the variability in Spanish was not the result of some events consistently eliciting Vdescriptions and others S-descriptions, or some events eliciting manner in the descriptions and others not. Instead, it was the speakers who varied substantially. Spanish speakers formed a spectrum in their individual preferences for framing strategy, ranging from those who only produced V-descriptions to those who almost exclusively used the S-pattern; most participants used both patterns to some extent, but the choice of one or the other was not systematically linked to specific events. A similar gradient of preferences was found in manner encoding, from speakers who always expressed manner to those who hardly ever did. Even after controlling for the effect of framing strategy on manner encoding in a logistic mixed model, speakers emerged as a substantially larger source of variability than events.

Typologies of Motion: How Entrenched Are Framing Patterns?
The results show that Swedish is much more accurately characterized as an S-language than Spanish is as a V-language when it comes to the description of caused motion. Classifying languages into a typology still captures general trends: the majority of Spanish descriptions were indeed verb-framed. However, the lack of a consistent syntactic framing in Spanish, even for one and the same event, demonstrates that for Spanish speakers there is no strong association between motion events and a linguistic schema to encode them.
In this sense, speaker variability offers an index of how entrenched an abstract linguistic pattern is among speakers of a language (cf. Dabrowska and Street, 2006), where entrenchment denotes the "degree to which the formation and activation of a cognitive unit is routinized and automated" (Schmid, 2007, p. 119). Highly entrenched constructions are automatically activated when certain situations in the world are to be described by a speaker (Langacker, 1999); they result in a "gestalt formation" (Divjak and Caldwell-Harris, 2015, p. 61) that links perceptual input and a target linguistic conceptualization.
Our data indicate that the S-pattern of describing motion events is deeply entrenched in Swedish-it is routinized and automated. When asked to describe a caused motion event, Swedish speakers retrieve a very specific linguistic construction (cf. Goldberg, 1995;Croft, 2001), which specifies the grammatical structure of the sentence and determines how semantic components like path and manner map onto the different syntactic slots. However, the same is not true for the V-pattern in Spanish. Spanish descriptions alternate in syntactic patterns and semantic content, not only between, but also within, speakers. In other words, no linguistic schema is deeply entrenched in Spanish.
The lack of attention to speaker variability in previous cross-linguistic work is surprising given that entrenchment is widely recognized as a psychological phenomenon that is essential to language (Langacker, 1999;Schmid, 2007Schmid, , 2017Ambridge et al., 2008;Caldwell-Harris et al., 2012;Divjak and Caldwell-Harris, 2015). As argued in Dabrowska (2016), ignoring individual variability comes at the cost of missing out on a "window onto the cognitive and experiential underpinnings of language" (p. 485). While classifying languages regarding their dominant lexicalization patterns has some descriptive value (Talmy, 2000), it is necessary to find out how consistently these patterns are applied, as this offers an index of the degree to which they are entrenched. In other words, typological descriptions need to quantify and characterize variability within a language. In what follows, we discuss two areas for which the current findings have implications and where a more refined quantification of variability will help theory development.

Implications for Linguistic Relativity
The current findings have several implications for the linguistic relativity hypothesis-the claim that the language we speak affects our mental categories (see Wolff and Holmes, 2011;Boroditsky, 2012;Gleitman and Papafragou, 2012). A mechanism commonly hypothesized to drive Whorfian effects is that language provides conceptual schemas that become automatized through repeated use, making them easily available also in situations where language is not used (Whorf, 1956;Lucy, 1992;Levinson, 2003). Assuming such a mechanism, a typological feature like the S/V-distinction is useful for testing relativistic hypotheses only to the extent it predicts the habitual linguistic experience of speakers of a language. But if speaker variability with respect to a feature is large within a language, as in the present case for Spanish, then power to detect the classical Whorfian effect between languages would be reduced even if it existed. The mixed findings in the literature on Whorfian effects in the motion domain (Gennari et al., 2002;Papafragou et al., 2002;Kersten et al., 2010;Papafragou and Selimis, 2010;Montero-Melis and Bylund, 2017) could be a consequence of this within-language variability (cf. Loucks and Pederson, 2011;. Therefore, future research testing effects of linguistic relativity needs to not only find crosslinguistic contrasts where languages either follow pattern A or B, but also explicitly measure the consistency with which these patterns are applied in each language. Large speaker variability also has an upside if tests of the Whorfian hypothesis instead embrace it. Rather than testing the Whorfian hypothesis between languages, differences might be sought at the level of speakers. The individual linguistic biases of a speaker-for example in terms of encoding mannershould then become the main predictor. This idea is not new (e.g., Brown and Lenneberg, 1954). It underlies all Whorfian research which employs training paradigms (e.g., Dolscheid et al., 2013) and much of the literature on relativistic effects in second language speakers, where linguistic proficiency is treated as the key predictor of performance on non-linguistic tasks (for an overview, see Bylund and Athanasopoulos, 2014). The approach of the present study is relevant to that literature, because it will allow researchers to spell out the conditions under which it would make sense to look at effects of language on cognition on a speaker-by-speaker level, rather than (or in addition to) at the level of language, namely when variability at the speaker level is large (cf. Cunnings and Fujita, 2020).
An interesting novel hypothesis based on the current results is that the degree of entrenchment itself could have effects on the mental representations speakers form of whole event categories. This is markedly different from the hypothesis tested in previous research on relativity effects in the motion domain (i.e., whether speakers of different languages pay more attention to path or manner). A highly entrenched construction for describing caused motion events, as the one found in Swedish, might result in a robust memory schema that defines a single abstract mental template for all these events (cf. Gilboa and Marlatte, 2017). This abstract family resemblance might have the consequence that Swedish speakers readily think of a set of events as belonging to the same kind, while this might not be obvious to speakers of Spanish, who lack a single entrenched linguistic construction to describe these events.
This predicted effect can be seen as an instance of "ontological Whorfianism, " whereby language invites us to group particulars into a single category that we would not group together were it not for language (Reines and Prinz, 2009). Because the neurobiology of schemas is increasingly well-understood (e.g., Gilboa and Marlatte, 2017;Heidlmayr et al., 2020), an exciting prospect is to test whether merely watching the same events elicits differential neural processing across speakers of different languages, indicative of a more schematic processing when an entrenched linguistic construction exists (as in Swedish) compared to when it does not (as in Spanish).

Variability as a Window Into Speech Planning
The large variability in Spanish descriptions implies that Spanish speakers are faced with substantial choices-both syntactic and semantic-when producing a description. This raises three questions about how this variability impacts speech planning.
First, does syntactic choice facilitate or inhibit speech planning for Spanish speakers? Two claims alternate in the literature: Some earlier evidence supports the notion that syntactic flexibility benefits speech planning, resulting in quicker speech latencies and fewer errors, in line with a flexible and incremental view of speech planning (Ferreira, 1996). However, there is also evidence to the contrary, i.e., that syntactic choice slows down production, in line with a competitive model of language production where having several options hinders production (Hwang and Kaiser, 2014). While the present study was not designed to contrast these two claims (e.g., we did not collect speech onset latencies), preliminary analyses of speech disfluencies reported elsewhere suggest that Spanish speakers overall made more speech errors than Swedish speakers (Montero-Melis et al., 2016). These analyses also showed that syntactic variability at the individual level was correlated with more speech errors, that is, those speakers who had higher entropy over syntactic frames (see section 4.3) were also more likely to produce speech errors, such as pauses or false starts (Montero-Melis et al., 2016). In sum, there is some preliminary evidence that Spanish speakers may pay the price of diminished fluency in speech production for having a more flexible way of describing caused motion.
Second, the variability in Spanish descriptions raises an interesting question regarding thinking for speaking, the idea that the particular grammatical choices favored by a language affect how we conceptualize an event while speaking (Slobin, 1996). Do Spanish speakers always think of manner but sometimes omit it in their descriptions-or do they omit manner (when they do so) because they did not even think of it? This question can be framed in terms of Levelt's (1989) model of speech production. According to this model, the first stage of speech planning happens in the "conceptualizer, " which establishes a pre-linguistic event model in which event structure and participants are defined but not yet mapped onto linguistic units. Only at the next stage, the "formulator" decides how to structure the sentence syntactically. The question, then, is whether variability in Spanish descriptions originates at the level of the conceptualizer (in line with, e.g., von Stutterheim and Nüse, 2003) or the formulator. Previous evidence from a nonlinguistic task comparing Spanish and Swedish speakers indicates that Spanish speakers paid less attention to manner (Montero-Melis and Bylund, 2017), suggesting that the effect originates at the level of the conceptualizer. However, more direct tests of this hypothesis would be possible capitalizing on the between-speaker variability in Spanish. For example, current decoding approaches (see Haxby et al., 2014) would in principle allow us to gauge if brain activity prior to speech onset (i.e., during speech planning) encodes the specific manner of the event irrespective of whether manner is later expressed in speech or not. An affirmative answer would be evidence against an effect in the conceptualizer.
A final question is how the kind of variability seen in Spanish affects the creation of situation models, that is, "mental representations of the people, objects, locations, events, and actions described in a text" (Zwaan, 1999, p. 15). 10 Accounts of how situation models are constructed differ regarding how much importance they assign to structural aspects of the linguistic message (see Zwaan, 2016). Some authors consider it a design feature of language that it underspecifies any situation that is described (Gleitman and Papafragou, 2012), while other accounts assume that the listener arrives at a different construal depending on subtle aspects of linguistic realization (Goldberg, 1995(Goldberg, , 2003Langacker, 1999). Thus, an interesting question is whether situation models are resistant to variability in the descriptions, both within and across languages. That is, do Spanish speakers build similar situation models when processing a motion description independently of the linguistic schemas they are generated from? And do the situation models of Spanish speakers differ in predictable ways from those of Swedish speakers? This question links language processing to linguistic relativity since if cross-linguistic differences lead to differences in conceptualizations, they do so arguably because different structural patterns give rise to different situation models.

The Relevance of (Quantifying) Variability
Recent years have seen a reappraisal of the importance of variability for linguistic theories, with increased emphasis on the role of experience and variation in forming linguistic representations (Bybee, 2010;Dabrowska, 2012;Ellis et al., 2013;Kapatsinski, 2014;Gries, 2015;Hoffmann et al., 2019;Verhagen et al., 2020). In this respect, work on event conceptualization has been lagging behind: Most studies have tended to focus on central tendencies in the form of language averages, and only exceptionally variability has been broken down by speakers (e.g., Berthele, 2013) or events (e.g., Cadierno et al., 2016). So while previous work may offer quite detailed qualitative reports of variability (e.g., Hickmann and Hendriks, 2010), a more thorough analysis of variability was needed.
The present study illustrated how to analyze variability and may prove useful beyond the domain of motion (see also Gries,10 We thank an anonymous reviewer for bringing up this idea. , 2015Gries and Ellis, 2015). We used the informationtheoretical notion of entropy and the variance components in (Bayesian) mixed models as two ways of quantifying the intuition of consistency across speakers and events. Entropy is a flexible and mathematically well-defined measure of variability, allowing for quantification and comparison of variability between languages. Here, we were able to show that Swedish is much more accurately characterized as an S-language than Spanish is as a V-language when it comes to the description of caused motion. More generally, our analysis approach could be a welcome addition in typology, as it will allow for a more flexible comparison of variability across languages. An advantage for use in cross-linguistic settings is that entropy can be computed over categorical variables with any number of levels, obviating the need to reduce different patterns to binary values. Thus, even in situations where a linguistic phenomenon is expressed in different ways across and within languages, entropy mathematically represents this variation as a single underlying distribution, whose variability or "randomness" can be straightforwardly quantified.

Limitations
As a qualificatory note, the present study comprised a limited set of caused motion events and focused only on two languages.
To further generalize the current findings, it will be necessary to extend the present approach to other types of motion events and other languages. An open question is whether the lack of consistency reported here for Spanish holds more generally across V-languages. To address this, it will be necessary to test other V-languages and see if they pattern like Spanish. Also, the sampled participant populations were relatively homogeneous (in terms of age, regional and social background, level of education, etc.); more heterogeneous samples will be needed before definitive conclusions can be drawn about languages as a whole. Even more importantly, the current study measures variability in production only. Therefore, it is not possible to fully tease apart two scenarios that could give rise to the small variability observed in Swedish descriptions: Is it the language that strongly constrains the options of how these events can be syntactically framed because V-framing is not even possible? Or do speakers have different choices but still consistently choose Sframing? While it is clear that path verbs are rare in Swedish (and therefore the choices at least somewhat constrained), a proper evaluation of this issue requires collecting acceptability ratings for the different types of constructions as applied to each event. 11

CONCLUSION
Variability in event encoding offers a window into the mental representation of event schemas. When it comes to withinlanguage variability, the dominant view in the literature has been that event properties account for it, but this account had not been empirically tested in a controlled experimental design. The present study tested this claim and found little evidence supporting it. Swedish descriptions were found to be highly consistent, whereas Spanish ones were much more variable. The analyses show that Spanish descriptions are variable, not because they are fine-tuned to different events, but because there is no consistent way to describe the same events across speakers. This suggests that some languages have more entrenched linguistic structures than others to describe the very same events. A discrete typology obscures this relevant fact; therefore, the quantification of variability should be part and parcel of any typology at the language level. Entropy offers a suitable and flexible tool to quantify such variability and thus the analysis presented here can be adopted in future typological descriptions.
The present findings have implications for theories of event representation in language and cognition: Linguistic templates that are not deeply entrenched among speakers of a language are unlikely to result in strong mental schemas. A more fruitful approach to testing linguistic relativity effects in the domain of motion is thus to not focus exclusively on the dominant pattern of motion encoding, but rather on the fact that some languages have a highly entrenched pattern, while others do not. This may lead to differences in event ontology across speakers of different languages (cf. Reines and Prinz, 2009). Finally, we highlighted the implications of this lack of entrenchment for speech planning, suggesting that variability of the sort reported here can provide an effective test ground for open questions about the effects of syntactic choice on sentence planning. In sum, the present study contributes to the view (cf. Verhagen et al., 2020) that variability in language provides a valuable source of information that should be analyzed and interpreted rather than dismissed as noise.

DATA AVAILABILITY STATEMENT
The data sets and R scripts necessary to reproduce all analyses reported in this paper are accessible at https://doi.org/10.7910/ DVN/Z12TY1. The repository contains: • README file explaining how to reproduce the analyses on your own computer.
• Basic data file in csv format.
• Item description in csv format.
• Supporting information: R-generated knitr report with supporting information and the code that generated all analyses reported in the paper (in html format).
• Files containing the R code necessary to generate the Supporting Information file.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and has approved it for publication.

ACKNOWLEDGMENTS
I am very grateful to Florian Jaeger, Emanuel Bylund, and Markus Ostarek for helpful feedback on earlier drafts of this manuscript and to Arantxa Hurtado and Alejandra Donoso for help with data collection.