Cross-linguistic evidence for memory storage costs in filler-gap dependencies with wh-adjuncts

Stepanov, Artur; Stateva, Penka

doi:10.3389/fpsyg.2015.01301

ORIGINAL RESEARCH article

Front. Psychol., 04 September 2015

Sec. Psychology of Language

Volume 6 - 2015 | https://doi.org/10.3389/fpsyg.2015.01301

This article is part of the Research TopicEncoding and Navigating Linguistic Representations in MemoryView all 49 articles

Cross-linguistic evidence for memory storage costs in filler-gap dependencies with wh-adjuncts

Arthur Stepanov^*

Penka Stateva

Center for Cognitive Science of Language, University of Nova Gorica, Nova Gorica, Slovenia

This study investigates processing of interrogative filler-gap dependencies in which the filler integration site or gap is not directly subcategorized by the verb. This is the case when the wh-filler is a structural adjunct such as how or when rather than subject or object. Two self-paced reading experiments in English and Slovenian provide converging cross-linguistic evidence that wh-adjuncts elicit a kind of memory storage cost similar to that previously shown in the literature for wh-arguments. Experiment 1 investigates the storage costs elicited by the adjunct when in Slovenian, and Experiment 2 the storage costs elicited by how quickly and why in English. The results support the class of theories of storage costs based on the metric in terms of incomplete phrase structure rules or incomplete syntactic head predictions. We also demonstrate that the endpoint of the storage cost for a wh-adjunct filler provides valuable processing evidence for its base structural position, the identification of which remains a rather murky issue in current grammatical research.

Introduction

It has long been known that processing syntactic dependencies, in which two elements are syntactically related and linearly separated by intervening material, may be difficult for sentence comprehenders. An early study in Wanner and Maratsos (1978) showed that such difficulties arise, in particular, in processing incomplete filler-gap dependencies, in which a wh-phrase is syntactically related to the gap in the subcategorized position of the verb:

(1) Which book do you think that Colin recommended _ to the librarian?

Such long-distance dependencies are a source of syntactic complexity that the parser has to deal with over and above what is required for processing phrase structure and specific lexical items. One line of explanation for these difficulties faced by the human parser is that syntactic dependencies of this kind incur a tax on the working memory needed to temporarily store the antecedent or predictor, until a suitable element with which it can be associated is encountered in the partially processed input (Chomsky and Miller, 1963; Abney and Johnson, 1991; Gibson, 1991, 1998; Stabler, 1994; Lewis, 1996). Thus, in (1), the filler (which book) must somehow be temporarily stored in working memory until a suitable integration point is found. The working memory tax associated with storage cost leads to particular behavioral effects such as increased response times, or specific brain activity patterns at the neural level. Chen et al. (2005), in one of their self-paced reading experiments, manipulated the type of structure between a relative clause, where a wh-dependency is established, and a sentential complement, where it is not, as in the following examples:

(2) a. The announcement [that the baker from a small bakery in New York City received the award] helped the business of the owner.

b. The announcement [which the baker from a small bakery in New York City received ___ ] helped the business of the owner.

The critical region was the subject NP “the baker from a small bakery in New York City.” By hypothesis, temporarily storing the wh-filler “which” initiates an incomplete syntactic dependency and a prediction of a subcategorizing or thematic-role assigning verb to complete the dependency. Thus, assuming that both conditions otherwise involve the same amount of lexical integrations, it is predicted that the critical region in (2b) should elicit greater reading times, showing a distributed slowdown effect, as opposed to (2a) where no wh-dependency is initiated, because of the storage effect¹. Indeed, Chen et al. (2005) observed that reading times in (2b) were greater than in (2a) (see also Gordon et al., 2002; Grodner et al., 2002 for related studies).

In the Event Related Potentials paradigm, storage costs for wh-fillers appear as a modulation of left anterior negativity (a negative voltage deflection in the frontal, often left-lateralized, regions of the scalp) spread over the region between the filler and the gap (Kluender and Kutas, 1993; King and Kutas, 1995), followed also by a P600, a positive deflection effect at the gap or pre-gap position (Kaan et al., 2000). Phillips et al. (2005) also observed that the sustained negativity persisting throughout the wh-dependency until the point of its completion is independent of the length of a filler-gap dependency, appearing both in short-distance (single clause) and long-distance (multi-clausal) wh-dependencies. The authors interpret this sustained negativity as a reflection of the cost of holding the wh-phrase in working memory. A similar pattern of Event Related Potentials was also observed in German and Japanese (Fiebach et al., 2002; Ueno and Garnsey, 2007).

In the present study, we investigate whether wh-adjuncts like how quickly, when and why elicit a similar kind of storage cost as wh-arguments do. Wh-adjuncts are notably different from wh-arguments in ways that directly affect processing. Semantically, adjuncts can never have a basic semantic type: canonically, they may function as predicates of events in the sense of event semantics (Davidson, 1980), or proposition or event modifiers in the sense of compositional semantics (Heim and Kratzer, 1998). In that capacity, wh-adjuncts are special in that their base (that is, semantically and syntactically determined) position is not predicted by subcategorization and/or thematic role assigning properties of the verb. The absence of direct association with the verb raises an a priori possibility that wh-adjuncts do not instantiate a filler-gap dependency at all: rather, they could be simply processed in their surface position. This lines up with certain grammatical theories that do not postulate syntactic displacement or dependency in the case of wh-adjuncts, as opposed to wh-arguments, and assume that wh-adjuncts are base-generated in their surface position (see, e.g., Hukari and Levine, 1995 for an overview, and the discussion below). We can then ask the following:

(3) a. Do wh-adjuncts instantiate filler-gap dependencies similar to wh-arguments, in the absence of thematic and/or subcategorization association with the verb?

b. Do all adjuncts incur similar storage costs?

As will become clear from the following discussion, we believe that the answer to (3a) is “yes,” but the answer to (3b) is most likely “no.” We do expect storage costs for (most) wh-adjuncts because their surface position must be syntactically linked to a base position linearly separated from it by intervening material. At the same time, recent advances in syntactic theory inform us that adjuncts differ with respect to their base positions. Consequently, we might expect different adjuncts to display different storage costs. The following section provides a basic overview of the major syntactic peculiarities that enter into processing considerations regarding filler-gap dependencies involving wh-adjuncts, and outlines the challenges presented by wh-adjuncts in light of the existing theories of storage costs and filler-gap dependencies in general.

Base/Integration Points of Wh-adjuncts

Syntactically, an adjunct is realized as a sister to an abstract syntactic node denoting the predicate that the adjunct modifies. An example of modifying an event predicate is shown in (4):

(4)

yes

The actual attachment site of an adjunct may vary depending on the type of predicate that it modifies. Current theoretical research recognizes a multitude of positions in the syntactic structure of the sentence, where adjuncts of various semantic types may appear (e.g., Cinque, 1999). For the present purposes, we may adopt a simplified version of that typology and pinpoint at least four classes of adjuncts as in Table 1 (cf. also Ernst, 2001; Rizzi, 2001).

TABLE 1

Table 1. Grammatical typology of adjuncts and their wh-counterparts.

The order of listing the adjunct types in Table 1 roughly corresponds to their base structural position in the syntactic tree, or closeness to the root node. The attachment site of each adjunct type is determined by the corresponding phrase structure rule (e.g., S yes AdvP S, or VP yes VP AdvP) and corresponds to speakers' semantic intuitions. Structurally the “highest” are speaker oriented adverbs which we will not consider in this study. The next highest position is occupied by reason adjuncts and their corresponding wh-counterpart why (see also Experiment 2 for further details), placed above the sentential S node, in the domain of COMP (or Complementizer Phrase in current syntactic terminology). This is followed by subject-oriented adjuncts that are adjoined to the S node (or Infl Phrase in many contemporary syntactic theories), not considered in this study either. Finally, VP-oriented adjuncts are adjoined to VP, thus are relatively “low” in the syntactic structure.

A major consequence of this diversity of the syntactic and semantic properties of wh-adjuncts is that, in contrast to arguments, the linear position of the adjunct in the sentence cannot be reliably predicted from the linear position of the corresponding verb and the information about the canonical word order in a language. Rather, the association of an adjunct to the verb in this case can only be loose and indirect [note that the lower VP constituent in (4) can syntactically be arbitrarily complex; other material can also be inserted by iterating the VP node]. In parsing, this translates into a state of affairs whereby a wh-filler cannot be reliably associated with a specific lexical stimulus, such as the verb. Rather, it may appear at an arbitrarily long distance from it². With regard to VP-modifying wh-adjuncts, one can envision two possibilities as to where their integration point might lie.

The first possibility is that, despite the irrelevance of the thematic/subcategorization information, the parser follows some lexically-driven strategy to integrate the adjunct at or near the verb, similarly to wh-argument dependencies. Indeed, some grammatical models postulate a close syntactic relationship between (wh-)adjuncts and the verb outside the realm of the thematic relations. Such postulated relationship usually has a featural character: some morphosyntactic feature on the adjunct and a feature on its licensing head such as V or Infl must match or agree. In such approaches, different features may correspond to different adjuncts (Travis, 1988; Laenzlinger, 1996; Ernst, 2001). It is thus possible that the featural association of a wh-adjunct and the verb is reflected in the processing pattern, resembling or approaching the pattern of association of wh-argument fillers with corresponding verbs.

The second possibility is that the base position of the wh-adjunct (and, correspondingly, its integration point in a filler-gap dependency) requires online computation of an abstract syntactic node [cf. the lower VP in (4)] as a way of identification of the event-denoting predicate hosting the wh-adjunct, as well as some predictive information about its linear position with respect to that node. To illustrate schematically, let us assume a parsing algorithm with a storage component, using both bottom-up and top-down strategies and consulting the phrase structure module of the grammar. If such an algorithm is used to process the embedded part of a sentence like (5), the adverbial phrase how would be stored (possibly along with a pointer associating it with the relevant phrase structure rule; see also the discussion of the SLASH feature below) until the VP node is constructed online. Completion of the VP would trigger a subsequent application of the rule VP yes VP AdvP, retrieving the stored phrase and integrating it at a grammatically permissible site [cf. (4)].

(5) I didn't know how John fixed the car

(6)

yes

Formally, the difference between the two possibilities lies in the syntactic category of an element combining with an adjunct: a lexical head such as V⁰ vs. a phrasal category such as VP, in the correspondent grammatical rule(s) guiding integration of the adjunct during online processing. In the absence of thematic and/or subcategorization criteria for integration, syntax seems to be a major relevant cue for predicting the integration site in this case (possibly supported by other cues such as plausibility). Consequently, no lexically-based strategy would be at issue; rather, the relevant integration algorithm would have to make reference to the syntactic category information in determining the integration point (see also Gibson, 1998, 2000 concerning processing costs of structural integrations). Similar considerations apply with respect to the processing costs of temporary storage of an wh-adjunct, as discussed below.

In sum, the grammatical distribution of (wh-)adjuncts in general is much more complex than that of (wh-)arguments. Correspondingly, general processing predictions with respect to the base site of the filler-gap dependency headed by a wh-adjunct can hardly be formulated. Rather, gap predictions must be formulated item-specifically. In the absence of thematic/subcategorization information, such predictions have to take into account, at the very least, the semantic type of an adjunct, on the one hand, and the phrase structure rules generating it, on the other.

Storing Wh-adjuncts: Theoretical Predictions

From the storage perspective, investigating wh-adjunct dependencies is theoretically illuminating in at least two different respects. The first one concerns the role of the thematic factor in current theories of storage costs, and more generally, the role of lexically-based strategies of computation of these costs. The second regards the interaction and potential convergence of the processing and grammatical predictions concerning the endpoint of the storage costs, also in the context of the Active Filler Strategy. Below we consider these aspects in turn.

The Lexically-based vs. Syntactically-based Views on Storage Costs

Current processing theories of incomplete filler-gap dependencies focus, explicitly or implicitly, on the issue of the temporary storage of the wh-filler, its integration into the syntactic structure of the input, or both. For instance, the memory-based accounts of filler-gap dependencies (Gibson, 1998, 2000) assume that integration and storage incur separate memory costs, and the overall processing cost of a filler gap dependency is a function of both of these measures. These accounts are based on the general idea that integrations and storage share the same pool of memory resources and that this pool of resources is limited; consequently, exceeding the set limit at some point slows down performance (Baddeley, 1990; Just and Carpenter, 1992; Lewis, 1996)³. From the evidence accumulating from the previous psycho- and neurolinguistic studies (see Introduction), it can be conjectured that integration costs are associated with behavioral or neurophysiological markers showing up at certain discrete points of parsing, usually at or around the predicted gap site, whereas storage costs reveal themselves as extended intervals of specific behavioral or neural response over a range of input that coincides, or is very close to, the area between the filler and the gap, in the form of a reading slowdown or increased sustained voltage deflection in the ERP signal.

The theories of storage proposed to date differ with respect to the question as to what processing units may incur a memory storage cost (see Chen et al., 2005 for review). It has been proposed that storage be measured in units such as incomplete clauses (Kimball, 1973), incomplete phrase structure rules (Yngve, 1960; Chomsky and Miller, 1963), incomplete thematic role assignments (Hakuta, 1981; Gibson, 1991), incomplete Case dependencies (Stabler, 1994), and predicted syntactic heads (Gibson, 1998, 2000). For instance, under the theory taking incomplete phrase structure rules to be relevant storage units, a center-embedded structure as in This is the malt that the rat that the cat that the dog worried killed ate elicits storage costs quantifiable in terms of the number of the phrase structure rules such as S yes NP VP that have to be kept in memory as more embedded material is processed. Similarly, under the predicted syntactic head theory, storage is quantified in terms of the number of syntactic heads expected to complete a dependency.

We believe that investigating wh-adjunct dependencies may reliably distinguish between these theories. In particular, the theories that take storage units to be incomplete thematic role assignments (Hakuta, 1981; Gibson, 1991) predict that wh-adjuncts should not elicit storage costs, simply because there are no thematic roles associated with them that need to be stored. A similar prediction is made by the theories that take the relevant storage units to be incomplete Case dependencies (Stabler, 1994): wh-adjuncts are usually adverbials or prepositional phrases; as such, they are not subject to the Case requirement, hence no Case information needs to be stored. Thematic roles, lexical head predictions and/or Case predictions are all part of the class of theories that take lexical factors as the cornerstone of relevance when it comes to computing storage costs.

On the other hand, theories that assume that incomplete phrase structure rules are stored during processing (see above) do predict storage costs for wh-adjuncts similarly to wh-arguments. As Chen et al. (2005) point out, these theories can be adapted to handle storage costs in filler-gap wh-dependencies utilizing the analytical tools of, e.g., head-driven phrase structure grammar (Pollard and Sag, 1994) and/or generalized phrase structure grammar (Gazdar et al., 1985). In these models, the mediation between the wh-filler and the verb is achieved via the SLASH feature which may propagate across syntactic nodes or rules down to the integration site thematically associated with the verb, and thus marks the path of the wh-dependency (Pollard and Sag, 1994; Sag and Fodor, 1994). The crucial point here is that the SLASH feature is insensitive to the syntactic category of the missing constituent. Thus, all else equal, it predicts the storage costs for wh-arguments, as well as for wh-adjuncts.

Similarly, theories that compute syntactic predictions in terms of expected syntactic heads (Gibson, 1998, 2000) predict storage costs for wh-adjuncts as well as wh-arguments. In contrast to the incomplete phrase structure rule theories, the expected syntactic head model does not make direct reference to hierarchical constituent structure, but only to its lowest level of representation, the level of syntactic heads. Let us assume for the moment that the gap expectation for how is at the point linearly following the lower VP constituent, as in (4). For concreteness, let us also assume the algorithm for quantitative estimation of storage costs based on predicted syntactic heads, as in Gibson (2000). Consider the relevant storage costs of the sentence in (7) (concentrating on the embedded clause) that can potentially be assigned in this model, with and without the wh-adjunct, all else being equal:

(7) I didn't know that/how you fixed the car yesterday

Input word

…how you fixed the car yesterday

(8) Storage cost 3 2 2 2 1 0

…that you fixed the car yesterday

2 1 1 1 0 0

Given the nature of adjuncts as event modifiers, the storage cost at the point when how is processed will be 3. This reflects, in addition to the associated gap position, two more heads to describe an event (e.g., it happened or John arrived). Upon encountering the subject you, the storage cost value is reduced to 2, expecting a predicate and (still) a gap. At the point when the comes, the parser expects the noun and a gap position. Finally, at car, only the gap position is expected. In the non-wh version, storage costs are correspondingly reduced.

To sum up, wh-adjuncts may provide important evidence to distinguish between theories that place crucial weight on the lexical properties of fillers and those that do not. If wh-adjuncts incur storage costs, that would argue in favor of the latter type of theories. The present study seeks to provide such evidence.

Endpoint of the Storage Costs

The second interesting aspect of storage costs has to do with understanding the way storage costs for wh-adjuncts are related to the two potential grammatical possibilities for the integration site considered above. The existing theories that take storage and integration both to be active components of the working memory, largely take it for granted that the integration site marks the retrieval of the wh-filler, hence the endpoint of the storage costs. For wh-arguments, the endpoint of the storage costs at the grammatically expected point (e.g., verb for the object filler) would not be particularly surprising. For wh-adjuncts, things are not that trivial. Current grammatical theories do not always offer reliable clues as to the end-/integration point of the wh-adjunct in the syntactic structure, due to their loose and mobile syntactic character that follows from the lack of thematic and/or subcategorizational anchors. The situation gets even more complicated considering that the integration site is different for different wh-adjuncts in the same language (see footnote 2 and Experiment 2). Naturally, wh-adjunct dependencies appear somewhat more elusive for tracking with current experimental methods than their wh-argument counterparts. The endpoint of the storage costs in this situation could then provide important processing evidence for grammatical theory, to the extent that it demarcates a likely integration site.

In this respect, it is also interesting to investigate the role of the Active Filler Strategy, a parsing strategy which assigns high priority to integrating the filler at the earliest point allowed by the grammar (see Fodor, 1978; Frazier and Clifton, 1989; de Vincenzi, 1991). The Active Filler Strategy bears on the “filled gap” effect of integrating a wh-argument like subject or object with its corresponding syntactic position in the input, as in the following sentences from the self-paced reading study in Stowe (1986):

(9) a. My brother wanted to know if Ruth will bring us home to Mom at Christmas.

b. My brother wanted to know [who]_i G_i will bring us home to Mom at Christmas.

c. My brother wanted to know [who]_i Ruth will bring $^{*} G_{i}$ us home to G_i at Christmas.

Longer reading times were reported at us in (9c) compared to (9a) and (9b). This is expected if the position of bring is the earliest potential position where the object wh-filler who, temporarily kept in the memory, can be integrated. That state of affairs causes reanalysis. In contrast, (9a) and (9b) involve no such reanalysis⁴. This and other studies investigating the Active Filler Strategy are usually based on processing verbal arguments. The interest in investigating the role of this strategy in processing wh-adjuncts consists primarily in determining (a) whether it is operative at all; and (b) if it is, what sort of evidence the parser uses in order to determine the earliest position, in the absence of thematic or lexically-oriented cues.

In the present study we report two self-paced reading experiments targeting wh-adjunct dependencies in Slovenian and English. Specifically, we focus on two examples of structurally low, VP-modifying adjuncts, as well as an example of a structurally high (reason) adjunct. Low or VP-modifying adjuncts offer a good source of evidence pertaining to research question (3a) above. Since the canonical word order in SVO languages presupposes some non-trivial distance between the occurrence of the filler and the VP in the linear representation of the interrogative sentence, a filler-gap dependency in this case can potentially be identified in parsing by a storage effect which extends across some part or all of the corresponding range in the input, much along the lines of the previous studies of storage costs incurred by wh-arguments. This is not the case with structurally high wh-adjuncts whose integration sites are likely to be close to their surface position or even identical to it (see also Section Storage Cost Predictions for why). Utilizing this idea, Experiment 1 aims at detecting a filler-gap dependency with the VP-modifying adjunct kdaj “when” in Slovenian, as well as investigating the endpoint of such dependency. Experiment 2, using English materials, addresses research question (3b) as well as (3a). It compares the storage cost patterns of the structurally low wh-adjunct how quickly and the structurally high wh-adjunct why, asking whether these processing patterns differ in a way that correlates with the syntactic and semantic properties of these two wh-items. This experiment also targets the endpoint of a filler-gap dependency in greater detail.

Inclusion of Slovenian in our study was justified on several grounds. Aside from the obvious benefit of expanding the empirical database of processing storage cost effects cross-linguistically, and the fact that Slovenian usually receives little attention in behavioral psycholinguistics, working with certain kinds of wh-adjunct dependencies in Slovenian turns out to be preferable as some wh-adjuncts in Slovenian are free of inherent lexical ambiguities typical of their counterparts in other languages, including English (as is the case of when, see below). This allows for a cleaner experimental design, avoiding potential confounds in the construction of stimuli. The present study is also the first, to our knowledge, comparing storage costs in filler-gap dependencies in two languages within the same experimental setup. Because of that, we show that storage costs elicited by wh-adjuncts are a language-independent phenomenon, a naturally expected result in the context of the general inquiry into the nature of the human parsing system.

Experiment 1: Slovenian kdaj (“when”)

Experiment 1 is a self-paced reading study in which we investigate potential storage costs elicited by the structurally low, VP-modifying wh-adjunct kdaj “when” in Slovenian. If a wh-adjunct like when in the beginning of the sentence instantiates a filler-gap dependency as wh-arguments do, thus functioning as a filler, we may expect a storage cost effect extending across the range in the input which is commensurable with the structural distance between the filler and its corresponding gap in the VP area. The experiment tests this scenario.

In addition, Experiment 1 aims to shed light on the issue regarding the endpoint of the storage cost for when. As noted above, there are two main theoretical possibilities to consider with respect to this endpoint. One is that the dependency terminates at the verb, as is the case for wh-arguments. The other is that the dependency terminates at some point predicted by phrase structure rules for VP. Regarding the latter possibility, in a sentence with a transitive verb it makes sense to expect a gap at or after the relevant part of the argument structure, viz. verb plus object, is processed [cf. (4) above]. The working assumption, trivial for wh-arguments, but non-trivial for wh-adjuncts, is that the end of the storage costs (that is, a point where reading times are equalized compared to the input not involving a wh-dependency) signals the gap site. Based on the results in Chen et al. (2005) for wh-arguments, we thus expect to see a region of increased reading times to last until either the first or the second suspected gap site:

(10) a. I didn't know when John bought (G1?) the newspaper (G2?) in the kiosk

b. I didn't know that John bought the newspaper in the kiosk.

If the endpoint of the storage costs is at the verb, this would support the approach to storage costs based on the featural association of the verb and the adjunct (see above). On the other hand, if the endpoint of the storage costs is at or after the direct object, this would be consistent with the phrase structural theories, as well as with the predicted syntactic head theories of storage costs.

Note that when in English is ambiguous. It can be used in its truly interrogative sense (cf. Peter asked when the parcel would arrive) or in another, related, but non-interrogative, guise (cf. Peter left when the parcel arrived). When used in embedded contexts, the truly interrogative version of when is selected by a particular class of verbs such as ask, wonder, or know. When used in its non-interrogative sense, when does not need to be selected at all. It is often difficult to distinguish these two usages in English and other languages which use a single lexical item for both. In Slovenian, on the other hand, the two usages of when are lexically disambiguated: kdaj is used in the respective interrogative contexts, and ko in non-interrogative ones. Because of that, Slovenian is an excellent choice to study the online behavior of the interrogative when and rule out potential confounds caused by its non-interrogative usage (which may not trigger a wh-dependency at all).

We thus concentrated on kdaj in Slovenian, and compared performance over the region corresponding to the argument structure (in bold) in simple embedded wh-questions such as the following (further description of the items involved is discussed in the Section Materials below):

(11) a. Kritik je potrdil, da je umetnik izdelal tisti

Critic is confirmed that is artist created this

koš v svoji delavnici.

basket in his workshop

“The critic has confirmed that the artist created thisbasket in his workshop”

b. Kritik je potrdil, kdaj je umetnik izdelal tisti

Critic is confirmed when is artist created this

koš v svoji delavnici.

basket ih his workshop

“The critic has confirmed (the date) when the artistcreated this basket in his workshop”

Since in (11a) the argument structure ends at the point koš, this is the point where we expect the storage costs to equalize with those observed at the same point in (11b). Following the critical region was either a locative PP (e.g., v svoji delavnici “in his workshop”) or a further optional specification of the object noun [e.g., (koš) božičnih daril “(basket) of X-mas presents”], which contained two to three words.

Methods

Participants⁵

Seventy-four monolingual speakers of Slovenian from the academic communities of the University of Nova Gorica and University of Ljubljana volunteered to participate in the experiment for no material compensation. All participants were naïve to the purposes of the study.

Materials and Methodology

Twenty-four sets of sentences, each with the two conditions described above, were carefully constructed. Since we were interested in evaluating the actual “boundaries” of the filler-gap dependencies reflected in online storage costs, the sentences in the two conditions were exactly identical except for the value of the embedded clausal head, or Complementizer: this value was either a declarative da “that” or interrogative kdaj. To control for length of a wh-dependency, all sentences were made exactly 12 words long. Each sentence began with an introductory part involving a one-word subject, a past tense auxiliary and a main verb [cf. (11)]. The main verbs were carefully chosen so that they may embed either a wh-interrogative clause, or a declarative that-clause. In English, typical representatives of this class of verbs are know and figure out (e.g., I know that the guests came vs. I know when the guests came; see also Experiment 2). In general, at least for when, the set of such ambiguously embedding verbs is much larger in Slovenian than in English, so that there was no repetition of verb between the items. The fourth word is the embedded complementizer appearing in one of the two versions outlined above. Words five through nine represent the region of interest as they minimally describe an argument structure that can be modified by when. The sixth word is the embedded subject, the seventh is the embedded verb and the eighth and ninth words represent the direct object, where the eighth word was always a demonstrative determiner. This was done in order to make the object structurally “heavier,” but not to the point when the complexity of its structure would potentially intervene with determination of the right boundary point. In choosing nouns used for embedded subjects and objects, as well as embedded verbs, we controlled for their plausibility and corpus frequency, for the latter using the FidaPLUS-JOS1M corpus (Erjavec et al., 2010).

The remaining three words always describe a location of the event in the form of a prepositional phrase compatible with the locative specification. The locational content of the prepositional phrase was chosen so that it would have a clear bias toward modification of the event, not of the last phrase (object). It should be also noted that Slovenian is a language where verbal clitics must always appear in the second position in the clause. The test sentences are all in past tense, whose grammatical manifestation in Slovenian requires a particular verbal clitic. That is why the second and fifth words in the test sentences are always verbal clitics, either singular or plural, depending on the grammatical number of the subject.

The target sentences were split into individualized lists balancing all factors in a Latin Square design, so that a different such list is activated for each participant. Each list was combined with 50 filler sentences of various syntactic types and of comparable length. The experimental items and fillers were thoroughly checked by a native speaker of Slovenian who is also a linguist. A complete list of target items along with their English glosses and translations is provided in Supplementary Material.

Subjects performed a self-paced reading task implemented by using the Ibex software (by Alex Drummond, http://spellout.net/ibexfarm/). We used a word-by-word centered-window presentation of stimuli. In this design, a subject initially sees two dashes in the center of the screen. By pressing the space bar, the first word in the sentence appears in place of the dashes. With each subsequent press of the space bar, the current word is replaced with the next word in the sentence, until the end of the sentence is reached. The reason we did not use the currently more popular moving window version of the self-paced reading task (see Just et al., 1982) was to rule out potential topological cues helping one to identify the left and right boundaries of a filler-gap dependency based on the positions of the relevant words or their placeholders (viz. dashes) in the linear representation of the sentence. Ruling out this possibility reinforces the scenario whereby processing filler-gap dependencies is based on the resources of working memory only, which is of primary interest from the point of view of evaluating storage costs. The order of stimulus presentation was pseudo-randomized for each participant by the experimental software and it was ensured that at least one filler intervenes between any two target items.

At the beginning of the experiment, participants were instructed to read the sentences at a natural pace and to be sure they understand what they read. To ensure that participants paid attention to the content of the reading task, half of the target items and one third of the fillers were followed by a yes-no comprehension question. Subjects were instructed to answer the question as quickly and accurately as possible. Feedback was provided when an incorrect response to a comprehension question was given, and subjects were told to take it into account as an indication to read more carefully. No feedback was given in cases of correct answer. Failure to respond within 4 s counted as an incorrect response. Before the start of the experiment, subjects read a short list of practice sentences and comprehension questions in order to familiarize themselves with the task. Each session lasted between 20 and 25 min per participant.

Statistical Procedures

We used the same statistical procedures for all experiments in this study. To control for differences in word length across conditions as well as overall differences in participants' reading speed, a regression equation predicting reading time from word length was constructed for each participant, on the basis of all filler and experimental items (see Ferreira and Clifton, 1986). At each word position, the reading time predicted by the participant's regression equation was subtracted from the actual measured reading time to obtain a residual reading time. The resulting residual reading times are the dependent variable used in all analyses (Tables 2–4 also include raw reading times, to provide a more interpretable scale for the effects).

TABLE 2

Table 2. Mean (standard error) comprehension question performance in percent correct as a function of condition, by subject.

TABLE 3

Table 3. Mean residual RTs as ms/word by participants as a function of condition, for the post-COMP regions in Experiment 1, rounded to units (raw RTs in parentheses).

TABLE 4

Table 4. Mean residual RTs as ms/word by participants as a function of condition in Experiment 2, rounded to units (raw RTs in parentheses).

For all analyses of reading time data, we used linear mixed-effects models (Baayen et al., 2008), and for question-answering data we used a logistic mixed effects model for binary data (Jaeger, 2008). The only fixed effect in our analyses was COMP(lementizer), taking values corresponding to the respective [+interrogative] or [−interrogative] complementizers, with subjects and items entered as random effects. Our constructed models utilized the maximal random effect structure with random intercepts for subjects and items and random slopes for the fixed effect term in subjects and items (Barr et al., 2013). We report p-values based on the likelihood-ratio test, whereby a model containing the fixed effect of interest is compared to a model that is identical in all respects except the fixed effect in question. The p-values are computed by treating the t statistic resulting from linear mixed effects analysis as approximately normally distributed (justified for datasets of our size; see Baayen et al., 2008), as also supported by visual inspection of residual plots which did not reveal any obvious deviations from homoscedasticity or normality. Analyses were performed using the lme4 package (Bates et al., 2014) in R (R Development Core Team, 2011).

Results

Data from five participants were omitted from all analyses because of overall poor comprehension question performance (< 67% accuracy overall). No subjects were removed on the basis of slow overall reading time (>4 standard deviations from the mean across subjects). Consequently, data from 69 subjects were used in subsequent analyses. For these subjects, reading time data from items with incorrectly answered comprehension questions were excluded from the analysis. In addition, residual reading time data points that were greater than three standard deviations from the subject mean were also excluded. This affected around 1.0% of the data overall for this experiment.

Comprehension questions

Overall, comprehension questions following the experimental items were answered correctly in 84% of the trials. The percentages of correct answers for each condition are presented in Table 2. A paired t-test revealed no significant effects [t₍₃₅₇₎ = −0.37, p = 0.71]. To control for item (and subject) variability, we also fit a logistic mixed effects model and obtained similar results of COMP not being a significant predictor for the question response accuracy [ $χ_{(1)}^{2} = 0.07$ , p = 0.7935].

Reading times

For the primary analyses, we treated each of the 12 words within each item as its own region, according to the following schema:

(12)

yes

Figure 1 and Table 3 show average residual reading times for each of the 12 primary regions per condition.

FIGURE 1

Figure 1. Plot of mean (standard error) residual RTs per word by region in Experiment 1.

There was no significant effect of complementizer type in the COMP region [ $χ_{(1)}^{2} = 0.71$ , p = 0.3987]. Since COMP is selected and/or subcategorized by the matrix verb, the absence of variation suggests that the parser is equally likely to expect a [−interrogative] and [+interrogative] complementizer after the selecting verbs. This is in line with the special properties of the verbs we used in our materials: they support both types of subcategorization. This result persists across each of the four verbs used in the stimuli validating the design in terms of balancing different types of subcategorization for the same verb.

As Figure 1 and Table 3 illustrate, the interrogative kdaj sentences were read slower than the declarative da sentences in the post-COMP area until region 9 (Obj). We have defined two aggregation regions in accord with the two different kinds of theoretical predictions considered above. Recall that the first class of theories predicts that the storage costs for wh-adjuncts are distributed more or less in conformity with those for wh-arguments, that is, they are bound to the verb. Thus the first region, indicated by the smaller circle on Figure 1, spans the range of stimuli between Cl2, the first post-COMP element, and V. The second type of theory predicts that the endpoint of storage costs extends beyond the verb, namely, across the VP domain generally. Thus the second aggregation domain, indicated by the larger circle on Figure 1, includes the first and extends further to the direct object phrase, until the first follow up word FU1.

In the first aggregated region spanning the area from Cl2 until V, linear mixed models revealed a main effect of COMP, which however shows up with a marginal significance [ $χ_{(1)}^{2} = 3.365$ , p = 0.0666]. In the second, larger, aggregated region, there is a significant main effect of COMP [ $χ_{(1)}^{2} = 5.3896$ , p = 0.0203], with the relevant portions of kdaj clauses being read about 10 ms/word ± 3.9 ms/word (standard errors) slower than da clauses. We also asked whether there is a main effect of COMP specifically in the direct object area (Det + Obj) differentiating our two aggregated regions. This turned out to be the case [ $χ_{(1)}^{2} = 4.427$ , p = 0.0353], indicating that the slowdown in kdaj clauses persists across this particular area. Finally, regions FU1-FU3 following direct object showed no main effect of COMP, suggesting that there is no significant difference in reading times between the two conditions [ $χ_{(1)}^{2} = 0.08$ , p = 0.7795].

Discussion

There were three main results of this experiment. The first result is that storage costs obtain for the wh-adjunct kdaj “when,” similarly to wh-arguments. This result holds under the assumption that the resource-consuming memory processes relevant for sentence processing involve both storage and integrations into the partially processed structure, shared in some form by most theories of storage costs up to date. Since our compared conditions only differ in the COMP value, they involve the same number of integrations. Therefore, the increased reading times in the kdaj-condition is likely to be attributed to a storage effect. Furthermore, the temporal span of this effect suggests that it is related specifically to processing of kdaj which requires additional memory resources reflected in the reading slowdown.

The second result concerns the observed time-course pattern for the storage cost effect with respect to the predictions of the two classes of theories. If the first class of theories (the filler is associated with the verb) is correct, we should expect a significant difference across the first aggregated region, but not across the second. If the second class of theories (the gap is grammatically defined as following the lowest VP constituent) is correct, then we expect the storage cost effect across the second aggregated region, including the first region as well as the area differentiating the two regions. The results indicate that the latter is the case. We have seen that the difference in reading times persists until the end of the direct object area. Under the direct (feature-) association theories predicting association of the wh-adjunct with the verb, the continuing storage effect after the verb region would remain unexplained. At the same time the observed time-course pattern of the storage cost effect is consistent with the phrase structure theories predicting a gap, or the endpoint of the storage effect, after the direct object.

An alternative interpretation of this result, suggested by a reviewer, might be that the slowdown effect over the direct object area Det+Obj is due to (spill-over) integration costs, rather than storage costs per se. This interpretation would then be consistent with the theories which directly associate the gap with the verb, similarly to wh-arguments, and it would confine storage costs to the pre-V region only. While this possibility cannot a priori be ruled out given the design of Experiment 1, we believe it is unlikely to be the case. Such a scenario would imply that integration of a wh-adjunct filler is just too costly: it persists through a sequence of three items (V+Det+Obj) which takes considerable time (ca. 1.5 s; see Table 3). It is true that wh-adjunct fillers are semantically more complex than wh-arguments (see Section Base/Integration Points of Wh-adjuncts). However, the alleged difficulty appears incommensurable with the relatively simple semantics of when as well as with the general pattern of processing filler-gap dependencies generally. In particular, no spill-over effects have been reported in the previous studies of filler-gap dependencies with wh-arguments (e.g., the ERP study of (Phillips et al., 2005) mentioned in Section Introduction observed a sustained P600 effect ending at the verb, interpreted by these authors as temporary storage cost). In addition, the fact that the post-V slowdown is restricted exactly to the entire direct object area (and not to some point before or after it) would be a suspicious coincidence under the spill-over scenario, whereas it is expected as a storage cost that conforms to the phrase structure-based theories (see above). Given these considerations, we continue to treat the direct object area as part of the relevant storage region.

The third result of Experiment 1, which stems from the second, suggests that the endpoint of the storage costs for a wh-adjunct can be a predictor of its potential gap position, or integration point. This result is important again in light of the grammatical theories that, due to excessive mobility of wh-adjuncts in the syntactic structure, often do not provide reliable diagnostics for their base position. The processing pattern is revealing in cases when the grammatical theory predicts more than one potential base position (see above), as well as in cases when it predicts a gap in a position different from the one found in a processing paradigm. An online processing study thus provides one with an efficient tool to carefully probe for the gap position in such non-trivial examples.

The results of this experiment are suggestive, but cannot be fully generalized because they are based on the processing of a single wh-item. It may be argued that the observed increased storage costs arise because of some specific lexical property of kdaj or, alternatively, because of some effect of interaction between this item and the syntactic structure independent of the storage cost effect. This experiment also raised an issue about the specific pattern of filler storage over the clausal subject regions. In particular, we would like to know whether the drop in the reading times is an idiosyncratic effect that occurs with specific wh-adjuncts, or representative of a more systematic pattern. Slovenian is a language whose grammar allows null/unexpressed subjects, so one could imagine a scenario where the storage cost effect expected over the subject would actually already be encoded over the preceding clitic (Cl2) region, given that this is a verbal clitic morphologically specified with the morphosemantic features of the subject (person, number, gender). Consequently, for instance, under the distance-based theory of storage costs reading the actual subject would not count toward calculating the overall storage costs for the wh-adjunct. Finally, not all wh-adjuncts are created equal. Unlike wh-arguments that are usually NPs with predicted syntactic behavior dictated by thematic considerations, wh-adjuncts may differ dramatically from each other from a syntactic point of view. There is thus an important question as to whether other wh-adjuncts elicit a similar kind of a storage cost effect, possibly correlating with their lexical, syntactic and semantic properties. Experiment 2 addresses these issues.

Experiment 2: English how quickly and why

Experiment 1 was concerned with the wh-adjunct when in Slovenian, which falls under the category of VP-modifying adverbs attached relatively low in the syntactic tree (see Table 1). Experiment 2 uses English materials in a self-paced reading task and takes the investigation of storage cost effects in wh-adjunct dependencies further. The goal of this experiment was threefold. First, we wanted to replicate the Slovenian pattern of storage costs in a language in which storage costs for wh-arguments have been previously investigated in reasonable detail and at present are better understood (see Section The Lexically-based vs. Syntactically-based Views on Storage Costs), with the aim to strengthen the cross-linguistic dimension of our inquiry. English is a natural choice in this regard. Second, now that there are reasons to believe that wh-adjuncts elicit storage costs as much as wh-arguments do, the main question we ask is whether these storage costs correlate with the syntactic base position of a particular wh-adjunct, along the lines outlined in Section Endpoint of the Storage Costs. Thus in Experiment 2 we focused on the comparison between the VP modifier how quickly and the wh-adjunct why. As shown in the syntactic literature, the syntactic behavior of why is quite different from that of VP- modifying adjuncts, and the most robust grammatical evidence for that again comes from English (see Section Storage Cost Predictions for why). Since we wanted to compare the patterns of storage costs for these two modifiers, our corresponding processing predictions can therefore be better grounded in this language.

Yet another goal of Experiment 2 was to more closely investigate the integration point of the wh-adjunct in light of the relevant storage costs. Experiment 1 showed that the parser may have to wait until the direct object is parsed in order to integrate the wh-adjunct. In this respect, we were interested in the role of the Active Filler Strategy as the parser's tendency to fill the adjunct gap as soon as possible. In particular, in cases of complex direct object phrases such as a glass of water, does the parser wait for bottom-up evidence that the end of the direct object constituent has been reached in order to discharge the wh-adjunct, or does it do it as soon as this becomes grammatically permissible - in our example, upon encountering a glass (and not waiting till the end of the direct object to determine whether the phrase is complete)? This question gains particular importance in light of the proposals in the literature that derive the Active Filler Strategy from a requirement to saturate a thematic role of the wh-filler as soon as possible (Pritchett, 1992; Gibson et al., 1994; Aoshima et al., 2004). If the Active Filler Strategy is indeed a thematic-oriented strategy, it should not be relevant in the case of wh-adjunct processing. On the other hand, if the Active Filler Strategy is, in principle, independent of the thematic factor (and may or may not interact with it), then it, or some version of it, should apply in the case of wh-adjunct dependencies also, and the gap should be filled on the first grammatically permissible occasion.

Storage Cost Predictions for how quickly

Syntactically, how quickly is a low, VP-modifying adjunct, and in this property it is similar to kdaj used in Experiment 1. Both items also have a comparable semantic status of event modifiers. Processing-wise, how quickly may be slightly more complex than kdaj because it contains an additional word⁶. For the purposes of this experiment, we will, however, treat how quickly as a single unit (both words were presented simultaneously to participants). Given the results of Experiment 1 using the VP adjunct kdaj, we expect a storage cost effect for how quickly across a range of input extending between the filler and the syntactically determined gap site or endpoint, which would comport with the filler's VP-modifying syntactic status. The presence of a storage effect of how quickly would provide further evidence strengthening the empirical validity of storage costs incurred by VP-modifying adjuncts, both on a cross-item as well as on a cross-linguistic basis.

Storage Cost Predictions for why

Our particular interest in why in the present study is dictated by the growing consensus in grammatical research that the syntactic and semantic status of why is principally different from that of VP-modifying adjuncts like how quickly and when. Specifically, why has different scopal properties, different restrictions on co-occurrence with other wh-phrases, different behavior under syntactic ellipsis, sentential negation and other root phenomena (e.g., Subject-Aux inversion), compared to the other wh-adjuncts. Why also has semantic properties that make it different from the other wh-adjuncts. Whereas the latter are either event or predicate modifiers, why is a functor over an entire proposition (thus a question Why did John leave the room? has some sort of a proposition as the answer, e.g., Because he was hurrying, rather than a predicate modifier such as quickly). As an explanation for this differing behavior, it has been proposed in the syntactic literature that why is base-generated in its surface syntactic position in COMP at the left periphery of the sentence, or in a position very close to COMP (Bromberger, 1992; Rizzi, 2001; Stepanov and Tsai, 2008; Shlonsky and Soare, 2011). This amounts to the claim that why does not instantiate a filler-gap dependency in the usual sense of a long-distance dependency requiring encoding, storage and subsequent retrieval of the wh-filler. Under the standard compositional semantics approach, if why is a functor of propositions, why would then be interpreted as a sister of a syntactic node denoting a proposition, which is consistent with its base-generation at the clausal left periphery.

These syntactic and semantic accounts make a very clear prediction for a psycholinguistic study: if why does not initiate a filler-gap dependency, then there should be no storage effect in the case of why, as opposed to how quickly. All else equal, processing-wise, why is predicted to behave similarly to the complementizer that in a pair of sentences like (13): both expect a proposition afterwards.

(13) a. Peter knows that John fixed the car

b. Peter knows why John fixed the car

We thus expect that how quickly and why will show a contrast in terms of expected storage costs. While how quickly should elicit storage costs similarly to Slovenian kdaj, why is not expected to elicit any additional storage costs compared to the that control. Experiment 2 tests this prediction for English.