Prenuclear L∗+H Activates Alternatives for the Accented Word

Braun, Bettina; Biezma, María

doi:10.3389/fpsyg.2019.01993

ORIGINAL RESEARCH article

Front. Psychol., 24 September 2019

Sec. Psychology of Language

Volume 10 - 2019 | https://doi.org/10.3389/fpsyg.2019.01993

Prenuclear L^∗+H Activates Alternatives for the Accented Word

1. Department of Linguistics, University of Konstanz, Konstanz, Germany
2. Spanish and Portuguese Department, University of Massachusetts Amherst, Amherst, MA, United States

Abstract

Previous processing studies have shown that constituents that are prosodically marked as focus lead to an activation of alternatives. We investigate the processing of constituents that are prosodically marked as contrastive topics. In German, contrastive topics are prosodically realized by prenuclear L^∗+H accents. Our study tests (a) whether prenuclear accents (as opposed to nuclear accents) are able to activate contrastive alternatives, (b) whether they do this in the same way as constituents prosodically marked as focus with nuclear accents do, which is important for semantic modeling, and (c) whether the activation of alternatives is caused by pitch accent type (prenuclear L^∗+H as contrastive accent vs. prenuclear L+H^∗ as non-contrastive accent) or by differences in F0-excursion (related to prominence). We conducted two visual-world eye-tracking studies, in which German listeners heard declarative utterances (e.g., The swimmer wanted to put on flappers) and watched displays that depicted four printed words: one that was a contrastive alternative to the subject noun (e.g., diver), one that was non-contrastively related to it (e.g., sports), the object (e.g., flappers), which had to be clicked, and an unrelated distractor. Experiment 1 presented participants with two naturally produced intonation conditions, a broad focus control condition with a prenuclear L+H^∗ accent on the subject and a contrastive topic condition with a prenuclear L^∗+H accent. The results showed that participants fixated more on the contrastive alternative when the subject was produced with an L^∗+H accent, with the same effect size and timing as reported for focus constituents. Experiment 2 resynthesized the stimuli so that peak height and F0-excursion were the same across intonation conditions. The effect was the same, but the time course was slightly later. Our results suggest that prenuclear L^∗+H immediately leads to the activation of alternatives during online processing, and that the F0-excursion of the accent lends little. The results are discussed with regard to the processing of contrastive focus accents and theories of contrastive topic.

Introduction

In intonation languages, utterances may be produced with a series of pitch accents, i.e., tonal targets or movements that are associated with the stressed syllables of accented words, see Example (1) – stressed syllables are underlined.

The utterance in (1) is produced as one intonation phrase (IP), i.e., without further phrasing. The last accent in an IP (or intermediate phrase, in languages that assume two layers of intonational phrases in the prosodic hierarchy, cf. Pierrehumbert, 1980; Grice et al., 2005) is called the nuclear accent. As detailed below, the nuclear accent has received particular attention in the prosodic, semantic, and processing literature. Particularly relevant for this paper is the finding that nuclear accents with certain pitch accents make alternatives more accessible (Weber et al., 2006; Ito and Speer, 2008; Braun and Tagliapietra, 2010; Husband and Ferreira, 2012, 2016; Gotzner et al., 2013; Gotzner, 2014). That is, listeners think of concepts that are contrastively related to the word bearing certain types of nuclear accent (see below), which results in priming effects and more fixations to contrastively related words or objects. Within the semantics/pragmatics literature, it is argued that nuclear accents determine the information structural category of a constituent as focus (shorthand “F”), where a focus constituent is a constituent that evokes alternatives relevant for interpretation (Rooth, 1985, 1992; Krifka, 2008).¹ In this study, we deal with prenuclear accents that signal a contrastive topic interpretation as in the German example in (2) and test whether these accents activate alternatives and if so, whether they do so in the same way as nuclear accents do [unless otherwise indicated, the label contrastive topic and the shorthand “CT” is used to refer, descriptively, to constituents with a special prosody that forces a particular interpretation, spelled out by the optional follow up between parenthesis in (2) and whose prosodic features are not of our concern here].

In short (see below for a more detailed discussion on the semantics and interpretation of contrastive topic utterances within the semantics and pragmatics literature), this special prosody (CT-prosody) indicates that the speaker decided to first say something only about a subset of the salient domain, e.g., only about the boys in (2), while there are other (contrastive) entities that s/he is not saying anything about.² One question this paper tries to shed light on is the status of CT within information structure, i.e., whether this prosody identifies constituents as belonging to a basic notion of information structure (CTs would be then taken to encode a different information category from, e.g., focus), whether it is related to focus, or whether there is no need of an additional information category and focus can also cover these cases. This links the question of CTs as a (possible) notion of information structure with the question about what prosodic cues are used to activate alternatives, in terms of pitch accent type and phonetic realization.

In the remainder of the introduction, we first review the current state-of-the-art on the processing of nuclear vs. prenuclear accents (section “Nuclear vs. Prenuclear Accents”). We then turn to the concept of contrastive topics (section “Theories of Contrastive Topics”); they are interesting because they can be realized with a prenuclear accent in German, L^∗+H, and because contrastive topics are claimed to trigger contrastive alternatives as well. In section “Intonational Realization of Contrastive Topics,” the prosodic realization of contrastive topics is reviewed, first for English, then for the target language German. It is shown that the contrast between contrastive and non-contrastive topics in German is realized on a continuum between L^∗+H and L+H^∗, with more acustically salient prosodic characteristics in contrastive than non-contrastive contexts, but that German listeners prefer prenuclear L^∗+H in contexts that trigger a contrastive topic reading. In section “Outline and Hypotheses” we put forth the hypotheses regarding the activation of alternatives.

We then present two visual-world eye-tracking paradigm studies (Cooper, 1974; Eberhard et al., 1995; Tanenhaus et al., 1995), one with naturally produced contours – Experiment 1 – and one with resynthesized contours – Experiment 2 – to investigate four research questions, (a) whether subject constituents that are prosodically marked with prenuclear L^∗+H lead to more fixations to a contrastive alternative than those marked with prenuclear L+H^∗, (b) whether the fixation differences occur immediately while the constituent is processed and can hence be attributed to the pitch accent realization, (c) whether there is a difference in fixation pattern between contrastive topic and focus constituents, and (d) whether the activation of alternatives is caused more by pitch accent type or by its phonetic realization (in particular peak height and F0-excursion, which are related to perceived prominence). The answers to these questions will further our understanding on the role of prenuclear accents during speech comprehension, will allow us to contribute to the discussion regarding how to best formally model contrastive topics and overall to the discussion of the taxonomy of information structural categories, and to clarify the role of phonology and phonetic implementation in the activation of alternatives.

Nuclear vs. Prenuclear Accents

The terms nuclear and prenuclear accent stem from the British School (e.g., Halliday, 1967a; Crystal, 1969; O’Connor and Arnold, 1973). In the nowadays dominant framework of autosegmental-metrical phonology (Pierrehumbert, 1980), all pitch accents have the same status. The difference between nuclear and prenuclear accent lies in their distribution in the utterance: nuclear accents form the head of the prosodic phrase and typically occur before a phrase break (intermediate phrase break in case there is one in the intonational phonology of the language, else intonation phase break), i.e., they are the last accent in the phrase. Prenuclear accents precede nuclear accents in the same phrase. In Example (3), if produced as a single phrase, there are hence two prenuclear accents (H^∗ and L+H^∗) and one nuclear accent (L+H^∗), followed by a low boundary tone (L−L%).

Nuclear accents have a number of interesting properties. First, they are more prominent to the listener than prenuclear accents, possibly owing to their special structural position. It has been shown that, if a prenuclear and a nuclear accent in the same phrase have the same F0-excursion, the nuclear accent sounds more prominent than the prenuclear one (e.g., Terken, 1991; Gussenhoven et al., 1997), see also Baumann and Winter (2018) for more recent evidence on German. Conversely, a nuclear accent needs less F0-excursion to be perceived as equally prominent as a prenuclear accent. Second, nuclear accents can signal focus and focal information is memorized better (Birch and Garnsey, 1995). Third, in terms of meaning contribution, the choice of nuclear accent type is claimed to signal differences in information status, i.e., whether a referent is new or accessible (e.g., Kohler, 1991; Baumann, 2006), focus location and domain (Eady and Cooper, 1986; Eady et al., 1986; Birch and Clifton, 1995; Baumann et al., 2006; Breen et al., 2010), illocution type (Braun et al., 2018b), as well as attitudinal information, such as sarcasm (e.g., Lommel and Michalsky, 2017).

The past approximately 20 years have accumulated knowledge on how nuclear pitch accents are processed online as the utterance unfolds over time (Dahan et al., 2002; Weber et al., 2006; Chen et al., 2007; Ito and Speer, 2008; Watson et al., 2008; Dennison and Schafter, 2010; Esteve-Gibert et al., 2016; Husband and Ferreira, 2016). In a frequently cited study, Dahan et al. (2002) investigated the effect of accentuation on reference resolution using the visual world eye tracking paradigm. Participants heard two instructions: In the first instruction, they were asked to move an object in a display (e.g., the candle in Put the candle above the triangle); according to a second instruction they had to move either the same object again (candle) or a lexical cohort competitor (candy). Object and competitor were either accented (nuclear H^∗ or L+H^∗, which was not controlled) or unaccented, resulting in four conditions. The results showed that before the cohort competitors were disambiguated segmentally, participants fixated the competitor candy more when the noun was accented, suggesting that listeners immediately exploited the relation between pitch accents and discourse structure for reference resolution. Notice that in Dahan et al. (2002) the experimental contrast was between a nuclear accent vs. no accent at all, which is a very prominent intonational contrast. Later studies have also shown that listeners are sensitive to smaller accentual contrasts, i.e., those between different types of nuclear accents (Chen et al., 2007; Watson et al., 2008). Moving from discourse effects to the immediate processing of pitch accents, Braun et al. (2018a) recently used the visual-world eye-tracking paradigm to test whether pitch accent type directly affects the fixation of contrastive alternatives, without an explicit context. In Experiment 1a of Braun et al. (2018a), German listeners heard declarative utterances (e.g., The swimmer wanted to put on flappers) and watched displays that depicted four printed words: one that was a contrastive alternative to the subject noun (e.g., diver), one that was non-contrastively related (e.g., sports), the object that had to be clicked (e.g., flappers), and an unrelated distractor. That experiment compared a nuclear L+H^∗ accent on the subject [indicating that the subject was in focus, see Example (4)] to a prenuclear L+H^∗ on the subject with a later nuclear accent on the object noun [indicating that the subject was part of a broad focus constituent, see Example (5)].

The results showed that participants directed more fixations to the contrastive alternative when the subject was realized with a nuclear L+H^∗ accent [Example (4)] than when it was realized with prenuclear L+H^∗ accent [Example (5)]. When the utterances were presented with a nuclear H+L^∗ accent on the subject, an accent suitable to mark accessible information (Baumann and Grice, 2006), there was no difference in fixations compared to the prenuclear L+H^∗. Also, there were no differences in fixations to the visually presented non-contrastive associate (e.g., sports). To account for the asymmetric fixation patterns for contrastive and non-contrastive associates, the authors argued against a priming account by which all kinds of related words are more strongly activated when the word is realized with a prominent nuclear accent (L+H^∗). Instead, they concluded that the fixation data are better captured by the contrast in the semantic/pragmatic import of the two complex accents: the nuclear L+H^∗ accent evokes contrastive alternatives while nuclear H+L^∗ does not. Because there were differential results for the two nuclear accents L+H^∗ and H+L^∗, such that nuclear L+H^∗ did and nuclear H+L^∗ did not activate alternatives compared to prenuclear L+H^∗, the authors argued that the fixation differences cannot be due to the status of the accents alone (nuclear vs. prenuclear), but to their interpretations.

Let us briefly discuss an alternative interpretation for the findings in Braun et al. (2018a), which will be addressed in more detail in this paper: the role of perceived prominence. According to e.g., Mixdorff and Widera (2001), accents with a higher peak are judged as more prominent in German (cf. Ladd and Morton, 1997 for English); this effect may not be due to peak height alone, but due to the increased F0-excursion of the tonal movement, as Gussenhoven (2002) pointed out: “[m]any perception experiments […] have shown that higher pitch peaks sound more prominent, everything else being equal. Interestingly, the effect is not simply due to peak height. Rather, it is an estimate of how wide the pitch excursion is, given some choice of pitch register, and the listener’s impression therefore results from an estimate of the pitch span in relation to some choice of pitch register” (Gussenhoven, 2002, p. 50). In the materials of Braun et al. (2018a), the nuclear accents L+H^∗ and H+L^∗ both had a higher peak and a larger F0-excursion than the prenuclear L+H^∗ in the control condition: on average 9 semitones (st) for nuclear accents vs. 5st for the prenuclear accent. So pure peak height or F0-excursion cannot explain the fixation data in Braun et al. (2018a) either. However, we also know that pitch accent type matters for perceived prominence: Baumann and Röhr (2015) tested the prominence of a range of nuclear accent types that followed a prenuclear H^∗ accent. Their findings showed that L+H^∗ (with a F0-excursion of 5st) was judged most prominent, followed in prominence by L^∗+H (also 5st) and H^∗ (1.2st), all with ratings above 70 on a scale from 0 to 100 (from least to most prominent). H+L^∗ (with an F0-excursion of 6st), the accent that did not result in fixation differences compared to prenuclear L+H^∗ in Braun et al. (2018a), was judged to be less prominent (average prominence rating: 58), despite of its larger F0-excursion compared to nuclear L^∗+H and L+H^∗ accents. Prenuclear accents were not included in the prominence study by Baumann and Röhr (2015). In a more recent experiment, Baumann and Winter (2018) used the rapid prosody transcription task (Cole et al., 2010) and tested more varied sentence materials and also prenuclear accents. Their data showed that prenuclear accents were less often judged prominent than nuclear accents, but accent type and position (prenuclear/nuclear) were not orthogonally varied so it is not clear whether there is an interaction between the two factors. The perceived prominence of an accent may hence contribute to the activation of alternatives. This is in line with Calhoun (2009) who argued that the more phonetically prominent an accent the more likely a contrastive interpretation. We address the issue of prominence in the activation of alternatives in Experiment 2.

Prenuclear accents have generally been somewhat neglected in the semantic and processing literature, except for studies on their phonetic realization (e.g., Arvaniti et al., 1998; Atterer and Ladd, 2004). Semantically, prenuclear accents have been described as ornamental (Büring, 2007), serving a mostly rhythmic purpose (Calhoun, 2010; Chodroff and Cole, 2018). In a learning paradigm, Kapatsinski et al. (2017) showed that listeners focus more on the nuclear contour and largely ignore the prenuclear accents (cf. Roettger and Cole, 2018 for higher accuracy for whole contours and nuclear tunes compared to prenuclear accents in an artificial language paradigm). Prenuclear L^∗+H accents may be an exception as this accent type is very prominent as a nuclear accent (Baumann and Röhr, 2015) and its inherent prominence may be used to trigger a CT-reading. This is the accent of interest in the present study.

Theories of Contrastive Topics

There are different theories on how CTs are formalized. While (6) illustrates what is identified as contrastive topic constructions in the literature (which assumes a specific prosody that will be reviewed below for English and German), researchers differ on what they take contrastive topics to be and how they are interpreted. We overview the differences between the alternative approaches in (6a–c) below.

All researchers agree that the interpretation of the answers in (6), with the special prosodic features discussed below, can be paraphrased along the lines of “as for the boys, they played hockey” (following Jackendoff, 1972). In these utterances, the boys is what is called the contrastive topic constituent while hockey is the sentence’s (narrow) focus. However, researchers disagree on how we arrive at such an interpretation and on how many basic notions of information structure are necessary to model it (ultimately disagreeing on the taxonomy of information structural categories). These differences are what the contrast in (6a–c) tries to represent (we elaborate on these differences below). The results in this paper won’t allow us to discard any of the formal approaches to CT-constructions altogether, but they will allow us to critically evaluate different implementations of such approaches and narrow down the possibilities. On this respect, this paper tries to contribute to a discussion regarding how empirical investigations can inform formal and pragmatic modeling of CT-phenomena and narrow down the landscape. The hope is that future work will continue this discussion. We proceed below to evaluate the different formal approaches.

There are roughly two main camps in the formal semantics and pragmatics literature on contrastive topics (see also Constant, 2014 for an overview): those approaches that appeal to an independent notion of topic (syntactically, semantically or pragmatically defined) and that argue that a contrastive topic is a topic that contrasts with other topics (see Molnár, 1998; Vallduví and Vilkuna, 1998; Steedman, 2000; Krifka, 2008), and those who do not appeal to any independent notion of topic to understand contrastive topics (see, e.g., Gyuris, 2002, 2009; Büring, 2003, 2016; Tomioka, 2010a, b; Constant, 2014). In fact, a related question in the literature is whether CTs are basic notions of information structure or not. The discussion on CTs is part of a larger debate regarding the taxonomy of information structural categories. For some authors (see, e.g., Krifka, 2008) CTs are topic constituents containing focus (focus being a basic notion of information structure while the status of topic not being that clear). For others (see, e.g., Büring, 2003, 2016) CTs are a basic notion of information structure on their own. Finally, there are others (see, e.g., Tomioka, 2010b; Wagner, 2012; Constant, 2014) for whom CTs are just focus constituents. We provide a brief overview of these approaches and how they differ, and we hope that the sketches below can illuminate the discussion of the empirical results presented in this paper and how they contribute to the discussion of how to best formally model CTs. For the sake of concreteness, we focus below on Krifka (2008) as a representative of theories appealing to independent notions of topic to understand CTs, (6a). We dub this the focus within topic approach. We then sketch Büring (2003, 2016) and Constant (2014) as proposals in which understanding CTs does not require an additional notion of topic. These two proposals crucially differ on considerations regarding whether the taxonomy of information structural categories needs to contain both CT and focus (Büring, 2003, 2016), (6b), or whether the notion of focus is enough (Constant, 2014), (6c). We identify these last two approaches by the name of their respective proponents.

Let us start the discussion with the focus within topic approach as spelled out in Krifka (2008). Contrastive topics in Krifka (2008) are taken to be cases of aboutness topics containing an element marked as focus. In this approach to CTs we need both a notion of topic independently defined and a notion of focus. In Krifka’s view, the topic constituent is the constituent in the sentence identifying the entity or set of entities under which the information expressed should be stored in the common ground (understood in Stalnakerian terms as the information accepted by participants for the purpose of the conversation). This notion of topic is the notion of aboutness topic in Strawson (1964), Halliday (1967b), Reinhart (1981), Gundel (1988), Klein (2008) and goes together with a “structured” view of information update: when accepting the information communicated in an utterance we store it with respect to the topic entity, i.e., we identify the constituent in the utterance that is encoding what the utterance is about, the topic, and the constituent that is encoding what is being said about such entity, the comment, and store that for the given topic the comment has been predicated (this is, e.g., equivalent to the “link” in Vallduví and Engdahl, 1996). In the example in (6a), this would amount to identifying the kids as the topic and being able to organize information storage in such a way that we can store a bulk of information specifically about the kids. In particular, in (6a) we are asked to add the information that they played hockey. As for focus, in Krifka’s approach a focus element (where focus is a basic notion of information structure) is an element that evokes alternatives relevant for the interpretation [very much the proposal put forward in Rooth (1985), which is also the notion of focus in Büring (2003, 2016) and Constant (2014)]. CTs are then a combination of aboutness topic and focus. In the case of CTs the alternatives that are evoked are alternative topics, i.e., CTs are topics that contrast with other topics (Krifka, 2008, p. 45). Summing up, CT-interpretations are then arrived at by identifying a constituent as being the utterance’s aboutness topic and factoring in that it contains focus. This is what we will call the focus within topic account. In terms of processing, this view of contrastive topic is compatible with two formal implementations reflecting two processing procedures. One possibility is that conventional linguistic cues (in this case prosodic cues) could both identify a constituent as being the aboutness topic and as containing focus. In this approach the interpretation of the utterance as a contrastive topic would take place online. The other possible implementation involves arriving first at a complete syntactic analysis of the utterance (together with the information-structural analysis) to be able to identify the utterance’s aboutness topic and that the focus constituent is indeed within the topic. In this implementation contrastive topics are not processed online.

Let us see how this proposal differs from proposals in which the notion of CT does not depend on an independent notion of topic.³Büring (2003, 2016) and Constant (2014) share important features regarding the interpretation of CTs. The interpretation of the sentences in (6b–c), assuming the special prosody discussed below, can be more precisely paraphrased as “as for the boys, they played hockey; the others, I’m not saying (because either I don’t know or because I don’t want to say).” Büring (and much subsequent work including Constant’s) follows the literature on formal discourse models (most importantly Roberts, 1996) and assume that utterances are embedded in a particular discourse structure, where discourse is a hierarchical order of moves organized around (implicit) questions that participants agree on addressing (discourse is a communal inquiry). The assumption in this approach is that “all that is given at the sentential level, conventionally, are certain sorts of presuppositions about the place and function of the utterance in the [intentional structure] of the discorse in which it occurs” (Roberts, 1996, p. 2). Following Rooth (1985), this literature takes focus to be one of the main conventional clues to link the utterance to discourse,⁴ since the focus structure of a particular utterance triggers the presupposition that there is a particular question open in the context that is being addressed (i.e., focus anaphora to a contextual question). That this is the case can be illustrated with question-answer pairs. The utterance in (7a) can be the answer to the spelled out question in (7), but (7b) can’t. The idea in focus theory is that even when the question is not spelled out, the focus structure allows us to identify what question the speaker is answering: (7a) and (7b) presuppose a different question in the context/discourse [the utterance in (7b) presuppose a question of the form who drinks coffee?].

In this line of work, the utterance with CT-prosody presuppose a complex question: CT-utterances are analyzed as a partial answer to a (implicit) general complex inquiry of the form, e.g., who did what? The responses to the question in (6), assuming the specific prosody, signals that the speaker is resolving only a sub-issue (e.g., what did the boys do? in the running example) while s/he is leaving un-answered other contrastive sub-issues (e.g., a contrastive (implicit) question of the form what did the girls do?) that should be addressed to provide a complete answer to the complex question.⁵ In this way, the speaker is offering only a partial answer to the more general question. Considering that the question that speakers address in the discourse is the topic of conversation, Büring rightfully calls these utterances (as containing) contrastive topics (they address a (sub)-topic that contrasts with other topics). What differs between Büring’s and Constant’s work is how we arrive at this partial-answerhood interpretation. In Büring’s system (e.g., Büring, 1997, 2003, 2016), prosody reflects a specific marking in the syntax, CT-marking, see (6b), that comes with its own interpretational rules and lead to the right semantic interpretation (crucially, this marking is different from F(ocus)-marking in the Roothian sense and, hence, CT and focus are taken to be two independent notions of information structure). In Constant’s (2014) proposal (see also Tomioka, 2010b; Wagner, 2012), on the other hand, CT-phrases are no more than a F-marked phrase (in the Roothian sense) with special instructions regarding how the evoked alternatives enter into the semantic computation, see (6c).⁶ In Constant’s system, contrastive topic is not an independent category of information structure. Contrastive topic constituents are just focus constituents (i.e., F-marked constituents in the Roothian sense) plus some instructions regarding how the evoked focus alternatives are to be handled in the interpretation.⁷ In this sense, Constant’s proposal offers a simpler ontology of information structure categories.

What are the predictions made by these two theories? As said, Büring considers CTs as an information-structural category on their own. This alone may predict a different prosodic realization from F-phrases (the special prosody found in CTs would mark its status as a different information structural category). Notice, however, that in Büring’s theory the alternatives evoked by F-marked phrases and CT-marked phrases are different: syntactic F-marking evokes alternative propositions while syntactic CT-marking evokes alternative questions. In Constant’s system CTs are focus phrases. This approach hence makes the prediction that CT-phrases evoke alternatives in the same way as F-phrases do. Constant’s theory also makes predictions regarding the different prosodic realizations found in CT-phrases and F-phrases by virtue of their syntax. CT-phrases in Constant’s system are taken to be in the left periphery, either because they are moved there or because they are generated there, and it is this syntax that is responsible for the special prosody. How do we choose between the two systems? In what follows we sketch our reasoning in this paper.

The empirical investigation presented in this paper is related to how alternatives are activated in CT-constructions in contrast to what we find in narrow focus. That the alternatives that are evoked in CTs are different from those in narrow focus constructions (e.g., alternative propositions vs. alternative questions, as in Büring’s system) does not warrant a prediction that we should observe differences in the way alternatives are evoked/activated in contrastive topics constituents and focus constituents but, if we did observe such difference, we may consider it as partial support for contrastive topics being different from focus (against Constant’s proposal). At the same time, if there is no difference between how alternatives are evoked in contrastive topics constituents and focus constituents, we would lack support for a system that considers contrastive topics different from focus. That is, everything else being equal, if we are to choose between two systems, one simpler than the other, we need arguments to support that the more complex system is justified, e.g., in terms of processing. One way to do that is by showing that the way alternatives are evoked for contrastive topic and focus is different, explaining why we need two different information structural categories (cashed out formally in a different syntactic marking and interpretational mechanisms). If two models can derive the same results, in the absence of support for a more complex model we shall prefer the simpler approach.

Regarding how alternatives are evoked in CTs we investigate whether, as in the case of focus, alternatives are evoked online. Both (6b) and (6c) are compatible with alternatives being evoked online. However, for (6a) we saw that there are different possible implementations. The analysis is compatible with alternatives being evoked as soon as the accent is processed (online processing), but it is also compatible with late activation, once the listener has already assigned a syntactic analysis of the constituent as topic.

All proposals depicted in (6) predict that there is a difference between the answers in (6) and (8). Given the provided context-question, an exhaustive (neutral) answer⁸ is not expected to have the same prosodic marking as the CT-utterance in (6).

An important question addressed in this paper concerns the way we process utterances triggering CT-interpretations and whether this differs from the processing of focused constituents.

Intonational Realization of Contrastive Topics

Since utterances with contrastive topic and focus constituents and broad focus utterances can have the same (surface) structure [see Examples (6) and (8)], when heard out of context, it is the intonational realization that distinguishes the interpretation of the grammatical subject as contrastive topic or focus or neither. Contrastive topics are often realized with different pitch accents from focal constituents. In English, Jackendoff (1972) described the prosodic realization of contrastive topics in English as B-accents (falling-rising contours) and foci as A-accents (falling contours), see Example (9). In the autosegmental-metrical framework, the B-accent contour is a complex phenomenon, represented as L+H^∗ L−H% (authors also consider L^∗+H as a possible complex accent for CT-phrases, see, e.g., Constant, 2014), while the A-accent contour is equivalent to H^∗ L−L%, the prosodic realization of an exhaustive focus in English.

In German, however, contrastive topics are realized with a prenuclear rising L^∗+H accent, while the (exhaustive) focus is realized as falling nuclear accent (Féry, 1993, p. 131). Unlike in English, there is typically no IP break between the contrastive topic and focus constituent (and hence there is no L−H% boundary tone). In German, the contrastive topic and the focus constituent are often produced in the same prosodic phrase. It is also often argued that the F0 contour between the rising accent on the contrastive topic and the fall on the focus remains high, resulting in the so-called hat pattern (originally described for Dutch by Cohen and ’t Hart, 1967)⁹. This realization is exemplified in Example (10), using the prosodic notation of the GToBI, German Tone and Break Indices, system (Baumann et al., 2001; Grice et al., 2005). German hence marks contrastive topics with a prenuclear accent. The prenuclear accent is prototypically an L^∗+H, an accent that is judged as one of the two most prominent accents when placed in nuclear position (Baumann and Röhr, 2015). The nuclear accent on the focus constituent, H^∗, is one that is not judged very prominent.

Experimental studies with identical sentences in different information structures showed that the prosodic difference between utterances identified as triggering a CT-reading [Example (10)] and those lacking this interpretation is not categorical (Braun, 2005, 2006, 2007). Instead, contrastive topics are typically realized with a later and higher peak and longer duration than the prenuclear rise in utterances without CT-interpretations. The hat pattern is not mandatory either. From the listeners’ perspective, while the prosodic contrast in the prenuclear accent in CT- and non-CT utterances is not necessarily categorical, prenuclear L^∗+H is interpreted as contrastive topic, prenuclear L+H^∗ is not. This was shown in a binary forced-choice context-matching experiment, in which participants received a written context (e.g., ‘Jetzt geht es um einen Sohn und eine Tochter. Der Sohn beschäftigt sich mit Latein und…’ “The next story is about a son and a daughter. The son is occupying himself with Latin and…”) and heard a target sentence (Die Tochter beschäftigt sich mit Mathe. “The daughter is occupying herself with mathematics.”) in one of eight conditions, manipulating prenuclear accent type (L^∗+H vs. L+H^∗), nuclear accent type (H^∗ vs. H+L^∗) and the F0-transition between prenuclear and nuclear accent (high plateau vs. dip). The highest acceptance came from utterances with a prenuclear L^∗+H accent and a nuclear H+L^∗ accent, while the F0 transition between the two did not matter (81.6% for the high plateau, 89.3% for the dip). It is interesting to note that the preferred focus accent in CT-constructions in German is nuclear H+L^∗, an accent type that is not judged particularly prominent in Baumann and Röhr (2015). In a context that did not trigger a CT-interpretation (CONTEXT: Die Tochter beschäftigt sich mit Mathe. “The daughter is occupying herself with mathematics.”, TARGET., weil sie morgen eine Klausur schreibt. “… because she will have a test tomorrow.”), participants gave highest agreement to contours with a prenuclear L+H^∗ accent on the subject and a nuclear H^∗ accent, irrespective of the F0 transition (69% for the high plateau, 68% for the dip). Given all these results, we will use a prenuclear L^∗+H accent on the subject constituent and a nuclear H+L^∗ accent as focus for the CT-condition in the experiments reported below. Since the F0 transition between the prenuclear and nuclear accent did not have an influence on perception, we stuck to one pattern, the hat pattern, which was more natural for the speaker. Regarding the phonetic implementation of prenuclear accents, offline acceptability studies have shown that participants find prenuclear rising accents with higher peaks more appropriate in contexts that triggered a CT-interpretation, accents with later but lower peaks were less acceptable but more appropriate than rises with earlier and lower peaks. In unmarked all-new contexts (Braun, 2004, 2005), there was no preference. Note that prenuclear L^∗+H has also been reported as neutral prenuclear accent in Truckenbrodt (2002), who analyzed a not further specified sample of Southern German and Austrian speakers.

Outline and Hypotheses

While the interpretation of the CT-constituent is often linked to contrast and some theories even link the CT-constituent directly to focus (see discussion above) this has not been supported by empirical findings in the literature yet. If CT-constituents were shown to activate alternatives, this would be the first demonstration that CT is processed like focus and that certain types of prenuclear accents (in addition to nuclear accents) have the potential to do so. Furthermore, depending on how this activation compares to the activation of alternatives found for utterances with narrow focus, the findings could provide empirical support to theories linking CT to focus in its treatment.

We use the visual-world eye-tracking paradigm with printed words (McQueen and Viebahn, 2007), which allows us to study the processing of contrastive alternatives without interference from visual relatedness (Huettig and McQueen, 2007). For the sake of comparability, we closely replicate Experiment 1a in Braun et al. (2018a), see examples (4) and (5). In Experiment 1 in this paper we compare two intonation conditions, naturally produced prenuclear L^∗+H (contrastive topic, CT, condition) to naturally produced prenuclear L+H^∗ (broad focus control condition). We measure participants’ fixations toward these referents while they process utterances in the two intonation conditions. A higher number of fixations to the contrastive associate in the contrastive topic compared to the control condition is interpreted as increased activation of the contrastive alternative in the contrastive topic condition. Note that the term “activation” is understood here as shorthand for “consider as lexical or conceptual alternatives,” In Experiment 2, we manipulate the intonation contours (PSOLA resynthesis) to reduce phonetic differences between contours.

Based on the semantic literature and the available processing data, we pose the following hypotheses on the activation of alternatives. The literature reviewed above results in a number of conflicting hypotheses on the role of prenuclear accents in processing (H1), on the comparison of contrastive topics accents and focus accents (H2), and on the role of F0-excursion of an accent for the activation of alternatives (H3). In what follows, we briefly lay out the possible hypotheses and advance some possible points of contention working against them.

H1.
The available processing literature suggests that prenuclear accents are not processed as deeply (semantically) as nuclear accents. From that perspective, one would expect no differences in fixations between prenuclear L^∗+H and prenuclear L+H^∗. However, since prenuclear L^∗+H has the potential to signal CT-constituents (among other things), we predict that prenuclear L^∗+H leads to more fixations to the contrastive associate than prenuclear L+H^∗ accents.
H2.
Given that prenuclear L^∗+H leads to a CT-reading, according to semantic/pragmatic proposals we predict that this accent has the same potential to activate contrastive alternatives than the nuclear L+H^∗ focus accent of Experiment 1a in Braun et al. (2018a). If CT equals focus, we expect a similar effect size and a similar timing as for the focus data of Experiment 1a in Braun et al. (2018a).
H3.
If a large F0-excursion is the decisive factor for the activation of alternatives, we predict that the fixation difference disappears when using resynthesized stimuli with the same F0-excursion of the rise for prenuclear L^∗+H and L+H^∗. These two accents did not differ in perceived prominence in Baumann and Röhr (2015) in nuclear position, where they had the same F0-excursion. If the interpretation of the accent type that is relevant, we hypothesize the same fixation differences between prenuclear L^∗+H and prenuclear L+H^∗ with resynthesized stimuli.

Hypotheses H1 and H2 are tested in Experiment 1, hypothesis H3 mainly in Experiment 2. Note that the experimental results with respect to H1 and H2 will allow us to discuss the different semantic/pragmatic formal theories in view of the psychological reality of contrastive topics.

Experiment 1

Methods

Participants

Forty native speakers of German between 19 and 33 years (average 25.7 years) participated for a small fee. Twenty-eight were female, 12 male. They were unaware of the purpose of the experiment and had not taken part in experiments involving similar materials. All participants reported to have normal hearing and had normal or corrected-to-normal vision. Written informed consent was obtained.

Materials

Sentences and visual displays

The experiment used the same sentence materials and displays as in Braun et al. (2018a). There were 24 experimental sentences and 24 filler sentences. All experimental sentences started with a subject-NP (see Table A1 in the Appendix), followed by a disyllabic auxiliary (wollte “wanted to”, hatte “had”, konnte “could”, and sollte “should”), an object noun and a non-finite verb (Der Turner hatte Blasen bekommen “The gymnast had gotten blisters”). Most of the subject-referents had penultimate stress and between two and four syllables. None of them had ultimate stress. The filler sentences were similar to the experimental sentences and also started with a definite subject-NP followed by a disyllabic auxiliary. However, they occassionally contained disyllabic verbs and temporal adverbials.

The words for the display in experimental trials had been selected as follows. For each of the subject nouns, there was one noun that was contrastively related and one that was non-contrastively related. The non-contrastive associate was collected in a free association task. Participants saw one noun at a time (e.g., gymnast), printed on screen, and had to type in the first word that came to their mind (e.g., sports). Due to this procedure of collecting highly active non-contrastive associates, these associates do not all have the same relation to the auditory target, i.e., some stand in a hyponym-hyperonym relation, others in a part-whole relation or refer to a typical instrument or location. While the hyponyms and hypernyms would qualify as replacements for the auditory target, the part-whole relations do not. It was not possible, however, to find enough non-contrastive associates with the same relation to the target. To collect the contrastive associate, participants saw a sentence fragment with a negated subject noun (e.g., “Not the gymnast had gotten blisters but the…”) and had to type in the most plausible continuation. For both the contrastive and the non-contrastive associates we chose the most frequent responses making sure that they differed from each other, were not onset competitors and had similar word lengths and lexical frequencies (factors that are known to affect fixation behavior, cf. Dahan et al., 2001; Kliegl et al., 2004). The average association strength, lexical frequency and number of characters of the selected contrastive and non-contrastive associates were matched, see Table 1. Each experimental trial showed the contrastive and non-contrastive associate, the grammatical object that had to be clicked as well as an unrelated distractor. The four words in any given experimental trial differed in onset letters.

TABLE 1

	Contrastive associate	Non-contrastive associate
Association strength (percentage)	30.3 (SD = 14.9)	27.9 (SD = 16.6)
Lexical frequency (occurrences per million)	1.5 (SD = 2.1)	4.6 (SD = 5.5)
Number of characters	6.8 (SD = 1.5)	5.9 (SD = 1.9)

Average association strength, lexical frequency and number of characters (and standard deviations) of contrastive and non-contrastive associates to the subject nouns.

In filler trials, the display showed the contrastive associate, the grammatical object that had to be clicked, a word that was non-contrastively related to the object and an unrelated distractor. In filler trials, the four words also differed in onset letters.

Recordings

The control condition (see Figure 1) and the fillers were the same as in Braun et al. (2018a). The experimental utterances (CT condition) were recorded anew, by the same female speaker of German under the same conditions (44.1 kHz, 16 Bit), see Figure 2. All sentences in the experiment were preceded by the prelude Und ich habe gehört “And I have heard,” to increase the preview time for the words in a natural way. This prelude was recorded once and spliced in front of all sentences with a pause of 1000 ms in-between.

FIGURE 1

FIGURE 2

Acoustically, prenuclear L^∗+H (contrastive topics) differed from prenuclear L+H^∗ in that they had a significantly later alignment of the L and H targets, a larger F0-excursion, and a longer duration of the stressed syllable, of the F0-rise and the entire subject-NP compared to prenuclear L+H^∗. The mean values and standard deviations for each of these measurements in the two intonation conditions are listed in Table A2 in the Appendix. The sound files are availabe at Supplementary Data Sheets S1–S3.

Procedure

Intonation condition was manipulated as a within-subjects factor (but for every participant between-items), i.e., each participant saw all of the 24 experimental trials, but each target sentence was presented in only one of the two intonation conditions (totaling in 12 trials for each intonation condition). Across the experiment, the position of each of the different types of printed words was balanced (i.e., it occurred equally often in the upper left and right, lower left and right parts of the screen).

Two basic experimental lists were constructed, following a Latin Square Design. Each list further contained all the filler sentences. The two basic experimental lists were pseudo-randomized four times with the restriction of at most three experimental trials in a row (but at most two of the same intonation condition). After each block of five trials, an automatic drift correction was initiated. In total, we had eight experimental lists, to which participants were randomly assigned (five participants for each list).

Every trial started with a fixation cross which was shown until participants clicked on it. In all trials, the same token of the prelude (with a duration of 897 ms) was used. This was followed by a 1000 ms silence, after which the target utterance was auditorily presented. After participants had clicked on the respective object, there was a 1000 ms inter-trial interval. Eye-movement data (fixations, blinks, saccades) were recorded throughout the experiment.

The testing procedure was the same as in Braun et al. (2018a). Participants were tested individually in a sound attenuated room at the University of Konstanz. They were instructed in writing to listen to the utterances and to click on the object that is mentioned therein as quickly as possible. The instructions gave an example to make sure that participants knew what the object is.

Participants sat at a distance of approximately 70 cm from a 20 inch LCD screen, so that they could freely move the computer mouse. They rested their chin on the provided chin rest. Their dominant eye was calibrated with an SMI Eyelink 1000 system (pupil and corneal reflection at a sampling rate of 250 Hz). The same sampling rate was used during trials. The auditory stimuli were presented via headphones (Sennheiser PMX90) at a comfortable loudness.

Results

The eye-tracking data were processed as in Braun et al. (2018a). That is, the eye movement record was sampled in 4 ms steps and automatically parsed into saccades, fixations, and blinks by the EyeLink software (using normal saccade sensitivity). Only fixations were further processed. They were automatically coded as pertaining to a given word if they fell within a rectangle of 100 × 100 pixels, centered on the middle of that word. The grand average of evolution of fixations to the four words in the two intonation conditions is shown in Figure 3 (using the VWPre package in R, see Porretta et al., 2017). The gray vertical dashed lines indicate the segmental reference points, i.e., word boundaries from left to right. Note that it takes approximately 200 ms to launch a saccade (Fischer, 1992; Matin et al., 1993; Altmann and Kamide, 2004), which is also the delay in our studies: The fixations to the target (the grammatical object that had to be clicked, blue line in Figure 3) increased at approximately 1000 ms after utterance onset in the broad focus condition, i.e., approximately 200 ms after the onset of the grammatical object. The same delay of 200 ms is observed in the prenuclear L^∗+H condition and is hence a good approximation for the time it took participants to launch saccades based on the auditory input. Hence, only after this time fixations can be interpreted as a response to the acoustic signal.

FIGURE 3

The interesting line for our research question is the red line in the time window from about 330 ms to 770 ms (i.e., 200 ms after the onset of the subject noun till 200 ms after its offset). This line shows fixations to the contrastive associate while participants were processing the subject noun. In Figure 4, the fixations to the contrastive associate in the two intonation conditions are compared directly.

FIGURE 4

For statistical analysis we analyzed participants’ fixations to the contrastive referent in consecutive 100 ms steps (cf. McQueen and Viebahn, 2007). We calculated the empirical logits of fixations to the contrastive associate in consecutive 100 ms windows starting from 100 ms after the onset of the utterance until 800 ms after its onset, dividing the fixations to that word by fixations that were directed elsewhere. A constant of 0.5 was added to both the denominator and the numerator (Barr et al., 2011). Empirical logits were analyzed using linear mixed effects regression models with intonation condition (prenuclear L^∗+H vs. L+H^∗) as fixed factor (dummy coded) and random intercepts for participants and items (Baayen, 2008; Baayen et al., 2008). The model further included random slopes for the two within-group factors when this improved the fit of the model, as determined by LogLikelihood comparisons, using the R-function anova(). P-values were calculated using the Satterthwaite approximation of degrees-of-freedom in the R-package lmerTest (Kuznetsova et al., 2016), which is based on lme4 (Bates et al., 2014).

In the time window 500–600 ms after the onset of the utterance, there were significantly more fixations to the contrastive associate in the contrastive topic condition (average logits = −1.7) than in the broad focus control condition (average logits = −2.3, ß = 0.56, SE = 0.19, df = 922, t = 2.9, p < 0.005), see Table 2 for p-values in all time windows. Note that there were no other significant differences in fixations to the contrastive associate in the entire time window shown in Figure 3. Given the time needed to plan a saccade, this difference is well within the time during which participants were processing the subject noun (170–270 ms after the onset of the subject noun, a period in time when all items are already unique when considering part-of-speech, grammatical gender, segments and stress, as indicated by a CELEX search). Note that fixations to the contrastive associate were numerically higher in the prenuclear L^∗+H than in the prenuclear L+H^∗ condition from the start of the utterance, but this difference was not significant. At the moment, we don’t have an explanation for this slight preference of the contrastive associate in the contrastive topic condition.

TABLE 2

	100–200 ms	200–300 ms	300–400 ms	400–500 ms	500–600 ms	600–700 ms	700–800 ms
Contrastive associate	p = 0.1	p = 0.1	p = 0.07	p = 0.1	p < 0.005	p < 0.07	p = 0.3
Non-contr. associate	p = 0.9	p = 0.8	p = 0.1	p = 0.9	p = 0.4	p = 0.4	p = 0.2

Summary of p-values of comparisons of fixations to the contrastive associate (first row) and non-contrastive associate (second row) across intonation conditions in consecutive 100 ms analysis windows of Experiment 1.

Bold face indicates a significant difference at α = 0.05. The subject noun starts on average 130 ms after the onset of the sentence; it ends on average 580 ms after the onset of the sentence (averages over both intonation conditions).

In both intonation conditions, there were also many fixations to the non-contrastive associate, but these fixations to the non-contrastive associate were not affected by intonation condition (second row of Table 2). There were more fixations to the target (i.e., the grammatical object that had to be clicked) in the broad focus control condition than in the contrastive topic condition. This effect approached significance in the time windows from 200–500 ms after the onset of the sentence (see Table A3 in the Appendix). This is the opposite pattern as for the fixations to the contrastive associate, which suggests that target fixations are reduced in the contrastive topic condition because of increased fixations to the contrastive associate.

We then compared whether the effect of intonation condition was stronger here, in the prenuclear L^∗+H condition than in the nuclear L+H^∗ (contrastive focus) condition of Experiment 1a in Braun et al. (2018a). In that experiment, there was an effect of intonation condition in the same time window, but with a smaller magnitude (ß = 0.4 in Braun et al. (2018a) compared to ß = 0.56 in this experiment). To this end, we combined the data set and calculated the interaction between experiment and condition (contrastive topic/focus vs. broad focus control). The model showed no interaction between experiment and condition (p = 0.5); there was only a significant effect of condition in the combined data set, with more fixations to the contrastive alternative in the contrastive accents (nuclear L+H^∗ and prenuclear L^∗+H) than in the control condition (ß = 0.6, SE = 0.19, df = 1839.9, t = 2.9, p = 0.003). The lack of an interaction does not allow for strong conclusions. An additional Bayes Factor analysis indicated that the simpler model was more than 200 times more likely than the model with the interaction (Morey and Rouder, 2018). This suggests that the activation of contrastive alternatives is not different for nuclear L+H^∗ accents and prenuclear L^∗+H accents.

Discussion

The eye-tracking data showed that participants fixated more on contrastive associates to the subject constituent when it was produced with a prenuclear L^∗+H accent compared to a prenuclear L+H^∗ accent. The difference was significant in the time window from 500–600 ms after the onset of the utterance, i.e., immediately while participants were processing the subject noun. We interpret these differences in fixations to the contrastive associate as evidence for an activation of alternatives upon hearing subjects with a prenuclear L^∗+H accent as compared to prenuclear L+H^∗. Given the lack of a difference for fixations to the non-contrastive associate, the data speak in favor of a model in which prenuclear L^∗+H is a contrastive accent in the sense that it leads to an increased activation of contrastive alternatives. Note that this difference in fixations to the contrastive associate for prenuclear L^∗+H (vs. prenuclear L+H^∗) is the same as the difference in fixations reported for comparison of nuclear L+H^∗ (contrastive focus vs. prenuclear L+H^∗) reported in Experiment 1a in Braun et al. (2018a). It is of similar magnitude and occurs at the same time window, specifically between 500–600 ms after the onset of the utterance. The data hence suggest that nuclear L+H^∗ and prenuclear L^∗+H have the same potential to activate alternatives, vis-à-vis a non-contrastive prenuclear L+H^∗ accent. This finding has interesting implications for the modeling of contrastive topics (see General Discussion).

We now focus on the time course of the effect of intonation condition to determine which part of the contour may have resulted in the activation of alternatives. We observe significant differences in fixations to the contrastive associate in the time window 500–600 ms after utterance onset. These fixations are triggered by acoustic information that occurred around 300–400 ms after utterance onset the latest (170–270 ms after the onset of the subject noun). This suggests that participants’ fixations are guided directly by the F0 information before and on the stressed syllable. Ritter and Grice (2015) already showed that German listeners are particularly sensitive to this “onglide” information, but only for nuclear accents. We add to this that prenuclear accents do not differ in this respect. Note that in this analysis window, only information on the pitch-level of the accented syllable is available (L^∗ vs. H^∗) and some information on the direction (rising or falling), but no information on the following pitch movement (dipping in broad focus, high plateau in contrastive topic condition). It hence seems that the pitch accent alone is sufficient to trigger the contrastive interpretation. This ties in with offline acceptability judgments, in which participants judged utterances with a combination of prenuclear L^∗+H followed by a nuclear H+L^∗ nuclear accent as more appropriate in a contrast that elicits a CT-interpretation, while the intervening pitch contour (the presence/absence of hat contour) had no effect (Braun and Asano, 2013). It is also consistent with findings on German that suggest that the onglide (the F0-information prior to the stressed syllable) is important for interpretation (Ritter and Grice, 2015).

Experiment 2

Experiment 2 tested whether the differences in fixations to the contrastive alternatives are solely due to the differences in accent type (prenuclear L^∗+H vs. L+H^∗ here) or due to the differences in phonetic implementation of these accent types (in particular the peak height and the F0-excursion of the rise and the concomitant differences in perceived prominence). Since there are different opinions on whether prominence is related more to F0-excursion or the scaling of the tonal targets, we manipulated the F0-contour of both intonation conditions to make their F0-excursions (and the scaling of the low and high tonal targets of the accents) the same. Specifically, we (a) raised the low tonal target in the L^∗+H condition, while keeping the high tonal target unaltered (making the CT accents less prominent under the view that L^∗-accents are more prominent the lower the L-target and under the view that F0-excursion is related to perceived prominence) and (b) lowered the entire register of the L+H^∗ condition, to have exactly the same F0-scaling for low and high tonal targets and a similar degree of unnaturalness induced by the resynthesis procedure.