<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2016.00844</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Salience and Attention in Surprisal-Based Accounts of Language Processing</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Zarcone</surname> <given-names>Alessandra</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/241830/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>van Schijndel</surname> <given-names>Marten</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/337324/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Vogels</surname> <given-names>Jorrig</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/69853/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Demberg</surname> <given-names>Vera</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/351854/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Computational Linguistics and Phonetics, Universit&#x000E4;t des Saarlandes</institution> <country>Saarbr&#x000FC;cken, Germany</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Linguistics, The Ohio State University</institution> <country>Columbus, OH, USA</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Alice Julie Blumenthal-Dram&#x000E9;, Albert-Ludwigs-Universit&#x000E4;t Freiburg, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: LouAnn Gerken, The University of Arizona, USA; Stefan Frank, Radboud University Nijmegen, Netherlands</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Alessandra Zarcone <email>zarcone&#x00040;coli.uni-saarland.de</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>06</day>
<month>06</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>7</volume>
<elocation-id>844</elocation-id>
<history>
<date date-type="received">
<day>29</day>
<month>02</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>05</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2016 Zarcone, van Schijndel, Vogels and Demberg.</copyright-statement>
<copyright-year>2016</copyright-year>
<copyright-holder>Zarcone, van Schijndel, Vogels and Demberg</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>The notion of <italic>salience</italic> has been singled out as the explanatory factor for a diverse range of linguistic phenomena. In particular, perceptual salience (e.g., visual salience of objects in the world, acoustic prominence of linguistic sounds) and semantic-pragmatic salience (e.g., prominence of recently mentioned or topical referents) have been shown to influence language comprehension and production. A different line of research has sought to account for behavioral correlates of cognitive load during comprehension as well as for certain patterns in language usage using information-theoretic notions, such as <italic>surprisal</italic>. Surprisal and salience both affect language processing at different levels, but the relationship between the two has not been adequately elucidated, and the question of whether salience can be reduced to surprisal / predictability is still open. Our review identifies two main challenges in addressing this question: terminological inconsistency and lack of integration between high and low levels of representations in salience-based accounts and surprisal-based accounts. We capitalize upon work in visual cognition in order to orient ourselves in surveying the different facets of the notion of salience in linguistics and their relation with models of surprisal. We find that work on salience highlights aspects of linguistic communication that models of surprisal tend to overlook, namely the role of attention and relevance to current goals, and we argue that the Predictive Coding framework provides a unified view which can account for the role played by attention and predictability at different levels of processing and which can clarify the interplay between low and high levels of processes and between predictability-driven expectation and attention-driven focus.</p></abstract>
<kwd-group><kwd>attention</kwd>
<kwd>goals</kwd>
<kwd>language</kwd>
<kwd>predictive coding</kwd>
<kwd>predictability</kwd>
<kwd>relevance</kwd>
<kwd>salience</kwd>
<kwd>surprisal</kwd></kwd-group>
<contract-num rid="cn001">SFB 1102</contract-num>
<contract-num rid="cn001">Cluster of Excellence Multimodal Computing and Interaction EXC284</contract-num>
<contract-num rid="cn002">DGE-1343012</contract-num>
<contract-sponsor id="cn001">Deutsche Forschungsgemeinschaft<named-content content-type="fundref-id">10.13039/501100001659</named-content></contract-sponsor>
<contract-sponsor id="cn002">National Science Foundation<named-content content-type="fundref-id">10.13039/100000001</named-content></contract-sponsor>
<counts>
<fig-count count="0"/>
<table-count count="0"/>
<equation-count count="5"/>
<ref-count count="175"/>
<page-count count="17"/>
<word-count count="16523"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1. Introduction: The attentive brain and the anticipating brain</title>
<p>The perceptual experience we are continuously subjected to while awake is an &#x0201C;embarrassment of riches&#x0201D; (Wolfe and Horowitz, <xref ref-type="bibr" rid="B167">2004</xref>): for example, when we process a visual scene, we need to focus our maximum visual acuity (the fovea) on the most useful or interesting parts of the scene (Mackworth and Morandi, <xref ref-type="bibr" rid="B107">1967</xref>). In doing so, we are guided by attention: the &#x0201C;attentive brain&#x0201D; filters out the <italic>relevant</italic> information, prioritizing between stimuli and giving certain stimuli a special status, thus easing the processing burden. The stimuli attracting attention are said to be <italic>salient</italic> (literally, &#x0201C;standing out from the ground&#x0201D;, Chiarcos et al., <xref ref-type="bibr" rid="B18">2011</xref>). The notion of <italic>salience</italic> has been widely used in linguistics as the explanatory factor for a diverse range of phenomena: to indicate a property of a sociolinguistic variable that makes it cognitively prominent and thus noticeable (Trudgill, <xref ref-type="bibr" rid="B150">1986</xref>; Kerswill and Williams, <xref ref-type="bibr" rid="B84">2002</xref>; R&#x000E1;cz, <xref ref-type="bibr" rid="B124">2013</xref>), or a property of discourse entities exploited in anaphoric binding (Grosz et al., <xref ref-type="bibr" rid="B59">1995</xref>; Osgood and Bock, <xref ref-type="bibr" rid="B120">1977</xref>; Prat-Sala and Branigan, <xref ref-type="bibr" rid="B123">2000</xref>), but also, according to a simulation view of language comprehension, the property of prominent entities in the described situation (Claus, <xref ref-type="bibr" rid="B23">2011</xref>).</p>
<p>The <italic>predictability</italic> of the stimulus also affects our perceptual experience. Our brain&#x00027;s ability to anticipate new stimuli is key to its adaptive success (Bar, <xref ref-type="bibr" rid="B5">2011</xref>; Clark, <xref ref-type="bibr" rid="B21">2013</xref>): the &#x0201C;anticipating brain&#x0201D; keeps track of what it has experienced (and how often), adapts to regularities, predicts upcoming stimuli based on recent context, but also detects <italic>surprising</italic> stimuli and reacts to unexpected ones if the predictions go wrong (Ranganath and Rainer, <xref ref-type="bibr" rid="B126">2003</xref>). For example, when looking at a series of static pictures implying motion, people mentally simulate implicit motion, going beyond what they see in the pictures and preparing for what is coming next (Freyd, <xref ref-type="bibr" rid="B47">1983</xref>; Hubbard, <xref ref-type="bibr" rid="B69">2005</xref>). Language is no exception: the linguistic units we process (at different levels: phonemes, words, syntactic constituents) may be expected or unexpected, depending on preceding context. The difference between expected and unexpected stimuli is determined by their frequency and <italic>conditional probability</italic> given preceding context. <italic>Surprisal</italic> is a function of the input&#x00027;s conditional probability given preceding context, corresponding to how predictable the input is, and has been shown to influence processing costs as well as production choices (Hale, <xref ref-type="bibr" rid="B61">2001</xref>; Levy, <xref ref-type="bibr" rid="B102">2008</xref>).</p>
<p>Salience has been identified with (e.g., R&#x000E1;cz, <xref ref-type="bibr" rid="B124">2013</xref>) or at least related to surprisal / predictability (e.g., Blumenthal-Dram&#x000E9; et al., <xref ref-type="bibr" rid="B9">2014</xref>), and given the success of information-theoretic models of language it would be tempting (and theoretically elegant) to reduce salience to surprisal. While it is clear that both predictability and salience(s) affect language processing, the relationship between the two has not been adequately elucidated, leaving the question open of whether salience can be reduced to surprisal. The main goal of this review is to address this question by disentangling the notions of salience and predictability and the role they both play during linguistic processing, distinguishing between their cognitive correlates and identifying their interplay.</p>
<p>The first challenge to face is undoubtedly a lack of terminological consistency among linguists: while in visual cognition the term <italic>salience</italic> refers to bottom-up stimulus-driven perceptual salience, linguists use the term to refer either to bottom-up, perceptual properties of incongruous stimuli (low-predictability stimuli, expected to require additional processing effort, Hanul&#x000ED;kov&#x000E1; et al., <xref ref-type="bibr" rid="B62">2012</xref>; Blumenthal-Dram&#x000E9; et al., <xref ref-type="bibr" rid="B9">2014</xref>), or to top-down, discourse-driven properties of accessible, congruous or recently accessed entities (high-predictability stimuli, expected to facilitate processing, Claus, <xref ref-type="bibr" rid="B23">2011</xref>). This inconsistency leads to potentially contradictory hypotheses on the relationship between predictability and salience (salience corresponds to low-predictability vs. salience corresponds to high-predictability).</p>
<p>The second challenge pertains to the interaction between high- and low-level representations involved in language processing. Predictability-based approaches to language comprehension have shown that high-level information (e.g., what we know about the speaker or the situation) might influence lower-level predictions, at a phoneme or word level. For example, because of our world knowledge including the information that <italic>men do not get pregnant</italic>, when we listen to a man&#x00027;s voice we don&#x00027;t expect him to say he&#x00027;s <italic>pregnant</italic> (van Berkum, <xref ref-type="bibr" rid="B153">2009</xref>). However, the interplay between low- and high-levels of processing and representation has not been explicitly modeled. This interplay becomes more clear if we factor in the role played by attention. For example, people can overlook very unexpected events if they are paying attention to other aspects of the scene: if people are asked to count passes in a basketball video, they will not notice a person in a gorilla costume walking across the scene (inattentional blindness effect, Simons and Chabris, <xref ref-type="bibr" rid="B142">1999</xref>). Similarly, if asked <italic>How many animals of each kind did Moses put on the Ark?</italic> (Van Oostendorp and De Mul, <xref ref-type="bibr" rid="B159">1990</xref>) people might be too focused on the high-level task of answering the question to notice that, at the word-level, <italic>Noah</italic> should be in the place of <italic>Moses</italic> (see Sanford and Sturt, <xref ref-type="bibr" rid="B134">2002</xref>, for a review of similar phenomena).</p>
<p>We will argue that the comprehender&#x00027;s attentional focus weights surprisal effects from one level or another, depending on the current goals and on perceived rewards. The Predictive Coding framework (Rao and Ballard, <xref ref-type="bibr" rid="B127">1999</xref>; Friston, <xref ref-type="bibr" rid="B49">2010</xref>; Clark, <xref ref-type="bibr" rid="B21">2013</xref>) provides a unified view which can clarify the interplay between low- and high-levels of processing and between bottom-up, stimulus-driven salience and top-down, goal-directed attentional control, and has the potential to reconcile low-level computations of surprisal, high-level representations, and goal-mediated attentional control.</p>
<p>We first give a brief overview of studies providing evidence for predictability-driven language comprehension, with a particular focus on recent results from information-theoretic approaches (Section 2). We then address the notion of salience (Section 3), first by drawing from work in visual cognition and then surveying the different facets of this notion in linguistics, seeking for parallels with visual cognition. We look at visual cognition because predictability and salience are arguably relevant to many cognitive domains (such as vision and language) and reflect very basic properties of cognition, but also because the field of visual cognition provides us with tools and categories which have been extensively modeled and discussed and have the potential to bring some clarity in the rather contradictory terminology employed in linguistics. We find that work on salience uncovers aspects of linguistic processing that models of surprisal tend to overlook, namely the role of attention, mediated by the perceiver&#x00027;s category system, by relevance to current goals and by affect. We then focus on recent work in the Predictive Coding framework, and on how surprisal and attention can be understood within this framework (Section 4). Finally we discuss how surprisal models can be extended to account for the role of salience and attention (Section 5).</p>
</sec>
<sec id="s2">
<title>2. Predictability and language</title>
<p>Every linguistic stimulus we process comes with a context: for example a visual scene, or a previously processed language input, or the situation we are in. Depending on previously processed contextual information, a stimulus can be more or less expected. Decades of experimental work in expectation-based approaches to language processing (e.g., Altmann and Kamide, <xref ref-type="bibr" rid="B1">1999</xref>; Trueswell et al., <xref ref-type="bibr" rid="B151">1994</xref>; Elman et al., <xref ref-type="bibr" rid="B33">2005</xref>) have shown that comprehenders draw context-based expectations about upcoming linguistic input at different levels: they build expectations for the next word (Morris, <xref ref-type="bibr" rid="B116">1994</xref>; Ehrlich and Rayner, <xref ref-type="bibr" rid="B32">1981</xref>; McDonald and Shillcock, <xref ref-type="bibr" rid="B110">2003</xref>), but also for their phonological form (DeLong et al., <xref ref-type="bibr" rid="B29">2005</xref>) and gender inflection (van Berkum et al., <xref ref-type="bibr" rid="B155">2005</xref>), for syntactic parses (Spivey-Knowlton et al., <xref ref-type="bibr" rid="B145">1993</xref>; MacDonald et al., <xref ref-type="bibr" rid="B106">1994</xref>; Demberg and Keller, <xref ref-type="bibr" rid="B30">2008</xref>), for discourse relations (K&#x000F6;hne and Demberg, <xref ref-type="bibr" rid="B89">2013</xref>; Drenhaus et al., <xref ref-type="bibr" rid="B31">2014</xref>; Rohde and Horton, <xref ref-type="bibr" rid="B132">2014</xref>), for semantic categories (Federmeier and Kutas, <xref ref-type="bibr" rid="B35">1999</xref>), for typical event participants (Bicknell et al., <xref ref-type="bibr" rid="B8">2010</xref>; Matsuki et al., <xref ref-type="bibr" rid="B109">2011</xref>), for the next referent to be mentioned (Altmann and Kamide, <xref ref-type="bibr" rid="B1">1999</xref>), for the next event to happen in a sequence (Chwilla and Kolk, <xref ref-type="bibr" rid="B20">2005</xref>; van der Meer et al., <xref ref-type="bibr" rid="B158">2005</xref>; Khalkhali et al., <xref ref-type="bibr" rid="B85">2012</xref>), and for typical implicit events (Zarcone et al., <xref ref-type="bibr" rid="B171">2014</xref>). The effects of predictability are measurable, as expectation-matching input facilitates processing, and deviation from expectations produces an increase in processing costs. Predictable words are read faster: they are fixated for less time and are more likely to be skipped than unpredictable words (Ehrlich and Rayner, <xref ref-type="bibr" rid="B32">1981</xref>; Balota et al., <xref ref-type="bibr" rid="B4">1985</xref>; McDonald and Shillcock, <xref ref-type="bibr" rid="B110">2003</xref>; Frisson et al., <xref ref-type="bibr" rid="B48">2005</xref>; Demberg and Keller, <xref ref-type="bibr" rid="B30">2008</xref>); also, the amplitude of the N400 event-related potential increases in a graded way as a function of a word&#x00027;s predictability (Kutas and Hillyard, <xref ref-type="bibr" rid="B97">1984</xref>; Federmeier and Kutas, <xref ref-type="bibr" rid="B35">1999</xref>; Kutas and Federmeier, <xref ref-type="bibr" rid="B96">2011</xref>; Frank et al., <xref ref-type="bibr" rid="B45">2013</xref>).</p>
<p>These and more studies have shown that during language processing comprehenders do not just rely on transitional probabilities between words (McDonald and Shillcock, <xref ref-type="bibr" rid="B110">2003</xref>; Frisson et al., <xref ref-type="bibr" rid="B48">2005</xref>) but exploit various sources of information to narrow down predictions for upcoming input, such as verb subcategorization biases and thematic fit (Trueswell et al., <xref ref-type="bibr" rid="B152">1993</xref>, <xref ref-type="bibr" rid="B151">1994</xref>; Hare et al., <xref ref-type="bibr" rid="B64">2003</xref>, <xref ref-type="bibr" rid="B63">2009</xref>; van Schijndel et al., <xref ref-type="bibr" rid="B161">2014</xref>), verb aspect (Ferretti et al., <xref ref-type="bibr" rid="B36">2007</xref>), but also visual context (Kamide et al., <xref ref-type="bibr" rid="B80">2003</xref>), generalized knowledge about typical events and their participants (Ferretti et al., <xref ref-type="bibr" rid="B37">2001</xref>; Bicknell et al., <xref ref-type="bibr" rid="B8">2010</xref>), knowledge about scenarios (van der Meer et al., <xref ref-type="bibr" rid="B157">2002</xref>, <xref ref-type="bibr" rid="B158">2005</xref>; Khalkhali et al., <xref ref-type="bibr" rid="B85">2012</xref>), discourse markers (K&#x000F6;hne and Demberg, <xref ref-type="bibr" rid="B89">2013</xref>; Drenhaus et al., <xref ref-type="bibr" rid="B31">2014</xref>; Xiang and Kuperberg, <xref ref-type="bibr" rid="B169">2015</xref>), and pragmatic inferences about the speaker&#x00027;s identity and status (van Berkum et al., <xref ref-type="bibr" rid="B156">2008</xref>). These different types of information are drawn upon by language comprehenders at multiple levels of representation (syntactic, lexical, semantic, and pragmatic) at each point in processing to reach a provisional analysis and build expectations at multiple levels based on this provisional analysis (van Berkum, <xref ref-type="bibr" rid="B154">2010</xref>; Kutas et al., <xref ref-type="bibr" rid="B95">2011</xref>; Kuperberg, <xref ref-type="bibr" rid="B92">2016</xref>; Kuperberg and Jaeger, <xref ref-type="bibr" rid="B93">2016</xref>). The flow of information goes both ways: the encountered input activates high-level representations in a bottom-up fashion (e.g., triggering expectations for new syntactic structures, event knowledge, scenarios), and, depending on contextual information, high-level representations influence low-level predictions (Kuperberg, <xref ref-type="bibr" rid="B92">2016</xref>). For example, knowledge about events and their participants cued by previous context (<italic>The day was breezy so the boy went outside to fly a&#x02026;</italic>) determines a prediction for a word (&#x02026;<italic>kite</italic>) but also triggers expectations for a phonological realization of the article against another (<underline><italic>a</italic></underline> <italic>kite</italic> / <underline><italic>an</italic></underline> <italic>airplane</italic>, DeLong et al., <xref ref-type="bibr" rid="B29">2005</xref>).</p>
<sec>
<title>2.1. Models of surprisal</title>
<p>Information-theoretic notions, such as <italic>surprisal</italic> (Hale, <xref ref-type="bibr" rid="B61">2001</xref>; Levy, <xref ref-type="bibr" rid="B102">2008</xref>), have been proposed to account for the relationship between predictability and processing costs. <italic>Surprisal</italic> is a function of the input&#x00027;s conditional probability given preceding context, corresponding to how predictable the input is and how much information it carries (highly predictable input conveys little information):
<disp-formula id="E1"><mml:math id="M1"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Surprisal</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">linguistic_unit</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textrm" mathvariant="normal">linguistic_unit</mml:mtext><mml:mo>|</mml:mo><mml:mtext class="textrm" mathvariant="normal">context</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>The surprisal of a word is equivalent to the difference between the probability distributions of possible utterances before and after encountering that word (Kullback-Leibler divergence), quantifying the amount of information conveyed by that word (Levy, <xref ref-type="bibr" rid="B102">2008</xref>). Surprisal Theory has sought to account for certain patterns in language usage as well as for behavioral correlates of cognitive load during comprehension, with the underlying linking hypotheses that cognitive load is proportional to the amount of information conveyed by the input (its surprisal) given preceding context, and that the speakers&#x00027; production choices tend to keep the amount of information constant (<italic>Uniform Information Density Hypothesis</italic>, Jaeger and Levy, <xref ref-type="bibr" rid="B73">2007</xref>, see also Jurafsky et al., <xref ref-type="bibr" rid="B76">2001</xref>; Gahl and Garnsey, <xref ref-type="bibr" rid="B51">2004</xref>). Surprisal can be modeled at different levels (phonemes, phrases, words) and is often estimated using relatively simple statistical models such as <italic>n</italic>-gram language models or Probabilistic Context-Free Grammars (Hale, <xref ref-type="bibr" rid="B61">2001</xref>; Demberg and Keller, <xref ref-type="bibr" rid="B30">2008</xref>; Frank, <xref ref-type="bibr" rid="B43">2009</xref>; Roark et al., <xref ref-type="bibr" rid="B131">2009</xref>). A word&#x00027;s surprisal has been shown to correlate with its reading time (Hale, <xref ref-type="bibr" rid="B61">2001</xref>; Demberg and Keller, <xref ref-type="bibr" rid="B30">2008</xref>; Levy, <xref ref-type="bibr" rid="B102">2008</xref>; Fossum and Levy, <xref ref-type="bibr" rid="B42">2012</xref>; Smith and Levy, <xref ref-type="bibr" rid="B143">2013</xref>; van Schijndel and Schuler, <xref ref-type="bibr" rid="B160">2015</xref>) and with the amplitude of the N400 at the word (Frank et al., <xref ref-type="bibr" rid="B45">2013</xref>).</p>
</sec>
<sec>
<title>2.2. Limitations of models of surprisal</title>
<p>A surprisal-based model is typically defined by the linguistic units it takes into consideration and by what level it can condition on. Typically, surprisal-based models do not tackle the problem of how different levels of representation interact with each other, as the probability of a linguistic unit (e.g., a phoneme, a phrase, a word, a situation model) is conditioned on the preceding units at the same level (e.g., preceding phonemes, phrases, words, situation models). Comprehenders, though, exploit information at different levels to build expectations for upcoming input. There have been some attempts at integrating surprisal estimates with a model of semantic surprisal (Mitchell et al., <xref ref-type="bibr" rid="B114">2010</xref>; Frank and Vigliocco, <xref ref-type="bibr" rid="B46">2011</xref>; Sayeed et al., <xref ref-type="bibr" rid="B138">2015</xref>), but not a unified account showing how the probability of lower-level units (e.g., perceptual features) can be conditioned on higher-level units (e.g., situation, world knowledge) to predict processing costs, or how to exploit higher-level information to predictively pre-activate information at lower levels of representation (Kutas et al., <xref ref-type="bibr" rid="B95">2011</xref>; Kuperberg, <xref ref-type="bibr" rid="B92">2016</xref>). We will argue that such an account should include the role played by attention in shifting the focus between different levels to determine at what level surprisal influences processing costs.</p>
<p>Surprisal-based models rely on the linking hypothesis that high surprisal corresponds to high processing costs. But does this relationship between surprisal and processing cost always hold? Kidd et al. (<xref ref-type="bibr" rid="B86">2012</xref>) have shown that infants focus their visual attention to sequences whose complexity (surprisal) is neither too low nor too high, but <italic>just right</italic>, that is, it falls within certain optimal complexity margins (this effect is known as the <italic>Goldilocks effect</italic>). Arguably, some sort of Goldilocks effect also affects the attention of adult comprehenders, who react to extreme values of the complexity/predictability spectrum by diverting their attention from extremely complex stimuli that is too demanding or unpredictable (for example, when they are pushed beyond their memory capacity, see Nicenboim et al., <xref ref-type="bibr" rid="B118">2015</xref>, or when they hear a foreign language), or from extremely predictable stimuli. For example, utterances about very predictable events (&#x0201C;<italic>John went shopping. He paid the cashier&#x0201D;</italic>) may trigger pragmatic inferences (<italic>John is a shoplifter</italic>, Kravtchenko and Demberg, <xref ref-type="bibr" rid="B91">2015</xref>), simply because we expect our interlocutors to be informative (if they think it&#x00027;s worth mentioning that <italic>John paid the cashier</italic>, it must be an exceptional event). Also, as noted by van Berkum (<xref ref-type="bibr" rid="B154">2010</xref>), &#x0201C;predictions are even useful when they are wrong&#x0201D;: less expected (marked) combinations (e.g., a cleft sentence construction) may be a way of marking the delivery of a message as worthy of extra attention, thus easing the processing burden on an otherwise surprising stimulus. Previous context may also lead the hearer to expect surprise, e.g., <italic>You&#x00027;ll never believe it! The thing John was brushing his teeth with was a knife the day before yesterday</italic>. (Futrell, <xref ref-type="bibr" rid="B50">2012</xref>).</p>
<p>A third point concerns the relationship between the model we use to estimate surprisal, and the input&#x00027;s probability of occurrence in the world. As observed by Pierrehumbert (<xref ref-type="bibr" rid="B122">2006</xref>), (log-)frequencies of occurrences, while going a long way in explaining processing costs, do not tell us the whole story: between the frequencies of events and the frequency of memories, &#x0201C;lies a process of attention, recognition, and coding which is not crudely reflective of frequency.&#x0201D; What we store in our memory, and then exploit in expectation-based processing, depends on where our attention is focused, on what stimuli we consider relevant but also on what valence we associate with them. We will argue in Section 4 that we need to factor in the role played by the affect system, that is the neural circuitry that processes valence in the brain, to fill the gap between probability distributions of events in the world and our memory&#x00027;s probability distributions.</p>
</sec>
<sec>
<title>2.3. Bayesian surprise and the snow-screen paradox</title>
<p>Surprisal does not quantify how useful or relevant the stimulus is, but solely how predictable it is. Itti and Baldi (<xref ref-type="bibr" rid="B70">2009</xref>) introduced a Bayesian theory of <italic>surprise</italic>, which weights the predictability of a stimulus by its usefulness or relevance, determining how unexpected we perceive the stimulus to be. The observer&#x00027;s background beliefs (for example, the probability of seeing <italic>CNN</italic> or <italic>BBC</italic> when turning on the TV) are represented as a prior probability distribution, which is updated using Bayes&#x00027; theorem as new observations are made (e.g., <italic>CNN is on</italic>). <italic>Bayesian surprise</italic> is the difference (Kullback-Leibler divergence) in the belief distribution before and after an observation, indicating how much the observation changed our beliefs about the world. If <italic>CNN</italic> is the most expected outcome given our prior beliefs, when we turn on the TV and see <italic>CNN</italic> the surprise will be minimal. If <italic>BBC</italic> is shown instead, there will be a small amount of surprise and a subsequent belief update. Every subsequent change on the screen (a newscaster&#x00027;s mouth moving, a commercial break) will also update our beliefs and thus our predictions about upcoming TV content accordingly.</p>
<p>Itti and Baldi (<xref ref-type="bibr" rid="B70">2009</xref>) illustrate the difference between <italic>surprisal</italic> and <italic>surprise</italic> using the so-called &#x0201C;snow-screen paradox&#x0201D;: if a random pixel pattern (known as <italic>snow</italic> or <italic>static</italic>) appears when we turn on the TV or while we are watching it, we will be highly surprised, because this outcome is extremely unexpected. At a high level, our belief that the snow would appear was very low (high surprise). At a low level, the pixel configuration before the snow would not have helped us predict the random black-and-white pixel configuration when it first appeared (high surprisal). Also, the snow is interesting at a high-level, because it signals a malfunction, so, after observing it, we will experience a large shift between prior and posterior distributions, strongly favoring the snow against other channels. But if the snow persists after the belief update, it is no longer interesting, because it is now the most expected outcome based on our updated belief (low surprise). At a pixel level, though, the snow frames are still continuously changing at random, making it impossible to predict the status of any pixel at any moment (high surprisal). In Itti and Baldi&#x00027;s words (<xref ref-type="bibr" rid="B70">2009</xref>, p. 1297), &#x0201C;random snow, although in the long term the most boring of all television programs, carries the largest amount of Shannon information&#x0201D; (that is, surprisal). Bayesian surprise differs from surprisal in that it quantifies the <italic>belief update</italic> of the model given the observation, whereas surprisal quantifies how much information the observation conveys (how predictable it is) given a current model, without taking into account a model update.</p>
<p>Griffiths and Tenenbaum (<xref ref-type="bibr" rid="B57">2007</xref>) also argue that surprisingness / interestingness rather than mere low probability determines the difference between a simply unlikely event and what we consider to be a coincidence: a coincidence (e.g., many coin flips, all turning out to be heads) is not only an unlikely event, but it is an event which is <italic>less</italic> likely under our currently adopted explanation for the observed state of things than under an alternative explanation (<italic>the coin is unfair</italic>, or <italic>the person flipping the coin can magically control it</italic>), which nevertheless does not have enough support to be adopted through a belief update. If interesting coincidences continue to occur, and if we pay attention to them, then the coincidence can turn into evidence and the alternative hypothesis can be supported via a belief update.</p>
<p>The snow-screen paradox shows that the level of representation that is most relevant to us determines how affected we are by one outcome or the other, and so does our category system: the snow is only interesting at its onset insofar as it signals a malfunction, but its random pixel changes have no relevance for us. If the observer neither understands English nor knows about different English-speaking channels, both <italic>CNN</italic> and <italic>BBC</italic> are categorized as <italic>TV channels I don&#x00027;t understand</italic>, and it makes very little difference in her belief update which one is showing. Similarly, language learners initially filter the L2-input (and try to build predictions about it) using the categories in their L1, which in turn determine what is surprising in the L2-input and what is not. Also, they rely heavily on L1-L2 similarities, for example by exploiting overlapping categories in the lexical aspect domain or in the grammatical aspect domain (depending on what dimension is marked in their L1) in learning the tense-aspect system of the new language (Izquierdo and Collins, <xref ref-type="bibr" rid="B72">2008</xref>; Shirai, <xref ref-type="bibr" rid="B141">2009</xref>). Learners do not pay attention to the <italic>snow</italic> in L2, that is to stimuli that are highly unpredictable to them because they are beyond their level, but focus on stimuli which they have a meaningful category for (see also Palm, <xref ref-type="bibr" rid="B121">2012</xref>).</p>
<p>In a similar vein, Relevance Theory (Sperber and Wilson, <xref ref-type="bibr" rid="B144">1986</xref>; Wilson and Sperber, <xref ref-type="bibr" rid="B165">2004</xref>) argues that comprehenders are driven by a search for <italic>relevance</italic>, under a <italic>presumption of optimal relevance</italic>. As the goal of comprehension is to construct a plausible hypothesis about the speaker&#x00027;s meaning, stimuli are optimally relevant if and only if (1) they are compatible with what we know of the communicator&#x00027;s abilities and preferences and (2) they are worth the audience&#x00027;s processing effort, because they contribute to confirming or correcting our hypotheses about the speaker&#x00027;s meaning (Wilson and Sperber, <xref ref-type="bibr" rid="B165">2004</xref>). Stimuli that are not relevant enough or that do not yield any cognitive effect (that is, do not confirm a hypothesis or correct a mistaken assumption about the speaker&#x00027;s meaning) are disregarded as not worth the processing effort. <italic>Snow</italic> stimuli are not worth the processing effort as they do not have any effect in confirming or correcting our hypotheses.</p>
</sec>
<sec>
<title>Summary</title>
<p>Predictability-based models have been very successful in accounting for processing costs during language comprehension, but (at least in their current implementations) they seem to have overlooked some aspects of linguistic processing, which suggest that the unexpectedness of a stimulus may not be the only factor determining how useful, interesting or difficult the stimulus is. In the next section, we will pinpoint these aspects in terms of salience and attention. In order to do so, we will first clarify some terminological issues related to salience in linguistics and its relation with predictability.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Salience in vision and salience in language</title>
<p><italic>Salience</italic> is a widely used term in linguistics, often referring to very different aspects of language comprehension and production (Chiarcos et al., <xref ref-type="bibr" rid="B18">2011</xref>; Blumenthal-Dram&#x000E9; et al., <xref ref-type="bibr" rid="B9">2014</xref>), such as the <italic>acoustic salience</italic> of the linguistic input (R&#x000E1;cz, <xref ref-type="bibr" rid="B124">2013</xref>) or of the <italic>visual salience</italic> of a scene during language-relevant tasks (Kelleher, <xref ref-type="bibr" rid="B83">2011</xref>), but also the <italic>discourse salience</italic> of referents (Osgood and Bock, <xref ref-type="bibr" rid="B120">1977</xref>) or the salience of entities in the described situation (<italic>simulation-based</italic> or <italic>situation-based salience</italic> Claus, <xref ref-type="bibr" rid="B23">2011</xref>). As with visual cognition, language understanding also seems to be influenced by low-level properties (of the visual scene or of the linguistic stimulus) and by high-level conceptual representations and goals. While in visual cognition salience is mainly used to refer to perceptual salience driven by low-level visual properties, in linguistics the same term is used to refer to two potentially contrasting properties of the stimulus (Blumenthal-Dram&#x000E9; et al., <xref ref-type="bibr" rid="B9">2014</xref>): for example, <italic>acoustic salience</italic> is typically meant to be a low-level perceptual property of the signal (depending on its transitional probabilities), attracting attention in a bottom-up fashion as <italic>visual salience</italic> does, whereas <italic>discourse</italic> and <italic>simulation-based salience</italic> typically exert a top-down influence which makes certain upcoming input more expected.</p>
<p>This terminological inconsistency is not completely unmotivated, as we will see in Section 3.3, but it leads to an apparent paradox when it comes to linking these models to measures of processing cost and to relating salience to predictability. Bottom-up salience, being a property of low-predictability stimuli, is expected to require additional processing effort (Hanul&#x000ED;kov&#x000E1; et al., <xref ref-type="bibr" rid="B62">2012</xref>), whereas top-down salience, being a property of accessible, high-predictability or recently accessed entities, is argued to facilitate processing (Claus, <xref ref-type="bibr" rid="B23">2011</xref>). We will now address this inconsistency by capitalizing on work on visual search in order to clarify the relationship between predictability and salience.</p>
<sec>
<title>3.1. Salience in visual cognition</title>
<p>Attention is a cognitive necessity: the amount of information our optic nerve receives<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref> far exceeds what our brain can process and transform into conscious experience. Attention filters out the relevant information, easing the processing burden (Wolfe and Horowitz, <xref ref-type="bibr" rid="B167">2004</xref>; Awh et al., <xref ref-type="bibr" rid="B3">2012</xref>). Attention is also an evolutionarily beneficial trait: our survival depends on our ability to filter and prioritize useful or interesting parts of our perceptual experience (attention-capturing or <italic>salient</italic> parts) over overtly predictable or uninteresting ones, in order to quickly identify and react to potentially dangerous or rewarding stimuli. Research in visual cognition has long focussed on pinning down factors that drive attention (Mackworth and Morandi, <xref ref-type="bibr" rid="B107">1967</xref>; Loftus and Mackworth, <xref ref-type="bibr" rid="B105">1978</xref>), and has identified two main components of attentional deployment (see Itti and Koch, <xref ref-type="bibr" rid="B71">2000</xref>, for a review): a bottom-up, fast mechanism based on the stimulus <italic>salience</italic> and a slower, top-down mechanism based on goals and tasks.</p>
<p><italic>Salience</italic> or <italic>saliency</italic> is defined by early features of the visual stimulus, such as color, intensity and orientation, which are claimed to drive preattentive selection (Koch and Ullman, <xref ref-type="bibr" rid="B88">1985</xref>; Itti and Koch, <xref ref-type="bibr" rid="B71">2000</xref>), determining effects such as the <italic>pop-out</italic> effect (observed when a target stimulus differs from its background distractors on at least one feature dimension). Itti and Koch (<xref ref-type="bibr" rid="B71">2000</xref>) describe a computational model of preattentive selection based on <italic>saliency maps</italic>, where each unit is activated based on low-level perceptual features and the competition among active units determines a single, winning location (the most salient one), predicting the location of gaze; the winning location is then promptly inhibited and a new winning location is chosen, predicting gaze at the next step, so that the map is able to scan the visual input by visiting different parts in a sequential fashion. Bruce and Tsotsos (<xref ref-type="bibr" rid="B15">2009</xref>) move from the idea that efficient sampling should focus on the areas maximizing information, and define salience in information-theoretic terms, as local information (how informative / unexpected the content of a region is, based on surrounding context). Salient parts of the stimulus are outliers (Tatler et al., <xref ref-type="bibr" rid="B147">2011</xref>), deviating from the surrounding area, and are prioritized by efficient sampling strategies as they carry the most information.</p>
<p>Salience is a good predictor of gaze during free visual search, but top-down factors such as current goals, task relevance and rewards (Folk et al., <xref ref-type="bibr" rid="B40">1992</xref>; Yarbus, <xref ref-type="bibr" rid="B170">1967</xref>; Hayhoe and Ballard, <xref ref-type="bibr" rid="B65">2005</xref>) and recent selection history (see Awh et al., <xref ref-type="bibr" rid="B3">2012</xref>, for a review) have been shown to influence gaze and attention in performance of a task and in presence of real-world scenes with clear semantic content, competing with and prevailing over bottom-up attention capture (Folk et al., <xref ref-type="bibr" rid="B40">1992</xref>; Chen and Zelinsky, <xref ref-type="bibr" rid="B17">2006</xref>). The computational model in Rao et al. (<xref ref-type="bibr" rid="B128">2002</xref>) captures such top-down effects by computing salience as a function of the similarity between the low-level perceptual features of the stimulus and a search target, creating a <italic>top-down saliency map</italic>. Top-down factors pose the problem of modeling local and global sources of information within the same framework (e.g., Navalpakkam and Itti, <xref ref-type="bibr" rid="B117">2005</xref>; Torralba et al., <xref ref-type="bibr" rid="B149">2006</xref>; Zelinsky et al., <xref ref-type="bibr" rid="B172">2006</xref>), finding a suitable interaction between bottom-up models such as the salience-based model in Itti and Koch (<xref ref-type="bibr" rid="B71">2000</xref>) and top-down ones such as the target-based model in Rao et al. (<xref ref-type="bibr" rid="B128">2002</xref>).</p>
<p>Torralba et al. (<xref ref-type="bibr" rid="B149">2006</xref>) argue that a holistic representation of scene context needs to be taken into account when modeling gaze in search tasks on real-world scenes: their Contextual Guidance Model combines low-level saliency and global high-level and context features (e.g., scene priors and tasks) to create a <italic>scene-modulated saliency map</italic> selecting fixation sites. Similarly, Henderson et al. (<xref ref-type="bibr" rid="B66">2009</xref>) show that visually non-salient targets in expected locations are found more easily than salient regions that are not likely target locations. According to their Cognitive Relevance Framework, visual search is guided top-down by cognitive relevance, that is by the need of the cognitive system to make sense of the scene (based on task, semantic knowledge about the type of scene and episodic knowledge about the particular scene being viewed): objects will be prioritized depending on current information-gathering needs over their low-level visual salience.</p>
<p>Work in visual cognition has shown that the stimulus in itself can capture the perceiver&#x00027;s attention if it <italic>pops out</italic> from the background due to its low-level perceptual features (its visual salience), carrying information given its surround. Top-down factors such as the perceiver&#x00027;s goals, the features of a search target, relevance to the task, recent selection history, and cognitive relevance (prior semantic knowledge about the scene and expected objects) can override bottom-up factors in determining what locations capture attention. Linguistic salience can also be defined as a property of linguistic stimuli &#x0201C;standing out&#x0201D; from a ground. We will now show how this term has been used in linguistics to refer to both low-level attention-capturing properties of the stimulus and to top-down activation of contextually-relevant elements.</p>
</sec>
<sec>
<title>3.2. Linguistic salience as a stimulus-specific property</title>
<p>A common use of the term <italic>salience</italic> in linguistics indicates a property of a sociolinguistic variable that makes it cognitively prominent (Trudgill, <xref ref-type="bibr" rid="B150">1986</xref>; Kerswill and Williams, <xref ref-type="bibr" rid="B84">2002</xref>). For example, Definite Article Reduction (DAR) in North England is the realization of the definite article as a glottal stop before consonants and vowels, which is cognitively salient (noticeable) to a speaker of a different variety of English (R&#x000E1;cz, <xref ref-type="bibr" rid="B124">2013</xref>). What makes a variable in dialect <italic>D</italic> noticeable to a speaker of dialect <italic>D</italic>&#x02032; is not its frequency per se, but a notable relative difference between its occurrence in <italic>D</italic> and its occurrence in <italic>D</italic>&#x02032; that makes the variable &#x0201C;stand out.&#x0201D; A speaker of <italic>D</italic>&#x02032; would not commonly expect a glottal stop between vowels or before a stressed vowel: the DAR occurs in positions in <italic>D</italic> where it is much less likely to occur in <italic>D</italic>&#x02032;, and therefore has a low transitional probability (large surprisal) for a speaker of <italic>D</italic>&#x02032;. A variable that has cognitively salient realizations can, in turn, be a marker of social indexation, becoming socially salient.</p>
<p>These studies indicate that transitional probabilities may guide attention by selecting interesting parts of the acoustic signal, which crucially are those with high surprisal / high information content. Similarly, marked (and less frequent) prosodic or syntactic constructions (Lambrecht, <xref ref-type="bibr" rid="B98">1994</xref>) can be used by the speaker to direct the listener&#x00027;s focus on a part of the signal, emphasizing it by way of the low predictability of the construction (e.g., <italic>It was</italic> <underline><italic>Moses</italic></underline> <italic>who put two animals of each kind on the ark</italic>, see also Giv&#x000F3;n, <xref ref-type="bibr" rid="B53">1988</xref>). Acoustic salience and syntactic focus are low-level properties of the linguistic signal that capture the hearer&#x00027;s attention in a bottom-up fashion (similarly to <italic>pop-out</italic> effects in visual cognition) and that depend on the transitional probabilities of the relevant segments, that is on their <italic>surprisal</italic>. Identifying linguistic salience with surprisal is a tempting and, arguably, a theoretically elegant option. Salience in linguistics, on the other hand, has also been used to indicate aspects of processing that are not as easily accounted for by models of predictability and that we will now review.</p>
</sec>
<sec>
<title>3.3. Linguistic salience as a situation-driven property</title>
<p>The term <italic>salience</italic> has been used in linguistics not only to refer to the property of a stimulus that stands out from a perceptual ground, but also to qualify entities that are prominent in the discourse model or the situation and influence comprehension in a top-down fashion, as in the case of <italic>discourse salience</italic> and <italic>situation-based salience</italic> (also referred to as <italic>semantic-pragmatic salience</italic>, see also Giora, <xref ref-type="bibr" rid="B52">2003</xref>). The idea behind these notions of salience is that, when understanding language, comprehenders maintain in their working memory a model of the evolving discourse context (Kamp, <xref ref-type="bibr" rid="B81">1981</xref>; Asher, <xref ref-type="bibr" rid="B2">1993</xref>; Kamp and Reyle, <xref ref-type="bibr" rid="B82">1993</xref>; Grosz et al., <xref ref-type="bibr" rid="B59">1995</xref>; Lascarides and Asher, <xref ref-type="bibr" rid="B99">2007</xref>) or, in a simulation-view of language comprehension, they run a mental simulation of the described situation (Zwaan and Radvansky, <xref ref-type="bibr" rid="B175">1998</xref>). If perceptual attention is necessary because we cannot focus on every aspect of the stimulus simultaneously, here the focus is on a different cognitive necessity, that is the limited capacity of our working memory: &#x0201C;only a few elements of the situation are available at any one time, that is the most salient ones at a particular time during processing&#x0201D; (Claus, <xref ref-type="bibr" rid="B23">2011</xref>). Salience is then accessibility in the discourse or situation model. High-accessibility entities are available for anaphoric binding and are likely to be mentioned in upcoming context (Grosz et al., <xref ref-type="bibr" rid="B59">1995</xref>; Osgood and Bock, <xref ref-type="bibr" rid="B120">1977</xref>; Levelt, <xref ref-type="bibr" rid="B101">1989</xref>; Vogels et al., <xref ref-type="bibr" rid="B163">2013</xref>). Discourse- and situation-based salience drive top-down predictions (derived from high-level information, be it the discourse model or the situation model) for what is going to be mentioned next, that is high-predictability entities.</p>
<p>Several factors may make an entity cognitively accessible / salient. An entity may be accessible because it perceptually available in the shared visual context (Kelleher, <xref ref-type="bibr" rid="B83">2011</xref>, see Section 3.4), because it is mentioned (and possibly highlighted) in discourse<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref> (for example, if it is the subject, Vogels et al., <xref ref-type="bibr" rid="B163">2013</xref>), or because of a mental simulation of the described situation. Consider this example discussed by Claus (<xref ref-type="bibr" rid="B23">2011</xref>):
<list list-type="order">
<list-item><p><italic>John was preparing for a marathon in August. After doing a few warm-up exercises, he put on / took off his sweatshirt and went jogging. He jogged halfway around the lake without too much difficulty</italic>. (Glenberg et al., <xref ref-type="bibr" rid="B55">1987</xref>).</p></list-item>
</list></p>
<p>In the first version (<italic>put on</italic>), the <italic>sweatshirt</italic> is still part of the situation involving John at the end of the story (it is part of the Here and Now of the protagonist, Claus, <xref ref-type="bibr" rid="B23">2011</xref>), whereas in the second version (<italic>took off</italic>) it is not: the entity&#x00027;s accessibility depends on the situational representation. The Here and Now of the protagonist does not only include what is visible to her, but also what she can act upon, what is relevant to her goals and to her mental state (see also Carreiras et al., <xref ref-type="bibr" rid="B16">1997</xref>; Radvansky and Curiel, <xref ref-type="bibr" rid="B125">1998</xref>; Zwaan et al., <xref ref-type="bibr" rid="B174">2000</xref>; Borghi et al., <xref ref-type="bibr" rid="B13">2004</xref>), and determines which elements are accessible and likely to be mentioned next.</p>
<p>Situation-based salience can drive predictions that are different than those coming from lower-level representations. Consider the following examples:
<list list-type="order">
<list-item><p><italic>For breakfast the boys / the eggs would only eat / bury toast and jam</italic>. (Kuperberg et al., <xref ref-type="bibr" rid="B94">2003</xref>).</p></list-item>
<list-item><p><italic>A huge blizzard ripped through town last night. My kids ended up getting the day off from school. They spent the whole day outside building a big snowman / towel / jacket in the front yard</italic>. (Metusalem et al., <xref ref-type="bibr" rid="B113">2012</xref>).</p></list-item>
</list></p>
<p>As in visual cognition, when the context evokes a clear scenario (the <italic>breakfast</italic> scenario, the <italic>playing in the snow</italic> scenario), relevant elements, perfectly congruent with the scenario, are activated (<italic>eggs</italic> and <italic>eating</italic> in the first, <italic>snowman</italic> and <italic>jacket</italic> in the second). In one case, though, the scenario-fitting element (<italic>the eggs would only eat</italic> and <italic>building a big jacket</italic>) does not fit the verb&#x00027;s selectional preferences: the higher-level predictions coming from the scenario are incompatible with lower-level predictions coming from the lexical semantic level. The congruity with the scenario reduces the N400 effect, which is evoked by a semantic violation due to the scenario-incongruent element (<italic>They spent the whole day outside building a big towel</italic>) and by a verb which is not supported by context (<italic>For breakfast the boys would only bury</italic>). High-level salient representations are activated and generate predictions for upcoming input even when they would be an anomalous continuation from the lower, lexical-semantic level of representation.</p>
<p>High-level predictions depend on generalized knowledge about real-world events and their typical participants, which is acquired both from first-hand participation or from second-hand experience (including language) and stored in our long-term memory (McRae and Matsuki, <xref ref-type="bibr" rid="B111">2009</xref>). An interesting open question, in line with the discrepancy between frequency of events and frequency of memories which we brought up in 2.2, is how we map between our experience of these events and our representations. When we experience people <italic>making coffee</italic>, inferring the protagonist&#x00027;s goals and intentions may be as important as observing what things typically happen in the sequence. We might remember better to <italic>use filtered water</italic> rather than tap water if we know that the point is to avoid limestone deposits in our coffee machine: knowing <italic>why</italic> (inferring goals) may help us remember <italic>what</italic> is part of the scenario, making a difference between an uninteresting detail in the scenario and a relevant, even if infrequent, step in the process. Between experience and memory there is again a process of &#x0201C;attention, recognition, and coding,&#x0201D; mediated by the affect system (see Section 4) and shaped by hypotheses about what is relevant to us and to other people, that shapes our memory&#x00027;s probability distributions. Current models of surprisal, which work on the <italic>linguistic signal</italic> as it is, currently lack a mechanism to weight certain aspects of the signal more than other.</p>
<p>We have classified existing notions of salience in linguistics into two main categories, while also clarifying how they relate to predictability-driven language processing: stimulus-specific attributes, which attract the comprehender&#x00027;s attention in a bottom-up fashion, and situation- and discourse-driven accessibility of entities, which guides the comprehender&#x00027;s top-down predictions for upcoming stimuli. These two categories have something in common: they are properties of entities &#x0201C;standing out&#x0201D; from a ground (perceptual in one case, cognitive in the other) and are properties we rely on to deal with limitations of our cognitive resources (attention in one case and working memory in the other). Nevertheless, salience as a stimulus-specific property is characterized as high surprisal, whereas entities which are salient with regard to the discourse or to the situation are highly predictable (low surprisal). We will now clarify how one type of salience may influence the other and interact with visual salience, and we will then explain the interaction between bottom-up focus and top-down predictions.</p>
</sec>
<sec>
<title>3.4. Interactions between bottom-up visual and linguistic salience and situation-driven salience</title>
<p>Given that language comprehension and production often take place within a non-linguistic, perceptual context, predictions in language processing will in many cases be shaped by a combination of linguistic and visual salience. Indeed, there is ample evidence that speakers and listeners use stimulus-based properties of the visual environment in language planning and processing (e.g., Clark et al., <xref ref-type="bibr" rid="B22">1983</xref>; Tanenhaus et al., <xref ref-type="bibr" rid="B146">1995</xref>; Coco and Keller, <xref ref-type="bibr" rid="B24">2009</xref>; Koolen et al., <xref ref-type="bibr" rid="B90">2015</xref>). It is less clear how stimulus-specific visual cues interact with either bottom-up linguistic salience or with top-down situation-driven salience. Results from scene description experiments have suggested that visual cues can tap directly into the lexical-syntactic representation of the sentence, allowing them to interact with the lexical accessibility of a reference to an entity (e.g., Tomlin, <xref ref-type="bibr" rid="B148">1997</xref>; Gleitman et al., <xref ref-type="bibr" rid="B54">2007</xref>). More recent studies (e.g., Vogels et al., <xref ref-type="bibr" rid="B163">2013</xref>; Coco and Keller, <xref ref-type="bibr" rid="B25">2015</xref>), however, corroborate the view that visual cues only play a role in the high-level global apprehension of the scene, which in turn affects lower (lexical-syntactic) levels of linguistic processing (Griffin and Bock, <xref ref-type="bibr" rid="B56">2000</xref>; Bock et al., <xref ref-type="bibr" rid="B10">2004</xref>). Hence, stimulus-driven visual salience influences the situation model, but only situation-driven salience in turn affects linguistic formulation.</p>
<p>In this view, low-level visual features help &#x0201C;set the scene,&#x0201D; using attention to filter out what is important or relevant information. In language production, this influences how information is structured in an utterance (e.g., what is mentioned first). In language comprehension, visual saliency cues may be used to give weight to an entity (provided the listener has access to the same visual environment as the speaker), so as to adjust predictions about what will be mentioned next. Hence, what starts as a perceptual bottom-up, high-surprisal cue can become a top-down, high-predictability cue: a visually salient entity pops out as surprising, which gives it a salient status within the situation model; next, the mental representation of the salient entity will be highly accessible by virtue of its high news value. Consequently, this entity will be likely to be mentioned, and hence is predictable. Salience is thus a way to describe what is in the current focus of attention, even though in one stage of processing this attentional focus may be due to a bottom-up surprising stimulus, whereas in a later stage of processing the same stimulus may be in focus because it is now highly predictable.</p>
<p>Top-down predictions arising from low-level visual cues may interact with predictions coming from other sources. For example, bottom-up linguistic salience can also focus attention on a certain entity, as when it is marked as new information or as &#x02018;in focus&#x02019; (in the information structural sense, as in &#x0201C;Once upon a time there was <italic>a girl</italic>&#x0201D;). As pointed out in Section 3.3, this may influence top-down accessibility at different levels of representation (situation-level, discourse-level, lexical-syntactic). In turn, each level of representation sprouts its own predictions and production choices, such as &#x02018;which topic will be discussed next?&#x02019; (situation level) or &#x0201C;what linguistic form is appropriate here?&#x0201D; (lexical-syntactic level). These predictions may be either in line or in conflict with predictions induced by the visual context (e.g., when the girl is either very visually prominent or not at all), and hence may lead to reduced or increased processing cost, respectively. In addition, linguistic saliency cues from different levels of representation may be either in line or in conflict with each other, which may show up as a modulation in correlates of processing cost (as with the <italic>breakfast-eggs</italic> example).</p>
<p>In general, when multiple saliency cues from different sources (visual, linguistic, bottom-up, top-down) can potentially be used to weight parts of the perceptual input, they may affect language planning and processing in different ways: they may influence either the same level or separate levels of processing, and their combined influence may show up as interactive or additive effects, or one cue may override the others. Hence, the effect of bottom-up salience on processing difficulty and production choices can either be boosted or tempered by the integration with other stimulus-based cues or simulation-driven predictions. Crucially, whether one cue takes precedence over another is highly dependent on current task goals. For example, visual salience may play a different role in an object naming task than in a memorization task or a visual search task, because different parts of the scene will be relevant in each task (Coco et al., <xref ref-type="bibr" rid="B26">2014</xref>; Montag and MacDonald, <xref ref-type="bibr" rid="B115">2014</xref>). Comprehenders will also use their beliefs about the speaker&#x00027;s intention to guide their focus of attention.</p>
<p>In sum, comprehenders&#x00027; predictions as well as speakers&#x00027; production choices are influenced by different stimulus-based and situation-based saliency cues at different levels of processing: salience on a situation-model level may influence predictions about the likelihood of mention of an entity, while local linguistic predictions, such as which lexical form will be used, may be influenced by salience on a more local, lexical-syntactic level (Kaiser and Trueswell, <xref ref-type="bibr" rid="B78">2008</xref>; Vogels et al., <xref ref-type="bibr" rid="B163">2013</xref>). At the same time, low-level, stimulus-based salience (surprisal) may also exert an influence on high-level, situation-model salience, resulting in a complex interplay between predictions at different levels of representation. Finally, the weighting of all those different saliency cues will be highly dependent on task goals and speaker intentions.</p>
</sec>
<sec>
<title>Summary</title>
<p>Work in visual cognition has shown that the stimulus low-level perceptual features (its visual salience) as well as top-down factors (goals, tasks, cognitive relevance) determine what locations capture attention. Salience-based approaches to language do not typically tackle the interaction between stimulus-specific properties of the linguistic signal and discourse- and situation-based salience, often adopting a misleading terminology by calling both <italic>salience</italic>, and ultimately are not explicit with regards to the relationship between salience(s) and surprisal. We have shown that some aspects of linguistic salience (e.g., acoustic salience, markedness of prosodic or syntactic constructions), which capture the comprehender&#x00027;s attention in a bottom-up fashion, can be easily conflated with surprisal, but discourse- and situation-based salience cannot, as they are deeply intertwined with goals, tasks, and attention.</p>
<p>Predictability-based approaches go a long way in accounting for processing costs, but current surprisal-based models of language comprehension do not include a mechanism to focus on relevant levels of representation or on relevant parts of the stimulus based on the comprehender&#x00027;s task or on the recognition of the speaker&#x00027;s or the protagonist&#x00027;s goals. We will now review the Predictive Coding framework, illustrating how high- and low-level representations can influence expectations at the relevant level of processing, how top-down information can focus attention to particular stimuli and how stimulus properties can in turn capture attention and influence top-down predictions, and how attention, goals, and salience can be reconciliated with surprisal.</p>
</sec>
</sec>
<sec id="s4">
<title>4. The predictive coding framework</title>
<p>Early studies in visual cognition argued that &#x0201C;perception is no passive sampling from external events&#x0201D; (Mackworth and Morandi, <xref ref-type="bibr" rid="B107">1967</xref>) and that there is &#x0201C;no perception without recognition&#x0201D; (Hake, <xref ref-type="bibr" rid="B60">1957</xref>). With the Predictive Coding framework (Rao and Ballard, <xref ref-type="bibr" rid="B127">1999</xref>; Friston, <xref ref-type="bibr" rid="B49">2010</xref>; Clark, <xref ref-type="bibr" rid="B21">2013</xref>) cognitive science completed a paradigm shift from the view of the brain as a &#x0201C;transformer of ambient sensations into cognition&#x0201D; to &#x0201C;a generator of predictions and inferences that interprets experience&#x0201D; (Mesulam, <xref ref-type="bibr" rid="B112">2008</xref>, p. 368). Predictive coding is fully compatible with the results from predictability-based approaches to language reviewed in Section 2 and has been argued to be the most appropriate framework to shed light on the interaction between high- and low-level representations in prediction-driven language comprehension (van Berkum, <xref ref-type="bibr" rid="B154">2010</xref>; Kuperberg, <xref ref-type="bibr" rid="B92">2016</xref>; Kuperberg and Jaeger, <xref ref-type="bibr" rid="B93">2016</xref>). Additionally, we argue that it provides a unique way to integrate surprise, surprisal and attention, and is thus an ideal candidate to model the interplay between salience and predictability.</p>
<p>In the Predictive Coding framework, the brain is conceptualized as a hierarchical architecture in which high- and low-level representations can influence predictions for expected input, and top-down models predict the flow of sensory data by modeling the source of the sensory input, that is by actively generating a representation of the upcoming input before perceiving it. The information flow is bidirectional: perception involves explaining away the sensory input by cascading predictions from high-level units down to lower-level units, generating the desired activity in the units, and then matching the predictions against the input and transmitting only the prediction error back to the higher levels. The prediction error or surprisal is the mismatch between the expected representation and the perceived representation. For example, if we are watching a video, our brain prepares for the next frame by predicting a representation of the figure in motion in the next stage of its movement. If the next frame depicts the expected continuation of movement, then the prediction error will be low, if the motion is interrupted, or changes trajectory, or if the frame shows something completely unexpected, then the prediction error will be high. Perceptually similar items and items that tend to occur in similar contexts will share a high degree of similarity in their representations. The prediction error is transmitted by dedicated &#x0201C;error units&#x0201D; and is used in turn to adjust future predictions to better match the input, resulting in a continuous cycle of prediction and error correction (Rao and Ballard, <xref ref-type="bibr" rid="B127">1999</xref>).</p>
<p>The brain attempts to minimize prediction error, through perception, action and attention. <italic>Perception</italic> minimizes prediction error by trying to infer the nature of the signal source from the varying input signal and extracting repeating patterns and statistical regularities from its environment, guided by the statistical history of events in our environment, and <italic>action</italic> is used by the observer to move the sensors to resample the world by actively seeking expected stimuli (for example, by moving the body so to receive a better signal). But not all error-unit responses have the same weight: <italic>attention</italic> is a means to weight reliable / relevant error-unit responses more than non-reliable / irrelevant ones (Clark, <xref ref-type="bibr" rid="B21">2013</xref>). We will now see how the brain encodes prediction as well as how it can use top-down information to inhibit bottom-up information, maximizing attention to task-relevant stimuli and suppressing task-irrelevant ones.</p>
<sec>
<title>4.1. Neural correlates of top-down and bottom-up processes</title>
<p>Communication in the brain occurs through neural firing, but, in order to parallelize operations, the brain operates multiple simultaneous communication channels at different firing frequencies (<italic>frequency-division multiplexing</italic>). Bottom-up information from perceptual stimuli is generally thought to be processed using high-frequency brain waves, such as those found in the gamma band (30&#x02013;100 Hz; e.g., Roux and Uhlhaas, <xref ref-type="bibr" rid="B133">2014</xref>). Top-down information, on the other hand, is generally thought to be stored as low-frequency brain waves, as in the theta (4&#x02013;7 Hz) or alpha (8&#x02013;12 Hz) bands, and several studies have suggested that lower frequencies serve to gate higher frequencies as a top-down control mechanism (e.g., Klimesch et al., <xref ref-type="bibr" rid="B87">2007</xref>; Sauseng et al., <xref ref-type="bibr" rid="B137">2010</xref>; Jensen et al., <xref ref-type="bibr" rid="B74">2012</xref>; Roux and Uhlhaas, <xref ref-type="bibr" rid="B133">2014</xref>).</p>
<p>Theta-band frequencies are thought to provide top-down envelopes that modulate the activation of bottom-up sequential information (Lisman and Buzs&#x000E1;ki, <xref ref-type="bibr" rid="B104">2008</xref>; Sauseng et al., <xref ref-type="bibr" rid="B135">2009</xref>; Holz et al., <xref ref-type="bibr" rid="B67">2010</xref>; Roux and Uhlhaas, <xref ref-type="bibr" rid="B133">2014</xref>). Essentially, the phase of the lower frequency encodes sequence positions, so when a high-frequency encoding of a stimulus is associated with a particular phase angle (sequence position) in the low-frequency signal, a corresponding association is made between the given stimulus and the selected sequence position. During each phase angle of the low-frequency brain wave, the amplitude of any associated bottom-up neural firing is boosted, producing a stronger signal for that percept. This mechanism, where the phase of a given frequency modulates the amplitude of a higher frequency, is called <italic>phase-amplitude coupling</italic> and uses frequency-division multiplexing to distinguish separate operations and time-division multiplexing to distinguish separate items (that is, each item corresponds to a separate point in the low-frequency phase).</p>
<p>In contrast to sequence-based prediction, perceptual salience is controlled by phase-amplitude coupling between gamma-band and alpha-band frequencies (Jensen et al., <xref ref-type="bibr" rid="B75">2002</xref>; Klimesch et al., <xref ref-type="bibr" rid="B87">2007</xref>; Sauseng et al., <xref ref-type="bibr" rid="B135">2009</xref>; Bonnefond and Jensen, <xref ref-type="bibr" rid="B12">2015</xref>). Alpha-band waves generally inhibit other neural activation, so at the peak of an alpha wave, other signals can be completely suppressed. As the alpha wave transitions to a lower-power phase of its cycle, it exerts less inhibitory influence on other signals and can reveal those signals it would otherwise suppress (Klimesch et al., <xref ref-type="bibr" rid="B87">2007</xref>; Jensen et al., <xref ref-type="bibr" rid="B74">2012</xref>). Conversely, as the alpha wave transitions back to its peak, other signals will become increasingly (re-)suppressed, which can produce an effect known as <italic>attentional blink</italic>, whereby having an alpha-band signal at a certain phase can inhibit or completely suppress processing of a stimulus such that the subject will not perceive the stimulus at all (Raymond et al., <xref ref-type="bibr" rid="B129">1992</xref>; Olivers, <xref ref-type="bibr" rid="B119">2007</xref>). Subjects seem to exploit this mechanism by adjusting the phase and power of their alpha waves in reponse to bottom-up observations, maximizing exposure to task-relevant stimuli and maximally suppressing task-irrelevant distractors (e.g., Worden et al., <xref ref-type="bibr" rid="B168">2000</xref>; Sauseng et al., <xref ref-type="bibr" rid="B136">2005</xref>; Mathewson et al., <xref ref-type="bibr" rid="B108">2009</xref>; Bonnefond and Jensen, <xref ref-type="bibr" rid="B11">2012</xref>, though see Firestone and Scholl, <xref ref-type="bibr" rid="B38">2015</xref>, for a dissenting review).</p>
<p>Phase-amplitude coupling thus uses the phase of top-down low-frequency control signals to increase the activation of select bottom-up high-frequency information signals, which literally increases the importance (salience) of those signals. Therefore, the communication frameworks that underlie our neurological operations seem to rely on simultaneous but distinct top-down and bottom-up processing signals, which can be independently measured during processing. For example, a future study might test how the N400 is modulated by varying target predictability (measurable by theta-gamma phase-amplitude coupling) and by varying the amount of target perceptual salience (measurable by alpha-gamma phase-amplitude coupling) afforded by the chosen task. Such a study would not have to rely on a priori, extrinsic measures of predictability (e.g., computed from <italic>n</italic>-gram statistics or incremental parsers) or salience (e.g., the number of words since a previous referent mention) but could instead model the actual probability and salience of each target and determine how those factors (as actually manifested during the experiment) influence processing.</p>
<p>Phase-amplitude coupling has already provided some support for the Predictive Coding framework (in addition to a wide array of other neurological evidence; see Lewis and Bastiaansen, <xref ref-type="bibr" rid="B103">2015</xref>, for a review of evidence from other neural measures). Intracranial electroencephalography (iEEG) studies (e.g., Zion Golumbic et al., <xref ref-type="bibr" rid="B173">2013</xref>; Fontolan et al., <xref ref-type="bibr" rid="B41">2014</xref>) have shown that top-down neural firing entrains to task-relevant auditory input, amplifying relevant input while suppressing irrelevant input. These results also suggest that top-down attention in auditory association cortex is modulated as a function of bottom-up information from primary auditory cortex. Thus, top-down frequencies tune attention by focusing on aspects of bottom-up input that are made relevant both by the task and by accumulated sources of prediction error.</p>
</sec>
<sec>
<title>4.2. Attention and goals</title>
<p>Attention balances the interaction between top-down predictions and bottom-up influences, weighting reliable / useful sources of prediction error more, and ultimately determining what levels and what parts of the stimulus are relevant at each moment. Attention is thus an ideal candidate to switch between levels of processing, which can account for a number of task- and goal-related effects in language comprehension.</p>
<p>Experimental work has shown that task influences the level of processing: Chwilla et al. (<xref ref-type="bibr" rid="B19">1995</xref>) contrasted a lexical decision task (<italic>is the target a Dutch word?</italic>) and a physical task (<italic>did the target appear in uppercase?</italic>) and observed a semantic priming effects (on the N400 and on reaction times) only when the task required accessing word meaning level (lexical decision task). Rayner and Raney (<xref ref-type="bibr" rid="B130">1996</xref>) showed that frequency effects found in a reading task disappeared if participants were given the task of searching for a target word in the text, while in Kaakinen and Hy&#x000F6;n&#x000E4; (<xref ref-type="bibr" rid="B77">2010</xref>) and Schotter et al. (<xref ref-type="bibr" rid="B139">2014</xref>) the effect of frequency was instead increased in a proofreading task compared to a reading-for-comprehension task. Schotter et al. (<xref ref-type="bibr" rid="B139">2014</xref>) additionally showed that the size of the frequency effect increased in the proofreading if misspelled words were non-words, while the size of the predictability effect increased if the relationship between words was crucial to identify spelling errors (that is, if misspelled words happened to be real words and the spelling mistake was only revealed by context). Xiang and Kuperberg (<xref ref-type="bibr" rid="B169">2015</xref>) contrasted a reading-for-comprehension task and a coherence rating task, showing that the coherence rating task facilitated a deeper situation-level representation of context and subsequent prediction of upcoming words. Tasks and goals determine what level we pay attention to, which level is relevant in the architecture and ultimately how detailed and specified our predictions are.</p>
</sec>
<sec>
<title>4.3. Attention and affect</title>
<p>Both the ability to predict what comes next and the ability to focus our attention on relevant stimuli are evolutionarily beneficial traits. The interoceptive and exteroceptive sensations perceived by our body (<italic>affective</italic> bodily changes, Barrett and Bar, <xref ref-type="bibr" rid="B6">2009</xref>; Craig, <xref ref-type="bibr" rid="B27">2009</xref>) determine the <italic>valence</italic> of perceived stimuli, that is their being perceived as pleasant and rewarding or painful and dangerous, which is possibly even more important for our survival. Valence is arguably also involved in language processing: van Berkum (<xref ref-type="bibr" rid="B154">2010</xref>) argues that language use, being an instance of social interaction, is entrenched in valence and affect, which arguably are part of the representations of not only emotionally-loaded lexical items, such as <italic>abortion</italic> or <italic>euthanasia</italic>, but of all lexical semantic content which is grounded in experience. The affect system is the neural circuitry that processes valence, and includes a broad set of cortical and subcortical brain areas such as the amygdala, the ventral striatum, the orbitofrontal cortex, the ventromedial prefrontal cortex, the cingulate cortex, the hypothalamus, and autonomic control centers in the brainstem (Barrett and Bar, <xref ref-type="bibr" rid="B6">2009</xref>; LeDoux, <xref ref-type="bibr" rid="B100">2000</xref>).</p>
<p>Valence is an integral dimension of perception and attention: the neurotransmitter dopamine, a key player in motivated and goal-directed behavior and in the resampling of stimuli that have been associated with rewards (reinforcement learning, Wise, <xref ref-type="bibr" rid="B166">2004</xref>), is also activated by surprising stimuli, such as sudden visual or auditory stimuli, that have never been associated with rewards (Horvitz, <xref ref-type="bibr" rid="B68">2000</xref>). Kakade and Dayan (<xref ref-type="bibr" rid="B79">2002</xref>) have proposed that dopamine activations are <italic>novelty bonuses</italic> that increase the probability of re-sampling not only typically rewarding stimuli, but also surprising stimuli (see Barto et al., <xref ref-type="bibr" rid="B7">2013</xref>, for a discussion of novelty vs. surprise), acting as a facilitator of exploratory action and perception. These properties make dopamine an ideal candidate for encoding precision of error units in the Predictive Coding framework (Fletcher and Frith, <xref ref-type="bibr" rid="B39">2009</xref>; Clark, <xref ref-type="bibr" rid="B21">2013</xref>). Interestingly, dopamine is also involved in the &#x02018;stamping-in&#x02019; of memory (Wise, <xref ref-type="bibr" rid="B166">2004</xref>), by loading environmental stimuli with motivational importance. Attention, affect and value drive learning, determining the strength of learned representations and ultimately making learning possible. The somatic marker hypothesis (Damasio, <xref ref-type="bibr" rid="B28">1994</xref>) and, more recently, the affective prediction hypothesis (Vuilleumier, <xref ref-type="bibr" rid="B164">2005</xref>; Barrett and Bar, <xref ref-type="bibr" rid="B6">2009</xref>) and the interoceptive Predictive Coding model (Seth et al., <xref ref-type="bibr" rid="B140">2011</xref>) suggest that affect and valence do not follow perception but instead are an integral part of it, for example driving object recognition. In a similar vein, Clark (<xref ref-type="bibr" rid="B21">2013</xref>) argues that nearly every aspect of perception is permeated by goal- and affect-laden expectations, and that the very division between emotional and non-emotional components may prove to be illusory. The affect system is arguably also the missing piece of the puzzle between physical experience and memory, reflecting a process which is not just reflective of frequency, but also of our attention processes and valence systems.</p>
</sec>
<sec>
<title>Summary</title>
<p>The studies reviewed here show that surprisal is not the only factor determining processing costs. The stimuli&#x00027;s relevance to the perceiver&#x00027;s goals, their valence and, crucially in the case of linguistic communication, their relevance to what we know of the speaker&#x00027;s abilities and preferences and their utility in confirming or correcting our hypotheses about the speaker&#x00027;s meaning determine what we pay attention to and what we are surprised by. At the two extremes of the predictability scale, stimuli can turn out to be too predictable (thus incompatible with what we assume to be relevant for the speaker&#x00027;s communicative goals), or too unpredictable (too costly and irrelevant, not worth the processing effort, or impossible to accommodate within our system of categories) and we may divert our attentions from both. On the other hand, relevant, unattended stimuli can be prioritized over task-irrelevant ones (for example, we can become aware of a deer by the side of the road, Jensen et al., <xref ref-type="bibr" rid="B74">2012</xref>), or incongruent objects may capture our attention if their perceptual salience is high enough (Coco et al., <xref ref-type="bibr" rid="B26">2014</xref>). Tasks and goals determine what level of processing is relevant and thus what level we pay attention to. A linking hypothesis aimed at indexing predictability and salience needs to account for these phenomena: high-level <italic>surprise</italic> may only be influenced by the relevant level of processing at each time, and <italic>surprisal</italic> at lower levels may not influence the behavioral response (unless it surpasses a certain threshold).</p>
<p>Predictive Coding provides an interesting framework for reconciling low-level computations of surprisal, high-level representations and hypotheses about the world and attentional focus mechanisms. We have reviewed recent work in neuroscience showing how our brain exploits multiple simultaneous channels at different firing frequencies to process perceptual stimuli bottom-up using high-frequency brain waves, while top-down information, at low-frequency brain waves, maximizes exposure to task-relevant stimuli by modulating the activation of relevant bottom-up information and suppressing task-irrelevant distractors. Attention is the mechanism we use to weight error-unit responses (in response to high-surprisal, attention-capturing input, or in response to relevant, interesting input, or as a function of the stimulus valence) over less interesting or informative ones. By weighting reliable sources of prediction error, attention and affect are the filter between perception and learned representations, and in the long-term shape our memories and beliefs. In the next section we will discuss in what way current surprisal models can be conceptually extended to yield more accurate accounts of language processing behavior.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Implications for models of processing difficulty: surprise, attention, affect</title>
<p>As discussed in this article, surprisal is a promising measure. Nevertheless, if our goal is not only to measure the amount of information contained in the linguistic signal, but also to describe how this amount of information relates to human processing difficulty, we need to also take into account effects of attention, namely (a) attention shifts from extremely predictable or too unpredictable stimuli, (b) the interplay of high- and low-level representations during language processing, mediated by attention and relevance, (c) the goal-dependent influence of higher-level representations, and (d) affect and valence and their influence on the learning of higher-level abstractions. We have argued that predictability and attention find a natural integration in the Predictive Coding framework, which accounts for how and why comprehenders generate predictions at multiple levels when processing language. In this framework, bottom-up properties of the signal are integrated with predicted percepts based on stored representations at multiple levels and grains of representation (van Berkum, <xref ref-type="bibr" rid="B154">2010</xref>; Farmer et al., <xref ref-type="bibr" rid="B34">2013</xref>; Kuperberg and Jaeger, <xref ref-type="bibr" rid="B93">2016</xref>). During processing, a new percept will in turn be used to generate updated predictions about the next part of the input. The Predictive Coding framework is however not an implemented computational model that we can run on a new text (or multi-modal input) to obtain processing difficulty predictions. Therefore, we will now propose how a computational model of surprisal could be extended to account for effects of attention. In particular, we argue that each representational level (auditory / visual, lexical, structural, situational) might need its own attention modulation.</p>
<p>Surprisal models are trained to accurately account for upcoming words, that is, the objective function during training of such models is to minimize prediction error. Consider for example an <italic>n</italic>-gram model, which predicts the surprisal of a word <italic>w</italic><sub><italic>i</italic></sub> based on the preceding sequence of <italic>n</italic> words, formalized as
<disp-formula id="E2"><mml:math id="M2"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Surprisal</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>In <italic>n</italic>-gram models there is no explicit modeling of syntax, semantic similarities, situational context representations or world knowledge. These models might therefore miss important generalizations or phenomena that are conditioned on words outside a window of <italic>n</italic> preceding words. However, with a lot of data and large contexts, many of the relevant statistics may be learned and represented by the model implicitly. <italic>N</italic>-gram models might therefore deliver good surprisal estimates for upcoming words, i.e., they might successfully predict upcoming words. Unfortunately, though, it is not clear how attention-based effects could be implemented in a model where the representation of linguistic knowledge is merely implicit. In such a model, the surprisal estimates would represent a combination of prediction errors and updates at all representation levels, i.e., they would be an approximation of the overall prediction error of a hierarchical architecture transmitting the prediction error up through all higher levels, and passing new updated anticipatory activations down. In order to adapt to a different task (e.g., reading for comprehension vs. spell checking), the model would have to be re-trained with a different objective function reflecting task-dependent costs of prediction errors.</p>
<p>A potential solution for modeling the hierarchical prediction process could therefore be in building models that also have a hierarchical architecture. Models with richer internal representations of linguistic structure and situational knowledge have been recently proposed. For instance, syntactic surprisal models internally represent syntactic structure (syntax tree <italic>t</italic> &#x02208; <italic>T</italic>) to estimate the predictability of upcoming words by calculating the difference in prefix probabilities (that is, the probability of observing sentence prefix <italic>w</italic><sub>1</sub>..<italic>w</italic><sub><italic>i</italic></sub>) before vs. after observing a word <italic>w</italic><sub><italic>i</italic></sub>. As Levy (<xref ref-type="bibr" rid="B102">2008</xref>) shows, the formula is equivalent to our the definition of surprisal &#x02212;log<italic>P</italic>(<italic>w</italic><sub><italic>i</italic></sub>|<italic>w</italic><sub>1</sub>..<italic>w</italic><sub><italic>i</italic>&#x02212;1</sub>).</p>
<disp-formula id="E3"><mml:math id="M3"><mml:mtable columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Surprisal</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>There have also been attempts to further extend computational models to capture topic context (e.g., Griffiths et al., <xref ref-type="bibr" rid="B57">2007</xref>), semantic surprisal (e.g., Mitchell et al., <xref ref-type="bibr" rid="B114">2010</xref>) or situation and event sequence knowledge (Frank et al., <xref ref-type="bibr" rid="B44">2008</xref>; Venhuizen et al., <xref ref-type="bibr" rid="B162">2016</xref>). A situation model representing situations <italic>S</italic> compatible with the prefix perceived so far and syntactic trees <italic>T</italic> that are consistent with the sentence prefix <italic>w</italic><sub>1</sub>..<italic>w</italic><sub><italic>i</italic>&#x02212;1</sub> could be represented as<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>
<disp-formula id="E4"><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">Surprisal</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mo>-</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x0002B;</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>A hierarchical model (see also Farmer et al., <xref ref-type="bibr" rid="B34">2013</xref>; Kuperberg, <xref ref-type="bibr" rid="B92">2016</xref>; Kuperberg and Jaeger, <xref ref-type="bibr" rid="B93">2016</xref>) then allows us to calculate the surprisal at each different level of representation. We can dissect the overall joint prefix probability that we use to calculate the information update from one word to the next in order to obtain prefix probabilities with respect to each level of representation:
<disp-formula id="E5"><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mo>-</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mo>-</mml:mo><mml:mo class="qopname">log</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>S</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x02208;</mml:mo><mml:mi>T</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x000D7;</mml:mo><mml:mtext>&#x000A0;</mml:mtext><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x000D7;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>.</mml:mo><mml:mo>.</mml:mo><mml:msub><mml:mrow><mml:mi>w</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>The information update can thus be calculated separately for each specific level of representation, and is equivalent to Itti and Baldi&#x00027;s (<xref ref-type="bibr" rid="B70">2009</xref>) Bayesian <italic>surprise</italic> for that level. With such a hierarchical model, it would be possible to attach a separate linking theory to each level of representation. These could then be used to model the time course of processing, or specific ERPs.</p>
<p>In our review, we observed that attention is distributed among incoming stimuli and processing levels, that goals may affect processing and attention and that not all error signals, even if large, will necessarily affect higher-level representations. We will now briefly discuss how each of these aspects can be addressed by a hierarchical model with separable linking theories per representation level.</p>
<p>Attention is limited and hence has to be distributed among different stimuli. The reviewed evidence also supports the idea that not all representations and levels of processing need to be actively &#x0201C;at work&#x0201D; to the same extent in all tasks, i.e., for some tasks like spell-checking, others which are not relevant to the task (e.g. coherence, meaning) may not get much attention allocated to them, and contribute little to observable processing difficulty. Sanford and Sturt (<xref ref-type="bibr" rid="B134">2002</xref>) make the case for underspecified representations: we do not need to fully specify the linguistic signal at all possible levels, but we only need full specification for the levels of representation that are in the focus of attention, whereas those which are not in the focus of attention may be subject to more shallow processing or incomplete pattern specification. Sanford and Sturt (<xref ref-type="bibr" rid="B134">2002</xref>) also observe that sometimes underspecified representations lead to errors, such as semantic illusions, which are easily avoided by manipulating focus (e.g., <italic>It was Moses who put two of each kind of animal on the ark</italic>. Bredart and Modolo, <xref ref-type="bibr" rid="B14">1988</xref>). In order to model phenomena like semantic illusions, the lexical semantic representation layer for the actor (<italic>Moses/Noah</italic>) would not be in the focus of attention during the critical region of this stimulus, and hence elicit only a small (or no) prediction error. The mismatch may therefore fail to propagate to other levels of representation, and not affect overall interpretation (that is, slip through unnoticed). The hierarchical model could specify a different linking function for each level of representation. It could then naturally account for task-dependent effects, such as the different strengths of predictability effects for different tasks.</p>
<p>Another apparent paradox that we discussed in Section 2.3 was the snow-screen paradox (Itti and Baldi, <xref ref-type="bibr" rid="B70">2009</xref>): processing difficulty for an uninteresting fixed screen (e.g., a blue screen) and a randomly-changing snow screen are intuitively similar, even though the amount of surprisal of these two percepts is extremely different. While prediction error when viewing a snow screen may be very large at the level of the visual cortex, this prediction error does not serve to update higher-level representations of the relevant semantics, as no interpretation of exact snow-screen patterns exists in the viewer&#x00027;s mind (the relevant categorization that can react to the incoming prediction error is not in place). The formulation of higher-level surprise also makes it explicit that a prediction error at a lower level only affects probability estimates at higher-level representations in as far as those prediction errors also change higher-level probability distributions: an exact pattern of snow might be very unpredictable, but the probability distribution over TV programmes <italic>P</italic>(<italic>TV_program</italic>|<italic>pixels</italic>) will not be affected by the likelihood of the exact pixel arrangement in the snow (at least not after already having perceived a few snow screens). Hence, these higher-level representations do not show any prediction error, and so the overall processing difficulty is low.</p>
<p>A similar situation could occur when a comprehender listens to somebody speaking in English (a language that the listener understands) and then switches to Finnish (a language she doesn&#x00027;t understand). In that case, processing difficulty would not go to infinity, but more likely she would stop predicting and processing the Finnish input in-depth: while there may be a very high prediction error at the word level, this prediction error does not serve to update any of the other representational layers, as it cannot be interpreted. During L2 language acquisition, new higher-level representations are learned. These can then &#x0201C;react" to certain input patterns from lower levels. This mechanism would then also naturally explain Goldilocks effects during learning, where learners only react to some types of prediction errors, most easily those that have representations in their own language as well, or those that are at the <italic>just right</italic> level of predictability, providing a theoretical explanation for observations in the language learning literature.</p>
</sec>
<sec sec-type="conclusions" id="s6">
<title>6. Conclusions</title>
<p>Prediction is a key aspect of cognition and in particular of language processing: comprehenders draw context-based expectations about upcoming input at different levels, relying and conditioning on multiple levels of representation at each point in processing, and experiencing a decrease in processing costs when the expectations are met and an increase when they are not. Current surprisal models go a long way in accounting for processing costs, but they still leave certain aspects unaccounted for, namely (1) phenomena at the extremes of the predictability scale (extremely high or low predictability), (2) the interaction between high- and low-levels of processing, (3) effects of task and goals, and (4) the influence of affect and valence. Work on linguistic salience, by putting the emphasis on attention and relevance, has the potential of accounting for these aspects, but has not exhaustively elucidated the interplay of salience and surprisal.</p>
<p>We have resolved terminological inconsistencies related to salience in linguistics by showing that, while perceptual acoustic salience and prosodic or syntactic focus can be accounted for in terms of surprisal-driven bottom-up attentional capture, discourse- and situation-based salience require an account of goal-driven attentional deployment that current models of surprisal lack. The Predictive Coding framework provides an integral account of prediction-driven perception, where perception, action, and attention share the common task of minimizing prediction error, respectively by trying to extract statistical regularities from the signal, by moving the sensors to resample the world to actively seek expected stimuli and by weighting reliable / goal-relevant and affect-laden error-unit responses more than non-reliable / irrelevant ones. The Predictive Coding framework is thus an ideal candidate to reconcile surprisal with attention and salience and to account for how these guide comprehenders in expectation-driven language processing at different levels.</p>
<p>We argued that current models of surprisal need to be extended to account for the role played by attention and goals. This extension can potentially be achieved by providing the model with richer internal representations of linguistic structure, situational knowledge, event sequence knowledge, and beliefs and by weighting predictions at different levels with regard to their relevance, that is to the way they affect the interpretation at higher levels. These models would potentially be able to calculate surprisal at different levels, modeling the comprehension process in more detail and activating or inhibiting irrelevant processing levels or irrelevant parts of the stimulus in order to model processing difficulty as a function of task-mediated attentional focus.</p>
</sec>
<sec id="s7">
<title>Author contributions</title>
<p>AZ, VD, JV, and MV conceived the review; AZ wrote the paper with the exceptions of Section 3.4 (written by JV), Section 4.1 (written by MV), and Section 5 (written by VD). All authors contributed critical comments and revision of the review and agreed to the final content of the article.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack>
<p>This research was funded by the German Research Foundation (DFG) as part of SFB 1102 &#x0201C;Information Density and Linguistic Encoding&#x0201D; and the Cluster of Excellence &#x0201C;Multimodal Computing and Interaction&#x0201D; (EXC 284). This material is partially based upon work supported by the National Science Foundation 1476 Graduate Research Fellowship Program under grant no. DGE-1343012.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Altmann</surname> <given-names>G.</given-names></name> <name><surname>Kamide</surname> <given-names>Y.</given-names></name></person-group> (<year>1999</year>). <article-title>Incremental interpretation at verbs: Restricting the domain of subsequent reference</article-title>. <source>Cognition</source> <volume>73</volume>, <fpage>247</fpage>&#x02013;<lpage>264</lpage>. <pub-id pub-id-type="pmid">10585516</pub-id></citation>
</ref>
<ref id="B2">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Asher</surname> <given-names>N.</given-names></name></person-group> (<year>1993</year>). <source>Reference to Abstract Objects in Discourse</source>. <publisher-loc>Kluver, Dordrecht</publisher-loc>.</citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Awh</surname> <given-names>E.</given-names></name> <name><surname>Belopolsky</surname> <given-names>A. V.</given-names></name> <name><surname>Theeuwes</surname> <given-names>J.</given-names></name></person-group> (<year>2012</year>). <article-title>Top-down versus bottom-up attentional control: a failed theoretical dichotomy</article-title>. <source>Trends Cogn. Sci.</source> <volume>16</volume>, <fpage>437</fpage>&#x02013;<lpage>443</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2012.06.010</pub-id><pub-id pub-id-type="pmid">22795563</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Balota</surname> <given-names>D. A.</given-names></name> <name><surname>Pollatsek</surname> <given-names>A.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name></person-group> (<year>1985</year>). <article-title>The interaction of contextual constraints and parafoveal visual information in reading</article-title>. <source>Cogn. Psychol.</source> <volume>17</volume>, <fpage>364</fpage>&#x02013;<lpage>390</lpage>. <pub-id pub-id-type="pmid">4053565</pub-id></citation>
</ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bar</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <source>Predictions in the Brain: Using Our Past to Generate a Future</source>. <publisher-loc>Oxford, UK</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation>
</ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barrett</surname> <given-names>L. F.</given-names></name> <name><surname>Bar</surname> <given-names>M.</given-names></name></person-group> (<year>2009</year>). <article-title>See it with feeling: affective predictions during object perception</article-title>. <source>Philos. Trans. R. Soc. B Biol. Sci.</source> <volume>364</volume>, <fpage>1325</fpage>&#x02013;<lpage>1334</lpage>. <pub-id pub-id-type="doi">10.1098/rstb.2008.0312</pub-id><pub-id pub-id-type="pmid">19528014</pub-id></citation>
</ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barto</surname> <given-names>A.</given-names></name> <name><surname>Mirolli</surname> <given-names>M.</given-names></name> <name><surname>Baldassarre</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>Novelty or surprise?</article-title> <source>Front. Psychol.</source> <volume>4</volume>:<issue>907</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2013.00907</pub-id><pub-id pub-id-type="pmid">27225533</pub-id></citation>
</ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bicknell</surname> <given-names>K.</given-names></name> <name><surname>Elman</surname> <given-names>J. L.</given-names></name> <name><surname>Hare</surname> <given-names>M.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name> <name><surname>Kutas</surname> <given-names>M.</given-names></name></person-group> (<year>2010</year>). <article-title>Effects of event knowledge in processing verbal arguments</article-title>. <source>J. Mem. Lang.</source> <volume>63</volume>, <fpage>489</fpage>&#x02013;<lpage>505</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2010.08.004</pub-id><pub-id pub-id-type="pmid">21076629</pub-id></citation>
</ref>
<ref id="B9">
<citation citation-type="other"><person-group person-group-type="editor"><name><surname>Blumenthal-Dram&#x000E9;</surname> <given-names>A.</given-names></name> <name><surname>Hanul&#x000ED;kov&#x000E1;</surname> <given-names>A.</given-names></name> <name><surname>Kortmann</surname> <given-names>B.</given-names></name></person-group> eds. (<year>2014</year>). <article-title>Perceptual linguistic salience: Modelling causes and consequences</article-title>, in <source>Proceedings of the FRIAS Workshop</source> (<publisher-loc>Freiburg</publisher-loc>).</citation>
</ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bock</surname> <given-names>K.</given-names></name> <name><surname>Irwin</surname> <given-names>D. E.</given-names></name> <name><surname>Davidson</surname> <given-names>D. J.</given-names></name></person-group> (<year>2004</year>). <article-title>Putting first things first</article-title>, in <source>The Interface of Language, Vision, and Action: Eye Movements and the Visual World</source>, eds <person-group person-group-type="editor"><name><surname>Henderson</surname> <given-names>J. M.</given-names></name> <name><surname>Ferreira</surname> <given-names>F.</given-names></name></person-group> (<publisher-loc>New York, NY</publisher-loc>: <publisher-name>Psychology Press</publisher-name>), <fpage>249</fpage>&#x02013;<lpage>278</lpage>.</citation>
</ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonnefond</surname> <given-names>M.</given-names></name> <name><surname>Jensen</surname> <given-names>O.</given-names></name></person-group> (<year>2012</year>). <article-title>Alpha oscillations serve to protect working memory maintenance against anticipated distracters</article-title>. <source>Curr. Biol.</source> <volume>22</volume>, <fpage>1969</fpage>&#x02013;<lpage>1974</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2012.08.029</pub-id><pub-id pub-id-type="pmid">23041197</pub-id></citation>
</ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bonnefond</surname> <given-names>M.</given-names></name> <name><surname>Jensen</surname> <given-names>O.</given-names></name></person-group> (<year>2015</year>). <article-title>Gamma activity coupled to alpha phase as a mechanism for top-down controlled gating</article-title>. <source>PLoS ONE</source> <volume>10</volume>:<fpage>e0128667</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0128667</pub-id><pub-id pub-id-type="pmid">26039691</pub-id></citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borghi</surname> <given-names>A. M.</given-names></name> <name><surname>Glenberg</surname> <given-names>A. M.</given-names></name> <name><surname>Kaschak</surname> <given-names>M. P.</given-names></name></person-group> (<year>2004</year>). <article-title>Putting words in perspective</article-title>. <source>Mem. Cogn.</source> <volume>32</volume>, <fpage>863</fpage>&#x02013;<lpage>873</lpage>. <pub-id pub-id-type="doi">10.3758/BF03196865</pub-id><pub-id pub-id-type="pmid">15673175</pub-id></citation>
</ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bredart</surname> <given-names>S.</given-names></name> <name><surname>Modolo</surname> <given-names>K.</given-names></name></person-group> (<year>1988</year>). <article-title>Moses strikes again: Focalization effect on a semantic illusion</article-title>. <source>Acta Psychol.</source> <volume>67</volume>, <fpage>135</fpage>&#x02013;<lpage>144</lpage>.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bruce</surname> <given-names>N. D.</given-names></name> <name><surname>Tsotsos</surname> <given-names>J. K.</given-names></name></person-group> (<year>2009</year>). <article-title>Saliency, attention, and visual search: an information theoretic approach</article-title>. <source>J. Vis.</source> <volume>9</volume>, <fpage>1</fpage>&#x02013;<lpage>24</lpage>. <pub-id pub-id-type="doi">10.1167/9.3.5</pub-id><pub-id pub-id-type="pmid">19757944</pub-id></citation>
</ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Carreiras</surname> <given-names>M.</given-names></name> <name><surname>Carriedo</surname> <given-names>N.</given-names></name> <name><surname>Alonso</surname> <given-names>M. A.</given-names></name> <name><surname>Fern&#x000E1;ndez</surname> <given-names>A.</given-names></name></person-group> (<year>1997</year>). <article-title>The role of verb tense and verb aspect in the foregrounding of information during reading</article-title>. <source>Mem. Cogn.</source> <volume>25</volume>, <fpage>438</fpage>&#x02013;<lpage>446</lpage>. <pub-id pub-id-type="pmid">9259622</pub-id></citation>
</ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>X. Zelinsky, G. J.</given-names></name></person-group> (<year>2006</year>). <article-title>Real-world visual search is dominated by top-down guidance</article-title>. <source>Vision Res.</source> <volume>46</volume>, <fpage>4118</fpage>&#x02013;<lpage>4133</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2006.08.008</pub-id><pub-id pub-id-type="pmid">17005231</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chiarcos</surname> <given-names>C.</given-names></name> <name><surname>Claus</surname> <given-names>B.</given-names></name> <name><surname>Grabski</surname> <given-names>M.</given-names></name></person-group> (<year>2011</year>). <source>Salience: Multidisciplinary Perspectives on Its Function in Discourse</source>, <volume>Vol. 227</volume>. <publisher-loc>Berlin</publisher-loc>: <publisher-name>Walter de Gruyter</publisher-name>.</citation>
</ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chwilla</surname> <given-names>D. J.</given-names></name> <name><surname>Brown</surname> <given-names>C. M.</given-names></name> <name><surname>Hagoort</surname> <given-names>P.</given-names></name></person-group> (<year>1995</year>). <article-title>The N400 as a function of the level of processing</article-title>. <source>Psychophysiology</source> <volume>32</volume>, <fpage>274</fpage>&#x02013;<lpage>285</lpage>. <pub-id pub-id-type="pmid">7784536</pub-id></citation>
</ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chwilla</surname> <given-names>D. J.</given-names></name> <name><surname>Kolk</surname> <given-names>H. H.</given-names></name></person-group> (<year>2005</year>). <article-title>Accessing world knowledge: evidence from N400 and reaction time priming</article-title>. <source>Cogn. Brain Res.</source> <volume>25</volume>, <fpage>589</fpage>&#x02013;<lpage>606</lpage>. <pub-id pub-id-type="pmid">16202570</pub-id></citation>
</ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clark</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Whatever next? Predictive brains, situated agents, and the future of cognitive science</article-title>. <source>Behav. Brain Sci.</source> <volume>36</volume>, <fpage>181</fpage>&#x02013;<lpage>204</lpage>. <pub-id pub-id-type="pmid">23663408</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Clark</surname> <given-names>H. H.</given-names></name> <name><surname>Schreuder</surname> <given-names>R.</given-names></name> <name><surname>Buttrick</surname> <given-names>S.</given-names></name></person-group> (<year>1983</year>). <article-title>Common ground at the understanding of demonstrative reference</article-title>. <source>J. Verbal Learn. Verbal Behav.</source> <volume>22</volume>, <fpage>245</fpage>&#x02013;<lpage>258</lpage>.</citation>
</ref>
<ref id="B23">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Claus</surname> <given-names>B.</given-names></name></person-group> (<year>2011</year>). <article-title>Establishing salience during narrative text comprehension: a simulation view account</article-title>, in <source>Salience: Multidisciplinary Perspectives on Its Function in Discourse</source>, eds <person-group person-group-type="editor"><name><surname>Chiarcos</surname> <given-names>C.</given-names></name> <name><surname>Claus</surname> <given-names>B.</given-names></name> <name><surname>Grabski</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Walter de Gruyter</publisher-name>), <fpage>251</fpage>&#x02013;<lpage>277</lpage>.</citation>
</ref>
<ref id="B24">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Coco</surname> <given-names>M. I.</given-names></name> <name><surname>Keller</surname> <given-names>F.</given-names></name></person-group> (<year>2009</year>). <article-title>The impact of visual information on reference assignment in sentence production</article-title>, in <source>Proceedings of the 31st Annual Meeting of the Cognitive Science Society</source> (<publisher-loc>Amsterdam</publisher-loc>).</citation>
</ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coco</surname> <given-names>M. I.</given-names></name> <name><surname>Keller</surname> <given-names>F.</given-names></name></person-group> (<year>2015</year>). <article-title>The interaction of visual and linguistic saliency during syntactic ambiguity resolution</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>68</volume>, <fpage>46</fpage>&#x02013;<lpage>74</lpage>. <pub-id pub-id-type="doi">10.1080/17470218.2014.936475</pub-id><pub-id pub-id-type="pmid">25176109</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coco</surname> <given-names>M. I.</given-names></name> <name><surname>Malcolm</surname> <given-names>G. L.</given-names></name> <name><surname>Keller</surname> <given-names>F.</given-names></name></person-group> (<year>2014</year>). <article-title>The interplay of bottom-up and top-down mechanisms in visual guidance during object naming</article-title>. <source>Q. J. Exp. Psychol.</source> <volume>67</volume>, <fpage>1096</fpage>&#x02013;<lpage>1120</lpage>. <pub-id pub-id-type="pmid">24224949</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Craig</surname> <given-names>A. D.</given-names></name></person-group> (<year>2009</year>). <article-title>How do you feel &#x02013; now? the anterior insula and human awareness</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>10</volume>, <fpage>59</fpage>&#x02013;<lpage>70</lpage>. <pub-id pub-id-type="doi">10.1038/nrn2555</pub-id><pub-id pub-id-type="pmid">19096369</pub-id></citation>
</ref>
<ref id="B28">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Damasio</surname> <given-names>A. R.</given-names></name></person-group> (<year>1994</year>). <source>Descartes&#x00027; Error: Emotion, Reason, and the Human Brain</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Grosset / Putnam</publisher-name>.</citation>
</ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>DeLong</surname> <given-names>K. A.</given-names></name> <name><surname>Urbach</surname> <given-names>T. P.</given-names></name> <name><surname>Kutas</surname> <given-names>M.</given-names></name></person-group> (<year>2005</year>). <article-title>Probabilistic word pre-activation during language comprehension inferred from electrical brain activity</article-title>. <source>Nat. Neurosci.</source> <volume>8</volume>, <fpage>1117</fpage>&#x02013;<lpage>1121</lpage>. <pub-id pub-id-type="doi">10.1038/nn1504</pub-id><pub-id pub-id-type="pmid">16007080</pub-id></citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Demberg</surname> <given-names>V.</given-names></name> <name><surname>Keller</surname> <given-names>F.</given-names></name></person-group> (<year>2008</year>). <article-title>Data from eye-tracking corpora as evidence for theories of syntactic processing complexity</article-title>. <source>Cognition</source> <volume>109</volume>, <fpage>193</fpage>&#x02013;<lpage>210</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2008.07.008</pub-id><pub-id pub-id-type="pmid">18930455</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Drenhaus</surname> <given-names>H.</given-names></name> <name><surname>Demberg</surname> <given-names>V.</given-names></name> <name><surname>K&#x000F6;hne</surname> <given-names>J.</given-names></name> <name><surname>Delogu</surname> <given-names>F.</given-names></name></person-group> (<year>2014</year>). <article-title>Incremental and predictive discourse processing based on causal and concessive discourse markers: ERP studies on German and English</article-title>, in <source>Proceedings of the 36th Annual Meeting of the Cognitive Science Society</source> (<publisher-loc>Qu&#x000E9;bec City, QC</publisher-loc>).</citation>
</ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ehrlich</surname> <given-names>S. F.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name></person-group> (<year>1981</year>). <article-title>Contextual effects on word perception and eye movements during reading</article-title>. <source>J. Verbal Learn. Verbal Behav.</source> <volume>20</volume>, <fpage>641</fpage>&#x02013;<lpage>655</lpage>. <pub-id pub-id-type="doi">10.1016/S0022-5371(81)90220-6</pub-id></citation>
</ref>
<ref id="B33">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Elman</surname> <given-names>J. L.</given-names></name> <name><surname>Hare</surname> <given-names>M.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name></person-group> (<year>2005</year>). <article-title>Cues, constraints, and competition in sentence processing</article-title>, in <source>Beyond Nature-Nurture: Essays in Honor of Elizabeth Bates</source>, eds <person-group person-group-type="editor"><name><surname>Tomasello</surname> <given-names>M.</given-names></name> <name><surname>Slobin</surname> <given-names>D.</given-names></name></person-group> (<publisher-loc>Mahwah, NJ</publisher-loc>: <publisher-name>Lawrence Erlbaum Associates</publisher-name>), <fpage>111</fpage>&#x02013;<lpage>138</lpage>.</citation>
</ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Farmer</surname> <given-names>T. A.</given-names></name> <name><surname>Brown</surname> <given-names>M.</given-names></name> <name><surname>Tanenhaus</surname> <given-names>M. K.</given-names></name></person-group> (<year>2013</year>). <article-title>Prediction, explanation, and the role of generative models in language processing</article-title>. <source>Behav. Brain Sci.</source> <volume>36</volume>, <fpage>211</fpage>&#x02013;<lpage>212</lpage>. <pub-id pub-id-type="doi">10.1017/S0140525X12002312</pub-id><pub-id pub-id-type="pmid">23663410</pub-id></citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Federmeier</surname> <given-names>K. D.</given-names></name> <name><surname>Kutas</surname> <given-names>M.</given-names></name></person-group> (<year>1999</year>). <article-title>A rose by any other name: Long-term memory structure and sentence processing</article-title>. <source>J. Mem. Lang.</source> <volume>41</volume>, <fpage>469</fpage>&#x02013;<lpage>495</lpage>.</citation>
</ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferretti</surname> <given-names>T. R.</given-names></name> <name><surname>Kutas</surname> <given-names>M.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name></person-group> (<year>2007</year>). <article-title>Verb aspect and the activation of event knowledge</article-title>. <source>J. Exp. Psychol.</source> <volume>33</volume>, <fpage>182</fpage>&#x02013;<lpage>196</lpage>. <pub-id pub-id-type="doi">10.1037/0278-7393.33.1.182</pub-id><pub-id pub-id-type="pmid">17201561</pub-id></citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ferretti</surname> <given-names>T. R.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name> <name><surname>Hatherell</surname> <given-names>A.</given-names></name></person-group> (<year>2001</year>). <article-title>Integrating verbs, situation schemas, and thematic role concepts</article-title>. <source>J. Mem. Lang.</source> <volume>44</volume>, <fpage>516</fpage>&#x02013;<lpage>547</lpage>. <pub-id pub-id-type="doi">10.1006/jmla.2000.2728</pub-id></citation>
</ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Firestone</surname> <given-names>C. Scholl, B. J.</given-names></name></person-group> (<year>2015</year>). <article-title>Cognition does not affect perception: evaluating the evidence for &#x02018;top-down&#x02019; effects</article-title>. <source>Behav. Brain Sci.</source> <pub-id pub-id-type="doi">10.1017/S0140525X15000965</pub-id><pub-id pub-id-type="pmid">27237758</pub-id>. [Epub ahead of print].</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fletcher</surname> <given-names>P. C.</given-names></name> <name><surname>Frith</surname> <given-names>C. D.</given-names></name></person-group> (<year>2009</year>). <article-title>Perceiving is believing: a Bayesian approach to explaining the positive symptoms of schizophrenia</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>10</volume>, <fpage>48</fpage>&#x02013;<lpage>58</lpage>. <pub-id pub-id-type="doi">10.1038/nrn2536</pub-id><pub-id pub-id-type="pmid">19050712</pub-id></citation>
</ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Folk</surname> <given-names>C. L.</given-names></name> <name><surname>Remington</surname> <given-names>R. W.</given-names></name> <name><surname>Johnston</surname> <given-names>J. C.</given-names></name></person-group> (<year>1992</year>). <article-title>Involuntary covert orienting is contingent on attentional control settings</article-title>. <source>J. Exp. Psychol.</source> <volume>18</volume>, <fpage>1030</fpage>&#x02013;<lpage>1044</lpage>. <pub-id pub-id-type="pmid">1431742</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fontolan</surname> <given-names>L.</given-names></name> <name><surname>Morillon</surname> <given-names>B.</given-names></name> <name><surname>Liegeois-Chauvel</surname> <given-names>C.</given-names></name> <name><surname>Giraud</surname> <given-names>A.-L.</given-names></name></person-group> (<year>2014</year>). <article-title>The contribution of frequency-specific activity to hierarchical information processing in the human auditory cortex</article-title>. <source>Nat. Commun.</source> <volume>5</volume>, <fpage>1</fpage>&#x02013;<lpage>10</lpage>. <pub-id pub-id-type="doi">10.1038/ncomms5694</pub-id><pub-id pub-id-type="pmid">25178489</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Fossum</surname> <given-names>V.</given-names></name> <name><surname>Levy</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>Sequential vs. hierarchical syntactic models of human incremental sentence processing</article-title>, in <source>Proceedings of the 3rd Annual Workshop on Cognitive Modeling and Computational Linguistics</source> (<publisher-loc>Montreal, QC</publisher-loc>).</citation>
</ref>
<ref id="B43">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>S. L.</given-names></name></person-group> (<year>2009</year>). <article-title>Surprisal-based comparison between a symbolic and a connectionist model of sentence processing</article-title>, in <source>Proceedings of the 31st Annual Meeting of the Cognitive Science Society</source>, (<publisher-loc>Amsterdam</publisher-loc>).</citation>
</ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>S. L.</given-names></name> <name><surname>Koppen</surname> <given-names>M.</given-names></name> <name><surname>Noordman</surname> <given-names>L. G.</given-names></name> <name><surname>Vonk</surname> <given-names>W.</given-names></name></person-group> (<year>2008</year>). <article-title>World knowledge in computational models of discourse comprehension</article-title>. <source>Discourse Process.</source> <volume>45</volume>, <fpage>429</fpage>&#x02013;<lpage>463</lpage>. <pub-id pub-id-type="doi">10.1080/01638530802069926</pub-id></citation>
</ref>
<ref id="B45">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>S. L.</given-names></name> <name><surname>Otten</surname> <given-names>L. J.</given-names></name> <name><surname>Galli</surname> <given-names>G.</given-names></name> <name><surname>Vigliocco</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>Word surprisal predicts N400 amplitude during reading</article-title>, in <source>Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics</source> (<publisher-loc>Sofia</publisher-loc>).</citation>
</ref>
<ref id="B46">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frank</surname> <given-names>S. L.</given-names></name> <name><surname>Vigliocco</surname> <given-names>G.</given-names></name></person-group> (<year>2011</year>). <article-title>Sentence comprehension as mental simulation: an information-theoretic perspective</article-title>. <source>Information</source> <volume>2</volume>, <fpage>672</fpage>&#x02013;<lpage>696</lpage>. <pub-id pub-id-type="doi">10.3390/info2040672</pub-id></citation>
</ref>
<ref id="B47">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Freyd</surname> <given-names>J. J.</given-names></name></person-group> (<year>1983</year>). <article-title>The mental representation of movement when static stimuli are viewed</article-title>. <source>Percept. Psychophys.</source> <volume>33</volume>, <fpage>575</fpage>&#x02013;<lpage>581</lpage>. <pub-id pub-id-type="pmid">6622194</pub-id></citation>
</ref>
<ref id="B48">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Frisson</surname> <given-names>S.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name> <name><surname>Pickering</surname> <given-names>M. J.</given-names></name></person-group> (<year>2005</year>). <article-title>Effects of contextual predictability and transitional probability on eye movements during reading</article-title>. <source>J. Exp. Psychol.</source> <volume>31</volume>, <fpage>862</fpage>&#x02013;<lpage>877</lpage>. <pub-id pub-id-type="doi">10.1037/0278-7393.31.5.862</pub-id><pub-id pub-id-type="pmid">16248738</pub-id></citation>
</ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friston</surname> <given-names>K.</given-names></name></person-group> (<year>2010</year>). <article-title>The free-energy principle: a unified brain theory?</article-title> <source>Nat. Rev. Neurosci.</source> <volume>11</volume>, <fpage>127</fpage>&#x02013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1038/nrn2787</pub-id><pub-id pub-id-type="pmid">20068583</pub-id></citation>
</ref>
<ref id="B50">
<citation citation-type="thesis"><person-group person-group-type="author"><name><surname>Futrell</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <source>Processing Effects of the Expectation of Informativity</source>. MA thesis, Stanford University, Stanford.</citation>
</ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gahl</surname> <given-names>S.</given-names></name> <name><surname>Garnsey</surname> <given-names>S. M.</given-names></name></person-group> (<year>2004</year>). <article-title>Knowledge of grammar, knowledge of usage: syntactic probabilities affect pronunciation variation</article-title>. <source>Language</source> <volume>80</volume>, <fpage>748</fpage>&#x02013;<lpage>775</lpage>. <pub-id pub-id-type="doi">10.1353/lan.2004.0185</pub-id></citation>
</ref>
<ref id="B52">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Giora</surname> <given-names>R.</given-names></name></person-group> (<year>2003</year>). <source>On Our Mind: Salience, Context, and Figurative Language</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation>
</ref>
<ref id="B53">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Giv&#x000F3;n</surname> <given-names>T.</given-names></name></person-group> (<year>1988</year>). <article-title>The pragmatics of word order: predictability, importance and attention</article-title>, in <source>Studies in Syntactic Typology</source>, eds <person-group person-group-type="editor"><name><surname>Hammond</surname> <given-names>M.</given-names></name> <name><surname>Moravcsik</surname> <given-names>E. A.</given-names></name> <name><surname>Wirth</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>John Benjamins</publisher-name>), <fpage>243</fpage>&#x02013;<lpage>284</lpage>.</citation>
</ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gleitman</surname> <given-names>L. R.</given-names></name> <name><surname>January</surname> <given-names>D.</given-names></name> <name><surname>Nappa</surname> <given-names>R.</given-names></name> <name><surname>Trueswell</surname> <given-names>J. C.</given-names></name></person-group> (<year>2007</year>). <article-title>On the give and take between event apprehension and utterance formulation</article-title>. <source>J. Mem. Lang.</source> <volume>57</volume>, <fpage>544</fpage>&#x02013;<lpage>569</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2007.01.007</pub-id><pub-id pub-id-type="pmid">18978929</pub-id></citation>
</ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Glenberg</surname> <given-names>A. M.</given-names></name> <name><surname>Meyer</surname> <given-names>M.</given-names></name> <name><surname>Lindem</surname> <given-names>K.</given-names></name></person-group> (<year>1987</year>). <article-title>Mental models contribute to foregrounding during text comprehension</article-title>. <source>J. Mem. Lang.</source> <volume>26</volume>, <fpage>69</fpage>&#x02013;<lpage>83</lpage>.</citation>
</ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Griffin</surname> <given-names>Z. M.</given-names></name> <name><surname>Bock</surname> <given-names>K.</given-names></name></person-group> (<year>2000</year>). <article-title>What the eyes say about speaking</article-title>. <source>Psychol. Sci.</source> <volume>11</volume>, <fpage>274</fpage>&#x02013;<lpage>279</lpage>. <pub-id pub-id-type="doi">10.1111/1467-9280.00255</pub-id><pub-id pub-id-type="pmid">11273384</pub-id></citation>
</ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Griffiths</surname> <given-names>T. L.</given-names></name> <name><surname>Steyvers</surname> <given-names>M.</given-names></name> <name><surname>Tenenbaum</surname> <given-names>J. B.</given-names></name></person-group> (<year>2007</year>). <article-title>Topics in semantic representation</article-title>. <source>Psychol. Rev.</source> <volume>114</volume>, <fpage>211</fpage>&#x02013;<lpage>244</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.114.2.211</pub-id><pub-id pub-id-type="pmid">17500626</pub-id></citation>
</ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Griffiths</surname> <given-names>T. L.</given-names></name> <name><surname>Tenenbaum</surname> <given-names>J. B.</given-names></name></person-group> (<year>2007</year>). <article-title>From mere coincidences to meaningful discoveries</article-title>. <source>Cognition</source> <volume>103</volume>, <fpage>180</fpage>&#x02013;<lpage>226</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2006.03.004</pub-id><pub-id pub-id-type="pmid">16678145</pub-id></citation>
</ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Grosz</surname> <given-names>B. J.</given-names></name> <name><surname>Weinstein</surname> <given-names>S.</given-names></name> <name><surname>Joshi</surname> <given-names>A. K.</given-names></name></person-group> (<year>1995</year>). <article-title>Centering: A framework for modeling the local coherence of discourse</article-title>. <source>Comput. Linguist.</source> <volume>21</volume>, <fpage>203</fpage>&#x02013;<lpage>225</lpage>.</citation>
</ref>
<ref id="B60">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Hake</surname> <given-names>H. W.</given-names></name></person-group> (<year>1957</year>). <source>Contribution of Psychology to the Study of Pattern Vision</source>. USAW WADC Tech. Rept. 1957.</citation>
</ref>
<ref id="B61">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Hale</surname> <given-names>J.</given-names></name></person-group> (<year>2001</year>). <article-title>A probabilistic Earley parser as a psycholinguistic model</article-title>, in <source>Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics</source> (<publisher-loc>Pittsburgh, PA</publisher-loc>).</citation>
</ref>
<ref id="B62">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanul&#x000ED;kov&#x000E1;</surname> <given-names>A.</given-names></name> <name><surname>Van Alphen</surname> <given-names>P. M.</given-names></name> <name><surname>Van Goch</surname> <given-names>M. M.</given-names></name> <name><surname>Weber</surname> <given-names>A.</given-names></name></person-group> (<year>2012</year>). <article-title>When one person&#x00027;s mistake is another&#x00027;s standard usage: the effect of foreign accent on syntactic processing</article-title>. <source>J. Cogn. Neurosci.</source> <volume>24</volume>, <fpage>878</fpage>&#x02013;<lpage>887</lpage>. <pub-id pub-id-type="doi">10.1162/jocn_a_00103</pub-id><pub-id pub-id-type="pmid">21812565</pub-id></citation>
</ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hare</surname> <given-names>M.</given-names></name> <name><surname>Elman</surname> <given-names>J. L.</given-names></name> <name><surname>Tabaczynski</surname> <given-names>T.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>The wind chilled the spectators, but the wine just chilled: sense, structure, and sentence comprehension</article-title>. <source>Cogn. Sci.</source> <volume>33</volume>, <fpage>610</fpage>&#x02013;<lpage>628</lpage>. <pub-id pub-id-type="doi">10.1111/j.1551-6709.2009.01027.x</pub-id><pub-id pub-id-type="pmid">19750146</pub-id></citation>
</ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hare</surname> <given-names>M.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name> <name><surname>Elman</surname> <given-names>J. L.</given-names></name></person-group> (<year>2003</year>). <article-title>Sense and structure: meaning as a determinant of verb subcategorization preferences</article-title>. <source>J. Mem. Lang.</source> <volume>48</volume>, <fpage>281</fpage>&#x02013;<lpage>303</lpage>. <pub-id pub-id-type="doi">10.1016/S0749-596X(02)00516-8</pub-id></citation>
</ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hayhoe</surname> <given-names>M.</given-names></name> <name><surname>Ballard</surname> <given-names>D.</given-names></name></person-group> (<year>2005</year>). <article-title>Eye movements in natural behavior</article-title>. <source>Trends Cogn. Sci.</source> <volume>9</volume>, <fpage>188</fpage>&#x02013;<lpage>194</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2005.02.009</pub-id><pub-id pub-id-type="pmid">15808501</pub-id></citation>
</ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Henderson</surname> <given-names>J. M.</given-names></name> <name><surname>Malcolm</surname> <given-names>G. L.</given-names></name> <name><surname>Schandl</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Searching in the dark: cognitive relevance drives attention in real-world scenes</article-title>. <source>Psychon. Bull. Rev.</source> <volume>16</volume>, <fpage>850</fpage>&#x02013;<lpage>856</lpage>. <pub-id pub-id-type="doi">10.3758/PBR.16.5.850</pub-id><pub-id pub-id-type="pmid">19815788</pub-id></citation>
</ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holz</surname> <given-names>E. M.</given-names></name> <name><surname>Glennon</surname> <given-names>M.</given-names></name> <name><surname>Prendergast</surname> <given-names>K.</given-names></name> <name><surname>Sauseng</surname> <given-names>P.</given-names></name></person-group> (<year>2010</year>). <article-title>Theta&#x02013;gamma phase synchronization during memory matching in visual working memory</article-title>. <source>Neuroimage</source> <volume>52</volume>, <fpage>326</fpage>&#x02013;<lpage>335</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.04.003</pub-id><pub-id pub-id-type="pmid">20382239</pub-id></citation>
</ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Horvitz</surname> <given-names>J.</given-names></name></person-group> (<year>2000</year>). <article-title>Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events</article-title>. <source>Neuroscience</source> <volume>96</volume>, <fpage>651</fpage>&#x02013;<lpage>656</lpage>. <pub-id pub-id-type="doi">10.1016/S0306-4522(00)00019-1</pub-id><pub-id pub-id-type="pmid">10727783</pub-id></citation>
</ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hubbard</surname> <given-names>T. L.</given-names></name></person-group> (<year>2005</year>). <article-title>Representational momentum and related displacements in spatial memory: a review of the findings</article-title>. <source>Psychon. Bull. Rev.</source> <volume>12</volume>, <fpage>822</fpage>&#x02013;<lpage>851</lpage>. <pub-id pub-id-type="doi">10.3758/BF03196775</pub-id><pub-id pub-id-type="pmid">16524000</pub-id></citation>
</ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Itti</surname> <given-names>L.</given-names></name> <name><surname>Baldi</surname> <given-names>P.</given-names></name></person-group> (<year>2009</year>). <article-title>Bayesian surprise attracts human attention</article-title>. <source>Vision Res.</source> <volume>49</volume>, <fpage>1295</fpage>&#x02013;<lpage>1306</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2008.09.007</pub-id><pub-id pub-id-type="pmid">18834898</pub-id></citation>
</ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Itti</surname> <given-names>L.</given-names></name> <name><surname>Koch</surname> <given-names>C.</given-names></name></person-group> (<year>2000</year>). <article-title>A saliency-based search mechanism for overt and covert shifts of visual attention</article-title>. <source>Vision Res.</source> <volume>40</volume>, <fpage>1489</fpage>&#x02013;<lpage>1506</lpage>. <pub-id pub-id-type="doi">10.1016/S0042-6989(99)00163-7</pub-id><pub-id pub-id-type="pmid">10788654</pub-id></citation>
</ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Izquierdo</surname> <given-names>J.</given-names></name> <name><surname>Collins</surname> <given-names>L.</given-names></name></person-group> (<year>2008</year>). <article-title>The facilitative role of L1 influence in tense&#x02013;aspect marking: a comparison of hispanophone and anglophone learners of french</article-title>. <source>Modern Lang. J.</source> <volume>92</volume>, <fpage>350</fpage>&#x02013;<lpage>368</lpage>.</citation>
</ref>
<ref id="B73">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Jaeger</surname> <given-names>T. F.</given-names></name> <name><surname>Levy</surname> <given-names>R. P.</given-names></name></person-group> (<year>2007</year>). <article-title>Speakers optimize information density through syntactic reduction</article-title>, in <source>Advances in Neural Information Processing Systems Vol. 19</source>, eds <person-group person-group-type="editor"><name><surname>Sch&#x000F6;lkopf</surname> <given-names>B.</given-names></name> <name><surname>Platt</surname> <given-names>J.</given-names></name> <name><surname>Hoffman</surname> <given-names>T.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>849</fpage>&#x02013;<lpage>856</lpage>.</citation>
</ref>
<ref id="B74">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jensen</surname> <given-names>O.</given-names></name> <name><surname>Bonnefond</surname> <given-names>M.</given-names></name> <name><surname>VanRullen</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>An oscillatory mechanism for prioritizing salient unattended stimuli</article-title>. <source>Trends Cogn. Sci.</source> <volume>16</volume>, <fpage>200</fpage>&#x02013;<lpage>206</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2012.03.002</pub-id><pub-id pub-id-type="pmid">22436764</pub-id></citation>
</ref>
<ref id="B75">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jensen</surname> <given-names>O.</given-names></name> <name><surname>Gelfand</surname> <given-names>J.</given-names></name> <name><surname>Kounios</surname> <given-names>J.</given-names></name> <name><surname>Lisman</surname> <given-names>J. E.</given-names></name></person-group> (<year>2002</year>). <article-title>Oscillations in the alpha band (9&#x02013;12 Hz) increase with memory load during retention in a short-term memory task</article-title>. <source>Cereb. Cortex</source> <volume>12</volume>, <fpage>877</fpage>&#x02013;<lpage>882</lpage>. <pub-id pub-id-type="doi">10.1093/cercor/12.8.877</pub-id><pub-id pub-id-type="pmid">12122036</pub-id></citation>
</ref>
<ref id="B76">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jurafsky</surname> <given-names>D.</given-names></name> <name><surname>Bell</surname> <given-names>A.</given-names></name> <name><surname>Gregory</surname> <given-names>M.</given-names></name> <name><surname>Raymond</surname> <given-names>W. D.</given-names></name></person-group> (<year>2001</year>). <article-title>Probabilistic relations between words: Evidence from reduction in lexical production</article-title>. <source>Typol. Stud. Lang.</source> <volume>45</volume>, <fpage>229</fpage>&#x02013;<lpage>254</lpage>. <pub-id pub-id-type="doi">10.1075/tsl.45.13jur</pub-id></citation>
</ref>
<ref id="B77">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaakinen</surname> <given-names>J. K.</given-names></name> <name><surname>Hy&#x000F6;n&#x000E4;</surname> <given-names>J.</given-names></name></person-group> (<year>2010</year>). <article-title>Task effects on eye movements during reading</article-title>. <source>J. Exp. Psychol.</source> <volume>36</volume>, <fpage>1561</fpage>&#x02013;<lpage>1566</lpage>. <pub-id pub-id-type="doi">10.1037/a0020693</pub-id><pub-id pub-id-type="pmid">20854008</pub-id></citation>
</ref>
<ref id="B78">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaiser</surname> <given-names>E.</given-names></name> <name><surname>Trueswell</surname> <given-names>J. C.</given-names></name></person-group> (<year>2008</year>). <article-title>Interpreting pronouns and demonstratives in Finnish: evidence for a form-specific approach to reference resolution</article-title>. <source>Lang. Cogn. Process.</source> <volume>23</volume>, <fpage>709</fpage>&#x02013;<lpage>748</lpage>. <pub-id pub-id-type="doi">10.1080/01690960701771220</pub-id></citation>
</ref>
<ref id="B79">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kakade</surname> <given-names>S.</given-names></name> <name><surname>Dayan</surname> <given-names>P.</given-names></name></person-group> (<year>2002</year>). <article-title>Dopamine: generalization and bonuses</article-title>. <source>Neural Netw.</source> <volume>15</volume>, <fpage>549</fpage>&#x02013;<lpage>559</lpage>. <pub-id pub-id-type="doi">10.1016/S0893-6080(02)00048-5</pub-id><pub-id pub-id-type="pmid">12371511</pub-id></citation>
</ref>
<ref id="B80">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kamide</surname> <given-names>Y.</given-names></name> <name><surname>Altmann</surname> <given-names>G. T.</given-names></name> <name><surname>Haywood</surname> <given-names>S. L.</given-names></name></person-group> (<year>2003</year>). <article-title>The time-course of prediction in incremental sentence processing: evidence from anticipatory eye movements</article-title>. <source>J. Mem. Lang.</source> <volume>49</volume>, <fpage>133</fpage>&#x02013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1016/S0749-596X(03)00023-8</pub-id></citation>
</ref>
<ref id="B81">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kamp</surname> <given-names>H.</given-names></name></person-group> (<year>1981</year>). <article-title>A theory of truth and semantic representation</article-title>, in <source>Formal Methods in the Study of Language</source>, eds <person-group person-group-type="editor"><name><surname>Portner</surname> <given-names>P. H.</given-names></name> <name><surname>Partee</surname> <given-names>B. H.</given-names></name></person-group> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>Foris</publisher-name>), <fpage>277</fpage>&#x02013;<lpage>213</lpage>.</citation>
</ref>
<ref id="B82">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kamp</surname> <given-names>H.</given-names></name> <name><surname>Reyle</surname> <given-names>U.</given-names></name></person-group> (<year>1993</year>). <source>From Discourse to Logic</source>. (<publisher-loc>Dordrecht</publisher-loc>: <publisher-name>Kluver</publisher-name>).</citation>
</ref>
<ref id="B83">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kelleher</surname> <given-names>J. D.</given-names></name></person-group> (<year>2011</year>). <article-title>Visual salience and the other one</article-title>, in <source>Salience: Multidisciplinary Perspectives on Its Function in Discourse</source>, eds <person-group person-group-type="editor"><name><surname>Chiarcos</surname> <given-names>C.</given-names></name> <name><surname>Claus</surname> <given-names>B.</given-names></name> <name><surname>Grabski</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Walter de Gruyter</publisher-name>), <fpage>205</fpage>&#x02013;<lpage>228</lpage>.</citation>
</ref>
<ref id="B84">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kerswill</surname> <given-names>P. Williams, A.</given-names></name></person-group> (<year>2002</year>). <article-title>&#x0201C;Salience&#x0201D; as an explanatory factor in language change: evidence from dialect levelling in urban England</article-title>, in <source>Language Change: The Interplay of Internal, External and Extra-Linguistic Factors</source>, eds <person-group person-group-type="editor"><name><surname>Jones</surname> <given-names>M. C.</given-names></name> <name><surname>Esch</surname> <given-names>E.</given-names></name></person-group> (<publisher-loc>Berlin; New York, NY</publisher-loc>: <publisher-name>Mouton de Gruyter</publisher-name>), <fpage>81</fpage>&#x02013;<lpage>110</lpage>.</citation>
</ref>
<ref id="B85">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Khalkhali</surname> <given-names>S.</given-names></name> <name><surname>Wammes</surname> <given-names>J.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name></person-group> (<year>2012</year>). <article-title>Integrating words that refer to typical sequences of events</article-title>. <source>Can. J. Exp. Psychol.</source> <volume>66</volume>, <fpage>106</fpage>&#x02013;<lpage>114</lpage>. <pub-id pub-id-type="doi">10.1037/a0027369</pub-id><pub-id pub-id-type="pmid">22686159</pub-id></citation>
</ref>
<ref id="B86">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kidd</surname> <given-names>C.</given-names></name> <name><surname>Piantadosi</surname> <given-names>S. T.</given-names></name> <name><surname>Aslin</surname> <given-names>R. N.</given-names></name></person-group> (<year>2012</year>). <article-title>The Goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex</article-title>. <source>PLoS ONE</source> <volume>7</volume>:<fpage>e36399</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0036399</pub-id><pub-id pub-id-type="pmid">22649492</pub-id></citation>
</ref>
<ref id="B87">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Klimesch</surname> <given-names>W.</given-names></name> <name><surname>Sauseng</surname> <given-names>P.</given-names></name> <name><surname>Hanslmayr</surname> <given-names>S.</given-names></name></person-group> (<year>2007</year>). <article-title>EEG alpha oscillations: The inhibition-timing hypothesis</article-title>. <source>Brain Res. Rev.</source> <volume>53</volume>, <fpage>63</fpage>&#x02013;<lpage>88</lpage>. <pub-id pub-id-type="doi">10.1016/j.brainresrev.2006.06.003</pub-id><pub-id pub-id-type="pmid">16887192</pub-id></citation>
</ref>
<ref id="B88">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koch</surname> <given-names>C.</given-names></name> <name><surname>Ullman</surname> <given-names>S.</given-names></name></person-group> (<year>1985</year>). <article-title>Shifts in selective visual attention: towards the underlying neural circuitry</article-title>. <source>Hum. Neurobiol.</source> <volume>4</volume>, <fpage>219</fpage>&#x02013;<lpage>227</lpage>. <pub-id pub-id-type="pmid">3836989</pub-id></citation>
</ref>
<ref id="B89">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>K&#x000F6;hne</surname> <given-names>J.</given-names></name> <name><surname>Demberg</surname> <given-names>V.</given-names></name></person-group> (<year>2013</year>). <article-title>The time-course of processing discourse connectives</article-title>, in <source>Proceedings of the 35th Annual Meeting of the Cognitive Science Society</source> (<publisher-loc>Berlin</publisher-loc>).</citation>
</ref>
<ref id="B90">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koolen</surname> <given-names>R.</given-names></name> <name><surname>Krahmer</surname> <given-names>E.</given-names></name> <name><surname>Swerts</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>How distractor objects trigger referential overspecification: testing the effects of visual clutter and distractor distance</article-title>. <source>Cogn. Sci.</source> <pub-id pub-id-type="doi">10.1111/cogs.12297</pub-id><pub-id pub-id-type="pmid">26432277</pub-id>. [Epub ahead of print].</citation>
</ref>
<ref id="B91">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Kravtchenko</surname> <given-names>E.</given-names></name> <name><surname>Demberg</surname> <given-names>V.</given-names></name></person-group> (<year>2015</year>). <article-title>Semantically underinformative utterances trigger pragmatic inferences</article-title>, in <source>Proceedings of the 37th Annual Meeting of the Cognitive Science Society</source> (<publisher-loc>Pasadena, CA</publisher-loc>).</citation>
</ref>
<ref id="B92">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuperberg</surname> <given-names>G. R.</given-names></name></person-group> (<year>2016</year>). <article-title>Separate streams or probabilistic inference? what the N400 can tell us about the comprehension of events</article-title>. <source>Lang. Cogn. Neurosci.</source> <volume>31</volume>, <fpage>602</fpage>&#x02013;<lpage>616</lpage>. <pub-id pub-id-type="doi">10.1080/23273798.2015.1130233</pub-id></citation>
</ref>
<ref id="B93">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuperberg</surname> <given-names>G. R.</given-names></name> <name><surname>Jaeger</surname> <given-names>T. F.</given-names></name></person-group> (<year>2016</year>). <article-title>What do we mean by prediction in language comprehension?</article-title> <source>Lang. Cogn. Neurosci.</source> <volume>31</volume>, <fpage>32</fpage>&#x02013;<lpage>59</lpage>. <pub-id pub-id-type="doi">10.1080/23273798.2015.1102299</pub-id><pub-id pub-id-type="pmid">27135040</pub-id></citation>
</ref>
<ref id="B94">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kuperberg</surname> <given-names>G. R.</given-names></name> <name><surname>Sitnikova</surname> <given-names>T.</given-names></name> <name><surname>Caplan</surname> <given-names>D.</given-names></name> <name><surname>Holcomb</surname> <given-names>P. J.</given-names></name></person-group> (<year>2003</year>). <article-title>Electrophysiological distinctions in processing conceptual relationships within simple sentences</article-title>. <source>Cogn. Brain Res.</source> <volume>17</volume>, <fpage>117</fpage>&#x02013;<lpage>129</lpage>. <pub-id pub-id-type="doi">10.1016/S0926-6410(03)00086-7</pub-id><pub-id pub-id-type="pmid">12763198</pub-id></citation>
</ref>
<ref id="B95">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kutas</surname> <given-names>M.</given-names></name> <name><surname>DeLong</surname> <given-names>K. A.</given-names></name> <name><surname>Smith</surname> <given-names>N. J.</given-names></name></person-group> (<year>2011</year>). <article-title>A look around at what lies ahead: Prediction and predictability in language processing</article-title>, in <source>Predictions in the Brain: Using Our Past to Generate a Future</source>, ed <person-group person-group-type="editor"><name><surname>Bar</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Oxford, UK</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>), <fpage>190</fpage>&#x02013;<lpage>207</lpage>.</citation>
</ref>
<ref id="B96">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kutas</surname> <given-names>M.</given-names></name> <name><surname>Federmeier</surname> <given-names>K. D.</given-names></name></person-group> (<year>2011</year>). <article-title>Thirty years and counting: Finding meaning in the N400 component of the event related brain potential (ERP)</article-title>. <source>Annu. Rev. Psychol.</source> <volume>62</volume>, <fpage>621</fpage>&#x02013;<lpage>647</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.psych.093008.131123</pub-id><pub-id pub-id-type="pmid">20809790</pub-id></citation>
</ref>
<ref id="B97">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kutas</surname> <given-names>M.</given-names></name> <name><surname>Hillyard</surname> <given-names>S. A.</given-names></name></person-group> (<year>1984</year>). <article-title>Brain potentials during reading reflect word expectancy and semantic association</article-title>. <source>Nature</source> <volume>307</volume>, <fpage>161</fpage>&#x02013;<lpage>163</lpage>. <pub-id pub-id-type="pmid">6690995</pub-id></citation>
</ref>
<ref id="B98">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lambrecht</surname> <given-names>K.</given-names></name></person-group> (<year>1994</year>). <source>Information Structure and Sentence Form: A Theory of Topic, Focus, and the Mental Representations of Discourse Referents</source>. <publisher-loc>Cambridge, UK</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation>
</ref>
<ref id="B99">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Lascarides</surname> <given-names>A.</given-names></name> <name><surname>Asher</surname> <given-names>N.</given-names></name></person-group> (<year>2007</year>). <article-title>Segmented discourse representation theory: dynamic semantics with discourse structure</article-title>, in <source>Computing Meaning</source>, <volume>Vol. 3</volume>, (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>87</fpage>&#x02013;<lpage>124</lpage>.</citation>
</ref>
<ref id="B100">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>LeDoux</surname> <given-names>J.</given-names></name></person-group> (<year>2000</year>). <article-title>Emotion circuits in the brain</article-title>. <source>Annu. Rev. Neurosci.</source> <volume>23</volume>, <fpage>155</fpage>&#x02013;<lpage>184</lpage>. <pub-id pub-id-type="doi">10.1146/annurev.neuro.23.1.155</pub-id><pub-id pub-id-type="pmid">10845062</pub-id></citation>
</ref>
<ref id="B101">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Levelt</surname> <given-names>W. J. M.</given-names></name></person-group> (<year>1989</year>). <source>Speaking. From Intention to Articulation</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>The MIT Press</publisher-name>.</citation>
</ref>
<ref id="B102">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Levy</surname> <given-names>R.</given-names></name></person-group> (<year>2008</year>). <article-title>Expectation-based syntactic comprehension</article-title>. <source>Cognition</source> <volume>106</volume>, <fpage>1126</fpage>&#x02013;<lpage>1177</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2007.05.006</pub-id><pub-id pub-id-type="pmid">17662975</pub-id></citation>
</ref>
<ref id="B103">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewis</surname> <given-names>A. G.</given-names></name> <name><surname>Bastiaansen</surname> <given-names>M.</given-names></name></person-group> (<year>2015</year>). <article-title>A predictive coding framework for rapid neural dynamics during sentence-level language comprehension</article-title>. <source>Cortex</source> <volume>68</volume>, <fpage>155</fpage>&#x02013;<lpage>168</lpage>. <pub-id pub-id-type="doi">10.1016/j.cortex.2015.02.01</pub-id><pub-id pub-id-type="pmid">25840879</pub-id></citation>
</ref>
<ref id="B104">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lisman</surname> <given-names>J.</given-names></name> <name><surname>Buzs&#x000E1;ki</surname> <given-names>G.</given-names></name></person-group> (<year>2008</year>). <article-title>A neural coding scheme formed by the combined function of gamma and theta oscillations</article-title>. <source>Schizophr. Bull.</source> <volume>34</volume>, <fpage>974</fpage>&#x02013;<lpage>980</lpage>. <pub-id pub-id-type="doi">10.1093/schbul/sbn060</pub-id><pub-id pub-id-type="pmid">18559405</pub-id></citation>
</ref>
<ref id="B105">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Loftus</surname> <given-names>G. R.</given-names></name> <name><surname>Mackworth</surname> <given-names>N. H.</given-names></name></person-group> (<year>1978</year>). <article-title>Cognitive determinants of fixation location during picture viewing</article-title>. <source>J. Exp. Psychol.</source> <volume>4</volume>, <fpage>565</fpage>&#x02013;<lpage>572</lpage>. <pub-id pub-id-type="pmid">722248</pub-id></citation>
</ref>
<ref id="B106">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>MacDonald</surname> <given-names>M. C.</given-names></name> <name><surname>Pearlmutter</surname> <given-names>N. J.</given-names></name> <name><surname>Seidenberg</surname> <given-names>M. S.</given-names></name></person-group> (<year>1994</year>). <article-title>The lexical nature of syntactic ambiguity resolution</article-title>. <source>Psychol. Rev.</source> <volume>101</volume>, <fpage>676</fpage>&#x02013;<lpage>703</lpage>. <pub-id pub-id-type="pmid">7984711</pub-id></citation>
</ref>
<ref id="B107">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mackworth</surname> <given-names>N. H.</given-names></name> <name><surname>Morandi</surname> <given-names>A. J.</given-names></name></person-group> (<year>1967</year>). <article-title>The gaze selects informative details within pictures</article-title>. <source>Percept. Psychophys.</source> <volume>2</volume>, <fpage>547</fpage>&#x02013;<lpage>552</lpage>.</citation>
</ref>
<ref id="B108">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mathewson</surname> <given-names>K. E.</given-names></name> <name><surname>Gratton</surname> <given-names>G.</given-names></name> <name><surname>Fabiani</surname> <given-names>M.</given-names></name> <name><surname>Beck</surname> <given-names>D. M.</given-names></name> <name><surname>Ro</surname> <given-names>T.</given-names></name></person-group> (<year>2009</year>). <article-title>To see or not to see: prestimulus alpha phase predicts visual awareness</article-title>. <source>J. Neurosci.</source> <volume>29</volume>, <fpage>725</fpage>&#x02013;<lpage>2732</lpage>. <pub-id pub-id-type="doi">10.1523/JNEUROSCI.3963-08.2009</pub-id><pub-id pub-id-type="pmid">26753742</pub-id></citation>
</ref>
<ref id="B109">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Matsuki</surname> <given-names>K.</given-names></name> <name><surname>Chow</surname> <given-names>T.</given-names></name> <name><surname>Hare</surname> <given-names>M.</given-names></name> <name><surname>Elman</surname> <given-names>J. L.</given-names></name> <name><surname>Scheepers</surname> <given-names>C.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name></person-group> (<year>2011</year>). <article-title>Event-based plausibility immediately influences on-line language comprehension</article-title>. <source>J. Exp. Psychol.</source> <volume>37</volume>, <fpage>13</fpage>&#x02013;<lpage>934</lpage>. <pub-id pub-id-type="doi">10.1037/a0022964</pub-id><pub-id pub-id-type="pmid">21517222</pub-id></citation>
</ref>
<ref id="B110">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McDonald</surname> <given-names>S. A.</given-names></name> <name><surname>Shillcock</surname> <given-names>R. C.</given-names></name></person-group> (<year>2003</year>). <article-title>Eye movements reveal the on-line computation of lexical probabilities during reading</article-title>. <source>Psychol. Sci.</source> <volume>14</volume>, <fpage>648</fpage>&#x02013;<lpage>652</lpage>. <pub-id pub-id-type="doi">10.1046/j.0956-7976.2003.psci_1480.x</pub-id><pub-id pub-id-type="pmid">14629701</pub-id></citation>
</ref>
<ref id="B111">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>McRae</surname> <given-names>K.</given-names></name> <name><surname>Matsuki</surname> <given-names>K.</given-names></name></person-group> (<year>2009</year>). <article-title>People use their knowledge of common events to understand language, and do so as quickly as possible</article-title>. <source>Lang. Linguist. Compass</source> <volume>3</volume>, <fpage>1417</fpage>&#x02013;<lpage>1429</lpage>. <pub-id pub-id-type="doi">10.1111/j.1749-818X.2009.00174.x</pub-id><pub-id pub-id-type="pmid">22125574</pub-id></citation>
</ref>
<ref id="B112">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mesulam</surname> <given-names>M.</given-names></name></person-group> (<year>2008</year>). <article-title>Representation, inference, and transcendent encoding in neurocognitive networks of the human brain</article-title>. <source>Ann. Neurol.</source> <volume>64</volume>, <fpage>367</fpage>&#x02013;<lpage>378</lpage>. <pub-id pub-id-type="doi">10.1002/ana.21534</pub-id><pub-id pub-id-type="pmid">18991346</pub-id></citation>
</ref>
<ref id="B113">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Metusalem</surname> <given-names>R.</given-names></name> <name><surname>Kutas</surname> <given-names>M.</given-names></name> <name><surname>Urbach</surname> <given-names>T. P.</given-names></name> <name><surname>Hare</surname> <given-names>M.</given-names></name> <name><surname>McRae</surname> <given-names>K.</given-names></name> <name><surname>Elman</surname> <given-names>J. L.</given-names></name></person-group> (<year>2012</year>). <article-title>Generalized event knowledge activation during online sentence comprehension</article-title>. <source>J. Mem. Lang.</source> <volume>66</volume>, <fpage>545</fpage>&#x02013;<lpage>567</lpage>. <pub-id pub-id-type="doi">10.1016/j.jml.2012.01.001</pub-id><pub-id pub-id-type="pmid">22711976</pub-id></citation>
</ref>
<ref id="B114">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Mitchell</surname> <given-names>J.</given-names></name> <name><surname>Lapata</surname> <given-names>M.</given-names></name> <name><surname>Demberg</surname> <given-names>V.</given-names></name> <name><surname>Keller</surname> <given-names>F.</given-names></name></person-group> (<year>2010</year>). <article-title>Syntactic and semantic factors in processing difficulty: an integrated measure</article-title>, in <source>Proceedings of the 48st Annual Meeting of the Association for Computational Linguistics</source> (<publisher-loc>Uppsala</publisher-loc>).</citation>
</ref>
<ref id="B115">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Montag</surname> <given-names>J. L.</given-names></name> <name><surname>MacDonald</surname> <given-names>M. C.</given-names></name></person-group> (<year>2014</year>). <article-title>Visual salience modulates structure choice in relative clause production</article-title>. <source>Lang. Speech</source> <volume>57</volume>, <fpage>163</fpage>&#x02013;<lpage>180</lpage>. <pub-id pub-id-type="doi">10.1177/0023830913495656</pub-id><pub-id pub-id-type="pmid">25102604</pub-id></citation>
</ref>
<ref id="B116">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Morris</surname> <given-names>R. K.</given-names></name></person-group> (<year>1994</year>). <article-title>Lexical and message-level sentence context effects on fixation times in reading</article-title>. <source>J. Exp. Psychol.</source> <volume>20</volume>, <fpage>92</fpage>&#x02013;<lpage>103</lpage>. <pub-id pub-id-type="pmid">8138791</pub-id></citation>
</ref>
<ref id="B117">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Navalpakkam</surname> <given-names>V.</given-names></name> <name><surname>Itti</surname> <given-names>L.</given-names></name></person-group> (<year>2005</year>). <article-title>Modeling the influence of task on attention</article-title>. <source>Vision Res.</source> <volume>45</volume>, <fpage>205</fpage>&#x02013;<lpage>231</lpage>. <pub-id pub-id-type="doi">10.1016/j.visres.2004.07.042</pub-id><pub-id pub-id-type="pmid">15581921</pub-id></citation>
</ref>
<ref id="B118">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nicenboim</surname> <given-names>B.</given-names></name> <name><surname>Vasishth</surname> <given-names>S.</given-names></name> <name><surname>Gattei</surname> <given-names>C.</given-names></name> <name><surname>Sigman</surname> <given-names>M.</given-names></name> <name><surname>Kliegl</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Working memory differences in long-distance dependency resolution</article-title>. <source>Front. Psychol.</source> <volume>6</volume>:<issue>312</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2015.00312</pub-id><pub-id pub-id-type="pmid">25852623</pub-id></citation>
</ref>
<ref id="B119">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Olivers</surname> <given-names>C. N.</given-names></name></person-group> (<year>2007</year>). <article-title>The time course of attention: it is better than we thought</article-title>. <source>Curr. Direct. Psychol. Sci.</source> <volume>16</volume>, <fpage>849</fpage>&#x02013;<lpage>860</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-8721.2007.00466.x</pub-id></citation>
</ref>
<ref id="B120">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Osgood</surname> <given-names>C. E.</given-names></name> <name><surname>Bock</surname> <given-names>J. K.</given-names></name></person-group> (<year>1977</year>). <article-title>Salience and sentencing: Some production principles</article-title>, in <source>Sentence Production: Developments in Research and Theory</source>, ed <person-group person-group-type="editor"><name><surname>Rosenberg</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>Hillsdale, NJ</publisher-loc>: <publisher-name>Erlbaum</publisher-name>), <fpage>89</fpage>&#x02013;<lpage>140</lpage>.</citation>
</ref>
<ref id="B121">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Palm</surname> <given-names>G.</given-names></name></person-group> (<year>2012</year>). <source>Novelty, Information and Surprise</source>. <publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>.</citation>
</ref>
<ref id="B122">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pierrehumbert</surname> <given-names>J. B.</given-names></name></person-group> (<year>2006</year>). <article-title>The next toolkit</article-title>. <source>J. Phon.</source> <volume>34</volume>, <fpage>516</fpage>&#x02013;<lpage>530</lpage>. <pub-id pub-id-type="doi">10.1016/j.wocn.2006.06.003</pub-id></citation>
</ref>
<ref id="B123">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Prat-Sala</surname> <given-names>M.</given-names></name> <name><surname>Branigan</surname> <given-names>H. P.</given-names></name></person-group> (<year>2000</year>). <article-title>Discourse constraints on syntactic processing in language production: a cross-linguistic study in english and spanish</article-title>. <source>J. Mem. Lang.</source> <volume>42</volume>, <fpage>168</fpage>&#x02013;<lpage>182</lpage>. <pub-id pub-id-type="doi">10.1006/jmla.1999.2668</pub-id></citation>
</ref>
<ref id="B124">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>R&#x000E1;cz</surname> <given-names>P.</given-names></name></person-group> (<year>2013</year>). <source>Salience in Sociolinguistics: A Quantitative Approach</source>. <publisher-loc>Berlin</publisher-loc>: <publisher-name>Walter de Gruyter</publisher-name>.</citation>
</ref>
<ref id="B125">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Radvansky</surname> <given-names>G. A.</given-names></name> <name><surname>Curiel</surname> <given-names>J. M.</given-names></name></person-group> (<year>1998</year>). <article-title>Narrative comprehension and aging: The fate of completed goal information</article-title>. <source>Psychol. Aging</source> <volume>13</volume>, <fpage>69</fpage>&#x02013;<lpage>79</lpage>. <pub-id pub-id-type="pmid">9533191</pub-id></citation>
</ref>
<ref id="B126">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ranganath</surname> <given-names>C.</given-names></name> <name><surname>Rainer</surname> <given-names>G.</given-names></name></person-group> (<year>2003</year>). <article-title>Neural mechanisms for detecting and remembering novel events</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>4</volume>, <fpage>193</fpage>&#x02013;<lpage>202</lpage>. <pub-id pub-id-type="doi">10.1038/nrn1052</pub-id><pub-id pub-id-type="pmid">12612632</pub-id></citation>
</ref>
<ref id="B127">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>R. P.</given-names></name> <name><surname>Ballard</surname> <given-names>D. H.</given-names></name></person-group> (<year>1999</year>). <article-title>Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects</article-title>. <source>Nat. Neurosci.</source> <volume>2</volume>, <fpage>79</fpage>&#x02013;<lpage>87</lpage>. <pub-id pub-id-type="pmid">10195184</pub-id></citation>
</ref>
<ref id="B128">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rao</surname> <given-names>R. P.</given-names></name> <name><surname>Zelinsky</surname> <given-names>G. J.</given-names></name> <name><surname>Hayhoe</surname> <given-names>M. M.</given-names></name> <name><surname>Ballard</surname> <given-names>D. H.</given-names></name></person-group> (<year>2002</year>). <article-title>Eye movements in iconic visual search</article-title>. <source>Vision Res.</source> <volume>42</volume>, <fpage>1447</fpage>&#x02013;<lpage>1463</lpage>. <pub-id pub-id-type="doi">10.1016/S0042-6989(02)00040-8</pub-id><pub-id pub-id-type="pmid">12044751</pub-id></citation>
</ref>
<ref id="B129">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Raymond</surname> <given-names>J. E.</given-names></name> <name><surname>Shapiro</surname> <given-names>K. L.</given-names></name> <name><surname>Arnell</surname> <given-names>K. M.</given-names></name></person-group> (<year>1992</year>). <article-title>Temporary suppression of visual processing in an RSVP task: An attentional blink?</article-title> <source>J. Exp. Psychol.</source> <volume>18</volume>, <fpage>849</fpage>&#x02013;<lpage>860</lpage>. <pub-id pub-id-type="pmid">1500880</pub-id></citation>
</ref>
<ref id="B130">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rayner</surname> <given-names>K.</given-names></name> <name><surname>Raney</surname> <given-names>G. E.</given-names></name></person-group> (<year>1996</year>). <article-title>Eye movement control in reading and visual search: Effects of word frequency</article-title>. <source>Psychon. Bull. Rev.</source> <volume>3</volume>, <fpage>245</fpage>&#x02013;<lpage>248</lpage>. <pub-id pub-id-type="pmid">24213875</pub-id></citation>
</ref>
<ref id="B131">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Roark</surname> <given-names>B.</given-names></name> <name><surname>Bachrach</surname> <given-names>A.</given-names></name> <name><surname>Cardenas</surname> <given-names>C.</given-names></name> <name><surname>Pallier</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing</article-title>, in <source>Proceedings of the 14th Conference of Empirical Methods in Natural Language Processing</source> (<publisher-loc>Singapore</publisher-loc>).</citation>
</ref>
<ref id="B132">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rohde</surname> <given-names>H.</given-names></name> <name><surname>Horton</surname> <given-names>W. S.</given-names></name></person-group> (<year>2014</year>). <article-title>Anticipatory looks reveal expectations about discourse relations</article-title>. <source>Cognition</source> <volume>133</volume>, <fpage>667</fpage>&#x02013;<lpage>691</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2014.08.012</pub-id><pub-id pub-id-type="pmid">25247235</pub-id></citation>
</ref>
<ref id="B133">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roux</surname> <given-names>F.</given-names></name> <name><surname>Uhlhaas</surname> <given-names>P. J.</given-names></name></person-group> (<year>2014</year>). <article-title>Working memory and neural oscillations: alpha&#x02013;gamma versus theta&#x02013;gamma codes for distinct WM information?</article-title> <source>Trends Cogn. Sci.</source> <volume>18</volume>, <fpage>16</fpage>&#x02013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2013.10.010</pub-id><pub-id pub-id-type="pmid">24268290</pub-id></citation>
</ref>
<ref id="B134">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sanford</surname> <given-names>A. J.</given-names></name> <name><surname>Sturt</surname> <given-names>P.</given-names></name></person-group> (<year>2002</year>). <article-title>Depth of processing in language comprehension: Not noticing the evidence</article-title>. <source>Trends Cogn. Sci.</source> <volume>6</volume>, <fpage>382</fpage>&#x02013;<lpage>386</lpage>. <pub-id pub-id-type="doi">10.1016/S1364-6613(02)01958-7</pub-id><pub-id pub-id-type="pmid">12200180</pub-id></citation>
</ref>
<ref id="B135">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sauseng</surname> <given-names>P.</given-names></name> <name><surname>Klimesch</surname> <given-names>W.</given-names></name> <name><surname>Heise</surname> <given-names>K.</given-names></name> <name><surname>Gruber</surname> <given-names>W.</given-names></name> <name><surname>Holz</surname> <given-names>E.</given-names></name> <name><surname>Karim</surname> <given-names>A.</given-names></name> <name><surname>Glennon</surname> <given-names>M.</given-names></name> <name><surname>Gerloff</surname> <given-names>C.</given-names></name> <name><surname>Birbaumer</surname> <given-names>N.</given-names></name> <name><surname>Hummel</surname> <given-names>F.</given-names></name></person-group> (<year>2009</year>). <article-title>Brain oscillatory substrates of human visual short-term memory capacity</article-title>. <source>Curr. Biol.</source> <volume>19</volume>, <fpage>1846</fpage>&#x02013;<lpage>1852</lpage>. <pub-id pub-id-type="doi">10.1016/j.cub.2009.08.062</pub-id></citation>
</ref>
<ref id="B136">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sauseng</surname> <given-names>P.</given-names></name> <name><surname>Klimesch</surname> <given-names>W.</given-names></name> <name><surname>Stadler</surname> <given-names>W.</given-names></name> <name><surname>Schabus</surname> <given-names>M.</given-names></name> <name><surname>Doppelmayr</surname> <given-names>M.</given-names></name> <name><surname>Hanslmayr</surname> <given-names>S.</given-names></name> <name><surname>Gruber</surname> <given-names>W. R.</given-names></name> <name><surname>Birbaumer</surname> <given-names>N.</given-names></name></person-group> (<year>2005</year>). <article-title>A shift of visual spatial attention is selectively associated with human EEG alpha activity</article-title>. <source>Eur. J. Neurosci.</source> <volume>22</volume>, <fpage>2917</fpage>&#x02013;<lpage>2926</lpage>. <pub-id pub-id-type="doi">10.1111/j.1460-9568.2005.04482.x</pub-id><pub-id pub-id-type="pmid">16324126</pub-id></citation>
</ref>
<ref id="B137">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sauseng</surname> <given-names>P.</given-names></name> <name><surname>Klimesch</surname> <given-names>W.</given-names></name> <name><surname>Stadler</surname> <given-names>W.</given-names></name> <name><surname>Schabus</surname> <given-names>M.</given-names></name> <name><surname>Doppelmayr</surname> <given-names>M.</given-names></name> <name><surname>Hanslmayr</surname> <given-names>S.</given-names></name> <name><surname>Gruber</surname> <given-names>W. R.</given-names></name> <name><surname>Birbaumer</surname> <given-names>N.</given-names></name></person-group> (<year>2010</year>). <article-title>A shift of visual spatial attention is selectively associated with human EEG alpha activity</article-title>. <source>Eur. J. Neurosci.</source> <volume>22</volume>, <fpage>2917</fpage>&#x02013;<lpage>2926</lpage>. <pub-id pub-id-type="doi">10.1111/j.1460-9568.2005.04482.x</pub-id><pub-id pub-id-type="pmid">16324126</pub-id></citation>
</ref>
<ref id="B138">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Sayeed</surname> <given-names>A.</given-names></name> <name><surname>Fischer</surname> <given-names>S.</given-names></name> <name><surname>Demberg</surname> <given-names>V.</given-names></name></person-group> (<year>2015</year>). <article-title>Vector-space calculation of semantic surprisal for predicting word pronunciation duration</article-title>, in <source>Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing</source> (<publisher-loc>Beijing</publisher-loc>).</citation>
</ref>
<ref id="B139">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schotter</surname> <given-names>E. R.</given-names></name> <name><surname>Bicknell</surname> <given-names>K.</given-names></name> <name><surname>Howard</surname> <given-names>I.</given-names></name> <name><surname>Levy</surname> <given-names>R.</given-names></name> <name><surname>Rayner</surname> <given-names>K.</given-names></name></person-group> (<year>2014</year>). <article-title>Task effects reveal cognitive flexibility responding to frequency and predictability: evidence from eye movements in reading and proofreading</article-title>. <source>Cognition</source> <volume>131</volume>, <fpage>1</fpage>&#x02013;<lpage>27</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2013.11.018</pub-id><pub-id pub-id-type="pmid">24434024</pub-id></citation>
</ref>
<ref id="B140">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seth</surname> <given-names>A. K.</given-names></name> <name><surname>Suzuki</surname> <given-names>K.</given-names></name> <name><surname>Critchley</surname> <given-names>H. D.</given-names></name></person-group> (<year>2011</year>). <article-title>An interoceptive predictive coding model of conscious presence</article-title>. <source>Front. Psychol.</source> <volume>2</volume>:<issue>395</issue>. <pub-id pub-id-type="doi">10.3389/fpsyg.2011.00395</pub-id><pub-id pub-id-type="pmid">22291673</pub-id></citation>
</ref>
<ref id="B141">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Shirai</surname> <given-names>Y.</given-names></name></person-group> (<year>2009</year>). <article-title>Temporality in first and second language acquisition</article-title>, in <source>The Expression of Time</source>, eds <person-group person-group-type="editor"><name><surname>Klein</surname> <given-names>W.</given-names></name> <name><surname>Li</surname> <given-names>P.</given-names></name></person-group> (<publisher-loc>Berlin</publisher-loc>: <publisher-name>Mouton de Gruyter</publisher-name>), <fpage>167</fpage>&#x02013;<lpage>194</lpage>.</citation>
</ref>
<ref id="B142">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simons</surname> <given-names>D. J.</given-names></name> <name><surname>Chabris</surname> <given-names>C. F.</given-names></name></person-group> (<year>1999</year>). <article-title>Gorillas in our midst: Sustained inattentional blindness for dynamic events</article-title>. <source>Perception</source> <volume>28</volume>, <fpage>1059</fpage>&#x02013;<lpage>1074</lpage>. <pub-id pub-id-type="pmid">10694957</pub-id></citation>
</ref>
<ref id="B143">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname> <given-names>N. J.</given-names></name> <name><surname>Levy</surname> <given-names>R.</given-names></name></person-group> (<year>2013</year>). <article-title>The effect of word predictability on reading time is logarithmic</article-title>. <source>Cognition</source> <volume>128</volume>, <fpage>302</fpage>&#x02013;<lpage>319</lpage>. <pub-id pub-id-type="doi">10.1016/j.cognition.2013.02.013</pub-id><pub-id pub-id-type="pmid">23747651</pub-id></citation>
</ref>
<ref id="B144">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sperber</surname> <given-names>D.</given-names></name> <name><surname>Wilson</surname> <given-names>D.</given-names></name></person-group> (<year>1986</year>). <source>Relevance: Communication and Cognition</source>. <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>Harvard University Press</publisher-name>.</citation>
</ref>
<ref id="B145">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Spivey-Knowlton</surname> <given-names>M. J.</given-names></name> <name><surname>Trueswell</surname> <given-names>J. C.</given-names></name> <name><surname>Tanenhaus</surname> <given-names>M. K.</given-names></name></person-group> (<year>1993</year>). <article-title>Context effects in syntactic ambiguity resolution: Discourse and semantic influences in parsing reduced relative clauses</article-title>. <source>Can. J. Exp. Psychol.</source> <volume>47</volume>, <fpage>276</fpage>&#x02013;<lpage>309</lpage>. <pub-id pub-id-type="pmid">8364532</pub-id></citation>
</ref>
<ref id="B146">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tanenhaus</surname> <given-names>M. K.</given-names></name> <name><surname>Spivey-Knowlton</surname> <given-names>M. J.</given-names></name> <name><surname>Eberhard</surname> <given-names>K. M.</given-names></name> <name><surname>Sedivy</surname> <given-names>J. C.</given-names></name></person-group> (<year>1995</year>). <article-title>Integration of visual and linguistic information in spoken language comprehension</article-title>. <source>Science</source> <volume>268</volume>, <fpage>1632</fpage>&#x02013;<lpage>1634</lpage>. <pub-id pub-id-type="pmid">7777863</pub-id></citation>
</ref>
<ref id="B147">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tatler</surname> <given-names>B. W.</given-names></name> <name><surname>Hayhoe</surname> <given-names>M. M.</given-names></name> <name><surname>Land</surname> <given-names>M. F.</given-names></name> <name><surname>Ballard</surname> <given-names>D. H.</given-names></name></person-group> (<year>2011</year>). <article-title>Eye guidance in natural vision: Reinterpreting salience</article-title>. <source>Journal of Vision</source>, <volume>11(5)(5)</volume>:<fpage>1</fpage>&#x02013;<lpage>23</lpage>.</citation>
</ref>
<ref id="B148">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Tomlin</surname> <given-names>R. S.</given-names></name></person-group> (<year>1997</year>). <article-title>Mapping conceptual representations into linguistic representations: the role of attention in grammar</article-title>, in <source>Language and Conceptualization</source>, eds <person-group person-group-type="editor"><name><surname>Nuyts</surname> <given-names>J.</given-names></name> <name><surname>Pederson</surname> <given-names>E.</given-names></name></person-group> (<publisher-loc>Cambridge, UK</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>), <fpage>162</fpage>&#x02013;<lpage>189</lpage>.</citation>
</ref>
<ref id="B149">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Torralba</surname> <given-names>A.</given-names></name> <name><surname>Oliva</surname> <given-names>A.</given-names></name> <name><surname>Castelhano</surname> <given-names>M. S.</given-names></name> <name><surname>Henderson</surname> <given-names>J. M.</given-names></name></person-group> (<year>2006</year>). <article-title>Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search</article-title>. <source>Psychol. Rev.</source> <volume>113</volume>, <fpage>766</fpage>&#x02013;<lpage>786</lpage>. <pub-id pub-id-type="doi">10.1037/0033-295X.113.4.766</pub-id><pub-id pub-id-type="pmid">17014302</pub-id></citation>
</ref>
<ref id="B150">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Trudgill</surname> <given-names>P.</given-names></name></person-group> (<year>1986</year>). <source>Dialects in Contact</source>. <publisher-loc>Oxford, UK</publisher-loc>: <publisher-name>Blackwell</publisher-name>.</citation>
</ref>
<ref id="B151">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trueswell</surname> <given-names>J.</given-names></name> <name><surname>Tanenhaus</surname> <given-names>M.</given-names></name> <name><surname>Garnsey</surname> <given-names>S.</given-names></name></person-group> (<year>1994</year>). <article-title>Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution</article-title>. <source>J. Mem. Lang.</source> <volume>33</volume>, <fpage>285</fpage>&#x02013;<lpage>318</lpage>.</citation>
</ref>
<ref id="B152">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Trueswell</surname> <given-names>J. C.</given-names></name> <name><surname>Tanenhaus</surname> <given-names>M. K.</given-names></name> <name><surname>Kello</surname> <given-names>C.</given-names></name></person-group> (<year>1993</year>). <article-title>Verb-specific constraints in sentence processing: separating effects of lexical preference from garden-paths</article-title>. <source>J. Exp. Psychol.</source> <volume>19</volume>, <fpage>528</fpage>&#x02013;<lpage>553</lpage>. <pub-id pub-id-type="pmid">8501429</pub-id></citation>
</ref>
<ref id="B153">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>van Berkum</surname> <given-names>J. J.</given-names></name></person-group> (<year>2009</year>). <article-title>The neuropragmatics of &#x02018;simple&#x02019; utterance comprehension: an ERP review</article-title>, in <source>Semantics and Pragmatics: From Experiment to Theory</source>, eds <person-group person-group-type="editor"><name><surname>Sauerland</surname> <given-names>U.</given-names></name> <name><surname>Yatsushiro</surname> <given-names>K.</given-names></name></person-group> (<publisher-loc>Basingstoke, UK</publisher-loc>: <publisher-name>Palgrave Macmillan</publisher-name>), <fpage>276</fpage>&#x02013;<lpage>316</lpage>.</citation>
</ref>
<ref id="B154">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Berkum</surname> <given-names>J. J.</given-names></name></person-group> (<year>2010</year>). <article-title>The brain is a prediction machine that cares about good and bad - any implications for neuropragmatics?</article-title> <source>Ital. J. Linguist.</source> <volume>22</volume>, <fpage>181</fpage>&#x02013;<lpage>208</lpage>.</citation>
</ref>
<ref id="B155">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Berkum</surname> <given-names>J. J.</given-names></name> <name><surname>Brown</surname> <given-names>C. M.</given-names></name> <name><surname>Zwitserlood</surname> <given-names>P.</given-names></name> <name><surname>Kooijman</surname> <given-names>V.</given-names></name> <name><surname>Hagoort</surname> <given-names>P.</given-names></name></person-group> (<year>2005</year>). <article-title>Anticipating upcoming words in discourse: evidence from ERPs and reading times</article-title>. <source>J. Exp. Psychol.</source> <volume>31</volume>, <fpage>443</fpage>&#x02013;<lpage>467</lpage>. <pub-id pub-id-type="doi">10.1037/0278-7393.31.3.443</pub-id><pub-id pub-id-type="pmid">15910130</pub-id></citation>
</ref>
<ref id="B156">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Berkum</surname> <given-names>J. J.</given-names></name> <name><surname>Van den Brink</surname> <given-names>D.</given-names></name> <name><surname>Tesink</surname> <given-names>C. M.</given-names></name> <name><surname>Kos</surname> <given-names>M.</given-names></name> <name><surname>Hagoort</surname> <given-names>P.</given-names></name></person-group> (<year>2008</year>). <article-title>The neural integration of speaker and message</article-title>. <source>J. Cogn. Neurosci.</source> <volume>20</volume>, <fpage>580</fpage>&#x02013;<lpage>591</lpage>. <pub-id pub-id-type="doi">10.1162/jocn.2008.20054</pub-id><pub-id pub-id-type="pmid">18052777</pub-id></citation>
</ref>
<ref id="B157">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van der Meer</surname> <given-names>E.</given-names></name> <name><surname>Beyer</surname> <given-names>R.</given-names></name> <name><surname>Heinze</surname> <given-names>B.</given-names></name> <name><surname>Badel</surname> <given-names>I.</given-names></name></person-group> (<year>2002</year>). <article-title>Temporal order relations in language comprehension</article-title>. <source>J. Exp. Psychol.</source> <volume>28</volume>, <fpage>770</fpage>&#x02013;<lpage>779</lpage>. <pub-id pub-id-type="doi">10.1037/0278-7393.28.4.770</pub-id><pub-id pub-id-type="pmid">12109767</pub-id></citation>
</ref>
<ref id="B158">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van der Meer</surname> <given-names>E.</given-names></name> <name><surname>Kr&#x000FC;ger</surname> <given-names>F.</given-names></name> <name><surname>Nuthmann</surname> <given-names>A.</given-names></name></person-group> (<year>2005</year>). <article-title>The influence of temporal order information in general event knowledge on language comprehension</article-title>. <source>Z. Psychol.</source> <volume>213</volume>, <fpage>142</fpage>&#x02013;<lpage>151</lpage>. <pub-id pub-id-type="doi">10.1026/0044-3409.213.3.142</pub-id></citation>
</ref>
<ref id="B159">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Oostendorp</surname> <given-names>H.</given-names></name> <name><surname>De Mul</surname> <given-names>S.</given-names></name></person-group> (<year>1990</year>). <article-title>Moses beats Adam: a semantic relatedness effect on a semantic illusion</article-title>. <source>Acta Psychol.</source> <volume>74</volume>, <fpage>35</fpage>&#x02013;<lpage>46</lpage>.</citation>
</ref>
<ref id="B160">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>van Schijndel</surname> <given-names>M.</given-names></name> <name><surname>Schuler</surname> <given-names>W.</given-names></name></person-group> (<year>2015</year>). <article-title>Hierarchic syntax improves reading time prediction</article-title>, in <source>Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source> (<publisher-loc>Denver, CO</publisher-loc>).</citation>
</ref>
<ref id="B161">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>van Schijndel</surname> <given-names>M.</given-names></name> <name><surname>Schuler</surname> <given-names>W.</given-names></name> <name><surname>Culicover</surname> <given-names>P. W.</given-names></name></person-group> (<year>2014</year>). <article-title>Frequency effects in the processing of unbounded dependencies</article-title>, in <source>Proceedings of the 36th Annual Meeting of the Cognitive Science Society</source> <publisher-loc>Qu&#x000E9;bec City, QC</publisher-loc>.</citation>
</ref>
<ref id="B162">
<citation citation-type="other"><person-group person-group-type="author"><name><surname>Venhuizen</surname> <given-names>N.</given-names></name> <name><surname>Brouwer</surname> <given-names>H.</given-names></name> <name><surname>Crocker</surname> <given-names>M.</given-names></name></person-group> (<year>2016</year>). <article-title>When the food arrives before the menu: modeling event-driven surprisal in language comprehension</article-title>, in <source>Abstract Presented at Events in Language and Cognition, Pre-CUNY Workshop on Event Structure</source> (<publisher-loc>Gainesville, FL</publisher-loc>).</citation>
</ref>
<ref id="B163">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vogels</surname> <given-names>J.</given-names></name> <name><surname>Krahmer</surname> <given-names>E.</given-names></name> <name><surname>Maes</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Who is where referred to how, and why? the influence of visual saliency on referent accessibility in spoken language production</article-title>. <source>Lang. Cogn. Process.</source> <volume>28</volume>, <fpage>1323</fpage>&#x02013;<lpage>1349</lpage>. <pub-id pub-id-type="doi">10.1080/01690965.2012.682072</pub-id></citation>
</ref>
<ref id="B164">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vuilleumier</surname> <given-names>P.</given-names></name></person-group> (<year>2005</year>). <article-title>How brains beware: neural mechanisms of emotional attention</article-title>. <source>Trends Cogn. Sci.</source> <volume>9</volume>, <fpage>585</fpage>&#x02013;<lpage>594</lpage>. <pub-id pub-id-type="doi">10.1016/j.tics.2005.10.011</pub-id><pub-id pub-id-type="pmid">16289871</pub-id></citation>
</ref>
<ref id="B165">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Wilson</surname> <given-names>D.</given-names></name> <name><surname>Sperber</surname> <given-names>D.</given-names></name></person-group> (<year>2004</year>). <article-title>Relevance theory</article-title>, in <source>Handbook of Pragmatics</source>, eds <person-group person-group-type="editor"><name><surname>Ward</surname> <given-names>G.</given-names></name> <name><surname>Horn</surname> <given-names>L.</given-names></name></person-group> (<publisher-loc>Oxford, UK</publisher-loc>: <publisher-name>Blackwell</publisher-name>), <fpage>607</fpage>&#x02013;<lpage>632</lpage>.</citation>
</ref>
<ref id="B166">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wise</surname> <given-names>R. A.</given-names></name></person-group> (<year>2004</year>). <article-title>Dopamine, learning and motivation</article-title>. <source>Nat. Rev. Neurosci.</source> <volume>5</volume>, <fpage>483</fpage>&#x02013;<lpage>494</lpage>. <pub-id pub-id-type="doi">10.1038/nrn1406</pub-id><pub-id pub-id-type="pmid">15152198</pub-id></citation>
</ref>
<ref id="B167">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wolfe</surname> <given-names>J. M.</given-names></name> <name><surname>Horowitz</surname> <given-names>T. S.</given-names></name></person-group> (<year>2004</year>). <article-title>What attributes guide the deployment of visual attention and how do they do it?</article-title> <source>Nat. Rev. Neurosci.</source> <volume>5</volume>, <fpage>495</fpage>&#x02013;<lpage>501</lpage>. <pub-id pub-id-type="doi">10.1038/nrn1411</pub-id><pub-id pub-id-type="pmid">15152199</pub-id></citation>
</ref>
<ref id="B168">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Worden</surname> <given-names>M. S.</given-names></name> <name><surname>Foxe</surname> <given-names>J. J.</given-names></name> <name><surname>Wang</surname> <given-names>N.</given-names></name> <name><surname>Simpson</surname> <given-names>G. V.</given-names></name></person-group> (<year>2000</year>). <article-title>Anticipatory biasing of visuospatial attention indexed by retinotopically specific &#x003B1;-band electroencephalography increases over occipital cortex</article-title>. <source>J. Neurosci.</source> <volume>20</volume>:<fpage>RC63</fpage>. <pub-id pub-id-type="pmid">10704517</pub-id></citation>
</ref>
<ref id="B169">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xiang</surname> <given-names>M.</given-names></name> <name><surname>Kuperberg</surname> <given-names>G.</given-names></name></person-group> (<year>2015</year>). <article-title>Reversing expectations during discourse comprehension</article-title>. <source>Lang. Cogn. Neurosci.</source> <volume>30</volume>, <fpage>648</fpage>&#x02013;<lpage>672</lpage>. <pub-id pub-id-type="doi">10.1080/23273798.2014.995679</pub-id><pub-id pub-id-type="pmid">25914891</pub-id></citation>
</ref>
<ref id="B170">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Yarbus</surname> <given-names>A. L.</given-names></name></person-group> (<year>1967</year>). <source>Eye Movements and Vision</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Plenum</publisher-name>.</citation>
</ref>
<ref id="B171">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zarcone</surname> <given-names>A.</given-names></name> <name><surname>Pad&#x000F3;</surname> <given-names>S.</given-names></name> <name><surname>Lenci</surname> <given-names>A.</given-names></name></person-group> (<year>2014</year>). <article-title>Logical metonymy resolution in a words-as-cues framework: Evidence from self-paced reading and probe recognition</article-title>. <source>Cogn. Sci.</source> <volume>38</volume>, <fpage>973</fpage>&#x02013;<lpage>996</lpage>. <pub-id pub-id-type="doi">10.1111/cogs.12108</pub-id><pub-id pub-id-type="pmid">24628505</pub-id></citation>
</ref>
<ref id="B172">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zelinsky</surname> <given-names>G.</given-names></name> <name><surname>Zhang</surname> <given-names>W.</given-names></name> <name><surname>Yu</surname> <given-names>B.</given-names></name> <name><surname>Chen</surname> <given-names>X.</given-names></name> <name><surname>Samaras</surname> <given-names>D.</given-names></name></person-group> (<year>2006</year>). <article-title>The role of top-down and bottom-up processes in guiding eye movements during visual search</article-title>, in <source>Advances in Neural Information Processing Systems</source>, <volume>Vol. 18</volume>, eds <person-group person-group-type="editor"><name><surname>Weiss</surname> <given-names>Y.</given-names></name> <name><surname>Sch&#x000F6;lkopf</surname> <given-names>B.</given-names></name> <name><surname>Platt</surname> <given-names>J.</given-names></name></person-group> (<publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>), <fpage>1569</fpage>&#x02013;<lpage>1576</lpage>.</citation>
</ref>
<ref id="B173">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zion Golumbic</surname> <given-names>E. M.</given-names></name> <name><surname>Ding</surname> <given-names>N.</given-names></name> <name><surname>Bickel</surname> <given-names>S.</given-names></name> <name><surname>Lakatos</surname> <given-names>P.</given-names></name> <name><surname>Schevon</surname> <given-names>C. A.</given-names></name> <name><surname>McKhann</surname> <given-names>G. M.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Mechanisms underlying selective neuronal tracking of attended speech at a &#x02018;cocktail party&#x02019;</article-title>. <source>Neuron</source> <volume>77</volume>, <fpage>980</fpage>&#x02013;<lpage>991</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuron.2012.12.037</pub-id></citation>
</ref>
<ref id="B174">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zwaan</surname> <given-names>R. A.</given-names></name> <name><surname>Madden</surname> <given-names>C. J.</given-names></name> <name><surname>Whitten</surname> <given-names>S. N.</given-names></name></person-group> (<year>2000</year>). <article-title>The presence of an event in the narrated situation affects its availability to the comprehender</article-title>. <source>Mem. Cogn.</source> <volume>28</volume>, <fpage>1022</fpage>&#x02013;<lpage>1028</lpage>. <pub-id pub-id-type="doi">10.3758/BF03209350</pub-id><pub-id pub-id-type="pmid">11105528</pub-id></citation>
</ref>
<ref id="B175">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zwaan</surname> <given-names>R. A.</given-names></name> <name><surname>Radvansky</surname> <given-names>G. A.</given-names></name></person-group> (<year>1998</year>). <article-title>Situation models in language comprehension and memory</article-title>. <source>Psychol. Bull.</source> <volume>123</volume>, <fpage>162</fpage>&#x02013;<lpage>185</lpage>. <pub-id pub-id-type="pmid">9522683</pub-id></citation>
</ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>On the order of 10<sup>8</sup> bits per second, (Itti and Koch, <xref ref-type="bibr" rid="B71">2000</xref>)</p></fn>
<fn id="fn0002"><p><sup>2</sup>Arguably, highlighting an entity through syntactic focus affects its bottom-up salience. The acquired focus will then cause the entity to be salient in the discourse model, exerting a top-down influence on predictions, see also Section 3.4.</p></fn>
<fn id="fn0003"><p><sup>3</sup><italic>S</italic> and <italic>T</italic> are chosen for the sake of the example, we do not intend to specifically argue for cognitive representations of syntax trees.</p></fn>
</fn-group>
</back>
</article>
